API Reference

Perform Bi-Persistence clustering. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.

Parameters:

min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.
min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.
distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances() for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).
metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.
lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in biperscan.lenses.available_lenses(). If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.
lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.
memory (str or None, default=None) – A path to store the cache or None to disable caching.

distances_

The mutual reachability distance matrix in condensed form.

Type:: array of shape (n_samples, n_samples)

lens_values_

The computed lens values.

Type:: array of shape (n_samples,)

lens_grades_

The lens grade for each point.

Type:: array of shape (n_samples,)

minimal_presentation_

The minimal presentation of the bi-filtration.

Type:: MinimalPresentation

merges_

The detected merges.

Type:: MergeList

simplified_merges_

The simplified merges.

Type:: SimplifiedHierarchy

linkage_hierarchy_

The linkage hierarchy graph. This property is computed on demand and not cached.

Type:: LinkageHierarchy

membership_

A binary membership matrix indicating which points belong to which groups. Groups can overlap and relate to the child and parent sides of the simplified merges.

Type:: array of shape (n_samples,n_groups)

labels_

The computed cluster labels. The labels identify points with the same membership combinations.

Type:: array of shape (n_samples,)

timers_

The time spent on each step of the clustering process.

Type:: dict

fit(X: ndarray[float64], y: ndarray = None)

Performs BPSCAN clustering on the given data.

Parameters:

X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2)) – A feature array, or condensed distance array if metric=’precomputed’.
y (None) – Ignored

Returns:

self – Returns the instance itself.

Return type:

object

Perform Bi-Persistence clustering on the given data. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.

Parameters:

X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2) A feature array, or) – condensed distance array if metric=’precomputed’.
min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.
min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.
distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances() for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).
metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.
lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in biperscan.lenses.available_lenses(). If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.
lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.
memory (str or None, default=None) – A path to store the cache or None to disable caching.

Returns:

distances (array of shape (n_samples, n_samples)) – The mutual reachability distance matrix in condensed form.
lens_values (array of shape (n_samples,)) – The computed lens values.
lens_grades (array of shape (n_samples,)) – The lens grade for each point.
col_to_edge (array of shape (n_edges,)) – Mapping from column index to index in the condensed distance matrix.
row_to_point (array of shape (n_samples,)) – Mapping from minimal presentation row index to data point index.
minimal_presentation (dict) – The minimal presentation of the bi-filtration. Contains the following keys: ‘lens_grade’, ‘distance_grade’, ‘parent’, ‘child’.
merges (dict) – The merges extracted from the minimal presentation. Contains the following keys: ‘start_column’, ‘end_column’, ‘lens_grade’, ‘distance_grade’, ‘parent’, ‘child’, ‘parent_side’, ‘child_side’. Merges are ordered by increasing end column.
simplified_merges (dict) – The merges that remain after combining similar merges. Contains the following keys: ‘parent’, ‘child’, ‘parent_side’, ‘child_side’.
membership (array of shape (n_samples,n_groups)) – A binary membership matrix indicating which points belong to which groups. Groups can overlap and relate to the child and parent sides of the simplified merges.
labels (array of shape (n_samples,)) – The computed cluster labels.
timers (dict) – The time spent on each step of the clustering process.

Lenses

This module implements several point-cloud measures that can be used as lenses with BPSCAN. Most of these functions are based on the documentation of the python implementation of Mapper.

biperscan.lenses.negative_density(X: ndarray, distance_matrix: ndarray, *, sigma: float = 0.3, **kwargs) → ndarray

Computes point-cloud density

Parameters:

X (2D NumPy array) – The original data matrix. Not used in this function.
distances (1D numpy array) – The condensed distance matrix.
sigma (float, optional (default = 0.3)) – stddev of Gaussian smoothing kernel.

Returns:

N by 1 numpy array containing the negative vertex density values normalized to
lie between 0 and 1.

biperscan.lenses.negative_distance_to_mean(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict = None, **kwargs) → ndarray

Computes distance to the mean centroid.

Parameters:

X (2D NumPy array) – The original data matrix.
distances (1D numpy array) – The condensed distance matrix, not used in this function.

Returns:

N by 1 numpy array containing the negative distance to centroid values
scaled to lie between 0 and 1.

biperscan.lenses.negative_distance_to_median(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict = None, **kwargs) → ndarray

Computes distance to the median centroid.

Parameters:

X (2D NumPy array) – The original data matrix.
distances (1D numpy array) – The condensed distance matrix, not used in this function.

Returns:

N by 1 numpy array containing the negative distance to centroid values
scaled to lie between 0 and 1.

biperscan.lenses.negative_eccentricity(X: ndarray, distance_matrix: ndarray, *, power: float = inf, **kwargs) → ndarray

Computes point-cloud eccentricity

Parameters:

X (2D NumPy array) – The original data matrix. Not used in this function.
distances (1D numpy array) – The condensed distance matrix
power (int, optional (default = np.inf)) – The power to use, may also be infinite.

Returns:

N by 1 numpy array containing the negative vertex eccentricity values scaled
to lie between 0 and 1.

biperscan.lenses.normalize(values: ndarray[float64]) → ndarray[float32]: Scales values to lie between 0 and 1.

Plot classes

class biperscan.plots.LinkageHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], linkage_hierarchy: dict[str, ndarray])

A class for plotting and transforming linkage hierarchies.

as_networkx(): Returns the hierarchy as a networkx graph.

as_pandas(): Returns the hierarchy as a pandas DataFrame.

plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)

Plots the hierarchy as a network.

Parameters:

layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, labels: bool = True, offset_x: float = 0.02, offset_y: float = 0.0, node_kws: dict | None = None, line_kws: dict | None = None, text_kws: dict | None = None)

Plots the distance and lens grade (or values) of the hierarchy.

Parameters:

view_type (str) – The type of view to plot. Either “grade” or “value”.
transposed (bool) – Whether to transpose the plot.
labels (bool) – Whether to plot the labels.
offset_x (float) – The x offset for the labels.
offset_y (float) – The y offset for the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes
line_kws (dict, optional) – Additional keyword arguments for plotting the lines.
text_kws (dict, optional) – Additional keyword arguments for plotting the labels.

class biperscan.plots.MinimalPresentation(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray])

A class for plotting and transforming minimal presentations.

as_networkx(): Returns the minimal presentation as a networkx graph.

as_pandas(): Returns the minimal presentation as a pandas DataFrame.

plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)

Plots the hierarchy as a network.

Parameters:

layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, line_kws: dict | None = None)

Plots the distance and lens grade (or values) of the minimal presentation.

Parameters:

view_type (str) – The type of view to plot. Either “grade” or “value”.
transposed (bool) – Whether to transpose the plot.
line_kws (dict, optional) – Additional keyword arguments for plotting the lines.