API Reference
- class biperscan.BPSCAN(min_samples: int | None = None, min_cluster_size: int = 10, distance_fraction: float = 1.0, max_label_depth: int | None = None, metric: str | Callable = 'euclidean', metric_kws: dict | None = None, lens: str | Callable | ndarray[float32] = 'negative_distance_to_median', lens_kws: dict | None = None, memory: str | None = None)
Perform Bi-Persistence clustering. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.
- Parameters:
min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.
min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.
distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.
max_label_depth (int, default=None) – The maximum depth to extract labels from the simplified hierarchy.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by
sklearn.metrics.pairwise_distances()
for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.
lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in
biperscan.lenses.available_lenses()
. If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.
memory (str or None, default=None) – A path to store the cache or None to disable caching.
- distances_
The mutual reachability distance matrix in condensed form.
- Type:
array of shape (n_samples, n_samples)
- lens_values_
The computed lens values.
- Type:
array of shape (n_samples,)
- lens_grades_
The lens grade for each point.
- Type:
array of shape (n_samples,)
- minimal_presentation_
The minimal presentation of the bi-filtration.
- Type:
- merge_hierarchy_
The merge hierarchy graph.
- Type:
- simplified_hierarchy_
The simplified hierarchy graph.
- Type:
- linkage_hierarchy_
The linkage hierarchy graph. This property is computed on demand and not cached.
- Type:
- labels_
The computed cluster labels.
- Type:
array of shape (n_samples,)
- membership_
A binary matrix indicating which points are members of which clusters. Columns are ordered by the merge hierarchy, so that the first non-zero column can be used to extract a labelling. Cluster membership overlaps.
- Type:
array of shape (n_samples, n_clusters)
- first_nonzero_membership()
Return the first non-zero members column index.
- fit(X: ndarray[float64], y: ndarray | None = None)
Performs BPSCAN clustering on the given data.
- Parameters:
X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2)) – A feature array, or condensed distance array if metric=’precomputed’.
y (None) – Ignored
- Returns:
self – Returns the instance itself.
- Return type:
- biperscan.bpscan(X, *, min_samples: int | None = None, min_cluster_size: int = 10, distance_fraction: float = 1.0, max_label_depth: int | None = None, metric: str | Callable = 'euclidean', metric_kws: dict | None = None, lens: str | Callable | ndarray[float32] = 'negative_distance_to_median', lens_kws: dict | None = None, memory: str | None = None)
Perform Bi-Persistence clustering on the given data. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.
- Parameters:
X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2)) – A feature array, or condensed distance array if metric=’precomputed’.
min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.
min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.
distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.
max_label_depth (int, default=None) – The maximum depth to extract labels from the simplified hierarchy.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by
sklearn.metrics.pairwise_distances()
for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.
lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in
biperscan.lenses.available_lenses()
. If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.
memory (str or None, default=None) – A path to store the cache or None to disable caching.
- Returns:
distances (array of shape (n_samples, n_samples)) – The mutual reachability distance matrix in condensed form.
lens_values (array of shape (n_samples,)) – The computed lens values.
lens_grades (array of shape (n_samples,)) – The lens grade for each point.
col_to_edge (array of shape (n_edges,)) – Mapping from column index to index in the condensed distance matrix.
row_to_point (array of shape (n_samples,)) – Mapping from minimal presentation row index to data point index.
minimal_presentation (dict) – The minimal presentation of the bi-filtration. Contains the following keys: ‘lens_grade’, ‘distance_grade’, ‘parent’, ‘child’.
merges (dict) – The merges extracted from the minimal presentation. Contains the following keys: ‘start_column’, ‘end_column’, ‘root_one’, ‘root_two’, ‘side_one’, ‘side_two’, ‘lens_grade’, ‘distance_grade’. Merges are ordered with increasing end column. Rows with the same ‘start_column’ and ‘end_column’ pairs indicate merges that originate from the same edges. Take the data point union over ‘side_one’ and ‘side_two’ for all rows with the same ‘start_column’ and ‘end_column’ up to the row being processed to find all points included in the row being processed!
merge_hierarchy (networkx.DiGraph) – The merge hierarchy graph.
simplified_hierarchy (networkx.DiGraph) – The simplified hierarchy graph.
labels (array of shape (n_samples,)) – The computed cluster labels.
Lenses
This module implements several point-cloud measures that can be used as lenses with BPSCAN. Most of these functions are based on the documentation of the python implementation of Mapper.
- biperscan.lenses.negative_density(X: ndarray, distance_matrix: ndarray, *, sigma: float = 0.3, **kwargs) ndarray
Computes point-cloud density
- Parameters:
X (2D NumPy array) – The original data matrix. Not used in this function.
distances (1D numpy array) – The condensed distance matrix.
sigma (float, optional (default = 0.3)) – stddev of Gaussian smoothing kernel.
- Returns:
N by 1 numpy array containing the negative vertex density values normalized to
lie between 0 and 1.
- biperscan.lenses.negative_distance_to_mean(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict | None = None, **kwargs) ndarray
Computes distance to the mean centroid.
- Parameters:
X (2D NumPy array) – The original data matrix.
distances (1D numpy array) – The condensed distance matrix, not used in this function.
- Returns:
N by 1 numpy array containing the negative distance to centroid values
scaled to lie between 0 and 1.
- biperscan.lenses.negative_distance_to_median(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict | None = None, **kwargs) ndarray
Computes distance to the median centroid.
- Parameters:
X (2D NumPy array) – The original data matrix.
distances (1D numpy array) – The condensed distance matrix, not used in this function.
- Returns:
N by 1 numpy array containing the negative distance to centroid values
scaled to lie between 0 and 1.
- biperscan.lenses.negative_eccentricity(X: ndarray, distance_matrix: ndarray, *, power: float = inf, **kwargs) ndarray
Computes point-cloud eccentricity
- Parameters:
X (2D NumPy array) – The original data matrix. Not used in this function.
distances (1D numpy array) – The condensed distance matrix
power (int, optional (default = np.inf)) – The power to use, may also be infinite.
- Returns:
N by 1 numpy array containing the negative vertex eccentricity values scaled
to lie between 0 and 1.
Plot classes
- class biperscan.plots.LinkageHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], linkage_hierarchy: dict[str, ndarray])
A class for plotting and transforming linkage hierarchies.
- as_networkx()
Returns the hierarchy as a networkx graph.
- as_pandas()
Returns the hierarchy as a pandas DataFrame.
- plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)
Plots the hierarchy as a network.
- Parameters:
layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.
- plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, labels: bool = True, offset_x: float = 0.02, offset_y: float = 0.0, node_kws: dict | None = None, line_kws: dict | None = None, text_kws: dict | None = None)
Plots the distance and lens grade (or values) of the hierarchy.
- Parameters:
view_type (str) – The type of view to plot. Either “grade” or “value”.
transposed (bool) – Whether to transpose the plot.
labels (bool) – Whether to plot the labels.
offset_x (float) – The x offset for the labels.
offset_y (float) – The y offset for the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes
line_kws (dict, optional) – Additional keyword arguments for plotting the lines.
text_kws (dict, optional) – Additional keyword arguments for plotting the labels.
- class biperscan.plots.MergeHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray], merge_hierarchy: DiGraph)
A class for plotting and transforming merge hierarchies.
- as_networkx()
Returns the merge hierarchy as a networkx graph.
- as_pandas()
Transform the merge hierarchy into pandas DataFrames.
- Returns:
nodes (pd.DataFrame) – One row for each merge in the merge hierarchy, can be used to extract data points contained in each aggregated node.
edges (pd.DataFrame) – Edges in the merge hierarchy.
- plot_merges(xs: ndarray, ys: ndarray, *, s: int = 2, title_y: float = 0.9, arrowsize: int = 10, linewidth: float = 1)
Plots the points in each merges in the merge hierarchy.
- plot_network(*, layout: str | dict = 'dot', nodes: bool = False, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)
Plots the hierarchy as a network.
- Parameters:
layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.
- plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, distance_offset: float = 1.05, line_kws: dict | None = None, font_kws: dict | None = None)
Plots the distance and lens grade (or values) of the merge hierarchy.
- Parameters:
view_type (str) – The type of view to plot. Either “grade” or “value”.
transposed (bool) – Whether to transpose the plot.
distance_offset (float) – A factor that controls the upper distance limit.
line_kws (dict, optional) – Additional keyword arguments for plotting the lines.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
- class biperscan.plots.MinimalPresentation(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray])
A class for plotting and transforming minimal presentations.
- as_networkx()
Returns the minimal presentation as a networkx graph.
- as_pandas()
Returns the minimal presentation as a pandas DataFrame.
- plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)
Plots the hierarchy as a network.
- Parameters:
layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.
- class biperscan.plots.SimplifiedHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray], merge_hierarchy: DiGraph, simplified_hierarchy: DiGraph)
A class for plotting and transforming simplified merge hierarchies.
- as_networkx()
Transforms the simplified hierarchy into a networkx DiGraph.
- Returns:
simplified_hierarchy (nx.DiGraph) – The simplified hierarchy.
merge_hierarchy (nx.DiGraph) – The merge hierarchy. Needed to map nodes in the simplified hierarchy to data points contained in the merge hierarchy.
- as_pandas()
Transform the simplified hierarchy into pandas DataFrames.
- Returns:
nodes (pd.DataFrame) – One node for each connected component in the merge hierarchy. Lists which merges are part of the component.
edges (pd.DataFrame) – Edges in the simplified hierarchy.
merges (pd.DataFrame) – One row for each merge in the merge hierarchy, can be used to extract data points contained in each aggregated node.
- plot_merges(xs: ndarray, ys: ndarray, *, s: int = 2, title_y: float = 0.9)
Plots the points in each merges in the merge hierarchy.
- plot_network(*, layout: str | dict = 'dot', nodes: bool = False, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks=True)
Plots the hierarchy as a network.
- Parameters:
layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.
nodes (bool) – Whether to plot the nodes.
edges (bool) – Whether to plot the edges.
labels (bool) – Whether to plot the labels.
node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.
line_kws (dict, optional) – Additional keyword arguments for plotting the edges.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.
hide_ticks (bool) – Whether to hide ticks.
- plot_persistence_areas(view_type: str = 'grade', transposed: bool = False, distance_offset: float = 1.05, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None)
Plots the distance and lens grade (or values) of the simplified merge hierarchy.
- Parameters:
view_type (str) – The type of view to plot. Either “grade” or “value”.
transposed (bool) – Whether to transpose the plot.
distance_offset (float) – A factor that controls the upper distance limit.
line_kws (dict, optional) – Additional keyword arguments for plotting the lines.
font_kws (dict, optional) – Additional keyword arguments for plotting the labels.