API Reference

class biperscan.BPSCAN(min_samples: int | None = None, min_cluster_size: int = 10, distance_fraction: float = 1.0, max_label_depth: int | None = None, metric: str | Callable = 'euclidean', metric_kws: dict | None = None, lens: str | Callable | ndarray[float32] = 'negative_distance_to_median', lens_kws: dict | None = None, memory: str | None = None)

Perform Bi-Persistence clustering. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.

Parameters:
  • min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.

  • min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.

  • distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.

  • max_label_depth (int, default=None) – The maximum depth to extract labels from the simplified hierarchy.

  • metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances() for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).

  • metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.

  • lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in biperscan.lenses.available_lenses(). If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.

  • lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.

  • memory (str or None, default=None) – A path to store the cache or None to disable caching.

distances_

The mutual reachability distance matrix in condensed form.

Type:

array of shape (n_samples, n_samples)

lens_values_

The computed lens values.

Type:

array of shape (n_samples,)

lens_grades_

The lens grade for each point.

Type:

array of shape (n_samples,)

minimal_presentation_

The minimal presentation of the bi-filtration.

Type:

MinimalPresentation

merge_hierarchy_

The merge hierarchy graph.

Type:

MergeHierarchy

simplified_hierarchy_

The simplified hierarchy graph.

Type:

SimplifiedHierarchy

linkage_hierarchy_

The linkage hierarchy graph. This property is computed on demand and not cached.

Type:

LinkageHierarchy

labels_

The computed cluster labels.

Type:

array of shape (n_samples,)

membership_

A binary matrix indicating which points are members of which clusters. Columns are ordered by the merge hierarchy, so that the first non-zero column can be used to extract a labelling. Cluster membership overlaps.

Type:

array of shape (n_samples, n_clusters)

first_nonzero_membership()

Return the first non-zero members column index.

fit(X: ndarray[float64], y: ndarray | None = None)

Performs BPSCAN clustering on the given data.

Parameters:
  • X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2)) – A feature array, or condensed distance array if metric=’precomputed’.

  • y (None) – Ignored

Returns:

self – Returns the instance itself.

Return type:

object

labels_at_depth(depth: int | None = None)

Recomputes labels from the simplified hierarchy.

Parameters:

depth (int or None, default=None) – The maximum depth to extract labels from the simplified hierarchy.

Returns:

labels – The computed cluster labels.

Return type:

array of shape (n_samples,)

biperscan.bpscan(X, *, min_samples: int | None = None, min_cluster_size: int = 10, distance_fraction: float = 1.0, max_label_depth: int | None = None, metric: str | Callable = 'euclidean', metric_kws: dict | None = None, lens: str | Callable | ndarray[float32] = 'negative_distance_to_median', lens_kws: dict | None = None, memory: str | None = None)

Perform Bi-Persistence clustering on the given data. BPSCAN adapts HDBSCAN* to operate on a bi-filtration of the data, where the filtration is defined by a lens function and a mutual reachability distance.

Parameters:
  • X (array of shape (n_samples, n_features), or array of shape (1, n_samples * (n_samples - 1) // 2)) – A feature array, or condensed distance array if metric=’precomputed’.

  • min_samples (int, default=None) – The number of samples in a neighborhood for a point to be considered as a core point. If None, defaults to min_cluster_size.

  • min_cluster_size (int, default=10) – The minimum number of samples to be a cluster.

  • distance_fraction (float, default=1.0) – The fraction of the maximum distance grade to use a upper distance limit when extracting merges.

  • max_label_depth (int, default=None) – The maximum depth to extract labels from the simplified hierarchy.

  • metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances() for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix. Otherwise, X is passed to the metric function as argument(s).

  • metric_kws (dict, default=None) – Additional keyword arguments to pass to the metric function.

  • lens (str or callable or array of shape (n_samples,)) – The lens function to use when computing the bi-filtration. If a string, it must be a key in biperscan.lenses.available_lenses(). If a callable, must return a float32 array of lens values. If an array, must be a float32 array of lens values.

  • lens_kws (dict, default=None) – Additional keyword arguments to pass to the lens function.

  • memory (str or None, default=None) – A path to store the cache or None to disable caching.

Returns:

  • distances (array of shape (n_samples, n_samples)) – The mutual reachability distance matrix in condensed form.

  • lens_values (array of shape (n_samples,)) – The computed lens values.

  • lens_grades (array of shape (n_samples,)) – The lens grade for each point.

  • col_to_edge (array of shape (n_edges,)) – Mapping from column index to index in the condensed distance matrix.

  • row_to_point (array of shape (n_samples,)) – Mapping from minimal presentation row index to data point index.

  • minimal_presentation (dict) – The minimal presentation of the bi-filtration. Contains the following keys: ‘lens_grade’, ‘distance_grade’, ‘parent’, ‘child’.

  • merges (dict) – The merges extracted from the minimal presentation. Contains the following keys: ‘start_column’, ‘end_column’, ‘root_one’, ‘root_two’, ‘side_one’, ‘side_two’, ‘lens_grade’, ‘distance_grade’. Merges are ordered with increasing end column. Rows with the same ‘start_column’ and ‘end_column’ pairs indicate merges that originate from the same edges. Take the data point union over ‘side_one’ and ‘side_two’ for all rows with the same ‘start_column’ and ‘end_column’ up to the row being processed to find all points included in the row being processed!

  • merge_hierarchy (networkx.DiGraph) – The merge hierarchy graph.

  • simplified_hierarchy (networkx.DiGraph) – The simplified hierarchy graph.

  • labels (array of shape (n_samples,)) – The computed cluster labels.

Lenses

This module implements several point-cloud measures that can be used as lenses with BPSCAN. Most of these functions are based on the documentation of the python implementation of Mapper.

biperscan.lenses.negative_density(X: ndarray, distance_matrix: ndarray, *, sigma: float = 0.3, **kwargs) ndarray

Computes point-cloud density

Parameters:
  • X (2D NumPy array) – The original data matrix. Not used in this function.

  • distances (1D numpy array) – The condensed distance matrix.

  • sigma (float, optional (default = 0.3)) – stddev of Gaussian smoothing kernel.

Returns:

  • N by 1 numpy array containing the negative vertex density values normalized to

  • lie between 0 and 1.

biperscan.lenses.negative_distance_to_mean(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict | None = None, **kwargs) ndarray

Computes distance to the mean centroid.

Parameters:
  • X (2D NumPy array) – The original data matrix.

  • distances (1D numpy array) – The condensed distance matrix, not used in this function.

Returns:

  • N by 1 numpy array containing the negative distance to centroid values

  • scaled to lie between 0 and 1.

biperscan.lenses.negative_distance_to_median(X: ndarray, distance_matrix: ndarray, *, metric: str = 'euclidean', metric_kws: dict | None = None, **kwargs) ndarray

Computes distance to the median centroid.

Parameters:
  • X (2D NumPy array) – The original data matrix.

  • distances (1D numpy array) – The condensed distance matrix, not used in this function.

Returns:

  • N by 1 numpy array containing the negative distance to centroid values

  • scaled to lie between 0 and 1.

biperscan.lenses.negative_eccentricity(X: ndarray, distance_matrix: ndarray, *, power: float = inf, **kwargs) ndarray

Computes point-cloud eccentricity

Parameters:
  • X (2D NumPy array) – The original data matrix. Not used in this function.

  • distances (1D numpy array) – The condensed distance matrix

  • power (int, optional (default = np.inf)) – The power to use, may also be infinite.

Returns:

  • N by 1 numpy array containing the negative vertex eccentricity values scaled

  • to lie between 0 and 1.

biperscan.lenses.normalize(values: ndarray[float64]) ndarray[float32]

Scales values to lie between 0 and 1.

Plot classes

class biperscan.plots.LinkageHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], linkage_hierarchy: dict[str, ndarray])

A class for plotting and transforming linkage hierarchies.

as_networkx()

Returns the hierarchy as a networkx graph.

as_pandas()

Returns the hierarchy as a pandas DataFrame.

plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)

Plots the hierarchy as a network.

Parameters:
  • layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.

  • nodes (bool) – Whether to plot the nodes.

  • edges (bool) – Whether to plot the edges.

  • labels (bool) – Whether to plot the labels.

  • node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the edges.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.

  • hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, labels: bool = True, offset_x: float = 0.02, offset_y: float = 0.0, node_kws: dict | None = None, line_kws: dict | None = None, text_kws: dict | None = None)

Plots the distance and lens grade (or values) of the hierarchy.

Parameters:
  • view_type (str) – The type of view to plot. Either “grade” or “value”.

  • transposed (bool) – Whether to transpose the plot.

  • labels (bool) – Whether to plot the labels.

  • offset_x (float) – The x offset for the labels.

  • offset_y (float) – The y offset for the labels.

  • node_kws (dict, optional) – Additional keyword arguments for plotting the nodes

  • line_kws (dict, optional) – Additional keyword arguments for plotting the lines.

  • text_kws (dict, optional) – Additional keyword arguments for plotting the labels.

class biperscan.plots.MergeHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray], merge_hierarchy: DiGraph)

A class for plotting and transforming merge hierarchies.

as_networkx()

Returns the merge hierarchy as a networkx graph.

as_pandas()

Transform the merge hierarchy into pandas DataFrames.

Returns:

  • nodes (pd.DataFrame) – One row for each merge in the merge hierarchy, can be used to extract data points contained in each aggregated node.

  • edges (pd.DataFrame) – Edges in the merge hierarchy.

plot_merges(xs: ndarray, ys: ndarray, *, s: int = 2, title_y: float = 0.9, arrowsize: int = 10, linewidth: float = 1)

Plots the points in each merges in the merge hierarchy.

Parameters:
  • xs (np.ndarray) – The x-coordinates of the points.

  • ys (np.ndarray) – The y-coordinates of the points.

  • s (int) – The size of the points.

  • title_y (float) – The y-coordinate of the title.

  • arrowsize (int) – The size of the arrows.

  • linewidth (float) – The width of the arrows.

plot_network(*, layout: str | dict = 'dot', nodes: bool = False, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)

Plots the hierarchy as a network.

Parameters:
  • layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.

  • nodes (bool) – Whether to plot the nodes.

  • edges (bool) – Whether to plot the edges.

  • labels (bool) – Whether to plot the labels.

  • node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the edges.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.

  • hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, distance_offset: float = 1.05, line_kws: dict | None = None, font_kws: dict | None = None)

Plots the distance and lens grade (or values) of the merge hierarchy.

Parameters:
  • view_type (str) – The type of view to plot. Either “grade” or “value”.

  • transposed (bool) – Whether to transpose the plot.

  • distance_offset (float) – A factor that controls the upper distance limit.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the lines.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.

class biperscan.plots.MinimalPresentation(distances: ndarray[float32], point_lens_values: ndarray[float32], point_lens_grades: ndarray[uint32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray])

A class for plotting and transforming minimal presentations.

as_networkx()

Returns the minimal presentation as a networkx graph.

as_pandas()

Returns the minimal presentation as a pandas DataFrame.

plot_network(*, layout: str | dict = 'sfdp', nodes: bool = True, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks: bool = True)

Plots the hierarchy as a network.

Parameters:
  • layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.

  • nodes (bool) – Whether to plot the nodes.

  • edges (bool) – Whether to plot the edges.

  • labels (bool) – Whether to plot the labels.

  • node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the edges.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.

  • hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(*, view_type: str = 'grade', transposed: bool = False, line_kws: dict | None = None)

Plots the distance and lens grade (or values) of the minimal presentation.

Parameters:
  • view_type (str) – The type of view to plot. Either “grade” or “value”.

  • transposed (bool) – Whether to transpose the plot.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the lines.

class biperscan.plots.SimplifiedHierarchy(distances: ndarray[float32], point_lens_values: ndarray[float32], col_to_edge: ndarray[uint32], row_to_point: ndarray[uint32], minpres: dict[str, ndarray], merge_hierarchy: DiGraph, simplified_hierarchy: DiGraph)

A class for plotting and transforming simplified merge hierarchies.

as_networkx()

Transforms the simplified hierarchy into a networkx DiGraph.

Returns:

  • simplified_hierarchy (nx.DiGraph) – The simplified hierarchy.

  • merge_hierarchy (nx.DiGraph) – The merge hierarchy. Needed to map nodes in the simplified hierarchy to data points contained in the merge hierarchy.

as_pandas()

Transform the simplified hierarchy into pandas DataFrames.

Returns:

  • nodes (pd.DataFrame) – One node for each connected component in the merge hierarchy. Lists which merges are part of the component.

  • edges (pd.DataFrame) – Edges in the simplified hierarchy.

  • merges (pd.DataFrame) – One row for each merge in the merge hierarchy, can be used to extract data points contained in each aggregated node.

plot_merges(xs: ndarray, ys: ndarray, *, s: int = 2, title_y: float = 0.9)

Plots the points in each merges in the merge hierarchy.

Parameters:
  • xs (np.ndarray) – The x-coordinates of the points.

  • ys (np.ndarray) – The y-coordinates of the points.

  • s (int) – The size of the points.

  • title_y (float) – The y-coordinate of the title.

  • linewidth (float) – The width of the arrows.

plot_network(*, layout: str | dict = 'dot', nodes: bool = False, edges: bool = True, labels: bool = True, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None, hide_ticks=True)

Plots the hierarchy as a network.

Parameters:
  • layout (str or dict) – The layout of the network. If a string, it should be one of “sfdp”, “neato”, “dot”, “spring”, or “spectral”. If a dictionary, it should be a mapping from node to position.

  • nodes (bool) – Whether to plot the nodes.

  • edges (bool) – Whether to plot the edges.

  • labels (bool) – Whether to plot the labels.

  • node_kws (dict, optional) – Additional keyword arguments for plotting the nodes.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the edges.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.

  • hide_ticks (bool) – Whether to hide ticks.

plot_persistence_areas(view_type: str = 'grade', transposed: bool = False, distance_offset: float = 1.05, node_kws: dict | None = None, line_kws: dict | None = None, font_kws: dict | None = None)

Plots the distance and lens grade (or values) of the simplified merge hierarchy.

Parameters:
  • view_type (str) – The type of view to plot. Either “grade” or “value”.

  • transposed (bool) – Whether to transpose the plot.

  • distance_offset (float) – A factor that controls the upper distance limit.

  • line_kws (dict, optional) – Additional keyword arguments for plotting the lines.

  • font_kws (dict, optional) – Additional keyword arguments for plotting the labels.