Bi-Persistence Clustering on 3D Horse-shaped Mesh Reconstruction Data

This notebook demonstrates the BPSCAN clustering algorithm on a 3D horse-shaped mesh reconstruction data set. The data was obtained from a STAD repository. The meshes were originally created or adapted for a paper by Robert W. Sumner and Jovan Popovic (2004). They can be downloaded and are described in more detail on their website.

[2]:

%load_ext autoreload
%autoreload 2

[26]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from umap import UMAP
from biperscan import BPSCAN

from lib.plotting import *

palette = configure_matplotlib()

Uniformly distributed noise points are added to make the clustering task more difficult.

[6]:

def noise_from(values, num_samples):
    min, max = values.min(), values.max()
    range = max - min
    offset = 0.01 * range
    return np.random.uniform(min - offset, max - offset, num_samples)


horse = pd.read_csv("data/horse/horse.csv").sample(frac=0.2)
num_noise_samples = horse.shape[0] // 10
df = pd.concat(
    (
        horse[["x", "y", "z"]],
        pd.DataFrame(
            dict(
                x=noise_from(horse.x, num_noise_samples),
                y=noise_from(horse.y, num_noise_samples),
                z=noise_from(horse.z, num_noise_samples),
            )
        ),
    )
)

The BPSCAN algorithm requires a bit of tuning to extract the branches. In particular, performance appears better when min_samples is smaller than min_cluster_size. distance_fraction is used to filter out the noise points. Values up to \(0.1\) appear to work quite well in general.

[7]:

c = BPSCAN(min_samples=10, min_cluster_size=60, distance_fraction=0.04).fit(df)

Visualizing a side-view of the dataset makes it difficult to see whether the legs are correctly clustered. So, we also show a 2D UMAP projection that separates the legs and tail.

[9]:

p = UMAP(n_neighbors=50, repulsion_strength=0.1).fit(df.values)

[17]:

sized_fig(0.5)
plt.scatter(df.z, df.y, s=1, c=c.lens_values_, cmap="viridis")
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")

sized_fig(0.5)
plt.scatter(*p.embedding_.T, s=1, c=c.lens_values_, cmap="viridis")
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")
plt.show()

The simplified hierarchy is quite complex, with multiple edges indicating a relation through both the parent side and the child side.

[23]:

sized_fig(0.5)
c.simplified_hierarchy_.plot_persistence_areas()
plt.show()

The merges themselves appear more clearly follow the expected structure. The legs, tail, head, and even ears and nose are detected as branches.

[22]:

sized_fig(0.5, 0.8)
c.simplified_hierarchy_.plot_merges(df.z, df.y, s=0.5)

Only at the highest shown depth-level are all expected branches separated into their own cluster. In this case, because the simplified hierarchy is a graph rather than a tree, the depth limit influences when the merges are visited rather than whether they are visited. This leads to subtle differences in the labelling.

[27]:

scatter_kwargs = dict(s=1, alpha=0.5, linewidths=0, edgecolor="none")
colors = plt.cm.tab20.colors
color_kwargs = dict(
    cmap=ListedColormap(["silver"] + [colors[i] for i in range(20)]), vmin=-1, vmax=19
)

[29]:

for i in range(4):
    labels = c.labels_at_depth(i)
    sized_fig(0.25)
    plt.scatter(df.z, df.y, c=labels, **color_kwargs, **scatter_kwargs)
    plt.subplots_adjust(0, 0, 1, 1)
    plt.axis("off")

    sized_fig(0.25)
    plt.scatter(*p.embedding_.T, c=labels, **color_kwargs, **scatter_kwargs)
    plt.subplots_adjust(0, 0, 1, 1)
    plt.axis("off")
plt.show()

The first-non-zero membership labelling finds most of the branching structure but does not separate the legs from each other.

[31]:

sized_fig()
plt.scatter(
    df.z,
    df.y,
    c=c.first_nonzero_membership(),
    **scatter_kwargs,
    **color_kwargs,
)
plt.axis("off")
plt.show()