Bi-Persistence Clustering on 3D Horse-shaped Mesh Reconstruction Data
This notebook demonstrates the BPSCAN clustering algorithm on a 3D horse-shaped mesh reconstruction data set. The data was obtained from a STAD repository. The meshes were originally created or adapted for a paper by Robert W. Sumner and Jovan Popovic (2004). They can be downloaded and are described in more detail on their website.
[2]:
%load_ext autoreload
%autoreload 2
[26]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from umap import UMAP
from biperscan import BPSCAN
from lib.plotting import *
palette = configure_matplotlib()
Uniformly distributed noise points are added to make the clustering task more difficult.
[6]:
def noise_from(values, num_samples):
min, max = values.min(), values.max()
range = max - min
offset = 0.01 * range
return np.random.uniform(min - offset, max - offset, num_samples)
horse = pd.read_csv("data/horse/horse.csv").sample(frac=0.2)
num_noise_samples = horse.shape[0] // 10
df = pd.concat(
(
horse[["x", "y", "z"]],
pd.DataFrame(
dict(
x=noise_from(horse.x, num_noise_samples),
y=noise_from(horse.y, num_noise_samples),
z=noise_from(horse.z, num_noise_samples),
)
),
)
)
The BPSCAN algorithm requires a bit of tuning to extract the branches. In particular, performance appears better when min_samples
is smaller than min_cluster_size
. distance_fraction
is used to filter out the noise points. Values up to \(0.1\) appear to work quite well in general.
[7]:
c = BPSCAN(min_samples=10, min_cluster_size=60, distance_fraction=0.04).fit(df)
Visualizing a side-view of the dataset makes it difficult to see whether the legs are correctly clustered. So, we also show a 2D UMAP projection that separates the legs and tail.
[9]:
p = UMAP(n_neighbors=50, repulsion_strength=0.1).fit(df.values)
[17]:
sized_fig(0.5)
plt.scatter(df.z, df.y, s=1, c=c.lens_values_, cmap="viridis")
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")
sized_fig(0.5)
plt.scatter(*p.embedding_.T, s=1, c=c.lens_values_, cmap="viridis")
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")
plt.show()
data:image/s3,"s3://crabby-images/24381/24381f9849315d2e402cd3d4d63cb29ec8119b13" alt="_images/Horse_Bifiltration_9_0.png"
data:image/s3,"s3://crabby-images/5bfa5/5bfa571866753dccfd68849bcab9fb2ec84f8eed" alt="_images/Horse_Bifiltration_9_1.png"
The simplified hierarchy is quite complex, with multiple edges indicating a relation through both the parent side and the child side.
[23]:
sized_fig(0.5)
c.simplified_hierarchy_.plot_persistence_areas()
plt.show()
data:image/s3,"s3://crabby-images/e619a/e619a08f365e3050815baa38fa7766ed64ecaaf9" alt="_images/Horse_Bifiltration_11_0.png"
The merges themselves appear more clearly follow the expected structure. The legs, tail, head, and even ears and nose are detected as branches.
[22]:
sized_fig(0.5, 0.8)
c.simplified_hierarchy_.plot_merges(df.z, df.y, s=0.5)
data:image/s3,"s3://crabby-images/b4cc8/b4cc82a3037737c1799e17a79060e811d7f83338" alt="_images/Horse_Bifiltration_13_0.png"
Only at the highest shown depth-level are all expected branches separated into their own cluster. In this case, because the simplified hierarchy is a graph rather than a tree, the depth limit influences when the merges are visited rather than whether they are visited. This leads to subtle differences in the labelling.
[27]:
scatter_kwargs = dict(s=1, alpha=0.5, linewidths=0, edgecolor="none")
colors = plt.cm.tab20.colors
color_kwargs = dict(
cmap=ListedColormap(["silver"] + [colors[i] for i in range(20)]), vmin=-1, vmax=19
)
[29]:
for i in range(4):
labels = c.labels_at_depth(i)
sized_fig(0.25)
plt.scatter(df.z, df.y, c=labels, **color_kwargs, **scatter_kwargs)
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")
sized_fig(0.25)
plt.scatter(*p.embedding_.T, c=labels, **color_kwargs, **scatter_kwargs)
plt.subplots_adjust(0, 0, 1, 1)
plt.axis("off")
plt.show()
data:image/s3,"s3://crabby-images/386bb/386bb035f903ce53c8165201c4a4b495b9a82c02" alt="_images/Horse_Bifiltration_16_0.png"
data:image/s3,"s3://crabby-images/d8831/d883179bfd6bb3da5099664eb9bb68cf5c929b4d" alt="_images/Horse_Bifiltration_16_1.png"
data:image/s3,"s3://crabby-images/5c208/5c208f57efde348393a483f8cf4ca780bc11ffd6" alt="_images/Horse_Bifiltration_16_2.png"
data:image/s3,"s3://crabby-images/11ec6/11ec6f1a89dc53a8f28ca24d37ae96be433e2fb6" alt="_images/Horse_Bifiltration_16_3.png"
data:image/s3,"s3://crabby-images/5493c/5493ca77e023b5ca2b09db758c98dc9b19bdfb90" alt="_images/Horse_Bifiltration_16_4.png"
data:image/s3,"s3://crabby-images/aaccd/aaccd0a36401e8674500d912811137b5f1c7e459" alt="_images/Horse_Bifiltration_16_5.png"
data:image/s3,"s3://crabby-images/47742/47742916da54e9094643654f8512465f2e22e13c" alt="_images/Horse_Bifiltration_16_6.png"
data:image/s3,"s3://crabby-images/da88b/da88bf191d8551e642630194cdededb952cd12d6" alt="_images/Horse_Bifiltration_16_7.png"
The first-non-zero membership labelling finds most of the branching structure but does not separate the legs from each other.
[31]:
sized_fig()
plt.scatter(
df.z,
df.y,
c=c.first_nonzero_membership(),
**scatter_kwargs,
**color_kwargs,
)
plt.axis("off")
plt.show()
data:image/s3,"s3://crabby-images/e4ad8/e4ad850195d061943957b63a4f761f1e9c65456b" alt="_images/Horse_Bifiltration_18_0.png"