Example: Branches in 3D Horse
This notebook demonstrates how FLASC behaves on a 3D point cloud of a horse. This point cloud consists of 3D points sampled from the surface of a horse. The sampling density varies with the level of detail on the surface. For example, there are a lot more points on the head than on the stomach. In addition, the data-set has several hollow regions, which FLASC’s centrality metric has to deal with accurately.
The horse-shaped mesh-reconstructionn dataset is obtained from a STAD repository. The meshes were originally created or adapted for a paper by Robert W. Sumner and Jovan Popovic (2004). They can be downloaded and are described in more detail on their website.
[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from flasc import FLASC
from lib.plotting import *
palette = configure_matplotlib()
[2]:
df = pd.read_csv('./data/horse/horse.csv')
X = df.to_numpy()
sized_fig(1, aspect=0.618/3)
plt.subplot(1, 3, 1)
plt.scatter(X.T[2], X.T[1], 1, alpha=0.1)
frame_off()
plt.subplot(1, 3, 2)
plt.scatter(X.T[2], X.T[0], 1, alpha=0.1)
frame_off()
plt.subplot(1, 3, 3)
plt.scatter(X.T[0], X.T[1], 1, alpha=0.1)
frame_off()
plt.show()

Single cluster
First, lets analyze this dataset as if it is a single cluster. For that purpose, we set override_cluster_labels
to assign every point to the 0
cluster. This is different from allow_single_cluster
, as it gives each point the same cluster membership probability. All branches in this point cloud (legs, head, and tail) are quite large, so we set min_branch_size=20
. Because there are no noise points and to speed up the result, we keep min_samples=5
. Note that this value is
unlikely to cross the hollow regions in the legs and torso.
[3]:
c = FLASC(min_samples=5, min_branch_size=20, branch_selection_method="leaf").fit(
X, labels=np.zeros(X.shape[0], dtype=np.intp)
)
g = c.cluster_approximation_graph_
With these settings, we find 2 main sides of the cluster, with 3 persistent branches each.
[4]:
sized_fig()
c.branch_condensed_trees_[0].plot(leaf_separation=0.2)
plt.ylabel('Eccentricity')
plt.show()

The resulting branches neatly correspond to the legs, head, ears and tail. The torso gets its own label, indicating the most central points.
[5]:
sized_fig(1, aspect=0.618/3)
plt.subplot(1, 3, 1)
plt.scatter(X.T[2], X.T[1], 1, c.labels_,
alpha=0.1, cmap='tab10', vmax=10)
frame_off()
plt.subplot(1, 3, 2)
plt.scatter(X.T[2], X.T[0], 1, c.labels_,
alpha=0.1, cmap='tab10', vmax=10)
frame_off()
plt.subplot(1, 3, 3)
plt.scatter(X.T[0], X.T[1], 1, c.labels_,
alpha=0.1, cmap='tab10', vmax=10)
frame_off()
plt.show()

Computing the branch membership vectors to re-assign central points to the closest branch centroid in the cluster approximation graph lets each branch grow into the torso for a potentially more useful segmentation.
[6]:
from flasc.prediction import (
branch_centrality_vectors,
update_labels_with_branch_centrality,
)
v = branch_centrality_vectors(c)
l, _ = update_labels_with_branch_centrality(c, v)
[7]:
sized_fig(1, aspect=0.618 / 3)
plt.subplot(1, 3, 1)
plt.scatter(X.T[2], X.T[1], 1, l, alpha=0.1, cmap="tab10", vmax=10)
frame_off()
plt.subplot(1, 3, 2)
plt.scatter(X.T[2], X.T[0], 1, l, alpha=0.1, cmap="tab10", vmax=10)
frame_off()
plt.subplot(1, 3, 3)
plt.scatter(X.T[0], X.T[1], 1, l, alpha=0.1, cmap="tab10", vmax=10)
frame_off()
plt.show()

Multiple clusters
This data set can also be viewed as multiple density-based clusters. To cluster this dataset we specify min_samples=20
and min_cluster_size=1000
. This results in four clusters: the tail, the head, the back legs, and the front legs. Parts of the torse are classified as noise. To detect branches within these clusters, we set min_branch_size=20
and branch_selection_persistence=0.1
.
[8]:
c = FLASC(
min_samples=20,
min_cluster_size=1000,
min_branch_size=20,
branch_selection_persistence=0.1,
).fit(X)
g = c.cluster_approximation_graph_
The clustering found 2 sides the each contain 2 clusters.
[9]:
sized_fig()
c.condensed_tree_.plot()
plt.show()

The resulting clusters neatly capture the tail, back legs, front legs and head.
[10]:
sized_fig(1, aspect=0.618/3)
plt.subplot(1, 3, 1)
colors = [palette[l] if l >= 0 else (0.8, 0.8, 0.8) for l in c.cluster_labels_]
plt.scatter(X.T[2], X.T[1], 1, colors,
alpha=0.1)
frame_off()
plt.subplot(1, 3, 2)
plt.scatter(X.T[2], X.T[0], 1, colors,
alpha=0.1)
frame_off()
plt.subplot(1, 3, 3)
plt.scatter(X.T[0], X.T[1], 1, colors,
alpha=0.1)
frame_off()
plt.show()

The branches detect the individual lower-legs, hips and shoulder-region as distinct branches.
[11]:
sized_fig(1, aspect=0.618/3)
palette = plt.get_cmap('tab20').colors
plt.subplot(1, 3, 1)
colors = [
palette[l] if l >= 0 else (0.8, 0.8, 0.8)
for l in c.labels_
]
plt.scatter(X.T[2], X.T[1], 1, colors,
alpha=0.1)
frame_off()
plt.subplot(1, 3, 2)
plt.scatter(X.T[2], X.T[0], 1, colors,
alpha=0.1)
frame_off()
plt.subplot(1, 3, 3)
plt.scatter(X.T[0], X.T[1], 1, colors,
alpha=0.1)
frame_off()
plt.show()

The branch grouping is easier to see when drawn over the cluster approximation graphs. A potential issue with this segmentation is that the label for the most central points in the front-legs consists of two distinct connected component. This happens for U-shaped clusters in general, where the centroid lies between the branches, and both branches have a local centrality maximum.
[12]:
sized_fig(1, aspect=1)
g.plot(edge_alpha=0.1, node_color=[
palette[l % 20] for l in c.labels_[g.point_mask]
])
frame_off()
plt.show()
C:\Users\jelme\Documents\Development\work\clones\hdbscan\hdbscan\plots.py:1179: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
plt.scatter(

These graphs also show how the centrality metric behaves.
[13]:
sized_fig(1, aspect=1)
g.plot(node_alpha=0, edge_color='centrality', edge_alpha=0.1)
frame_off()
plt.show()
