Multiprocessing Behaviour

FLASC’s branch detection step and HDBSCAN*’s core distance computation can run in parallel. Due to the memory bound nature of the implementation and the overhead + copy costs of joblib’s loky multiprocess backend, running multiple processes is only beneficial for larger datasets.

This notebook investigates at which sizes multiprocessing becomes beneficial to find a good default behaviour. Some parts of FLASC are re-implemented here to investigate the branch-detection step and core-distance step on their own. Unlike HDBSCAN*, FLASC will respect the specified num_jobs parameter making it very easy to override the default behaviour.

Setup

This cell loads the libraries required to run this notebook.

[1]:
%load_ext autoreload
%autoreload 2
[1]:
import time
import itertools
import numpy as np
import pandas as pd
from tqdm import trange
from sklearn.utils import shuffle
from joblib.externals.loky import get_reusable_executor

from flasc import FLASC

import seaborn as sns
import matplotlib.pyplot as plt
from _plotting import *
%matplotlib inline
palette = configure_matplotlib()

Datasets

The same dataset generation procedure is used as for the Computational Cost comparison. Clusters are generated using a varying number of random walks starting from the same position. Multiple clusters are positioned at uniform random positions sampled from a space that fits 5 times the specified number of clusters. These datasets result in non-trivially structured data, which should provide a more useful description of FLASC’s multi-processing behaviour.

[2]:
def generate_cluster(
    n_dims=2, n_walks=5, walk_length=50, std_step=0.1
):
    """Generates a cluster by repeating a random walk from (0,0)."""
    # The possible directions
    vectors = np.eye(n_dims)
    # Output collection
    points = []
    # Sample random walks
    for r in range(n_walks):
        directions = np.random.choice(np.arange(n_dims), walk_length)
        lengths = np.random.normal(0, std_step, (walk_length, 1))
        steps = vectors[directions] * lengths
        pts = np.cumsum(steps, axis=0)
        points.append(pts)
    # Add noise
    points = np.concatenate(points)
    return points

def generate_clusters(n_clusters=2, n_dims=2, min_dist=1, n_walks=5):
    # Uniform random samples within volume spaced to fit 2*n_clusters
    extra_spacing_factor = 5
    volume = n_clusters * (min_dist**n_dims)
    length = np.power(extra_spacing_factor * volume, 1./n_dims)
    coords = np.random.uniform(high=length, size=(n_clusters, n_dims))

    # Perform random walks at each coord
    points = np.concatenate([
        generate_cluster(n_dims=n_dims, n_walks=n_walks) + coord
        for coord in coords
    ])
    # Create labels and return
    y = np.repeat(np.arange(n_clusters), 50*n_walks)
    return shuffle(points, y)

We generated 5 data sets for each combination of: - Number of dimensions - Number of clusters - Number of walks per cluster

The cluster radius values, which indicate the 95 percentile distance from a cluster’s centor to its points, are copied from the Computational Cost comparison notebook.

[3]:
repeats = list(range(5))
num_dims = [2, 8, 16]
cluster_radius = [0.95555435, 0.80832393, 0.7837516]
num_clusters = np.round(np.exp(np.linspace(np.log(2), np.log(800), 10)))
num_walks = np.asarray([5, 10, 20])

params = pd.DataFrame([
    (r, ds[0], ds[1], c, w, c * w * 50)
    for r, ds, c, w in itertools.product(
        repeats,
        zip(num_dims, cluster_radius),
        num_clusters,
        num_walks
    ) if c * w * 50 <= 200000
], columns=['repeat','num_dims', 'min_dist', 'num_clusters', 'num_walks', 'num_points'])
params['X'], params['y'] = zip(*[
    generate_clusters(
        n_dims=int(params.num_dims[i]),
        min_dist=float(params.min_dist[i]),
        n_clusters=int(params.num_clusters[i]),
        n_walks=int(params.num_walks[i])
    )
    for i in trange(params.shape[0])
])
params.to_pickle('./data/generated/threading_comparison_datasets.pickle')

Core distances

The first step that can benefit from multiprocessing is finding the point’s core distances and neighbours. In the cell below, we extracted this steps implementation from the main FLASC function to analyze its run time separately from the other steps. (cell hidden in docs)

In addition to the dataset parameters, we vary whether clusters are overridden in this parameter sweep, because with overriden clusters only part of the HDBSCAN* algorithm is evaluated.

[6]:
# Parameter values to compare
num_jobs = [1, 4]
override_clusters = [True, False]

# Create single data frame with combinations
sweep = pd.DataFrame([
        (d, c, w, c * w * 50, j, o)
        for d, c, w, j, o in itertools.product(
            num_dims,
            num_clusters,
            num_walks,
            num_jobs,
            override_clusters
        ) if c * w * 50 <= 200000
    ],
    columns=[
        'num_dims', 'num_clusters', 'num_walks', 'num_points',
        'num_jobs', 'override_clusters'
    ]
)
id_vars = sweep.columns.to_list()

To make sure the multi-processing pools are not reused, we force a shutdown before calling the timed function. Essentially, this assumes a cold-run where the threading pool is not initialized yet.

[7]:
def measure_setting(p):
    """Computes the run times of the given setting"""
    num_repeats = len(repeats)
    times = np.nan * np.ones(num_repeats, dtype=np.double)

    # Evaluate num_repeat times
    for i in range(num_repeats):
        # Extract the dataset
        param_i = params[
            (params.num_dims == p.num_dims) &
            (params.num_clusters == p.num_clusters) &
            (params.num_walks == p.num_walks) &
            (params.repeat == i)
        ].index[0]
        X = params.loc[param_i, 'X']

        # Clean up processing backend
        get_reusable_executor().shutdown(wait=True)

        # Run the fit
        start = time.perf_counter()
        _hdbscan_linkage(X, num_jobs=p.num_jobs, run_override=p.override_clusters)
        end = time.perf_counter()

        # Store run time and  num clusters
        times[i] = end - start
    return times

The cell below actually runs the parameter sweep in about 2 hours and 40 minutes.

[8]:
sweep['run_times'] = [ measure_setting(sweep.iloc[i]) for i in trange(sweep.shape[0]) ]
sweep.to_pickle('./data/generated/thread_scaling_core_dists.pickle')
100%|██████████████████████████████████████████████████████████████████████████████| 300/300 [2:40:05<00:00, 32.02s/it]

Results

In the cells below, we try to find out at which data-set sizes multi-processing becomes beneficial. First, we load the data files, so it is possible to recreate the figures without running the entire parameter sweep.

[9]:
params = pd.read_pickle('./data/generated/threading_comparison_datasets.pickle')
sweep = pd.read_pickle('./data/generated/thread_scaling_core_dists.pickle')
repeats = np.arange(len(sweep.run_times[0]))
sweep['repeats'] = [repeats for _ in range(sweep.shape[0])]
sweep = sweep.explode(['run_times', 'repeats'])

Then, we compute the speedup between 1 job and 4 jobs for all datasets.

[ ]:
pivotted = pd.pivot(sweep,
         index=[
            'num_dims','num_clusters', 'num_walks','num_points',
            'override_clusters', 'repeats'
        ],
         columns='num_jobs',
         values='run_times'
)
one_job = pivotted[1].to_numpy()[None].T
multi_jobs = pivotted.iloc[:, 1:]
speedup = (one_job / multi_jobs).reset_index()
speedup = speedup.rename(columns={
    'repeat': 'Repeat',
    'num_dims': 'Num dimensions',
    'num_clusters': 'Num clusters',
    'num_walks': 'Num walks',
    'num_points': 'Num points',
    'override_clusters': 'Override clusters',
    'num_detected_clusters': 'Num detected clusters',
    'num_jobs': 'Num jobs',
    4: 'Speedup'
})

Now, we can plot the average speedup for the different data-set and cluster sizes:

[33]:
g = sns.FacetGrid(
    speedup,
    col='Override clusters',
    col_order=[False, True],
)
g.map_dataframe(
    sns.lineplot,
    x='Num points',
    y='Speedup',
    hue='Num dimensions',
    palette='tab10'
)
for a in plt.gcf().axes:
    a.plot([0, 200000], [1, 1], 'k:', linewidth=0.5)
    a.set_xlim([0, 200000])
plt.sca(a)
l = plt.legend(title='Num dimensions')
adjust_legend_subtitles(l)
g.set_titles('Override {col_name}')
size_fig(1, aspect=0.618*2/3)
# plt.xscale('log')
plt.subplots_adjust(hspace=0.2, wspace=0.2)
plt.show()
_images/Comparison_multi_processing_22_0.png

For these datasets, HDBSCAN*’s threading limit of +/- 16.000 points is too low. Benefits only start to happend from 50.000 points. For our implementation, we use 125.000 points as conservative threshold, ensuring no slow-downs occur on 2-dimensional datasets.

The raw numbers:

[31]:
speedup.groupby(by=[
    'Num clusters', 'Override clusters', 'Num dimensions'
]).Speedup.mean().reset_index().pivot(
    index=['Override clusters', 'Num dimensions'],
    values='Speedup',
    columns='Num clusters'
)
[31]:
Num clusters 2.0 4.0 8.0 15.0 29.0 56.0 109.0 211.0 411.0 800.0
Override clusters Num dimensions
False 2 0.005177 0.010299 0.022059 0.043019 0.088026 0.187841 0.412534 0.548403 0.761539 1.715687
8 0.015312 0.033583 0.098570 0.195107 0.367741 0.603581 0.830696 0.964007 1.051215 1.281790
16 0.024017 0.055282 0.136907 0.281519 0.503949 0.733526 0.915181 0.980591 1.007645 1.050434
True 2 0.002616 0.005449 0.011458 0.023468 0.051625 0.133886 0.346752 0.478879 0.703685 1.986652
8 0.007089 0.016132 0.046569 0.097261 0.208068 0.410467 0.748709 0.910315 1.203641 2.349060
16 0.010657 0.025276 0.063608 0.139045 0.300975 0.529191 0.817763 0.928171 1.040766 1.312453

Branch Detection

The branch detection step can be performed in parallel for each cluster separately. Here we analyze whether that has a benefit separately from the other steps. The functions below implements all FLASC steps that occur before the branch detection step and the branch detection step on its own. Joblib’s Memory caching is used to speed up the parts that are not measured. This requires roughly 20Gb free disk space, which is cleaned up when the sweep completes. (code cell hiddin in docs)

[5]:
import tempfile
from joblib.parallel import Parallel
from joblib.memory import Memory

tempdir = tempfile.TemporaryDirectory()
memory = Memory(tempdir.name, verbose=0)

FLASC’s main parameters are varied so we can find a good threshold for each variant of the algorithm: - branch detection method, - cluster override

FLASC’s generic algorithm variant, which computes the full distance matrix is not tested here, as no speedups were observed with that variant previously.

[7]:
# Parameter values to compare
branch_detection_method = ['core', 'full']
override_clusters = [True, False]
num_jobs = [1, 4]

# Create single data frame with combinations
sweep = pd.DataFrame([
        (d, c, w, c * w * 50, j, b, o)
        for d, c, w, j, b, o in itertools.product(
            num_dims,
            num_clusters,
            num_walks,
            num_jobs,
            branch_detection_method,
            override_clusters
        ) if c * w * 50 <= 200000
    ],
    columns=[
        'num_dims', 'num_clusters', 'num_walks', 'num_points',
        'num_jobs', 'branch_detection_method', 'override_clusters'
    ]
)
id_vars = sweep.columns.to_list()

As before, the loky backend is shutdown between runs to make sure no caching benefits influence the comparison.

[8]:
def measure_setting(p):
    """Computes the run times of the given setting"""
    num_repeats = len(repeats)
    times = np.nan * np.ones(num_repeats, dtype=np.double)

    # Evaluate num_repeat times
    for i in range(num_repeats):
        # Find data-set index
        param_i = params[
            (params.num_dims == p.num_dims) &
            (params.num_clusters == p.num_clusters) &
            (params.num_walks == p.num_walks) &
            (params.repeat == i)
        ].index[0]

        # Compute clusters and points
        run_generic = False
        run_override = p.override_clusters
        run_core = p.branch_detection_method == 'core'
        preparation = memory.cache(_flasc_clusters)(
            param_i, run_generic=run_generic, run_override=run_override
        )

        # Clean up processing backend
        get_reusable_executor().shutdown(wait=True)

        # Run the branch detection step
        start = time.perf_counter()
        _flasc_branches(
            *preparation,
            run_generic=run_generic,
            run_override=run_override,
            run_core=run_core,
            num_jobs=p.num_jobs
        )
        end = time.perf_counter()

        # Store run time and  num clusters
        times[i] = end - start
    return times

The cell below runs the actual sweep, which takes about 4 hours.

[9]:
sweep['run_times'] = [ measure_setting(sweep.iloc[i]) for i in trange(sweep.shape[0]) ]
sweep.to_pickle('./data/generated/thread_scaling_branches.pickle')
tempdir.cleanup()
100%|██████████████████████████████████████████████████████████████████████████████| 600/600 [4:01:28<00:00, 24.15s/it]

Results

In this section we plot the results from the branch-detection sweep. The data-files are read in again so that the figures can be re-created without running the sweep.

[44]:
params = pd.read_pickle('./data/generated/threading_comparison_datasets.pickle')
sweep = pd.read_pickle('./data/generated/thread_scaling_branches.pickle')
repeats = np.arange(len(sweep.run_times[0]))
sweep['repeats'] = [repeats for _ in range(sweep.shape[0])]
sweep = sweep.explode(['run_times', 'repeats'])

The speedup from 1 to 4 jobs is computed:

[ ]:
pivotted = pd.pivot(sweep,
         index=[
            'num_dims', 'num_clusters', 'num_walks', 'num_points',
            'branch_detection_method', 'override_clusters', 'repeats'
        ],
         columns='num_jobs',
         values='run_times'
)
one_job = pivotted[1].to_numpy()[None].T
multi_jobs = pivotted.iloc[:, 1:]
speedup = (one_job / multi_jobs).reset_index()
speedup = speedup.rename(columns={
    'repeat': 'Repeat',
    'num_dims': 'Num dimensions',
    'num_clusters': 'Num clusters',
    'num_walks': 'Num walks',
    'num_points': 'Num points',
    'branch_detection_method': 'Branch detection',
    'override_clusters': 'Override clusters',
    'num_jobs': 'Num jobs',
    4: 'Speedup'
})

The figure below shows the speedups for the different FLASC parameter combinations and dataset dimensions. Again, 2D datasets benefit the least from multiprocessing.

[43]:
g = sns.FacetGrid(
    speedup,
    row='Branch detection',
    col="Override clusters",
    row_order=['core', 'full'],
    col_order=[False, True]
)
g.map_dataframe(
    sns.lineplot,
    x='Num points',
    y='Speedup',
    hue='Num walks',
    style='Num dimensions',
    palette='tab10'
)
for a in plt.gcf().axes:
    a.plot([0, 200000], [1, 1], 'k:', linewidth=0.5)
    a.set_xlim([0, 200000])
g.set_titles('{row_name} | Override {col_name}')
# g.set(ylim=(0, 250))
# plt.xscale('log')
size_fig(1, 1)
plt.subplots_adjust(hspace=0.2, wspace=0.2)
# plt.savefig('./images/threading_best_size.png')
plt.show()
_images/Comparison_multi_processing_39_0.png

For the core branch detection approach, spinning up multiple processes is not worth it (at this min_samples value), regardless of dataset size. Only when the pool is re-used from the core distances step, could there be a benefit, but even that appears unlikely. For the full branch detection method, multiprocessing becomes beneficial from around 150.000 data points. However, because a threadpool is created for the core distance step at 125.000 points, re-using that pool is likely beneficial already. So, in the final implementation, we disable the thread pool for the branch detection step when the core detection method or generic variant of the algorithm is used. Otherwise, the pool from the core distances step is re-used.

It can be worth enabling multiprocessing manually for datasets with tens of thousands of points if they have more than 2 dimensions.

[13]:
speedup.groupby(by=[
    'Num clusters', 'Num walks', 'Override clusters', 'Branch detection', 'Num dimensions'
]).Speedup.mean().reset_index().pivot(
    index=['Branch detection', 'Override clusters', 'Num walks', 'Num dimensions'],
    values='Speedup',
    columns='Num clusters'
)
[13]:
Num clusters 2.0 4.0 8.0 15.0 29.0 56.0 109.0 211.0 411.0 800.0
Branch detection Override clusters Num walks Num dimensions
core False 5 2 0.003159 0.005944 0.014465 0.025352 0.049430 0.090837 0.173306 0.303712 0.483539 0.755368
8 0.002931 0.007304 0.014776 0.027717 0.051618 0.096351 0.173453 0.267089 0.394351 0.291465
16 0.002928 0.007461 0.015128 0.027281 0.051547 0.094157 0.167719 0.277289 0.124852 0.090819
10 2 0.005429 0.011559 0.027093 0.049754 0.091805 0.172753 0.255159 0.494996 NaN NaN
8 0.005681 0.013227 0.027950 0.054397 0.097699 0.172399 0.284143 0.384808 NaN NaN
16 0.005520 0.015637 0.027964 0.051809 0.095650 0.163954 0.251415 0.199158 NaN NaN
20 2 0.010943 0.021226 0.051036 0.097723 0.177076 0.307708 0.529663 NaN NaN NaN
8 0.011226 0.026767 0.052873 0.094384 0.173779 0.294398 0.416041 NaN NaN NaN
16 0.010768 0.028713 0.055042 0.099928 0.179951 0.278628 0.297049 NaN NaN NaN
True 5 2 0.003137 0.008425 0.016174 0.030059 0.057410 0.106009 0.196040 0.339635 0.551901 0.872365
8 0.003665 0.008627 0.017128 0.032247 0.059790 0.112970 0.202119 0.332817 0.487964 0.430270
16 0.003711 0.009020 0.018092 0.033078 0.061428 0.113519 0.196277 0.312589 0.343147 0.177158
10 2 0.006769 0.017592 0.034283 0.064142 0.119286 0.220623 0.340419 0.624888 NaN NaN
8 0.007501 0.019264 0.038454 0.070137 0.129781 0.236366 0.395940 0.593900 NaN NaN
16 0.008003 0.020320 0.040147 0.073285 0.135198 0.244074 0.392161 0.415146 NaN NaN
20 2 0.016858 0.043566 0.084196 0.151690 0.281300 0.487886 0.781616 NaN NaN NaN
8 0.018425 0.047230 0.093067 0.170709 0.306283 0.517214 0.759545 NaN NaN NaN
16 0.020348 0.051696 0.100968 0.181717 0.325953 0.537935 0.648758 NaN NaN NaN
full False 5 2 0.005859 0.009929 0.027037 0.045271 0.096476 0.169733 0.330739 0.524932 0.811864 1.149715
8 0.005682 0.015299 0.036559 0.077567 0.170626 0.354356 0.637906 1.042073 1.434513 1.214936
16 0.006155 0.017117 0.049194 0.111786 0.266430 0.561697 1.085606 1.842024 1.719904 1.371313
10 2 0.018080 0.040749 0.066864 0.121523 0.198410 0.385023 0.674564 0.894895 NaN NaN
8 0.014676 0.040010 0.094555 0.207498 0.399099 0.728690 1.175111 1.582926 NaN NaN
16 0.017140 0.055636 0.136961 0.332426 0.742876 1.285111 2.011022 2.329959 NaN NaN
20 2 0.039610 0.066367 0.128860 0.314227 0.463832 0.693282 1.017464 NaN NaN NaN
8 0.054310 0.133418 0.262903 0.479026 0.819095 1.236615 1.644502 NaN NaN NaN
16 0.059054 0.176911 0.427189 0.828805 1.560656 2.101381 2.590795 NaN NaN NaN
True 5 2 0.005828 0.013195 0.028016 0.052473 0.106243 0.193333 0.353224 0.594588 0.909688 1.302847
8 0.005980 0.017393 0.039808 0.083182 0.180777 0.384853 0.693834 1.176123 1.648200 1.545613
16 0.006963 0.018492 0.051550 0.116409 0.274195 0.569178 1.115421 1.842648 2.146585 1.990661
10 2 0.015712 0.037300 0.072851 0.136760 0.255589 0.454890 0.751886 1.092214 NaN NaN
8 0.016179 0.048635 0.114401 0.248468 0.500768 0.884831 1.434070 1.877409 NaN NaN
16 0.019345 0.061627 0.150492 0.338740 0.759894 1.317657 2.101171 2.474793 NaN NaN
20 2 0.045825 0.116512 0.218807 0.379507 0.642007 0.972706 1.363354 NaN NaN NaN
8 0.054985 0.168832 0.351439 0.673266 1.121121 1.699989 2.107128 NaN NaN NaN
16 0.068774 0.196071 0.460291 0.881114 1.608466 2.141666 2.603280 NaN NaN NaN

Full implementation

Finally, lets check the full FLASC implementation to validate that the default behaviour does not introduce slow-downs.

[6]:
# Parameter values to compare
branch_detection_method = ['core', 'full']
override_clusters = [True, False]
enable_threading = [True, False]

# Create single data frame with combinations
sweep = pd.DataFrame([
        (d, c, w, c * w * 50, j, b, o)
        for d, c, w, j, b, o in itertools.product(
            num_dims,
            num_clusters,
            num_walks,
            enable_threading,
            branch_detection_method,
            override_clusters
        ) if c * w * 50 <= 200000
    ],
    columns=[
        'num_dims', 'num_clusters', 'num_walks', 'num_points',
        'enable_threading', 'branch_detection_method', 'override_clusters'
    ]
)
id_vars = sweep.columns.to_list()
[7]:
def measure_setting(p):
    """Computes the run times of the given setting"""
    num_repeats = len(repeats)
    times = np.nan * np.ones(num_repeats, dtype=np.double)

    # Evaluate num_repeat times
    for i in range(num_repeats):
        # Find data-set index
        param_i = params[
            (params.num_dims == p.num_dims) &
            (params.num_clusters == p.num_clusters) &
            (params.num_walks == p.num_walks) &
            (params.repeat == i)
        ].index[0]
        X = params.X[param_i]
        y = params.y[param_i]

        # Compute clusters and points
        clusterer = FLASC(
            min_samples=10,
            min_cluster_size=100,
            min_branch_size=20,
            allow_single_cluster=True,
            override_cluster_labels=y if p.override_clusters else None,
            branch_detection_method=p.branch_detection_method,
            num_jobs = None if p.enable_threading else 1,
        )

        # Clean up processing backend
        get_reusable_executor().shutdown(wait=True)

        # Run the branch detection step
        start = time.perf_counter()
        clusterer.fit(X)
        end = time.perf_counter()

        # Store run time and  num clusters
        times[i] = end - start
    return times

The sweep below takes roughly 7 hours.

[8]:
sweep['run_times'] = [ measure_setting(sweep.iloc[i]) for i in trange(sweep.shape[0]) ]
sweep.to_pickle('./data/generated/thread_scaling.pickle')
100%|██████████████████████████████████████████████████████████████████████████████| 600/600 [6:50:00<00:00, 41.00s/it]

Results

In this section we plot the results from the branch-detection sweep. The data-files are read in again so that the figures can be re-created without running the sweep.

[9]:
params = pd.read_pickle('./data/generated/threading_comparison_datasets.pickle')
sweep = pd.read_pickle('./data/generated/thread_scaling.pickle')
repeats = np.arange(len(sweep.run_times[0]))
sweep['repeats'] = [repeats for _ in range(sweep.shape[0])]
sweep = sweep.explode(['run_times', 'repeats'])

The speed up is computed:

[ ]:
pivotted = pd.pivot(sweep,
         index=[
            'num_dims', 'num_clusters', 'num_walks', 'num_points',
            'branch_detection_method', 'override_clusters', 'repeats'
        ],
         columns='enable_threading',
         values='run_times'
)
one_job = pivotted[False].to_numpy()
multi_jobs = pivotted[True]
speedup = (one_job / multi_jobs).reset_index()
speedup = speedup.rename(columns={
    'repeat': 'Repeat',
    'num_dims': 'Num dimensions',
    'num_clusters': 'Num clusters',
    'num_walks': 'Num walks',
    'num_points': 'Num points',
    'branch_detection_method': 'Branch detection',
    'override_clusters': 'Override clusters',
    True: 'Speedup'
})

The figures below show the speedup for the different parameter combinations:

[13]:
g = sns.FacetGrid(
    speedup,
    row='Branch detection',
    col="Override clusters",
    row_order=['core', 'full'],
    col_order=[False, True]
)
g.map_dataframe(
    sns.lineplot,
    x='Num points',
    y='Speedup',
    hue='Num walks',
    style='Num dimensions',
    palette='tab10'
)
g.set_titles('{row_name} | Override {col_name}')
g.set(ylim=(0, 3))
# plt.xscale('log')
size_fig(1, 1)
plt.subplots_adjust(hspace=0.2, wspace=0.2)
# plt.savefig('./images/threading_best_size.png')
plt.show()
_images/Comparison_multi_processing_52_0.png

This is a reasonable result. There is a noticable speedup for large datasets. For smaller datasets with more dimensions, there is no slowdown. Manually enabling multi-processing would result in more speedups. There is some variation in speedup at smaller datasets, for with no multi-processing occurs. This is likely due to background tasks or other interference.

The raw values are shown below:

[12]:
speedup.groupby(by=[
    'Num clusters', 'Num walks', 'Override clusters', 'Branch detection', 'Num dimensions'
]).Speedup.mean().reset_index().pivot(
    index=['Branch detection', 'Override clusters', 'Num walks', 'Num dimensions'],
    values='Speedup',
    columns='Num clusters'
)
[12]:
Num clusters 2.0 4.0 8.0 15.0 29.0 56.0 109.0 211.0 411.0 800.0
Branch detection Override clusters Num walks Num dimensions
core False 5 2 1.048687 0.998321 0.991910 1.004897 0.990731 1.027229 0.994402 0.991903 1.000755 1.343989
8 1.002823 0.898074 0.988875 1.011377 1.008605 0.998200 0.998690 1.000843 1.002410 1.247001
16 1.001668 0.998073 0.998362 1.021311 1.000981 1.000242 1.001140 1.000171 0.998409 1.056936
10 2 0.994004 0.775257 0.986888 1.000605 1.053610 1.005856 1.009160 1.000453 NaN NaN
8 0.984689 1.014262 0.997846 0.997620 0.998605 1.001291 1.004902 0.997786 NaN NaN
16 1.002216 1.000690 0.998147 0.883028 0.998437 0.998638 1.002251 1.000784 NaN NaN
20 2 0.999919 0.996704 1.036251 0.995041 0.998312 0.993591 1.000946 NaN NaN NaN
8 1.015203 1.015858 1.000028 0.997828 1.003294 1.000881 1.005032 NaN NaN NaN
16 1.001609 1.000310 1.130976 0.891607 1.000934 0.998499 1.000345 NaN NaN NaN
True 5 2 0.858018 1.025632 1.000907 0.993969 1.003129 1.006935 0.993899 0.979987 0.986470 1.449443
8 0.969761 1.016629 0.998081 0.988468 1.008540 0.999815 0.995208 1.001120 0.998286 1.993373
16 0.982635 1.006680 1.000523 1.015217 0.997253 1.000820 0.999782 1.001271 0.999376 1.272039
10 2 0.981934 0.622268 1.002810 1.017507 0.998458 0.993172 0.999530 0.998511 NaN NaN
8 1.008657 0.994673 0.996742 0.988056 1.001082 1.001694 0.999746 1.005091 NaN NaN
16 0.998010 1.004363 1.024920 0.935080 0.999444 0.998022 1.004893 0.995623 NaN NaN
20 2 0.971642 0.984365 1.008843 1.011050 1.001648 1.000334 0.998262 NaN NaN NaN
8 0.965328 1.002476 0.999736 0.998644 1.003580 1.002242 0.999195 NaN NaN NaN
16 0.994387 1.018864 1.067334 0.810977 0.996211 1.005306 1.001005 NaN NaN NaN
full False 5 2 0.902959 0.936036 1.026920 1.024985 0.988197 0.993021 0.999647 0.999481 0.999064 1.675548
8 0.987607 0.998359 0.995219 1.000916 1.001613 1.000171 0.998957 1.000638 0.999232 1.377506
16 1.007962 0.998622 0.999076 0.980647 1.000885 0.998939 1.000502 0.999154 1.000504 1.175418
10 2 1.003720 0.995133 1.026779 0.999799 1.003259 1.011203 0.997672 1.002850 NaN NaN
8 0.995194 0.997441 0.978555 1.003955 0.997489 0.999588 1.000746 0.997281 NaN NaN
16 0.992229 0.999902 1.025964 1.148526 1.002297 0.998628 0.998925 0.998720 NaN NaN
20 2 1.040651 0.994715 0.998396 0.954799 1.000177 1.000735 1.072320 NaN NaN NaN
8 1.002801 0.996368 1.000081 1.017886 0.993969 0.999157 1.001497 NaN NaN NaN
16 0.984919 0.997827 1.074300 0.997348 1.001499 1.000763 0.999747 NaN NaN NaN
True 5 2 0.991949 0.979542 0.999035 0.955455 1.006953 0.982647 1.003248 0.994500 0.999074 1.995182
8 0.996766 1.003658 1.003242 0.998856 1.005461 1.003568 0.998295 0.999856 0.999647 2.218780
16 0.962467 0.998671 1.004725 0.998648 1.002510 1.000000 1.002248 1.001481 0.981688 1.672515
10 2 1.062156 0.913745 0.985154 1.007887 0.996800 1.011242 1.000094 0.995252 NaN NaN
8 1.004948 0.995675 0.999816 0.988373 1.002524 0.998651 1.003883 0.998305 NaN NaN
16 0.983020 0.969100 1.002134 1.106752 1.005494 0.997581 0.999471 0.998845 NaN NaN
20 2 1.027089 1.013243 1.002259 0.984969 1.003379 0.997785 0.998740 NaN NaN NaN
8 1.009391 0.996882 1.004752 1.000431 0.994762 1.001176 0.997090 NaN NaN NaN
16 0.991513 0.998886 1.098382 0.989049 0.997396 0.997037 1.002327 NaN NaN NaN