Indices and tables¶
Installation and logistics¶
Installation¶
Available via pip:
pip install hypercluster
Or bioconda:
conda install hypercluster
# or
conda install -c conda-forge -c bioconda hypercluster
If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended.
To check channel priority: conda config --get channels
It should look like:
--add channels 'defaults' # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge' # highest priority
If it doesn’t look like that, try:
conda config --add channels bioconda
conda config --add channels conda-forge
Quick reference for clustering and evaluation¶
Clusterer |
Type |
---|---|
KMeans/MiniBatch KMeans |
Partitioner |
Affinity Propagation |
Partitioner |
Mean Shift |
Partitioner |
DBSCAN |
Clusterer |
OPTICS |
Clusterer |
Birch |
Partitioner |
OPTICS |
Clusterer |
HDBSCAN |
Clusterer |
NMF |
Partitioner |
LouvainCluster |
Partitioner |
LeidenCluster |
Partitioner |
Metric |
Type |
---|---|
adjusted_rand_score |
Needs ground truth |
adjusted_mutual_info_score |
Needs ground truth |
homogeneity_score |
Needs ground truth |
completeness_score |
Needs ground truth |
fowlkes_mallows_score |
Needs ground truth |
mutual_info_score |
Needs ground truth |
v_measure_score |
Needs ground truth |
silhouette_score |
Inherent metric |
calinski_harabasz_score |
Inherent metric |
davies_bouldin_score |
Inherent metric |
smallest_largest_clusters_ratio |
Inherent metric |
number_of_clusters |
Inherent metric |
smallest_cluster_size |
Inherent metric |
largest_cluster_size |
Inherent metric |
Quickstart and examples¶
With snakemake:¶
snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=.
With python:¶
import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster
data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')
# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
gold_standard = labels
)
clusterer.visualize_evaluations()
# With a range of algorithms
clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
gold_standard = labels
)
clusterer.visualize_evaluations()
Example work flows for both python and snakemake are here
Source code is available here