# CLI for single-cell analyses

This repository provides a no-frills command-line interface for single-cell RNA-seq data analysis from a Matrix Market file.

It is mostly intended for testing performance of the underlying C++ libraries without needing to fiddle with data analysis frameworks like R or Python.

To build, just:

```
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
```

This will produce a `scran`

executable in `build`

, which can be run on the command line:

```
./build/scran --help
## Single-cell RNA-seq analyses on the command-line
## Usage: ./build/scran [OPTIONS] path
##
## Positionals:
## path TEXT REQUIRED Path to the Matrix Market file
##
## Options:
## -h,--help Print this help message and exit
## -t,--nthreads FLOAT=1 Number of threads to use (+2 for UMAP and t-SNE, which use their own threads)
## -o,--output TEXT=output Path to the output directory
## --skip-output BOOLEAN=0 Run the analysis but do not save results
## --qc-nmads FLOAT=3 Number of MADs to use for filtering
## --hvg-span FLOAT=0.4 LOWESS span for variance modelling
## --hvg-num INT=2500 Number of HVGs to use for PCA
## --pca-num INT=25 Number of PCs to keep
## --nn-approx BOOLEAN=1 Whether to use an approximate neighbor search
## --snn-neighbors INT=10 Number of neighbors to use for the SNN graph
## --snn-scheme ENUM:value in {jaccard->2,number->1,ranked->0} OR {2,1,0}=0
## Edge weighting scheme: ranked, number or jaccard
## --snn-res FLOAT=1 Resolution to use in multi-level community detection
## --tsne-perplexity FLOAT=30 Perplexity to use in t-SNE
## --tsne-iter INT=500 Number of iterations to use in t-SNE
## --umap-neighbors INT=15 Number of neighbors to use in the UMAP
## --umap-mindist FLOAT=0.01 Minimum distance to use in the UMAP
## --umap-epochs INT=500 Number of epochs to use in the UMAP
```

To illustrate, we’ll use the Bach mammary dataset (25k cells) here.

Running with 8 threads and omitting the output, we can do:

```
time ./build/scran -t 8 --skip-output matrix.mtx.gz
## Initializing matrix... 8.762s
## Computing QC metrics... 0.026s
## Computing QC thresholds... 0.002s
## Filtering cells... 0s
## Log-normalizing the counts... 0s
## Mean-variance modelling... 0.251s
## Principal components analysis... 12.245s
## Building the neighbor index... 1.142s
## Finding neighbors for clustering... 0.347s
## Finding neighbors for t-SNE... 1.558s
## Finding neighbors for UMAP... 0.418s
## SNN graph construction... 0.068s
## Multi-level clustering... 4.57s
## Marker detection... 3.314s
## UMAP calculation... 21.141s
## t-SNE calculation... 28.357s
##
## real 0m53.146s
## user 1m55.276s
## sys 0m0.865s
```

If we were to keep the output (which is the default behavior), we would get a directory at the specified `output`

path.

This contains QC metrics, variance modelling results, PCA coordinates, cluster assignments, UMAP/t-SNE values and marker statistics for each cluster.