Get the most out of your single cell data.
Explore the docs »
View
Demo ·
Report
Bug ·
Request
Feature
SignacX is software developed by the Savova lab at Sanofi with a focus on single cell genomics for clinical applications. SignacX classifies the cellular phenotype for each individual cell in single cell RNA-sequencing data using neural networks trained with sorted bulk gene expression data from the Human Primary Cell Atlas. In this R implementation, we provide functions and vignettes that demonstrate how to: integrate single cell data (mapping cells from one data set to another), classify non-human data, identify novel cell types, and classify single cell data across many tissues, diseases and technologies. To learn more, check out the pre-print here.
Here, we provide interactive access to data from the pre-print with SPRING Viewer. Just click the “Explore” links below, and search your favorite gene:
Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version |
---|---|---|---|---|---|---|
Explore | Kidney | Cancer | 48,037 | 47 | Stewart et al. 2019 | v2.0.7 |
Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi et al. 2019 | v2.0.7 |
Explore | Lung | Cancer | 42,844 | 18 | Zilionis et al. 2020 | v2.0.7 |
Explore | Lung | Fibrosis | 96,461 | 31 | Habermann et al. 2020 | v2.0.7 |
Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman et al. 2019 | v2.0.7 |
Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlain et al. 2021 | v2.0.7 |
Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlain et al. 2021 | v2.0.7 |
Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlain et al. 2021 | v2.0.7 |
Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis et al. 2020 | v2.0.7 |
Explore | PBMCs | Healthy | 7,902 | 1 | 10X Genomics | v2.0.7 |
Explore | PBMCs | Healthy | 4,784 | 1 | 10X Genomics | v2.0.7 |
Explore | Skin | Atopic dermatitis | 36,690 | 17 | He et al. 2020 | v2.0.7 |
Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhang et. al 2019 | v2.0.7 |
Note: * Cell type annotations are provided at four levels (immune, celltypes, cellstates and novel celltypes). * When available, we also provided information about sample covariates (i.e., disease, age, gender, FACs etc.). * Cell type annotations for all 13 data sets were generated with the Signac function with the default settings without changing any settings or parameters.
Special thanks to Allon Klein’s lab (particularly Caleb Weinreb and Sam Wolock) for hosting the data.
To install SignacX in R, simply do:
install.packages("SignacX")
The main functions in Signac are:
# load the library
library(SignacX)
# Generate initial labels
= Signac(E = your_data_here)
labels
# Get cell type labels
= GenerateLabels(labels, E = your_data_here) celltypes
Sometimes we don’t have time to run Signac, and need a quick solution. Although Signac scales fine with large data sets (>300,000 cells), we developed SignacFast to quickly classify single cell data:
# load the library
library(SignacX)
# generate labels with pre-trained model
<- SignacFast(E = your_data_here, num.cores = 4)
labels_fast = GenerateLabels(labels_fast, E = your_data_here) celltypes_fast
To make life easier, SignacX was integrated with Seurat (versions 3 and 4), and with SPRING. We provide a few vignettes:
In the pre-print, we often used Signac integrated with SPRING. To reproduce our findings and to generate new results with SPRING, please visit the SPRING repository which has example notebooks and installation instructions, particularly for processing CITE-seq and scRNA-seq data from 10X Genomics. Briefly, Signac is integrated seamlessly with the output files of SPRING in R, requiring only a few functions:
# load the Signac library
library(SignacX)
# dir points to the "FullDataset_v1" directory generated by the SPRING Jupyter notebook
= "./FullDataset_v1"
dir
# load the expression data
= CID.LoadData(dir)
E
# generate cellular phenotype labels
= Signac(E, spring.dir = dir)
labels = GenerateLabels(labels, E = E, spring.dir = dir)
celltypes
# write cell types and Louvain clusters to SPRING
<- CID.writeJSON(celltypes, spring.dir = dir) dat
After running the above functions, cellular phenotypes and Louvain clusters are ready to be visualized with SPRING Viewer, which can be setup locally as described here.
Another way to use Signac is with Seurat. In this vignette, we performed multi-modal analysis of CITE-seq PBMCs from 10X Genomics using Signac integrated with Seurat.
Note: * This same data set was also processed using SPRING in this notebook, and subsequently classified with Signac, which was used to generate SPRING layouts for these data in the pre-print (Figures 2-4), which is available for interactive exploration here.
Sometimes, we have single cell genomics data with disease information, and we want to know which cellular phenotypes are enriched for disease. In this vignette, we applied Signac to classify cellular phenotypes in healthy and lupus nephritis kidney cells, and then we used MASC to identify which cellular phenotypes were disease-enriched.
Note: * MASC typically requires equal numbers of cells and samples between case and control: an unequal number might skew the clustering of cells towards one sample (i.e., a “batch effect”), which could cause spurious disease enrichment in the mixed effect model. Since Signac classifies each cell independently (without using clusters), Signac annotations can be used with MASC without a priori balancing samples or cells, unlike cluster-based annotation methods.
In Supplemental Figure 8 of the pre-print, we classified single cell data for a model organism (cynomolgus monkey) for which flow-sorted datasets were generally lacking without any additional species-specific training. Instead, we mapped homologous genes from the Macaca fascicularis genome to the human genome in the single cell data, and then performed cell type classification with Signac. We demonstrate how we mapped the gene symbols here.
Note: * This code can be used for to identify homologous genes between any two species. * Monkey data used in Supplemental Figure 8 are available for interactive exploration in the table listed above.
In Figure 6 of the pre-print, we compiled data from three source (CellPhoneDB, GWAS catalog and Fang et al. 2020) to find genes of immunological / pharmacological interest. These genes and their annotations can be accessed internally from within Signac:
# load the library
library(SignacX)
# See ?Genes_Of_Interest
data("Genes_Of_Interest")
In Figure 4 of the pre-print, we demonstrated that Signac mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. Here, we provide a vignette for reproducing this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model (Note: the CD56bright NK cells appear in the “CellStates” annotation layer as red cells).
Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version |
---|---|---|---|---|---|---|
Explore | Kidney | Cancer | 48,037 | 47 | Stewart et al. 2019 | v2.0.7 + CD56bright NK |
Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi et al. 2019 | v2.0.7 + CD56bright NK |
Explore | Lung | Cancer | 42,844 | 18 | Zilionis et al. 2020 | v2.0.7 + CD56bright NK |
Explore | Lung | Fibrosis | 96,461 | 31 | Habermann et al. 2020 | v2.0.7 + CD56bright NK |
Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman et al. 2019 | v2.0.7 + CD56bright NK |
Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK |
Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK |
Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlain et al. 2021 | v2.0.7 + CD56bright NK |
Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis et al. 2020 | v2.0.7 + CD56bright NK |
Explore | PBMCs | Healthy | 4,784 | 1 | 10X Genomics | v2.0.7 + CD56bright NK |
Explore | Skin | Atopic dermatitis | 36,690 | 17 | He et al. 2020 | v2.0.7 + CD56bright NK |
Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhang et. al 2019 | v2.0.7 + CD56bright NK |
Sometimes we don’t have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data:
# load the library
library(SignacX)
# generate labels with pre-trained model
<- SignacFast(E = your_data_here, num.cores = 4)
labels_fast = GenerateLabels(labels_fast, E = your_data_here) celltypes_fast
Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classsification time ~5-10x fold. These models were generated from the HPCA training data like so:
# load the library
library(SignacX)
# load pre-trained neural network ensemble model
= GetTrainingData_HPCA()
ref
# generate models
= ModelGenerator(R = training_HPCA, N = 100, num.cores = 4) Models_HPCA
The “Models_HPCA” are accessed from within the R package:
# load the library
library(SignacX)
# load pre-trained neural network ensemble model
= GetModels_HPCA() Models
We demonstrate how to use SignacFast in this vignette, which shows that the results are broadly consistent with running Signac.
Note: * For proper use; if the concern is only major cell types (i.e., TNK and MPh cells), then SignacFast is a fine alternative to Signac.
In Figure 2-3 of the pre-print, we validated Signac with CITE-seq PBMCs. Here, we reproduced that analysis with SPRING (in this vignette; as was performed in the pre-print) and additionally with Seurat (in this vignette), and provide interactive access to the data here.
In Figure 3 of the pre-print, we validated Signac with flow cytometry and compared Signac to SingleR. We reproduced that analysis using Seurat in this vignette, and provide interactive access to the data here.
In Table 1 of the pre-print, we benchmarked Signac across seven different technologies: CEL-seq, Drop-Seq, inDrop, 10X (v2), 10X (v3), Seq-Well and Smart-Seq2; this analysis was reproduced here.
See the open issues for a list of proposed features (and known issues).
Any contributions you make are greatly appreciated.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)You can also open a pull request to commit to the master branch.
Distributed under the GPL v3.0 License. See LICENSE
for
more information.
Mathew Chamberlain - chamberlainphd@gmail.com
Project Link: https://github.com/mathewchamberlain/SignacX