This vignette shows how to use the massiveGST package.
Get the stable release from CRAN
install.packages("massiveGST")
On going bug correction and future improvements are in github.
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
::install("stefanoMP/massiveGST") BiocManager
suppressPackageStartupMessages(library(massiveGST, quietly = TRUE))
In any cases, the names of the list have to match the gene names in any of their coding (gene-symbol, entrez, ensembl, etc.). To explain some details of the functions, we assume that positive values of the gene-profile are up-regulated in the treatment group, while the negative ones are up-regulated in the control group. In other words, the positive genes are associated with the treatment group, while the others are associated with the control samples.
Here, we consider the working gene-profile from Frattini et al, 2018, stored as external data in the package.
<- system.file("extdata", package="massiveGST")
fname <- file.path(fname, "pre_ranked_list.txt")
fname <- get_geneProfile(fname)
geneProfile class(geneProfile)
## [1] "numeric"
head(geneProfile)
## WASF3 VAV3 PFKM LGR6 SLC6A11 KIAA1147
## 4.819656 4.332110 4.244893 4.182461 4.088992 4.045543
tail(geneProfile)
## SLC47A2 STON1 DKFZp434K191 PLEKHO1 PDE6B IFNGR2
## -2.728522 -2.748892 -2.881312 -2.942627 -3.141299 -3.158738
msigdbr package is an R distribution of the most recent release of the MSigDB collection of gene-sets. The function get_geneSets_from_msigdbr is a wrapper allowing the creation of a data structure for following gene-set enrichment analysis. Function parameters are those of the msigdbr function, except for ‘what’, allowing to choose the coding of the gene names.
This example extracts gene-sets where the gene-names are the gene symbols. Other possibilities have been specified in the ‘man’ fo the functions, and essentially they are the names of the columns of the data frame retrieved by the msigdbr function.
<- get_geneSets_from_msigdbr(category = "H", what = "gene_symbol") geneSets
## msigdbr: R package version 7.5.1
class(geneSets)
## [1] "list"
head(names(geneSets))
## [1] "HALLMARK_ADIPOGENESIS" "HALLMARK_ALLOGRAFT_REJECTION"
## [3] "HALLMARK_ANDROGEN_RESPONSE" "HALLMARK_ANGIOGENESIS"
## [5] "HALLMARK_APICAL_JUNCTION" "HALLMARK_APICAL_SURFACE"
Optionally, the gene-sets collection can be stored in gmt formatted file
<- file.path(tempdir(), "hallmarks.gmt")
fname write_geneSets_to_gmt(geneSets, fileName = fname)
and retrieved with the code
<- file.path(tempdir(), "hallmarks.gmt")
fname <- get_geneSets_from_local_files(fname)
tmp
class(geneSets)
## [1] "list"
head(names(geneSets))
## [1] "HALLMARK_ADIPOGENESIS" "HALLMARK_ALLOGRAFT_REJECTION"
## [3] "HALLMARK_ANDROGEN_RESPONSE" "HALLMARK_ANGIOGENESIS"
## [5] "HALLMARK_APICAL_JUNCTION" "HALLMARK_APICAL_SURFACE"
system.time({ans <- massiveGST(geneProfile, geneSets, alternative = "two.sided")})
## user system elapsed
## 0.040 0.011 0.053
class(ans)
## [1] "mGST" "data.frame"
1:6,] ans[
## size actualSize NES odd logit2NES
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866418 1.419209 0.5050872
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907051 2.233160 1.1590869
## HALLMARK_ADIPOGENESIS 200 176 0.5946149 1.466790 0.5526627
## HALLMARK_MITOTIC_SPINDLE 199 187 0.5508146 1.226252 0.2942556
## HALLMARK_BILE_ACID_METABOLISM 112 112 0.5856305 1.413305 0.4990726
## HALLMARK_FATTY_ACID_METABOLISM 158 148 0.5649870 1.298782 0.3771593
## abs_logit2NES p.value BH.value
## HALLMARK_ESTROGEN_RESPONSE_EARLY 0.5050872 5.120719e-05 2.560360e-04
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 1.1590869 2.230464e-18 2.788080e-17
## HALLMARK_ADIPOGENESIS 0.5526627 1.514858e-05 8.415875e-05
## HALLMARK_MITOTIC_SPINDLE 0.2942556 1.664775e-02 4.380987e-02
## HALLMARK_BILE_ACID_METABOLISM 0.4990726 1.752278e-03 6.226110e-03
## HALLMARK_FATTY_ACID_METABOLISM 0.3771593 6.386537e-03 1.995793e-02
## B.value relevance
## HALLMARK_ESTROGEN_RESPONSE_EARLY 2.560360e-03 59
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 1.115232e-16 59
## HALLMARK_ADIPOGENESIS 7.574288e-04 56
## HALLMARK_MITOTIC_SPINDLE 8.323875e-01 52
## HALLMARK_BILE_ACID_METABOLISM 8.761391e-02 48
## HALLMARK_FATTY_ACID_METABOLISM 3.193268e-01 48
The analysis result is essentially a data frame; the first class ‘mGST’ allows hooking other methods to handle the output.
The results can be saved in tab-separated value (tsv) format or an XLS format file.
<- file.path(tempdir(), "massiveGST_results.tsv")
fname save_as_tsv(ans, file_name = fname)
<- file.path(tempdir(), "massiveGST_results.xls")
fname save_as_xls(ans, file_name = fname)
The summary function allows customizing the result of the analysis.
summary(ans)[1:10,]
## size actualSize NES logit2NES p.value
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_MITOTIC_SPINDLE 199 187 0.5508 0.2943 1.7e-02
## HALLMARK_BILE_ACID_METABOLISM 112 112 0.5856 0.4991 1.8e-03
## HALLMARK_FATTY_ACID_METABOLISM 158 148 0.5650 0.3772 6.4e-03
## HALLMARK_MTORC1_SIGNALING 200 188 0.5372 0.2152 7.9e-02
## HALLMARK_G2M_CHECKPOINT 200 178 0.5475 0.2748 2.9e-02
## HALLMARK_UV_RESPONSE_DN 144 132 0.5528 0.3059 3.6e-02
## HALLMARK_HEME_METABOLISM 200 180 0.5325 0.1877 1.3e-01
## BH.value B.value relevance
## HALLMARK_ESTROGEN_RESPONSE_EARLY 2.6e-04 2.6e-03 59.0
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 2.8e-17 1.1e-16 59.0
## HALLMARK_ADIPOGENESIS 8.4e-05 7.6e-04 56.0
## HALLMARK_MITOTIC_SPINDLE 4.4e-02 8.3e-01 52.0
## HALLMARK_BILE_ACID_METABOLISM 6.2e-03 8.8e-02 48.0
## HALLMARK_FATTY_ACID_METABOLISM 2.0e-02 3.2e-01 48.0
## HALLMARK_MTORC1_SIGNALING 1.6e-01 1.0e+00 47.0
## HALLMARK_G2M_CHECKPOINT 6.9e-02 1.0e+00 46.5
## HALLMARK_UV_RESPONSE_DN 8.2e-02 1.0e+00 42.0
## HALLMARK_HEME_METABOLISM 2.5e-01 1.0e+00 41.0
With it:
summary(ans, order_by = "NES")[1:10,]
## size actualSize NES logit2NES p.value
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
## HALLMARK_BILE_ACID_METABOLISM 112 112 0.5856 0.4991 1.8e-03
## HALLMARK_FATTY_ACID_METABOLISM 158 148 0.5650 0.3772 6.4e-03
## HALLMARK_NOTCH_SIGNALING 32 30 0.5572 0.3316 2.8e-01
## HALLMARK_UV_RESPONSE_DN 144 132 0.5528 0.3059 3.6e-02
## HALLMARK_MITOTIC_SPINDLE 199 187 0.5508 0.2943 1.7e-02
## HALLMARK_G2M_CHECKPOINT 200 178 0.5475 0.2748 2.9e-02
## HALLMARK_PROTEIN_SECRETION 96 93 0.5459 0.2656 1.3e-01
## BH.value B.value
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 2.8e-17 1.1e-16
## HALLMARK_ADIPOGENESIS 8.4e-05 7.6e-04
## HALLMARK_ESTROGEN_RESPONSE_EARLY 2.6e-04 2.6e-03
## HALLMARK_BILE_ACID_METABOLISM 6.2e-03 8.8e-02
## HALLMARK_FATTY_ACID_METABOLISM 2.0e-02 3.2e-01
## HALLMARK_NOTCH_SIGNALING 4.2e-01 1.0e+00
## HALLMARK_UV_RESPONSE_DN 8.2e-02 1.0e+00
## HALLMARK_MITOTIC_SPINDLE 4.4e-02 8.3e-01
## HALLMARK_G2M_CHECKPOINT 6.9e-02 1.0e+00
## HALLMARK_PROTEIN_SECRETION 2.4e-01 1.0e+00
summary(ans, order_by = "p.value")[1:10,]
## size actualSize NES logit2NES p.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 200 188 0.2472 -1.6070 6.8e-33
## HALLMARK_INTERFERON_ALPHA_RESPONSE 97 87 0.1688 -2.3000 1.3e-26
## HALLMARK_ALLOGRAFT_REJECTION 200 189 0.2816 -1.3512 4.4e-25
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_INFLAMMATORY_RESPONSE 200 192 0.3517 -0.8825 1.4e-12
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 200 186 0.3554 -0.8592 1.1e-11
## HALLMARK_IL6_JAK_STAT3_SIGNALING 87 83 0.3064 -1.1789 1.1e-09
## HALLMARK_COMPLEMENT 200 190 0.3869 -0.6643 7.8e-08
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
## BH.value B.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 3.4e-31 3.4e-31
## HALLMARK_INTERFERON_ALPHA_RESPONSE 3.3e-25 6.6e-25
## HALLMARK_ALLOGRAFT_REJECTION 7.3e-24 2.2e-23
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 2.8e-17 1.1e-16
## HALLMARK_INFLAMMATORY_RESPONSE 1.4e-11 7.2e-11
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 8.9e-11 5.3e-10
## HALLMARK_IL6_JAK_STAT3_SIGNALING 7.7e-09 5.4e-08
## HALLMARK_COMPLEMENT 4.9e-07 3.9e-06
## HALLMARK_ADIPOGENESIS 8.4e-05 7.6e-04
## HALLMARK_ESTROGEN_RESPONSE_EARLY 2.6e-04 2.6e-03
summary(ans, order_by = "bonferroni")[1:10,]
## size actualSize NES logit2NES p.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 200 188 0.2472 -1.6070 6.8e-33
## HALLMARK_INTERFERON_ALPHA_RESPONSE 97 87 0.1688 -2.3000 1.3e-26
## HALLMARK_ALLOGRAFT_REJECTION 200 189 0.2816 -1.3512 4.4e-25
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_INFLAMMATORY_RESPONSE 200 192 0.3517 -0.8825 1.4e-12
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 200 186 0.3554 -0.8592 1.1e-11
## HALLMARK_IL6_JAK_STAT3_SIGNALING 87 83 0.3064 -1.1789 1.1e-09
## HALLMARK_COMPLEMENT 200 190 0.3869 -0.6643 7.8e-08
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
## BH.value B.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 3.4e-31 3.4e-31
## HALLMARK_INTERFERON_ALPHA_RESPONSE 3.3e-25 6.6e-25
## HALLMARK_ALLOGRAFT_REJECTION 7.3e-24 2.2e-23
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 2.8e-17 1.1e-16
## HALLMARK_INFLAMMATORY_RESPONSE 1.4e-11 7.2e-11
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 8.9e-11 5.3e-10
## HALLMARK_IL6_JAK_STAT3_SIGNALING 7.7e-09 5.4e-08
## HALLMARK_COMPLEMENT 4.9e-07 3.9e-06
## HALLMARK_ADIPOGENESIS 8.4e-05 7.6e-04
## HALLMARK_ESTROGEN_RESPONSE_EARLY 2.6e-04 2.6e-03
<- summary(ans, order_by = "p.value", cols_to_remove = c("BH.value", "B.value"))[1:10,]) (tmp
## size actualSize NES logit2NES p.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 200 188 0.2472 -1.6070 6.8e-33
## HALLMARK_INTERFERON_ALPHA_RESPONSE 97 87 0.1688 -2.3000 1.3e-26
## HALLMARK_ALLOGRAFT_REJECTION 200 189 0.2816 -1.3512 4.4e-25
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_INFLAMMATORY_RESPONSE 200 192 0.3517 -0.8825 1.4e-12
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 200 186 0.3554 -0.8592 1.1e-11
## HALLMARK_IL6_JAK_STAT3_SIGNALING 87 83 0.3064 -1.1789 1.1e-09
## HALLMARK_COMPLEMENT 200 190 0.3869 -0.6643 7.8e-08
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
summary(ans, as.formattable = TRUE)
size | actualSize | NES | logit2NES | p.value | BH.value | B.value | relevance | |
---|---|---|---|---|---|---|---|---|
HALLMARK_ESTROGEN_RESPONSE_EARLY | 200 | 184 | 0.5866 | 0.5051 | 5.1e-05 | 2.6e-04 | 2.6e-03 | 59.0 |
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 200 | 177 | 0.6907 | 1.1591 | 2.2e-18 | 2.8e-17 | 1.1e-16 | 59.0 |
HALLMARK_ADIPOGENESIS | 200 | 176 | 0.5946 | 0.5527 | 1.5e-05 | 8.4e-05 | 7.6e-04 | 56.0 |
HALLMARK_MITOTIC_SPINDLE | 199 | 187 | 0.5508 | 0.2943 | 1.7e-02 | 4.4e-02 | 8.3e-01 | 52.0 |
HALLMARK_BILE_ACID_METABOLISM | 112 | 112 | 0.5856 | 0.4991 | 1.8e-03 | 6.2e-03 | 8.8e-02 | 48.0 |
HALLMARK_FATTY_ACID_METABOLISM | 158 | 148 | 0.5650 | 0.3772 | 6.4e-03 | 2.0e-02 | 3.2e-01 | 48.0 |
HALLMARK_MTORC1_SIGNALING | 200 | 188 | 0.5372 | 0.2152 | 7.9e-02 | 1.6e-01 | 1.0e+00 | 47.0 |
HALLMARK_G2M_CHECKPOINT | 200 | 178 | 0.5475 | 0.2748 | 2.9e-02 | 6.9e-02 | 1.0e+00 | 46.5 |
HALLMARK_UV_RESPONSE_DN | 144 | 132 | 0.5528 | 0.3059 | 3.6e-02 | 8.2e-02 | 1.0e+00 | 42.0 |
HALLMARK_HEME_METABOLISM | 200 | 180 | 0.5325 | 0.1877 | 1.3e-01 | 2.5e-01 | 1.0e+00 | 41.0 |
HALLMARK_MYC_TARGETS_V1 | 200 | 178 | 0.5228 | 0.1319 | 2.9e-01 | 4.3e-01 | 1.0e+00 | 33.5 |
HALLMARK_PROTEIN_SECRETION | 96 | 93 | 0.5459 | 0.2656 | 1.3e-01 | 2.4e-01 | 1.0e+00 | 32.0 |
HALLMARK_NOTCH_SIGNALING | 32 | 30 | 0.5572 | 0.3316 | 2.8e-01 | 4.2e-01 | 1.0e+00 | 29.0 |
HALLMARK_ESTROGEN_RESPONSE_LATE | 200 | 193 | 0.5037 | 0.0215 | 8.6e-01 | 8.9e-01 | 1.0e+00 | 28.0 |
HALLMARK_UNFOLDED_PROTEIN_RESPONSE | 113 | 101 | 0.5285 | 0.1647 | 3.2e-01 | 4.6e-01 | 1.0e+00 | 26.0 |
HALLMARK_E2F_TARGETS | 200 | 171 | 0.5113 | 0.0651 | 6.1e-01 | 6.9e-01 | 1.0e+00 | 22.0 |
HALLMARK_PEROXISOME | 104 | 100 | 0.5181 | 0.1047 | 5.3e-01 | 6.2e-01 | 1.0e+00 | 20.0 |
HALLMARK_WNT_BETA_CATENIN_SIGNALING | 42 | 41 | 0.5308 | 0.1782 | 4.9e-01 | 6.0e-01 | 1.0e+00 | 20.0 |
HALLMARK_ANGIOGENESIS | 36 | 36 | 0.5321 | 0.1856 | 5.0e-01 | 6.0e-01 | 1.0e+00 | 19.0 |
HALLMARK_ANDROGEN_RESPONSE | 100 | 91 | 0.5133 | 0.0765 | 6.6e-01 | 7.2e-01 | 1.0e+00 | 14.0 |
HALLMARK_PI3K_AKT_MTOR_SIGNALING | 105 | 99 | 0.5016 | 0.0090 | 9.6e-01 | 9.8e-01 | 1.0e+00 | 11.0 |
HALLMARK_CHOLESTEROL_HOMEOSTASIS | 74 | 70 | 0.5008 | 0.0049 | 9.8e-01 | 9.8e-01 | 1.0e+00 | 6.0 |
HALLMARK_MYC_TARGETS_V2 | 58 | 50 | 0.4820 | -0.1042 | 6.6e-01 | 7.2e-01 | 1.0e+00 | -9.0 |
HALLMARK_HEDGEHOG_SIGNALING | 36 | 35 | 0.4661 | -0.1958 | 4.9e-01 | 6.0e-01 | 1.0e+00 | -11.0 |
HALLMARK_APICAL_SURFACE | 44 | 41 | 0.4641 | -0.2073 | 4.3e-01 | 5.5e-01 | 1.0e+00 | -16.0 |
HALLMARK_PANCREAS_BETA_CELLS | 40 | 40 | 0.4560 | -0.2544 | 3.4e-01 | 4.6e-01 | 1.0e+00 | -21.0 |
HALLMARK_KRAS_SIGNALING_DN | 200 | 181 | 0.4795 | -0.1181 | 3.4e-01 | 4.6e-01 | 1.0e+00 | -25.5 |
HALLMARK_XENOBIOTIC_METABOLISM | 200 | 189 | 0.4930 | -0.0405 | 7.4e-01 | 7.9e-01 | 1.0e+00 | -25.5 |
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY | 49 | 44 | 0.4501 | -0.2888 | 2.5e-01 | 3.9e-01 | 1.0e+00 | -27.0 |
HALLMARK_TGF_BETA_SIGNALING | 54 | 52 | 0.4511 | -0.2829 | 2.2e-01 | 3.7e-01 | 1.0e+00 | -30.0 |
HALLMARK_DNA_REPAIR | 150 | 134 | 0.4646 | -0.2048 | 1.6e-01 | 2.8e-01 | 1.0e+00 | -31.0 |
HALLMARK_SPERMATOGENESIS | 135 | 123 | 0.4582 | -0.2420 | 1.1e-01 | 2.2e-01 | 1.0e+00 | -32.0 |
HALLMARK_P53_PATHWAY | 200 | 181 | 0.4716 | -0.1640 | 1.9e-01 | 3.2e-01 | 1.0e+00 | -32.5 |
HALLMARK_APICAL_JUNCTION | 200 | 186 | 0.4749 | -0.1452 | 2.4e-01 | 3.8e-01 | 1.0e+00 | -33.0 |
HALLMARK_MYOGENESIS | 200 | 195 | 0.4813 | -0.1082 | 3.7e-01 | 4.8e-01 | 1.0e+00 | -36.0 |
HALLMARK_UV_RESPONSE_UP | 158 | 148 | 0.4567 | -0.2504 | 6.9e-02 | 1.5e-01 | 1.0e+00 | -37.0 |
HALLMARK_KRAS_SIGNALING_UP | 200 | 185 | 0.4523 | -0.2764 | 2.5e-02 | 6.3e-02 | 1.0e+00 | -45.0 |
HALLMARK_COAGULATION | 138 | 132 | 0.4100 | -0.5250 | 3.6e-04 | 1.4e-03 | 1.8e-02 | -50.0 |
HALLMARK_APOPTOSIS | 161 | 157 | 0.4170 | -0.4835 | 3.3e-04 | 1.4e-03 | 1.7e-02 | -52.0 |
HALLMARK_GLYCOLYSIS | 200 | 187 | 0.4489 | -0.2961 | 1.6e-02 | 4.4e-02 | 8.0e-01 | -53.0 |
HALLMARK_HYPOXIA | 200 | 186 | 0.4338 | -0.3842 | 1.9e-03 | 6.2e-03 | 9.3e-02 | -55.0 |
HALLMARK_IL2_STAT5_SIGNALING | 199 | 180 | 0.4164 | -0.4872 | 1.1e-04 | 5.0e-04 | 5.5e-03 | -55.0 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 87 | 83 | 0.3064 | -1.1789 | 1.1e-09 | 7.7e-09 | 5.4e-08 | -55.0 |
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION | 200 | 191 | 0.4449 | -0.3191 | 8.7e-03 | 2.6e-02 | 4.4e-01 | -60.0 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 97 | 87 | 0.1688 | -2.3000 | 1.3e-26 | 3.3e-25 | 6.6e-25 | -63.0 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 200 | 186 | 0.3554 | -0.8592 | 1.1e-11 | 8.9e-11 | 5.3e-10 | -66.0 |
HALLMARK_COMPLEMENT | 200 | 190 | 0.3869 | -0.6643 | 7.8e-08 | 4.9e-07 | 3.9e-06 | -69.0 |
HALLMARK_INFLAMMATORY_RESPONSE | 200 | 192 | 0.3517 | -0.8825 | 1.4e-12 | 1.4e-11 | 7.2e-11 | -76.0 |
HALLMARK_INTERFERON_GAMMA_RESPONSE | 200 | 188 | 0.2472 | -1.6070 | 6.8e-33 | 3.4e-31 | 3.4e-31 | -76.0 |
HALLMARK_ALLOGRAFT_REJECTION | 200 | 189 | 0.2816 | -1.3512 | 4.4e-25 | 7.3e-24 | 2.2e-23 | -76.5 |
summary(ans, order_by = "p.value", cols_to_remove = c("BH.value", "B.value"), as.formattable = TRUE)
size | actualSize | NES | logit2NES | p.value | |
---|---|---|---|---|---|
HALLMARK_INTERFERON_GAMMA_RESPONSE | 200 | 188 | 0.2472 | -1.6070 | 6.8e-33 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 97 | 87 | 0.1688 | -2.3000 | 1.3e-26 |
HALLMARK_ALLOGRAFT_REJECTION | 200 | 189 | 0.2816 | -1.3512 | 4.4e-25 |
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 200 | 177 | 0.6907 | 1.1591 | 2.2e-18 |
HALLMARK_INFLAMMATORY_RESPONSE | 200 | 192 | 0.3517 | -0.8825 | 1.4e-12 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 200 | 186 | 0.3554 | -0.8592 | 1.1e-11 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 87 | 83 | 0.3064 | -1.1789 | 1.1e-09 |
HALLMARK_COMPLEMENT | 200 | 190 | 0.3869 | -0.6643 | 7.8e-08 |
HALLMARK_ADIPOGENESIS | 200 | 176 | 0.5946 | 0.5527 | 1.5e-05 |
HALLMARK_ESTROGEN_RESPONSE_EARLY | 200 | 184 | 0.5866 | 0.5051 | 5.1e-05 |
HALLMARK_IL2_STAT5_SIGNALING | 199 | 180 | 0.4164 | -0.4872 | 1.1e-04 |
HALLMARK_APOPTOSIS | 161 | 157 | 0.4170 | -0.4835 | 3.3e-04 |
HALLMARK_COAGULATION | 138 | 132 | 0.4100 | -0.5250 | 3.6e-04 |
HALLMARK_BILE_ACID_METABOLISM | 112 | 112 | 0.5856 | 0.4991 | 1.8e-03 |
HALLMARK_HYPOXIA | 200 | 186 | 0.4338 | -0.3842 | 1.9e-03 |
HALLMARK_FATTY_ACID_METABOLISM | 158 | 148 | 0.5650 | 0.3772 | 6.4e-03 |
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION | 200 | 191 | 0.4449 | -0.3191 | 8.7e-03 |
HALLMARK_GLYCOLYSIS | 200 | 187 | 0.4489 | -0.2961 | 1.6e-02 |
HALLMARK_MITOTIC_SPINDLE | 199 | 187 | 0.5508 | 0.2943 | 1.7e-02 |
HALLMARK_KRAS_SIGNALING_UP | 200 | 185 | 0.4523 | -0.2764 | 2.5e-02 |
HALLMARK_G2M_CHECKPOINT | 200 | 178 | 0.5475 | 0.2748 | 2.9e-02 |
HALLMARK_UV_RESPONSE_DN | 144 | 132 | 0.5528 | 0.3059 | 3.6e-02 |
HALLMARK_UV_RESPONSE_UP | 158 | 148 | 0.4567 | -0.2504 | 6.9e-02 |
HALLMARK_MTORC1_SIGNALING | 200 | 188 | 0.5372 | 0.2152 | 7.9e-02 |
HALLMARK_SPERMATOGENESIS | 135 | 123 | 0.4582 | -0.2420 | 1.1e-01 |
HALLMARK_PROTEIN_SECRETION | 96 | 93 | 0.5459 | 0.2656 | 1.3e-01 |
HALLMARK_HEME_METABOLISM | 200 | 180 | 0.5325 | 0.1877 | 1.3e-01 |
HALLMARK_DNA_REPAIR | 150 | 134 | 0.4646 | -0.2048 | 1.6e-01 |
HALLMARK_P53_PATHWAY | 200 | 181 | 0.4716 | -0.1640 | 1.9e-01 |
HALLMARK_TGF_BETA_SIGNALING | 54 | 52 | 0.4511 | -0.2829 | 2.2e-01 |
HALLMARK_APICAL_JUNCTION | 200 | 186 | 0.4749 | -0.1452 | 2.4e-01 |
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY | 49 | 44 | 0.4501 | -0.2888 | 2.5e-01 |
HALLMARK_NOTCH_SIGNALING | 32 | 30 | 0.5572 | 0.3316 | 2.8e-01 |
HALLMARK_MYC_TARGETS_V1 | 200 | 178 | 0.5228 | 0.1319 | 2.9e-01 |
HALLMARK_UNFOLDED_PROTEIN_RESPONSE | 113 | 101 | 0.5285 | 0.1647 | 3.2e-01 |
HALLMARK_PANCREAS_BETA_CELLS | 40 | 40 | 0.4560 | -0.2544 | 3.4e-01 |
HALLMARK_KRAS_SIGNALING_DN | 200 | 181 | 0.4795 | -0.1181 | 3.4e-01 |
HALLMARK_MYOGENESIS | 200 | 195 | 0.4813 | -0.1082 | 3.7e-01 |
HALLMARK_APICAL_SURFACE | 44 | 41 | 0.4641 | -0.2073 | 4.3e-01 |
HALLMARK_HEDGEHOG_SIGNALING | 36 | 35 | 0.4661 | -0.1958 | 4.9e-01 |
HALLMARK_WNT_BETA_CATENIN_SIGNALING | 42 | 41 | 0.5308 | 0.1782 | 4.9e-01 |
HALLMARK_ANGIOGENESIS | 36 | 36 | 0.5321 | 0.1856 | 5.0e-01 |
HALLMARK_PEROXISOME | 104 | 100 | 0.5181 | 0.1047 | 5.3e-01 |
HALLMARK_E2F_TARGETS | 200 | 171 | 0.5113 | 0.0651 | 6.1e-01 |
HALLMARK_MYC_TARGETS_V2 | 58 | 50 | 0.4820 | -0.1042 | 6.6e-01 |
HALLMARK_ANDROGEN_RESPONSE | 100 | 91 | 0.5133 | 0.0765 | 6.6e-01 |
HALLMARK_XENOBIOTIC_METABOLISM | 200 | 189 | 0.4930 | -0.0405 | 7.4e-01 |
HALLMARK_ESTROGEN_RESPONSE_LATE | 200 | 193 | 0.5037 | 0.0215 | 8.6e-01 |
HALLMARK_PI3K_AKT_MTOR_SIGNALING | 105 | 99 | 0.5016 | 0.0090 | 9.6e-01 |
HALLMARK_CHOLESTEROL_HOMEOSTASIS | 74 | 70 | 0.5008 | 0.0049 | 9.8e-01 |
The value of the summary method is invisible, but it is a data frame.
tmp
## size actualSize NES logit2NES p.value
## HALLMARK_INTERFERON_GAMMA_RESPONSE 200 188 0.2472 -1.6070 6.8e-33
## HALLMARK_INTERFERON_ALPHA_RESPONSE 97 87 0.1688 -2.3000 1.3e-26
## HALLMARK_ALLOGRAFT_REJECTION 200 189 0.2816 -1.3512 4.4e-25
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 200 177 0.6907 1.1591 2.2e-18
## HALLMARK_INFLAMMATORY_RESPONSE 200 192 0.3517 -0.8825 1.4e-12
## HALLMARK_TNFA_SIGNALING_VIA_NFKB 200 186 0.3554 -0.8592 1.1e-11
## HALLMARK_IL6_JAK_STAT3_SIGNALING 87 83 0.3064 -1.1789 1.1e-09
## HALLMARK_COMPLEMENT 200 190 0.3869 -0.6643 7.8e-08
## HALLMARK_ADIPOGENESIS 200 176 0.5946 0.5527 1.5e-05
## HALLMARK_ESTROGEN_RESPONSE_EARLY 200 184 0.5866 0.5051 5.1e-05
Three functions allow to remove rows from the results table as need. The output of these function can be formatted with the summary method.
The first function removes the non significant gene-sets. By default, a 5% level of significance is applied to BH.values (Benijamini and Hockberg adjustment of the p.values)
summary(cut_by_significance(ans), as.formattable = TRUE)
size | actualSize | NES | logit2NES | p.value | BH.value | B.value | relevance | |
---|---|---|---|---|---|---|---|---|
HALLMARK_ESTROGEN_RESPONSE_EARLY | 200 | 184 | 0.5866 | 0.5051 | 5.1e-05 | 2.6e-04 | 2.6e-03 | 59.0 |
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 200 | 177 | 0.6907 | 1.1591 | 2.2e-18 | 2.8e-17 | 1.1e-16 | 59.0 |
HALLMARK_ADIPOGENESIS | 200 | 176 | 0.5946 | 0.5527 | 1.5e-05 | 8.4e-05 | 7.6e-04 | 56.0 |
HALLMARK_MITOTIC_SPINDLE | 199 | 187 | 0.5508 | 0.2943 | 1.7e-02 | 4.4e-02 | 8.3e-01 | 52.0 |
HALLMARK_BILE_ACID_METABOLISM | 112 | 112 | 0.5856 | 0.4991 | 1.8e-03 | 6.2e-03 | 8.8e-02 | 48.0 |
HALLMARK_FATTY_ACID_METABOLISM | 158 | 148 | 0.5650 | 0.3772 | 6.4e-03 | 2.0e-02 | 3.2e-01 | 48.0 |
HALLMARK_COAGULATION | 138 | 132 | 0.4100 | -0.5250 | 3.6e-04 | 1.4e-03 | 1.8e-02 | -50.0 |
HALLMARK_APOPTOSIS | 161 | 157 | 0.4170 | -0.4835 | 3.3e-04 | 1.4e-03 | 1.7e-02 | -52.0 |
HALLMARK_GLYCOLYSIS | 200 | 187 | 0.4489 | -0.2961 | 1.6e-02 | 4.4e-02 | 8.0e-01 | -53.0 |
HALLMARK_HYPOXIA | 200 | 186 | 0.4338 | -0.3842 | 1.9e-03 | 6.2e-03 | 9.3e-02 | -55.0 |
HALLMARK_IL2_STAT5_SIGNALING | 199 | 180 | 0.4164 | -0.4872 | 1.1e-04 | 5.0e-04 | 5.5e-03 | -55.0 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 87 | 83 | 0.3064 | -1.1789 | 1.1e-09 | 7.7e-09 | 5.4e-08 | -55.0 |
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION | 200 | 191 | 0.4449 | -0.3191 | 8.7e-03 | 2.6e-02 | 4.4e-01 | -60.0 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 97 | 87 | 0.1688 | -2.3000 | 1.3e-26 | 3.3e-25 | 6.6e-25 | -63.0 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 200 | 186 | 0.3554 | -0.8592 | 1.1e-11 | 8.9e-11 | 5.3e-10 | -66.0 |
HALLMARK_COMPLEMENT | 200 | 190 | 0.3869 | -0.6643 | 7.8e-08 | 4.9e-07 | 3.9e-06 | -69.0 |
HALLMARK_INFLAMMATORY_RESPONSE | 200 | 192 | 0.3517 | -0.8825 | 1.4e-12 | 1.4e-11 | 7.2e-11 | -76.0 |
HALLMARK_INTERFERON_GAMMA_RESPONSE | 200 | 188 | 0.2472 | -1.6070 | 6.8e-33 | 3.4e-31 | 3.4e-31 | -76.0 |
HALLMARK_ALLOGRAFT_REJECTION | 200 | 189 | 0.2816 | -1.3512 | 4.4e-25 | 7.3e-24 | 2.2e-23 | -76.5 |
As a toy example, …
summary(cut_by_significance(ans, level_of_significance = 0.01, where = "bonferroni"),
cols_to_remove = c("BH.value", "NES", "size"),
order_by = "logit2NES",
as.formattable = TRUE)
actualSize | logit2NES | p.value | B.value | |
---|---|---|---|---|
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 177 | 1.1591 | 2.2e-18 | 1.1e-16 |
HALLMARK_ADIPOGENESIS | 176 | 0.5527 | 1.5e-05 | 7.6e-04 |
HALLMARK_ESTROGEN_RESPONSE_EARLY | 184 | 0.5051 | 5.1e-05 | 2.6e-03 |
HALLMARK_IL2_STAT5_SIGNALING | 180 | -0.4872 | 1.1e-04 | 5.5e-03 |
HALLMARK_COMPLEMENT | 190 | -0.6643 | 7.8e-08 | 3.9e-06 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 186 | -0.8592 | 1.1e-11 | 5.3e-10 |
HALLMARK_INFLAMMATORY_RESPONSE | 192 | -0.8825 | 1.4e-12 | 7.2e-11 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 83 | -1.1789 | 1.1e-09 | 5.4e-08 |
HALLMARK_ALLOGRAFT_REJECTION | 189 | -1.3512 | 4.4e-25 | 2.2e-23 |
HALLMARK_INTERFERON_GAMMA_RESPONSE | 188 | -1.6070 | 6.8e-33 | 3.4e-31 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 87 | -2.3000 | 1.3e-26 | 6.6e-25 |
The functions cut_by_NES and cut_by_logit2NES remove the rows having a NES/logit2NES below a given threshold. They are equivalent, in fact
\[logit2NES = \log_2 \frac{NES}{1-NES},\] and back \[NES = \frac{2^{logit2NES}}{1+2^{logit2NES}}.\]
The advantage of representing the Normalized Enrichment Score (\(NES\)) as \(logit2NES\) is that this last is a signed value: positive means association with the genes of the treatment group, while a negative value signals the association with the control samples. The default thresholds are set to 0.6 and 0.58 for \(NES\) and \(logit2NES\). These values say that the probability of association of the gene-set with the treatment group is 1.5 higher than the case of association with the control group.
Trimming the table of results according to the \(NES\)/\(logit2NES\) means giving much more attention to the descriptive interpretation of the NES as a measure of strongness of association. Given a gene-set, the \(NES\) is the percentile rank associated with the gene-set, seen as a single value (the average of the ranks), in the universe of the genes outside the gene-set.
<- cut_by_significance(ans)
tmp summary(cut_by_logit2NES(tmp), as.formattable = TRUE, order_by = "NES")
size | actualSize | NES | logit2NES | p.value | BH.value | B.value | |
---|---|---|---|---|---|---|---|
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 200 | 177 | 0.6907 | 1.1591 | 2.2e-18 | 2.8e-17 | 1.1e-16 |
HALLMARK_COMPLEMENT | 200 | 190 | 0.3869 | -0.6643 | 7.8e-08 | 4.9e-07 | 3.9e-06 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 200 | 186 | 0.3554 | -0.8592 | 1.1e-11 | 8.9e-11 | 5.3e-10 |
HALLMARK_INFLAMMATORY_RESPONSE | 200 | 192 | 0.3517 | -0.8825 | 1.4e-12 | 1.4e-11 | 7.2e-11 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 87 | 83 | 0.3064 | -1.1789 | 1.1e-09 | 7.7e-09 | 5.4e-08 |
HALLMARK_ALLOGRAFT_REJECTION | 200 | 189 | 0.2816 | -1.3512 | 4.4e-25 | 7.3e-24 | 2.2e-23 |
HALLMARK_INTERFERON_GAMMA_RESPONSE | 200 | 188 | 0.2472 | -1.6070 | 6.8e-33 | 3.4e-31 | 3.4e-31 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 97 | 87 | 0.1688 | -2.3000 | 1.3e-26 | 3.3e-25 | 6.6e-25 |
summary(cut_by_NES(tmp), as.formattable = TRUE, order_by = "NES")
size | actualSize | NES | logit2NES | p.value | BH.value | B.value | |
---|---|---|---|---|---|---|---|
HALLMARK_OXIDATIVE_PHOSPHORYLATION | 200 | 177 | 0.6907 | 1.1591 | 2.2e-18 | 2.8e-17 | 1.1e-16 |
HALLMARK_COMPLEMENT | 200 | 190 | 0.3869 | -0.6643 | 7.8e-08 | 4.9e-07 | 3.9e-06 |
HALLMARK_TNFA_SIGNALING_VIA_NFKB | 200 | 186 | 0.3554 | -0.8592 | 1.1e-11 | 8.9e-11 | 5.3e-10 |
HALLMARK_INFLAMMATORY_RESPONSE | 200 | 192 | 0.3517 | -0.8825 | 1.4e-12 | 1.4e-11 | 7.2e-11 |
HALLMARK_IL6_JAK_STAT3_SIGNALING | 87 | 83 | 0.3064 | -1.1789 | 1.1e-09 | 7.7e-09 | 5.4e-08 |
HALLMARK_ALLOGRAFT_REJECTION | 200 | 189 | 0.2816 | -1.3512 | 4.4e-25 | 7.3e-24 | 2.2e-23 |
HALLMARK_INTERFERON_GAMMA_RESPONSE | 200 | 188 | 0.2472 | -1.6070 | 6.8e-33 | 3.4e-31 | 3.4e-31 |
HALLMARK_INTERFERON_ALPHA_RESPONSE | 97 | 87 | 0.1688 | -2.3000 | 1.3e-26 | 3.3e-25 | 6.6e-25 |
The results can be displayed two-way: a bar plot and a network graph. The corresponding functions provide a graphical rendering of the table in input in both cases. Then, eventually, the table has to be trimmed according to significance or NES.
plot(ans)
A meaningful display follows.
plot(cut_by_significance(ans), top = 30)
Here, the maximum number of bars has been restricted to 30.
The horizontal axis (signed-NES) is a linear transformation of the NES: \[signed\mbox-NES = 2\cdot NES - 1\] necessary to a) signal the direction of the association, and b) to bound the bars between -1.0 and 1.0
The network plot needs to specify ‘as.network = TRUE’ and provide the gene-sets collection.
plot(cut_by_significance(ans), gene_sets = geneSets, as.network = TRUE)
The similarity \(S(A, B)\) between two gene-sets \(A\) and \(B\) comes from the convex combination \[S(A,B)= \epsilon \cdot \delta_1(A, B)+ (1-\epsilon)\cdot \delta_0(A, B),\] with \(0\leq \epsilon \leq 1\), and \(\delta_0\) is the Jaccard similarity, while \(\delta_1\) is the overlap index. \(logit2NES\) controls the color of the balls: red for those with positive values (associated with the treatment), green otherwise. \(actualSize\) controls the dimension of the balls.
Here, we show a more serious analysis. A larger collection of gene-sets have be considered.
The nature of the gene profile included in the package is strictly according to the treatment versus control logic. In this case, 9 fusion FGFR3-TACC3 positive samples have been compared to 535 other samples. The interest is in the treatment, and then the analysis requires the alternative hypothesis “greater”.
system.time({C5BP_gs <- get_geneSets_from_msigdbr(category = "C5",
subcategory = "BP",
what = "gene_symbol")})
## msigdbr: R package version 7.5.1
## user system elapsed
## 38.386 0.114 38.741
system.time({C5MF_gs <- get_geneSets_from_msigdbr(category = "C5",
subcategory = "MF",
what = "gene_symbol")})
## msigdbr: R package version 7.5.1
## user system elapsed
## 1.653 0.052 1.705
system.time({C5CC_gs <- get_geneSets_from_msigdbr(category = "C5",
subcategory = "CC",
what = "gene_symbol")})
## msigdbr: R package version 7.5.1
## user system elapsed
## 1.072 0.053 1.124
system.time({H_gs <- get_geneSets_from_msigdbr(category = "H",
what = "gene_symbol")})
## msigdbr: R package version 7.5.1
## user system elapsed
## 0.108 0.024 0.132
# merging gene-sets collections
<- c(C5MF_gs, C5CC_gs, C5BP_gs, H_gs)
geneSets length(geneSets)
## [1] 10452
# running the analysis
system.time({ans <- massiveGST(geneProfile, geneSets,
alternative = "greater")})
## user system elapsed
## 6.842 0.037 6.880
# removing non significant results
<- cut_by_significance(ans,
ans level_of_significance = 0.05,
where = "bonferroni")
# Tabular results
summary(ans, as.formattable = TRUE, cols_to_remove = "BH.value")
collection | size | actualSize | NES | logit2NES | p.value | B.value | relevance | |
---|---|---|---|---|---|---|---|---|
HALLMARK_OXIDATIVE_PHOSPHORYLATION | H | 200 | 177 | 0.6907 | 1.1591 | 1.1e-18 | 1.2e-14 | 14877.5 |
GOBP_AEROBIC_RESPIRATION | C5 BP | 187 | 132 | 0.6906 | 1.1586 | 2.0e-14 | 2.1e-10 | 14717.0 |
GOCC_INNER_MITOCHONDRIAL_MEMBRANE_PROTEIN_COMPLEX | C5 CC | 155 | 96 | 0.7345 | 1.4683 | 1.0e-15 | 1.1e-11 | 14710.0 |
GOCC_MITOCHONDRIAL_PROTEIN_CONTAINING_COMPLEX | C5 CC | 281 | 212 | 0.6502 | 0.8946 | 2.5e-14 | 2.6e-10 | 14624.0 |
GOBP_RESPIRATORY_ELECTRON_TRANSPORT_CHAIN | C5 BP | 113 | 85 | 0.7199 | 1.3616 | 1.2e-12 | 1.3e-08 | 14579.0 |
GOCC_RESPIRASOME | C5 CC | 101 | 77 | 0.7332 | 1.4585 | 7.6e-13 | 7.9e-09 | 14556.5 |
GOBP_CELLULAR_RESPIRATION | C5 BP | 231 | 169 | 0.6508 | 0.8981 | 7.0e-12 | 7.3e-08 | 14532.5 |
GOBP_ATP_SYNTHESIS_COUPLED_ELECTRON_TRANSPORT | C5 BP | 92 | 67 | 0.7495 | 1.5809 | 8.3e-13 | 8.7e-09 | 14500.0 |
GOBP_OXIDATIVE_PHOSPHORYLATION | C5 BP | 139 | 88 | 0.6928 | 1.1731 | 2.1e-10 | 2.2e-06 | 14475.0 |
GOCC_MITOCHONDRIAL_MATRIX | C5 CC | 473 | 394 | 0.6132 | 0.6649 | 6.9e-15 | 7.2e-11 | 14309.5 |
GOCC_OXIDOREDUCTASE_COMPLEX | C5 CC | 120 | 102 | 0.6562 | 0.9325 | 2.5e-08 | 2.6e-04 | 14287.0 |
GOCC_ORGANELLE_INNER_MEMBRANE | C5 CC | 551 | 412 | 0.6117 | 0.6556 | 4.2e-15 | 4.4e-11 | 14284.0 |
GOMF_OXIDOREDUCTION_DRIVEN_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY | C5 MF | 72 | 57 | 0.7193 | 1.3577 | 5.1e-09 | 5.3e-05 | 14275.0 |
GOBP_ENERGY_DERIVATION_BY_OXIDATION_OF_ORGANIC_COMPOUNDS | C5 BP | 321 | 242 | 0.6177 | 0.6922 | 1.5e-10 | 1.6e-06 | 14241.0 |
GOBP_MITOCHONDRIAL_ELECTRON_TRANSPORT_NADH_TO_UBIQUINONE | C5 BP | 51 | 42 | 0.7660 | 1.7110 | 1.2e-09 | 1.3e-05 | 14140.5 |
GOBP_MITOCHONDRIAL_RESPIRATORY_CHAIN_COMPLEX_ASSEMBLY | C5 BP | 94 | 57 | 0.6796 | 1.0845 | 1.4e-06 | 1.4e-02 | 14063.0 |
GOCC_MITOCHONDRIAL_ENVELOPE | C5 CC | 783 | 602 | 0.5950 | 0.5552 | 1.0e-15 | 1.1e-11 | 14058.5 |
GOMF_OXIDOREDUCTASE_ACTIVITY_ACTING_ON_NAD_P_H_QUINONE_OR_SIMILAR_COMPOUND_AS_ACCEPTOR | C5 MF | 57 | 49 | 0.6950 | 1.1880 | 1.2e-06 | 1.2e-02 | 14051.5 |
GOMF_ELECTRON_TRANSFER_ACTIVITY | C5 MF | 125 | 106 | 0.6327 | 0.7844 | 1.2e-06 | 1.2e-02 | 14044.0 |
GOMF_NADH_DEHYDROGENASE_ACTIVITY | C5 MF | 45 | 38 | 0.7665 | 1.7146 | 6.6e-09 | 6.9e-05 | 14036.5 |
GOBP_FATTY_ACID_BETA_OXIDATION | C5 BP | 76 | 66 | 0.6582 | 0.9456 | 4.4e-06 | 4.6e-02 | 14003.5 |
GOCC_NADH_DEHYDROGENASE_COMPLEX | C5 CC | 49 | 40 | 0.7211 | 1.3704 | 6.5e-07 | 6.8e-03 | 13972.0 |
GOCC_MITOCHONDRION | C5 CC | 1648 | 1278 | 0.5825 | 0.4806 | 3.6e-23 | 3.7e-19 | 13906.5 |
GOBP_ELECTRON_TRANSPORT_CHAIN | C5 BP | 167 | 135 | 0.6147 | 0.6737 | 2.1e-06 | 2.2e-02 | 13899.5 |
GOBP_NADH_DEHYDROGENASE_COMPLEX_ASSEMBLY | C5 BP | 58 | 40 | 0.7026 | 1.2406 | 4.6e-06 | 4.8e-02 | 13875.0 |
GOBP_TRICARBOXYLIC_ACID_CYCLE | C5 BP | 31 | 29 | 0.8451 | 2.4477 | 6.3e-11 | 6.6e-07 | 13847.0 |
GOMF_PRIMARY_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY | C5 MF | 179 | 163 | 0.6049 | 0.6143 | 2.0e-06 | 2.0e-02 | 13827.0 |
GOBP_ATP_METABOLIC_PROCESS | C5 BP | 273 | 203 | 0.5971 | 0.5677 | 9.4e-07 | 9.8e-03 | 13775.0 |
GOBP_ACETYL_COA_METABOLIC_PROCESS | C5 BP | 33 | 31 | 0.7381 | 1.4944 | 2.2e-06 | 2.3e-02 | 13758.0 |
GOBP_ORGANIC_ACID_CATABOLIC_PROCESS | C5 BP | 243 | 211 | 0.5907 | 0.5295 | 2.8e-06 | 3.0e-02 | 13666.0 |
GOBP_VASCULAR_PROCESS_IN_CIRCULATORY_SYSTEM | C5 BP | 263 | 245 | 0.5863 | 0.5029 | 1.7e-06 | 1.8e-02 | 13623.0 |
GOBP_INNER_MITOCHONDRIAL_MEMBRANE_ORGANIZATION | C5 BP | 40 | 27 | 0.7526 | 1.6047 | 2.8e-06 | 2.9e-02 | 13616.0 |
GOCC_ENVELOPE | C5 CC | 1237 | 966 | 0.5687 | 0.3987 | 3.3e-13 | 3.4e-09 | 13522.5 |
GOBP_GENERATION_OF_PRECURSOR_METABOLITES_AND_ENERGY | C5 BP | 494 | 394 | 0.5720 | 0.4182 | 5.0e-07 | 5.2e-03 | 13462.5 |
GOMF_ACTIVE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY | C5 MF | 406 | 371 | 0.5725 | 0.4214 | 8.4e-07 | 8.8e-03 | 13449.0 |
GOBP_MITOCHONDRION_ORGANIZATION | C5 BP | 518 | 390 | 0.5711 | 0.4132 | 7.5e-07 | 7.8e-03 | 13430.0 |
GOMF_TRANSPORTER_ACTIVITY | C5 MF | 1235 | 1048 | 0.5448 | 0.2594 | 5.4e-07 | 5.6e-03 | 12720.0 |
GOBP_INTRACELLULAR_TRANSPORT | C5 BP | 1546 | 1264 | 0.5386 | 0.2230 | 2.3e-06 | 2.4e-02 | 12434.0 |
GOCC_CATALYTIC_COMPLEX | C5 CC | 1430 | 1162 | 0.5389 | 0.2247 | 4.6e-06 | 4.8e-02 | 12432.0 |
# Saving the table
<- file.path(tempdir(), "massiveGST_results.tsv")
fname save_as_xls(ans, file_name = fname)
# Inspecting the network of gene-sets
plot(ans, gene_sets = geneSets, as.network = TRUE)
Since version 1.2, a new function has been add to implement the overepresentation test (Fisher’s exact text) that is integrated with both the tabular and graphical function of this package.
Getting the gene sets and the gene-profile
<- get_geneSets_from_msigdbr(category = "C5", subcategory = "CC", what = "gene_symbol") geneSets
## msigdbr: R package version 7.5.1
<- system.file("extdata", package="massiveGST")
fname <- file.path(fname, "pre_ranked_list.txt")
fname <- get_geneProfile(fname) geneProfile
To mimic a set of significant genes coming from a differential expression procedure, we get the first 500 genes in the pre-ranked list,
<- names(head(geneProfile, 500)) geneList
and run the enrichment analysis
<- massiveORT(geneList, geneSets) ans
## greater
The function massiveORT essentially is a wrapper to the function fisher.test in charge to 1) arrange the input to feed fisher.test in sequence for each gene set, 2) arrange the output in a data frame compatible with the other function of the package, and 3) compute the universe of genes for the analysis.
By default, the universe of genes necessary to the Fisher’s test is computed as the collection of genes included at lest once in any gene-set. The function allows to consider a different universe. Se help file.
The tabular result comes from
summary(cut_by_significance(ans), as.formattable = TRUE)
collection | universe size | geneList size | geneSet size | geneList in geneSet | odds_ratio | log2 odds_ratio | p.value | BH.value | B.value | relevance | |
---|---|---|---|---|---|---|---|---|---|---|---|
GOCC_OXIDOREDUCTASE_COMPLEX | C5 CC | 14332 | 344 | 120 | 17 | 7.0050 | 2.8084 | 4.0e-09 | 2.8e-06 | 4.0e-06 | 752.0 |
GOCC_RESPIRASOME | C5 CC | 14332 | 344 | 101 | 14 | 6.7762 | 2.7605 | 1.3e-07 | 2.5e-05 | 1.3e-04 | 732.0 |
GOCC_INNER_MITOCHONDRIAL_MEMBRANE_PROTEIN_COMPLEX | C5 CC | 14332 | 344 | 155 | 16 | 4.8595 | 2.2808 | 9.9e-07 | 1.4e-04 | 1.0e-03 | 728.0 |
GOCC_ORGANELLE_INNER_MEMBRANE | C5 CC | 14332 | 344 | 551 | 36 | 3.0574 | 1.6123 | 4.9e-08 | 1.2e-05 | 4.9e-05 | 712.0 |
GOCC_MITOCHONDRIAL_PROTEIN_CONTAINING_COMPLEX | C5 CC | 14332 | 344 | 281 | 21 | 3.4323 | 1.7792 | 4.4e-06 | 4.9e-04 | 4.4e-03 | 709.0 |
GOCC_MITOCHONDRIAL_ENVELOPE | C5 CC | 14332 | 344 | 783 | 47 | 2.8490 | 1.5105 | 5.5e-09 | 2.8e-06 | 5.5e-06 | 707.0 |
GOCC_MITOCHONDRION | C5 CC | 14332 | 344 | 1648 | 75 | 2.2004 | 1.1377 | 2.6e-08 | 8.6e-06 | 2.6e-05 | 689.0 |
GOCC_NADH_DEHYDROGENASE_COMPLEX | C5 CC | 14332 | 344 | 49 | 7 | 6.8945 | 2.7854 | 1.6e-04 | 1.0e-02 | 1.6e-01 | 688.0 |
GOCC_GLIAL_CELL_PROJECTION | C5 CC | 14332 | 344 | 33 | 7 | 11.1480 | 3.4787 | 1.1e-05 | 1.1e-03 | 1.1e-02 | 685.0 |
GOCC_INTRINSIC_COMPONENT_OF_PRESYNAPTIC_MEMBRANE | C5 CC | 14332 | 344 | 73 | 8 | 5.0986 | 2.3501 | 3.5e-04 | 2.1e-02 | 3.5e-01 | 684.5 |
GOCC_PLASMA_MEMBRANE_REGION | C5 CC | 14332 | 344 | 1233 | 59 | 2.2594 | 1.1759 | 2.1e-07 | 3.4e-05 | 2.1e-04 | 683.0 |
GOCC_INTRINSIC_COMPONENT_OF_SYNAPTIC_MEMBRANE | C5 CC | 14332 | 344 | 159 | 12 | 3.4027 | 1.7667 | 4.6e-04 | 2.4e-02 | 4.6e-01 | 681.0 |
GOCC_LEADING_EDGE_MEMBRANE | C5 CC | 14332 | 344 | 176 | 13 | 3.3306 | 1.7358 | 3.3e-04 | 2.1e-02 | 3.3e-01 | 678.0 |
GOCC_ENVELOPE | C5 CC | 14332 | 344 | 1237 | 57 | 2.1556 | 1.1081 | 1.2e-06 | 1.5e-04 | 1.2e-03 | 676.0 |
GOCC_MITOCHONDRIAL_MATRIX | C5 CC | 14332 | 344 | 473 | 27 | 2.5859 | 1.3707 | 2.9e-05 | 2.7e-03 | 2.9e-02 | 676.0 |
GOCC_CELL_PROJECTION_MEMBRANE | C5 CC | 14332 | 344 | 340 | 21 | 2.7855 | 1.4780 | 7.6e-05 | 5.8e-03 | 7.6e-02 | 674.0 |
GOCC_BASOLATERAL_PLASMA_MEMBRANE | C5 CC | 14332 | 344 | 228 | 15 | 2.9481 | 1.5598 | 4.1e-04 | 2.3e-02 | 4.1e-01 | 664.5 |
GOCC_BASAL_PART_OF_CELL | C5 CC | 14332 | 344 | 271 | 16 | 2.6268 | 1.3933 | 8.7e-04 | 4.2e-02 | 8.8e-01 | 649.5 |
GOCC_CELL_LEADING_EDGE | C5 CC | 14332 | 344 | 416 | 21 | 2.2372 | 1.1617 | 1.1e-03 | 4.9e-02 | 1.0e+00 | 644.0 |
GOCC_ASTROCYTE_PROJECTION | C5 CC | 14332 | 344 | 19 | 5 | 14.7062 | 3.8784 | 6.8e-05 | 5.7e-03 | 6.9e-02 | 641.0 |
GOCC_TRICARBOXYLIC_ACID_CYCLE_ENZYME_COMPLEX | C5 CC | 14332 | 344 | 16 | 4 | 13.6904 | 3.7751 | 4.7e-04 | 2.4e-02 | 4.8e-01 | 621.0 |
GOCC_DIHYDROLIPOYL_DEHYDROGENASE_COMPLEX | C5 CC | 14332 | 344 | 9 | 3 | 20.4829 | 4.3563 | 1.0e-03 | 4.7e-02 | 1.0e+00 | 574.5 |
GOCC_OXOGLUTARATE_DEHYDROGENASE_COMPLEX | C5 CC | 14332 | 344 | 5 | 3 | 61.4033 | 5.9402 | 1.3e-04 | 9.5e-03 | 1.3e-01 | 555.5 |
The graphical output is from
plot(cut_by_significance(ans))
and
plot(cut_by_significance(ans), gene_sets = geneSets, as.network = TRUE)
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 20.3
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=it_IT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] massiveGST_1.2.3 visNetwork_2.1.0 igraph_1.3.2 WriteXLS_6.4.0
## [5] msigdbr_7.5.1 formattable_0.2.1
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 pillar_1.7.0 bslib_0.3.1 compiler_4.2.3
## [5] jquerylib_0.1.4 tools_4.2.3 digest_0.6.29 jsonlite_1.8.0
## [9] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.7 pkgconfig_2.0.3
## [13] rlang_1.0.3 cli_3.6.0 DBI_1.1.3 rstudioapi_0.13
## [17] yaml_2.3.5 xfun_0.31 fastmap_1.1.0 stringr_1.4.0
## [21] dplyr_1.0.9 knitr_1.39 generics_0.1.2 htmlwidgets_1.5.4
## [25] sass_0.4.1 vctrs_0.4.1 tidyselect_1.1.2 glue_1.6.2
## [29] babelgene_22.3 R6_2.5.1 fansi_1.0.3 rmarkdown_2.14
## [33] purrr_0.3.4 magrittr_2.0.3 ellipsis_0.3.2 htmltools_0.5.2
## [37] assertthat_0.2.1 utf8_1.2.2 stringi_1.7.6 crayon_1.5.1