library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
library(DTSEA)
#> This package SHOULD NOT BE USED UNDER INTEL MATH KERNEL LIBRARY ON ANY OCCASION.
#> There is an avoidable but critical bug with Intel Math Kernel Library (MKL) on various operating systems.
#>
#> ======================================
#> For better performance, we recommend not using RStudio on Windows because RStudio cannot take advantage of the multi-core capabilities available on modern computers.
The Drug Target Set Enrichment Analysis (DTSEA) is a novel tool used to identify the most effective drug set against a particular disease based on the Gene Set Enrichment Analysis (GSEA).
The central hypothesis of DTSEA is that the targets of potential candidates for a specific disease (e.g., COVID-19) ought to be close to each other, or at least not so far away from the disease. The DTSEA algorithm determines whether a drug is potent for the chosen disease by the proximity between drug targets and the disease-related genes. Under the central hypothesis of DTSEA, the DTSEA consists of two main parts:
To evaluate the influence, we compute the disease-node distance by using the random walk with restart (RwR) algorithm, then rank the nodes reversely.
The GSEA approach is adopted in this part to identify whether candidate drug targets are disease-related (top) or disease-unrelated (bottom) on the human PPI list. The specific disease gene list is normalized by the median and is set zero as the arbitrary cutoff point to classify the relations manually.
This vignette illustrates how to use the DTSEA easily. Here, using functions in this package, users could identify potential drugs for disease by the DTSEA algorithm.
The function DTSEA
is used to calculate the enrichment
scores of drugs. The parameters are as follows:
network
: The human protein-protein interactome network.
It should be or be preconverted before being inputted in DTSEA.disease
: The disease-related nodes.drugs
: The drug-target long format dataframe. It
includes at least columns with the drug_id and drug_target.rwr.pt
: The random walk p0 vector. Set it to zero if
you wish DTSEA automatically compute it, or you can provide your
predetermined p0 vector.eps
: The boundary of calculating the p value.nPermSimple
: Number of permutations in the simple fgsea
implementation for preliminary estimation of p-values.gseaParam
: GSEA parameter value, all gene-level
statistics are raised to the power of ‘gseaParam’ before calculating
GSEA enrichment scores.verbose
: Show the messages.# Load the data
data("example_disease_list", package = "DTSEA")
data("example_drug_target_list", package = "DTSEA")
data("example_ppi", package = "DTSEA")
# Perform a simple DTSEA analysis using default optional parameters then sort
# the result dataframe by normalized enrichment scores (NES)
<- DTSEA(network = example_ppi,
result disease = example_disease_list,
drugs = example_drug_target_list, verbose = FALSE
%>%
) arrange(desc(NES))
#> Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.81% of the list).
#> The order of those tied genes will be arbitrary, which may produce unexpected results.
head(result)
#> # A tibble: 6 × 8
#> drug_id pval padj log2err ES NES size leadingEdge
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <list>
#> 1 DB00112 0.00580 0.0367 0.407 0.729 1.83 9 <chr [4]>
#> 2 DB09079 0.0986 0.268 0.0926 0.509 1.40 12 <chr [10]>
#> 3 DB11952 0.0770 0.268 0.104 0.890 1.38 2 <chr [2]>
#> 4 DB05017 0.104 0.268 0.0888 0.866 1.34 2 <chr [2]>
#> 5 DB11526 0.181 0.369 0.0673 0.681 1.31 4 <chr [4]>
#> 6 DB01611 0.215 0.372 0.0609 0.719 1.26 3 <chr [3]>
You can arrange the positive results by NES
select(result, -leadingEdge) %>%
arrange(desc(NES)) %>%
filter(NES > 0 & pval < .05) %>% head()
#> # A tibble: 1 × 7
#> drug_id pval padj log2err ES NES size
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 DB00112 0.00580 0.0367 0.407 0.729 1.83 9
Or we can draw the enrichment plot of the first predicted drug using
fgsea
package.
::plotEnrichment(
fgseapathway = example_drug_target_list %>%
filter(drug_id == slice(result, 1)$drug_id) %>%
pull(gene_target),
stats = random.walk(network = example_ppi,
p0 = calculate_p0(nodes = example_ppi,
disease = example_disease_list)
) )
The function random.walk
calculates the pt
vector. The parameters are as follows:
network
The input graph object. It should be either
igraph object or edge list matrix / data frame.p0
The starting vector on time 0.edge_weight
Logical to indicate whether the input graph
contains weight information.gamma
The restart probability used for RwR. The
gamma
takes the value from 0 to 1, controlling the
probability that a node would go back to its starting node.threshold
The threshold used for RwR. The
threshold
indicates the stabilization status, which is a
stopping criterion of RwR.pt.post.processing
The way to scale the pt
vector. It can be ‘none’, ‘zscore’, and ‘log’.pt.align
The way to normalize the output
pt
vector. It can be ‘mean’ to manually cut the up- and
down-regulated genes, ‘median’ to avoid the influence of the
distribution shape, or ‘none’ for no normalization.# Calculate p0
<- calculate_p0(nodes = example_ppi, disease = example_disease_list)
p0
# Then perform random walk
random.walk(network = example_ppi, p0 = p0, verbose = FALSE) %>%
head()
#> MYD88 MAPK14 TRAF1 JAK2 AGT IRAK2
#> 5.010910 4.994586 4.985955 4.970095 4.964825 4.959323
The function cronbach.alpha
and kendall.w
are used to calculate the prediction consistency.
If you have several prediction results, you can use either function to describe the consistency.
# Just report the results
kendall.w(data)$report
#> [1] "Kendall's coefficient W = 0.51, p = 0.335"
# Or just report the alpha
cronbach.alpha(data)
#> [1] 0.1510749
You can enable the multicore feature to utilize the multicore advantages. Here is the benchmark.
# Load the data
data("example_disease_list", package = "DTSEA")
data("example_drug_target_list", package = "DTSEA")
data("example_ppi", package = "DTSEA")
# set up environment
<- function() {
single.core suppressWarnings(capture.output(DTSEA(network = example_ppi,
disease = example_disease_list,
drugs = example_drug_target_list,
nproc = 0)))
NULL
}
<- function() {
dual.core suppressWarnings(capture.output(DTSEA(network = example_ppi,
disease = example_disease_list,
drugs = example_drug_target_list,
nproc = 10)))
NULL
}
system.time(single.core()) - system.time(dual.core())
In this package, we provide the example data, which is a small set of data to demonstrate the usage and the main idea behind DTSEA. We provide some extra data files, the real data we used in the DTSEA paper. The supplementary package is now on the GitHub. Anyone can obtain this package by:
if (!"devtools" %in% as.data.frame(installed.packages())$Package)
install.packages("devtools")
::install_github("hanjunwei-lab/DTSEAdata") devtools
Here is a list of the supplementary datasets
Data name | Description |
---|---|
deletion |
Drug prediction results on trimmed graphs after enrichment |
drug_indications |
The drug indications data from ChEMBL |
drug_predicted |
The drug target data predicted by ChEMBL |
drug_targets |
The combined drug target data |
expr_raw |
The raw expression profile of GSE164805 |
graph |
The human gene functional interaction network we used |
ncbi_list |
The COVID-19-related genes provided by NCBI |
The Intel Math Kernel Library (MKL) performs poorly with this package when dealing with linear algebra operations. If you use MKL-based BLAS or MKL-based R distribution (like Microsoft R Open), you will get unexpected or zero results in certain circumstances. Please install the Automatically Tuned Linear Algebra package (libatlas) or the multi-threaded OpenBlas library in order to get higher performance and reliable results with:
sudo apt install libatlas3-base -y
or
sudo apt install libopenblas-base -y