KSEAapp Package Overview

Welcome to the KSEAapp package overview

This vignette will demonstrate potential usage for the available features within this package.

The fundamental calculations underlying this package is based on work published in Casado et al. (2013) Sci Signal. 6(268):rs6. Please refer to this paper for details on the formula.

Details on this R package and applications can be further found in Wiredja et al. (2017) Bioinformatics 33(21):3489-3491.

Please note: To maintain compliance with updated CRAN policies, KSEA App package v2.0 no longer offers KSEA.Complete(), as this function writes outputs directly to the user’s filespace.

Summary of the package contents

This package has the following functions:

KSEA.KS_table()
KSEA.Scores()
KSEA.Barplot()
KSEA.Heatmap()

This package includes a few sample datasets to use for exercises:

KSData: a Kinase-Substrate (K-S) dataset used for calculations (this file is abbreviated for simplicity and file size). The full file is available in the following GitHub page, github.com/casecpb/KSEA/, as “PSP&NetworKIN_Kinase_Substrate_Dataset_July2016.csv”. Please note that this file may be updated with more recent versions in the future. In that case, the appended date will also be updated (e.g., “July2016” will be “July2030”).
PX: a sample experimental dataset that is correctly formatted for input
KSEA.Scores.1: a sample output from KSEA.Scores() function
KSEA.Scores.2: a sample output from KSEA.Scores() function
KSEA.Scores.3: a sample output from KSEA.Scores() function

Additional notes on the PX format:

The following is a detailed description of each column in PX:

Protein = the Uniprot Accession ID for the parent protein (if unavailable, write “NULL”)
Gene = the HUGO gene name for the parent protein
Peptide = the peptide sequence (if unavailable, write “NULL”)
Residue.Both = all phosphosites from that peptide, separated by semicolons if applicable; must be formatted as the single amino acid abbrev. with the residue position (e.g. S102)
p = the p-value of that peptide representing differential phosphorylation between the control and treatment group (if none calculated, please write “NULL”, cannot be NA)
FC = the fold change (not log-transformed); usually the control sample is the denominator

The listed columns must be presented in that exact order. There can be no NA values, or else the entire row will be discarded from analysis. Although Protein, Peptide, and p entries are optional, the column headers are mandatory.

Overview of KSEAapp package functionality

The goal of the KSEAapp is to generate relative kinase activity inferences from quantitative phosphoproteomics data.

Given an experimental dataset input, you will generate 3 different forms of outputs:

a table of all the kinase-substrate (K-S) relationships used for the calculations
a table of all the KSEA kinase scores
a summary plot (or alternatively, a heatmap) highlighting the results

You can achieve this result using the KSEA.KS_table(), KSEA.Scores(), KSEA.Barplot(), and KSEA.Heatmap() functions. These series of functions allows everything to be created as objects within the R environment. This gives additional flexibility for the user to do downstream data manipulation. The user can employ KSEA.Heatmap() rather than KSEA.Barplot() to compile a multi-condition experiment into a single heatmap instead of separate bar plots.

The following are detailed walk-throughs on how to navigate through this process.

Walk-through of the different functions

This exercise requires the following datasets included in the package:

KSData
PX
KSEA.Scores.1 (only if using KSEA.Heatmap() function)
KSEA.Scores.2 (only if using KSEA.Heatmap() function)
KSEA.Scores.3 (only if using KSEA.Heatmap() function)

1) Generate the K-S table using the KSEA.KS_table() function

This is the overview of all the required parameters for KSEA.KS_table()

KSData: the Kinase-Substrate (K-S) dataset as described above
PX: the experimental data file as described above
NetworKIN: a binary input of TRUE or FALSE, indicating whether or not to include NetworKIN predictions; NetworKIN = TRUE means inclusion of NetworKIN predictions
NetworKIN.cutoff: a numeric value between 1 and infinity setting the minimum NetworKIN score (can be left out if NetworKIN = FALSE)

Here is an example type-up for the R Console:

KSData.dataset <- KSEA.KS_table(KSData, PX, NetworKIN=TRUE, NetworKIN.cutoff=5)

This generates a complete table listing ALL the K-S relationships identified from the experimental dataset. This includes relationships for kinases that are not featured in the bar plot. For each kinase, every substrate identified from the dataset was used for the KSEA calculations (in other words, there was no filtering of the substrates). Kinase.Gene represents the gene name for each kinase. Substrate.Gene indicates the gene name for each substrate linked to that kinase. Substrate.Mod is the substrate’s specific amino acid residue that was modified. Source shows the database where the K-S annotation was derived from. log2FC is the log2(fold change) value of that particular substrate phosphosite from the experiment. If that same site was detected across multiple peptides that map to the same protein, the average log2FC is reported.

2) Generate the KSEA kinase scores using the KSEA.Scores() function

This is the overview of all the required parameters for KSEA.Scores()

KSData: the Kinase-Substrate (K-S) dataset as described above
PX: the experimental data file as described above
NetworKIN: a binary input of TRUE or FALSE, indicating whether or not to include NetworKIN predictions; NetworKIN = TRUE means inclusion of NetworKIN predictions
NetworKIN.cutoff: a numeric value between 1 and infinity setting the minimum NetworKIN score (can be left out if NetworKIN = FALSE)

Here is an example type-up for the R Console:

Scores <- KSEA.Scores(KSData, PX, NetworKIN=TRUE, NetworKIN.cutoff=5)

This is a complete table listing ALL the kinases, including those that are not featured in the bar plot, that have at least one identified substrate in the input dataset. Please refer to the original Casado et al. publication for detailed description of these columns and what they represent. Kinase.Gene indicates the gene name for each kinase. mS represents the mean log2(fold change) of all the kinase’s substrates. Enrichment is the background-adjusted value of the kinase’s mS. m is the total amount of detected substrates from the experimental dataset for each kinase. z.score is the normalized score for each kinase, weighted by the number of identified substrates. p.value represents the statistical assessment for the z.score. FDR is the p-value adjusted for multiple hypothesis testing using the Benjamini & Hochberg method.

3) For single-condition experiment, generate a summary bar plot using the KSEA.Barplot() function

This is the overview of all the required parameters for KSEA.Barplot()

KSData: the Kinase-Substrate (K-S) dataset as described above
PX: the experimental data file as described above
NetworKIN: a binary input of TRUE or FALSE, indicating whether or not to include NetworKIN predictions; NetworKIN = TRUE means inclusion of NetworKIN predictions
NetworKIN.cutoff: a numeric value between 1 and infinity setting the minimum NetworKIN score (can be left out if NetworKIN = FALSE)
m.cutoff: a numeric value between 0 and infinity indicating the min. # of substrates a kinase must have to be included in the bar plot output
p.cutoff: a numeric value between 0 and 1 indicating the p-value cutoff for indicating significant kinases in the bar plot

Here is an example type-up for the R Console:

KSEA.Barplot(KSData, PX, NetworKIN=TRUE, NetworKIN.cutoff=5, m.cutoff=5, p.cutoff=0.01)

This is the bar plot that summarizes the KSEA results. Note that not all kinases are included. The kinase substrate count cutoff, set by m.cutoff, decides which kinases to include in this plot. The p-value cutoff, set by p.cutoff, decides which kinases to color blue/red for visual annotation of kinases that reach statistical significance. Kinases with non-significant scores will be black.

4) For multi-condition experiments, alternatively generate a summary heatmap using the KSEA.Heatmap() function

Important notes:

This function is designed for compiling results of studies that have 2+ treatment/case conditions; otherwise, the KSEA.Barplot() function should be used if there is just a single treatment group and a single control group.
Before using KSEA.Heatmap(), you need to generate the outputs from KSEA.Scores() for each desired pairwise comparison in the study.
The separate objects from KSEA.Scores() will need to be put into a list for input into KSEA.Heatmap().

This is the overview of all the required parameters for KSEA.Heatmap():

score.list: the data frame outputs from the KSEA.Scores() function, compiled in a list format
sample.labels: a character vector of all the sample names for heatmap annotation; the names must be in the same order as the data in score.list; please avoid long names, as they may get cropped in the final image
stats: character string of either “p.value” or “FDR” indicating the data column to use for marking statistically significant scores
m.cutoff: a numeric value between 0 and infinity indicating the min. # of substrates a kinase must have to be included in the heatmap
p.cutoff: a numeric value between 0 and 1 indicating the p-value or FDR cutoff for indicating significant kinases in the heatmap
sample.cluster: a binary input of TRUE or FALSE, indicating whether or not to perform hierarchical clustering of the sample columns

Here is an example type-up for the R Console:

KSEA.Heatmap(score.list=list(KSEA.Scores.1, KSEA.Scores.2, KSEA.Scores.3), 
             sample.labels=c("Tumor.A", "Tumor.B", "Tumor.C"), 
             stats="p.value", m.cutoff=3, p.cutoff=0.05, sample.cluster=TRUE)

This should result in a heatmap. Blue = negative kinase scores; White = zero-valued kinase scores; Red = positve kinase scores; Asterisks = scores that met the statistical cutoff, as indicated by the p.cutoff parameter.