Help for package CNSigs

Type:

Package

Title:

Analysis of Copy Number Signatures

Version:

0.1.0

Maintainer:

Shawn Striker <striker.35@osu.edu>

Description:

A workflow to generate and analyze signatures based on copy number data using non-negative matrix factorization (NMF) in an approach similar to that used in mutational signatures. It can be used to extract features from Copy number segment data and use that to find a subset of copy number signatures which can be further used to correlate with other relevant data. For more on 'NMF' see Gaujoux (2013) <doi:10.1186/1471-2105-11-367>.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.3

Depends:

R (≥ 2.10), NMF

Imports:

methods, doParallel, foreach, flexmix, limSolve, ggplot2, snow, cowplot, pheatmap, RColorBrewer, viridisLite, colorspace, stats, utils

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-12-29 13:06:42 UTC; str236

Author:

David Tallman [aut], Shawn Striker [cre, ctb], Daniel Stover [cph]

Repository:

CRAN

Date/Publication:

2026-01-08 19:30:19 UTC

addPloidyData

Description

This function is used to append your ploidy data onto the sample component matrix. The function is called in runPipeline if you specify ploidy as a desired feature and give a vector of ploidy values.

Usage

addPloidyData(scm, ploidyData)

Arguments

scm

The scm to append the ploidy to

ploidyData

Vector of ploidy date to append to segs

Value

Returns scm including the ploidy data

Components derived from TCGA

Description

These generated components were derived from all of the copy number data available from TCGA. It is meant to be used to look for signatures in cancer data so that you do not have to model new data, and so that you can easily compare signatures

Usage

cancerComps

Format

A list with 6 flexmix objects:

segsize: Mixture of normal distributions
bp10MB: Mixture of normal distributions
osCN: Mixture of poisson distributions
changepoint: Mixture of normal distributions
copynumber: Mixture of normal distributions
bpchrarm: Mixture of poisson distributions

Signatures derived from TCGA

Description

These signatures were derived from all of the copy number data available from TCGA. It is meant to be used so that you can compare newly found signatures to these described signatures. There are a total of 25 signatures. The components used to derive these signatures can be found in the cancerComps variable.

Usage

cancerSigs

Format

An object of class matrix (inherits from array) with 28 rows and 25 columns.

checkCompOverlap

Description

This function is used to check for overlapping components and remove them if they overlap. The mixed modeling can sometimes run into an error in which it produces multiple essentially identical components. This function attempts to find and remove these duplicates.

Usage

checkCompOverlap(comps, pois = FALSE)

Arguments

comps

Component parameters

pois

Whether or not the components are poisson or normal distributions

Value

Returns a components with no overlaps

Signatures derived from TCGA and collapsed.

Description

These signatures were derived from all of the copy number data available from TCGA. These pan-cancer signatures were collapsed by similarity to a total of 13 signatures and were built using ploidy data. It is meant to be used so that you can compare newly found signatures to these described signatures.

Usage

collapsedSigs

Format

An object of class data.frame with 29 rows and 13 columns.

compareExposures

Description

This function is used to check two signature sets in order to compare how the samples exposures differ across the two runs.

Usage

compareExposures(reference, toCompare)

Arguments

reference

Results from your reference analysis

toCompare

Results from run that you want to compare to reference

Value

Prints out the difference in signature exposures

Examples

compareExposures(referenceExp, referenceExp)

Components fitted onto featsExp

Description

The generated components for the segDataExp dataset. Generated using the function fitModels(featsExp). Each data frame has a value and a ID column. The ID tells you which sample the observed value is from.

Usage

compsExp

Format

A list with 6 flexmix objects:

segsize: Mixture of normal distributions
bp10MB: Mixture of normal distributions
osCN: Mixture of poisson distributions
changepoint: Mixture of normal distributions
copynumber: Mixture of normal distributions
bpchrarm: Mixture of poisson distributions

createSigs

Description

This function is used to create the final signatures and generates the resulting NMF object, from which you can extract the feature contribution to each signature using NMF::basis(), and the signature contribution of each sample by using NMF::scoef()

Usage

createSigs(scm, nsig, cores = 1, runName = "", saveRes = FALSE, saveDir = NULL)

Arguments

scm

The sample_by_component matrix to run NMF on

nsig

Number of signatures for the NMF to create

cores

Number of cores to use in parallel process

runName

Name of the run used in file names, Default is ""

saveRes

Whether or not to save the results, Default is FALSE

saveDir

Where to save the results, must be provided if using saveDir

Value

Returns the resulting NMF object

Examples


createSigs(scmExp,5) #Generates 5 signatures from the SCM

Default features to use for copy number signatures

Description

These are the default features that are used in the package. Use this to get a list of the feature names and you can remove values from this and pass it in to the package using the featsToUse parameter seen in multiple functions.

Usage

defaultFeats

Format

An object of class character of length 6.

detSigNumPipeline

Description

This function allows you to run the Copy number signature pipeline up until the determineSigNum call. This is useful if you want to repeteadly check the optimal number of signatures for different sample sets. May take a while, especially if not given multiple cores.

Usage

detSigNumPipeline(
  segData,
  cores = 1,
  components = NULL,
  saveRes = FALSE,
  runName = "Run",
  rmin = 3,
  rmax = 12,
  max_comps = NULL,
  min_comps = NULL,
  saveDir = NULL,
  smooth = FALSE,
  colMap = NULL,
  pR = FALSE,
  gbuild = "hg19",
  featsToUse = NULL,
  ploidyData = NULL
)

Arguments

segData

The data to be analyzed. If a path name, readSegs is used to make the list. Otherwise the list must be formatted correctly. Refer to ?readSegs for format information.

cores

The number of computer cores to be used for parallel processing

components

Can be used when fixing components. Default is NULL.

saveRes

Whether or not to save the resulting tables and plots. Default is FALSE

runName

Used to title plots and files when saving results

rmin

Minimum number of signatures to look for. Default is 3.

rmax

Maximum number of signatures to look for. Default is 12.

max_comps

vector of length 6 specifying the max number of components for each feature. Passed to fitModels. Default is 10 for all features

min_comps

vector of length 6 specifying the min number of components for each feature. Passed to fitModels. Default is 2 for all features

saveDir

Used to specify where to save the results, must be provided if using saveDir

smooth

Whether or not to smooth the input data. Default is FALSE.

colMap

Mapping of column names when reading from text file. Default column names are ID, chromosome, start, end, segVal.

pR

Peak Reduction

gbuild

The reference genome build. Default is hg19. Also supports hg18 and hg38.

featsToUse

Vector of feature names that you wish to use

ploidyData

The ploidy data to use as a feature

Value

Returns a list with all of the results from the pipeline

Examples

#Runs the entire pipeline on the example data giving it 6 cores and specifying
#5 signatures with a name of "TCGA Test"

detSigNumPipeline(segDataExp, cores = 6, saveRes = FALSE, runName = "TCGA Test")

determineNumSigs

Description

This function uses the extracted features and modelled components, and it performs NMF on these ranging from the minimum number of signatures to the max. It repeats this on randomized data and computes various measures to help inform the user on how many signatures to proceed with. This function may take a while to run since it repeats the NMF process many times. It is suggested to give it multiple cores to allow for parallel processing.

Usage

determineNumSigs(
  scm,
  rmin = 3,
  rmax = 12,
  cores = 1,
  nrun = 250,
  saveRes = FALSE,
  saveDir = NULL,
  runName = ""
)

Arguments

scm

Sample by component matrix used to find signatures

rmin

The lower bound of signature numbers to check. Default is 2.

rmax

The upper bound of signature numbers to check. Default is 12.

cores

The number of cores to use for parallel analysis. Default is 1.

nrun

Number of runs for NMF. Default is 250.

saveRes

Whether or not to save the plot. Default is FALSE.

saveDir

Directory to save plot in, must be provided if using saveDir

runName

Used to title plots and files when saving results

Value

Creates a series of plots to help user decide

Examples


determineNumSigs(generateSCM(featsExp,compsExp))

diffCompSigSim

Description

This function is used to determine the similarity between two signatures that have different underlying components. Uses ks-statistic based measure to estimate similarity for normal distribution based components and uses a correlation measure when comparing poisson distribution based components.

Usage

diffCompSigSim(refComps, refWeights, valComps, valWeights)

Arguments

refComps

Reference component parameters

refWeights

Reference component weights

valComps

Component parameters to compare against

valWeights

Component weights to compare agaisnt

Value

Returns a correlation value

extractCNFeats

Description

This function is used to extract the six copy number features that are eventually used in order to make the signatures. It does this using six sub functions to extract each feature. Before extracting the features, the segments are passed through a validation function to make sure the data is formatted correctly and there are no invalid segments. Can be done in parallel using the cores parameter.

Usage

extractCNFeats(
  segData,
  gbuild = "hg19",
  cores = 1,
  featsToExtract = CNSigs::defaultFeats
)

Arguments

segData

The copy number segment data

gbuild

The reference genome build. Default is hg19. Also supports hg18 and hg38.

cores

The number of cores to use for parallel processing. Default 1.

featsToExtract

The names of the features to extract.

Value

list of dataframes containing results of six copy number features

Examples

extractCNFeats(segDataExp)

featureFuncs

Description

This group of functions return vectors of the corresponding features for the samples passed in. Some of the functions use internal datasets to the CNSig package that specify the chromosome lengths and the centromere positions.

Usage

extractSegsize(segData)

extractBP10MB(segData, chrlen)

extractOscillations(segData, chrlen)

extractBPChrArm(segData, centromeres, chrlen)

extractChangepoints(segData, centromeres, chrlen)

extractCN(segData)

Arguments

segData

The samples to extract data from

chrlen

The lengths of the chromosomes from reference genome

centromeres

The positions of the centromeres in reference genome

extractSegsize

This function returns a vector of all the segment sizes for all for all of the samples.

extractBP10MB

This function returns a vector of the average number of breakpoints in a per 10MB for each chromosome.

extractOscillations

This function returns a vector of number of oscillation events found on each of the chromosomes.

getBPChrArm

This function returns a vector of number of total breakpoints per chromosome arm.

extractChangepoints

This function returns a vector of average size of changepoints per chromosome

extractCN

This function returns a vector of average copynumber per chromosome

Features from segDataExp

Description

The extracted features from the segDataExp dataset. Generated using the function extractCNFeats(segDataExp). Each data frame has a value and a ID column. The ID tells you which sample the observed value is from.

Usage

featsExp

Format

A list with 6 data frames:

segsize: Size of every segment
bp10MB: Average # of breakpoints per 10MB per chromosome
osCN: Number of oscillation events per chromosome
changepoint: Average changepoint per chromosome
copynumber: Average copy number per chromosome
bpchrarm: Number of breakpoints per chromosome arm

findExposures

Description

This function is used to find the signature exposures of a set of samples using fixed signatures found earlier. It does this using the least squares optimization method with constraints to keep the output as non-negative using the lsei function from the package limSolve.

Usage

findExposures(scm, fixedSigs, runName = "", saveRes = FALSE, saveDir = NULL)

Arguments

scm

Sample by component matrix

fixedSigs

The fixed signatures

runName

Name of the run used in file names, Default is ""

saveRes

Whether or not to save the results, Default is FALSE

saveDir

Where to save the results, must be provided if using saveDir

Value

Returns the resulting matrix of exposures

Examples

findExposures(t(scmExp), sigsExp)

fitComponent

Description

This function is used to fit a mixture model of either normal or poisson distributions to the inpput data. This function is mainly used by the fitModels function to create the components from the extracted features.

Usage

fitComponent(
  toFit,
  min_prior = 0.001,
  min_comp = 2,
  max_comp = 10,
  dist = "norm",
  pR = FALSE,
  seed = 77777,
  model_sel = "BIC",
  niter = 10000,
  nrep = 1
)

Arguments

toFit

Extracted features to fit models to.

min_prior

Minimum prior probability of a cluster. Default is 0.001.

min_comp

Minimum number of models to fit. Default is 2.

max_comp

Maximum number of models to fit. Default is 10.

dist

Type of distribution to fit. Either "norm" or "pois". Default "norm"

pR

Peak Reduction reduces peaks in modeling to make modeling easier. Default is FALSE.

seed

Seed to be used for modeling. Default is 77777

model_sel

Type of model_selection method to be used. Default "BIC". See flexmix package for more options.

niter

Max number of iterations for modeling. Default is 1000.

nrep

Number of repetitions for modeling attempts. Default is 1.

Value

Returns the flexmix object for the fit model.

Examples

fitComponent(featsExp$bp10MB[,2]) #Fits 2-10 normal distributions

#Tries to fit exactly 4 poisson distributions
fitComponent(featsExp$osCN[,2],dist="pois",min_comp = 4, max_comp = 4)

fitModels

Description

This function takes all of the extracted copy number features and attempts to fit a mixture of poisson and normal distributions to the data, and returns a mixture of components that can be used to build the signatures. The order of features is "segsize","bp10MB","osCN","changepoint","copynumber","bpchrarm". Therefore if you only want to change the maximum number of components for osCN to 5 then you would use max_comps = c(10,10,5,10,10,10).

Usage

fitModels(
  CN_features,
  max_comps = NULL,
  min_comps = NULL,
  cores = 1,
  pR = FALSE,
  min_prior = NULL,
  featsToModel = CNSigs::defaultFeats
)

Arguments

CN_features

List of features received from extractCopynumberFeatures

max_comps

vector of length 6 specifying the max number of components for each feature. default is 10 for all features

min_comps

vector of length 6 specifying the min number of components for each feature. default is 2 for all features

cores

Number of parallel cores to use. Default is 1.

pR

Peak Reduction reduces peaks in modeling to make modeling easier. Default is FALSE.

min_prior

Used to override the minimum prior probabilty of a cluster

featsToModel

The names of the features to extract.

Value

Returns a list of the different components that contain flexmix objects for each feature

Examples


fitModels(featsExp)

#Models an exact number of components, useful when comparing two different
#datasets
min_comps = c(7, 3, 3, 2, 2, 3)
max_comps = c(7, 3, 3, 2, 10, 3)
fitModels(featsExp, max_comps, min_comps)

generateSCM

Description

This function takes in an extracted set of features and a defined set of components, and calculates the sum of the posterior probabilities for each feature. This sum represents how much each component contributes to a sample and corresponds to one column in the matrix.

Usage

generateSCM(feats, comps, runName = "", saveRes = FALSE, saveDir = NULL)

Arguments

feats

List of features received from extractCopynumberFeatures

comps

List of components modelled using fitModels

runName

Name of the run used in file names, Default is ""

saveRes

Whether or not to save the results, Default is FALSE

saveDir

Where to save the results, Default is getwd()

Value

Creates a sample by component matrix

Examples

generateSCM(featsExp,compsExp)

matchSigs

Description

This function is used to check to find a mapping between two similar sets of signatures. It compares the signature values to see how similar the proposed signatures are and shows you the best matches. It uses the measure of cosine similarity to compare signatures. The two signature sets must have the same underlying components to be matched.

Usage

matchSigs(referenceSigs, toCompareSigs)

Arguments

referenceSigs

Signature matrix from your reference analysis

toCompareSigs

Signature matrix from run that you want to compare to reference

Value

Prints out the signature mapping and returns the avg similarity

Examples

matchSigs(referenceExp$sigs, referenceExp$sigs)

plotComp

Description

This function plots the specified mixed model so that it can be visualized. It utilizes the gamma function to allow approximations of the poisson distributions, allowing for a smooth plot.

Usage

plotComp(comps, compName, saveRes = FALSE, saveDir = NULL, runName = "")

Arguments

comps

List of components to be plotted. Output from fitModels.

compName

Name of the component to plot

saveRes

Whether or not to save results. Default is F.

saveDir

Where to save plots, must be provided if using saveDir

runName

Used to add a runName to the file output. Default is "".

Value

Plots the components to allow visualization

Examples

plotComp(compsExp, compName = "segsize")

plotComps

Description

This function plots all of the mixed models so that it can be visualized. It utilizes the gamma function to allow approximations of the poisson distributions, allowing for a smooth plot.

Usage

plotComps(comps, saveRes = FALSE, saveDir = NULL, runName = "")

Arguments

comps

List of components to be plotted. Output from fitModels.

saveRes

Whether or not to save results. Default is F.

saveDir

Where to save plots. Default is getwd()

runName

Used to add a runName to the file output. Default is "".

Value

Plots all the components to allow visualization

Examples

plotComps(compsExp)

plotScm

Description

This function is used to generate the sample by component matrix plot.

Usage

plotScm(
  scm,
  runName = "",
  saveRes = FALSE,
  saveDir = NULL,
  rowOrder = FALSE,
  colOrder = TRUE
)

Arguments

scm

Sample by component matrix

runName

Name of the run used in plot titles, Default is ""

saveRes

Whether or not to save the plots, Default is FALSE

saveDir

Where to save the plots, must be provided if using saveDir

rowOrder

Ordering specification for the rows of the heatmap. Three possible options: * TRUE: Uses hierarchical clustering to determine row order. * FALSE: (default) Leaves rows in the order they were given. * A numeric vector the same length as the number of rows specifying the indices of the input matrix

colOrder

Ordering specification for the columns of the heatmap. See above for options. Default value is T.

Value

pheatmap figure of component result by sample

Examples

plotScm(scmExp)

plotScm(scmExp, rowOrder = FALSE, colOrder = FALSE)

newOrder = sample(1:ncol(scmExp), ncol(scmExp))
plotScm(scmExp, colOrder = newOrder)

plotSegs

Description

This function is used to create a plot of a samples segs. The input samples can either be a single data.frame of the segs of one patient or a list of data.frames for multiple samples.

Usage

plotSegs(
  samples,
  name = "",
  chrom = -1,
  gbuild = "hg19",
  sep = FALSE,
  alpha = 1
)

Arguments

samples

The samples to plot. If a list it plots both on the same plot

name

The name of the sample. Used for plot title

chrom

Which chromosome to plot. Default plots all of them.

gbuild

The reference genome build. Default is hg19. Also supports hg18 and hg38.

sep

Whether or not to place different members of the list on the same or different axis

alpha

Allows you to adjust the transparency of the lines. 0-1

Value

displays a plot of the segments

Examples

plotSegs(segDataExp[[1]]) #Plots all of the first sample's segments
plotSegs(segDataExp[[1]],1) #Only plots the first chromosome segments
plotSegs(segDataExp[1:2]) #Plots first two samples on same axis
plotSegs(segDataExp[1:2], sep = TRUE) #Plots first two samples seperately

plotSig

Description

This function is used to create a plot for the specified signature to look at the contribution of each of the components to the signatures

Usage

plotSig(sigs, sigNum)

Arguments

sigs

The dataset of component contribution to each signature

sigNum

The signature number to plot

Value

displays a plot of the signature

Examples

plotSig(referenceExp$sigs, 1) #Plots first signature

plotSigExposure

Description

This function plots the signature exposure for all of the samples as a stacked bar plot. There are a number of different options for how to sort the resulting plot.

Usage

plotSigExposure(
  sigExposure,
  saveRes = FALSE,
  saveDir = NULL,
  runName = "",
  trackData = NULL,
  sort = FALSE,
  sortOrder = "m",
  method = NULL,
  colors = NULL
)

Arguments

sigExposure

Signature exposure matrix to be plotted

saveRes

Whether or not to save results. Default is FALSE.

saveDir

Where to save plots, must be provided if using saveDir

runName

Used to add a runName to the file output. Default is "".

trackData

Data used to plot tracks

sort

Whether or not to sort the plot

sortOrder

The order in which to sort the plot

method

The method by which to sort the main plot

colors

Colors used in plotting

Details

Adding data tracks to the plot: One of the major features of this function is that it allows the user to add in some additional data for the samples to be plotted as a track alongside the main signature exposure stacked bar plot. These additional data points can be passed in as a vector of corresponding values in the same order. If you want to plot multiple tracks you can pass in a list of vectors using the trackData parameter.

Specifying how to sort the plot: When you give the function a set of trackData, it allows you to begin to specify the sortOrder. This allows your to sort the main plot in a different order. "m" represents the main plot, and "t" followed by the number of the track (ie: "t1","t2" ...) represents the tracks. By chaining the values together you can specify a variety of ways to sort the final plot. As an example, the sortOrder of "mt1t2" specifies the the plot should be sorted by the signature exposures first followed by the first track and finally the second track. In another example, the sortOrder of "t2mt1" specifies the plot to be sorted by track number 2 first followed by the signature exposures and lastly by track number 1.

Sorting method: The two methods of sorting the signature exposure are either "hclust" or "group". The hclust uses the ward.D method to cluster the exposures and then cuts the tree to split the data. The group method splits the samples into groups based on which signatures they had the highest exposure to.

Value

Plots the signature exposure to allow visualization

Examples

plotSigExposure(sigExposExp)

plotSigExposureMat

Description

This function is used to generate the signature exposure matrix heatmap plot.

Usage

plotSigExposureMat(
  sigExposure,
  runName = "",
  saveRes = FALSE,
  saveDir = NULL,
  rowOrder = FALSE,
  colOrder = TRUE
)

Arguments

sigExposure

Sample by signature matrix

runName

Name of the run used in plot titles, Default is ""

saveRes

Whether or not to save the plots, Default is FALSE

saveDir

Where to save the plots, must be provided if using saveDir

rowOrder

colOrder

Ordering specification for the columns of the heatmap. See above for options. Default value is T.

Value

pheatmap figure of signature exposure by patient

Examples

plotSigExposureMat(sigExposExp)

plotSigExposureMat(sigExposExp, rowOrder = FALSE, colOrder = FALSE)

newOrder = sample(1:ncol(sigExposExp), ncol(sigExposExp))
plotSigExposureMat(sigExposExp, colOrder = newOrder)

plotSigMat

Description

This function is used to generate the signature by component matrix plot.

Usage

plotSigMat(sigs, runName = "", saveRes = FALSE, saveDir = NULL)

Arguments

sigs

Signature by component matrix

runName

Name of the run used in plot titles, Default is ""

saveRes

Whether or not to save the plots, Default is FALSE

saveDir

Where to save the plots, must be provided if using saveDir

Value

pheatmap figure of component weights by sample

Examples

plotSigMat(sigsExp)

plotSigs

Description

This function plots all of the signatures so that they can be visualized. It does this by looping through the signatures and calling the plotSig function

Usage

plotSigs(sigs, saveRes = FALSE, saveDir = NULL, runName = "")

Arguments

sigs

The dataset of component contribution to each signature

saveRes

Whether or not to save results. Default is FALSE.

saveDir

Where to save plots, must be provided if using saveDir

runName

Used to add a runName to the file output. Default is "".

Value

Plots all the signatures to allow visualization

Examples

plotSigs(referenceExp$sigs)

postProb

Description

This function calculates the probabilities that each of the new data point falls into the distributions defined by the parameters. Used when calculating the sample by component matrix.

Usage

postProb(params, newData)

Arguments

params

A vector of the distribution parameters

newData

The new data to calculate the probabilities for

Value

Returns the probability that the newData is in the distributions

readSegs

Description

This function is used to read in the segments. It can either take a file path to a csv to read in the data, or it can take in a long data frame and convert it to the format needed for the pipeline. The variable colMap is used in order to map your column names to what the pipeline expects. For instance, if your column that has the chromosome numbers in it is titled "chrom" instead of the expected "chromosome" then you would specify the colMap as c("ID","chrom","start","end","segVal"). If your data is seperated into major and minor allele copy numbers then for the segVal part of the colMap should be formatted as "nMajor+nMinor" to let the function know to add them together.

Usage

readSegs(path, colMap = NULL, readPloidy = FALSE)

Arguments

path

The path to the .txt file with the data in it, or a folder containing the .txt files

colMap

The mapping of column names. The default is c("ID","chromosome","start","end","segVal"). If your column names vary from this please pass a vector similar to the above with the changes.

readPloidy

Whether or not the input file has ploidy and should be read

Value

Returns a segments in a list formatted to be run through the pipeline

reducePeaks

Description

This function is used to reduce the peaks within a feature distribution so that the models can be fitted properly. The flexmix package can struggle to converge on a solution if there are large spikes in the distribution.

Usage

reducePeaks(toReduce)

Arguments

toReduce

Input feature distribution

Value

Returns the input feature with the peaks reduced

The result object from Pipeline using segDataExp.

Description

The generated result object from the entire pipelin using the segDataExp. Function used to create: referenceExp = runPipeline(segDataExp,nsigs = 5)

Usage

referenceExp

Format

A list with 7 elements:

func: The function call used in the pipeline run
Input_data: The data the features were extracted from
CN_features: The extracted features
CN_components: The fitted component models
scm: The sample by component matrix
nmf_Results: The results of the NMF run
sigs: The signature by component matrix.

remapResults

Description

This function is used to remap the results from a runPipeline run with a different order of the signatures or different names. You can either give the function a new mapping of the signatures which is just a new order in which you want the signatures in. For instance, if you want to just swap the first and second signatures and you have a total of 4 signatures, you would pass in c(2,1,3,4) for the sigMap parameters. The other use case is if you want to rename the signatures. To do this you just have to pass a vector of names that is of the same length as the number of signatures.

Usage

remapResults(path, sigMap = NULL, sigNames = NULL, saveRes = FALSE)

Arguments

path

The path to the results folder to remap

sigMap

The new order for the signatures

sigNames

New signature names

saveRes

Whether or not to save results. Default is FALSE.

Details

Overall, this function will create a duplicate results folder in the same directory and regenerate all of the plots and result files into the new order or with the new names. This means that you don't have to regenerate all of the plots manually.

Value

no return

runPipeline

Description

This function allows you to run the entire Copy number signature pipeline in one go. May take a while, especially if not given multiple cores. For more information on what actually happens in the pipeline, refer to the CNSigs vignette.

Usage

runPipeline(
  segData,
  cores = 1,
  nsigs = 0,
  saveRes = FALSE,
  runName = "Run",
  rmin = 3,
  rmax = 12,
  components = NULL,
  max_comps = NULL,
  min_comps = NULL,
  fixedSigs = NULL,
  saveDir = NULL,
  smooth = FALSE,
  colMap = NULL,
  pR = FALSE,
  gbuild = "hg19",
  featsToUse = NULL,
  ploidyData = NULL,
  plot = TRUE
)

Arguments

segData

The data to be analyzed. If a path name, readSegs is used to make the list. Otherwise the list must be formatted correctly. Refer to ?readSegs for format information.

cores

The number of computer cores to be used for parallel processing

nsigs

The number of signatures to look for. Value of 0 runs the determineSigNum function to look for optimal number. Default is 0.

saveRes

Whether or not to save the resulting tables and plots. Default is FALSE

runName

Used to title plots and files when saving results

rmin

Minimum number of signatures to look for. Default is 3.

rmax

Maximum number of signatures to look for. Default is 12.

components

Can be used when fixing components. Default is NULL.

max_comps

vector of length 6 specifying the max number of components for each feature. Passed to fitModels. Default is 10 for all features

min_comps

vector of length 6 specifying the min number of components for each feature. Passed to fitModels. Default is 2 for all features

fixedSigs

Signature x Component matrix. Used when fixing signatures. Default is NULL

saveDir

Used to specify where to save the results, must be provided if using saveDir

smooth

Whether or not to smooth the input data. Default is F.

colMap

Mapping of column names when reading from text file. Default column names are ID, chromosome, start, end, segVal.

pR

Peak Reduction

gbuild

The reference genome build. Default is hg19. Also supports hg18 and hg38.

featsToUse

Vector of feature names that you wish to use

ploidyData

The ploidy data to use as a feature

plot

Whether or not to generate the plots. Default is T.

Value

Returns a list with all of the results from the pipeline

Examples

#Runs the entire pipeline on the example data giving it 6 cores and specifying
#5 signatures with a name of "TCGA Test"

runPipeline(segDataExp, cores = 6, nsigs = 5, saveRes = FALSE, "TCGA Test")

Sample by component matrix for segDataExp

Description

The generated scm for the segDataExp dataset. Generated using the function generateSCM(featsExp,compsExp). It is a matrix showing how much each extracted component contributes to each sample. Is what is put into NMF and used to create the signatures.

Usage

scmExp

Format

An object of class matrix (inherits from array) with 20 rows and 20 columns.

Segmentation Data from TCGA BRCA samples

Description

A small example subset of BRCA samples from TCGA in list format. Each item contains the segmentation data for that sample.

Usage

segDataExp

Format

A list with 20 elements and 5 variables:

ID: ID number for the sample
chromosome: chromosome the segment is found on
start: starting position of the segment
end: end position of the segment
segVal: copynumber value for the segment

Source

https://portal.gdc.cancer.gov/

segStats

Description

This function allows you to get an overview of some of the features of your samples. It outputs a summary of stats for the segments, including size, number per sample, along with various other measures.

Usage

segStats(segTabs)

Arguments

segTabs

The list of samples copy number segments

Value

Outputs a summary of the statistics

Examples

segStats(segDataExp)

The generated sigExposure from the segDataExp run

Description

The generated signature exposure matrix for the segDataExp dataset. Extracted from the referenceExp data object using referenceExp$sigExposure. This matrix shows how much each signature contributes to the patient samples.

Usage

sigExposExp

Format

An object of class data.frame with 5 rows and 20 columns.

sigSim

Description

This function is used to compare two sets of signatures by finding the similarity matrix across both signature sets. If the signatures have the same underlying components similarity is calculated using the cosine similarity. If the signatures have different underlying components the similarity is estimated using a ks-statistic based measure. See package vignette for more information.

Usage

sigSim(reference, toCompare, plot = TRUE, text = TRUE)

Arguments

reference

Results from your reference analysis

toCompare

Results from run that you want to compare to reference

plot

If T, displays the heatmap plot

text

If T, displays the similarity value on the plot

Value

Plots signature similarity and returns the avg similarity

Examples

sigSim(referenceExp, referenceExp)

The generated Signatures from the segDataExp run

Description

The generated signature by component matrix for the segDataExp dataset. Extracted from the referenceExp data object using referenceExp$sigs. This matrix shows how much each component contributes to the signatures.

Usage

sigsExp

Format

An object of class matrix (inherits from array) with 20 rows and 5 columns.

smoothSegs

Description

This function is used to attempt to smooth the input copy number segments in order to reduce the biasing affect of the technology and copy number caller used to make the segments. It does this by trying to join together close segs and removing small abbarent segments.

Usage

smoothSegs(segData, cores = 1)

Arguments

segData

The segData to be smoothed

cores

Number of cores to be used for parallel smoothing. Default is 1.

Value

Returns the smoothed segments

Examples

smoothSegs(segDataExp)

sumOfPosteriors

Description

This function is used to calculate the sum of posteriors for a given feature. It returns a vector of posterior probabilities which describe how much each component contributes to the distribution of features passed in.

Usage

sumOfPosteriors(feat, comps, name)

Arguments

feat

Feature to calculate the sum from.

comps

Component parameters for that feature

name

Name of feature to sum across

Value

Returns the sum of the posteriors for the specified feature.

validateSegData

Description

This function is used to validate and clean up all of the input data. It converts all the columns to numeric that need to be, and filters out any invalid segments, like ones with 0 or negative length. It also converts the chromosome tags to the proper format for feature extraction.

Usage

validateSegData(segData, cores = 1)

Arguments

segData

The copy number segment data

cores

The number of cores to use for parallel processing. Default 1.

Value

list of dataframes containing converted seg data

Examples

validateSegData(segDataExp)