| Type: | Package |
| Title: | Comprehensive Database for Pathway Enrichment Analysis |
| Version: | 0.1.0 |
| Description: | Provides access to large-scale genomics data from the South Dakota State University's bioinformatics database, a unified platform for pathway analysis of over 13,000 organisms. It includes various gene mappings, gene characteristics, and pathway mapping data from KEGG, GOBP, GOCC, and many more pathway databases. Also provides various helper functions for processing RNA-Seq data for differential expression analysis and pathway enrichment analysis, occasionally sourced from code from Integrated Differential Expression & Pathway analysis (iDEP), developed by Ge, S.X., Son, E.W. & Yao, R. (2018) <doi:10.1186/s12859-018-2486-6>. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Language: | en-US |
| Imports: | DBI, dplyr, edgeR, R.utils, RSQLite, stats, utils, tools |
| RoxygenNote: | 7.3.3 |
| Suggests: | clusterProfiler, curl, knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1.0) |
| URL: | https://github.com/aidanfred24/pathdb, https://aidanfred24.github.io/pathdb/ |
| BugReports: | https://github.com/aidanfred24/pathdb/issues |
| Config/Needs/editorial: | spelling |
| NeedsCompilation: | no |
| Packaged: | 2026-05-29 23:02:25 UTC; aidan |
| Author: | Aidan Frederick [aut, cre], Xijin Ge [fnd] |
| Maintainer: | Aidan Frederick <Aidan.Frederick@sdstate.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-03 13:00:02 UTC |
pathdb: Comprehensive Database for Pathway Enrichment Analysis
Description
Provides access to large-scale genomics data from the South Dakota State University's bioinformatics database, a unified platform for pathway analysis of over 13,000 organisms. It includes various gene mappings, gene characteristics, and pathway mapping data from KEGG, GOBP, GOCC, and many more pathway databases. Also provides various helper functions for processing RNA-Seq data for differential expression analysis and pathway enrichment analysis, occasionally sourced from code from Integrated Differential Expression & Pathway analysis (iDEP), developed by Ge, S.X., Son, E.W. & Yao, R. (2018) doi:10.1186/s12859-018-2486-6.
Author(s)
Maintainer: Aidan Frederick Aidan.Frederick@sdstate.edu
Other contributors:
Xijin Ge Xijin.Ge@sdstate.edu [funder]
See Also
Useful links:
Report bugs at https://github.com/aidanfred24/pathdb/issues
TERM2GENE Data Prep for iDEP Database
Description
Prepares background genes for enrichment analysis functions in the format of TERM2GENE data, using pathway information from various databases. Requires ID for a species, and can filter for specific vector of genes.
Usage
T2G_prep(species_id = NULL, category = "GOBP", genes = NULL)
Arguments
species_id |
Numeric. The ID of a desired species from database, found
using |
category |
Character. A vector or character constant of pathway categories/databases (e.g. KEGG, GOBP, GOCC, etc.). It is not recommended to use all categories, as some species have many, leading to performance issues |
genes |
Character. A character vector of genes to add to query |
Value
A data frame containing TERM2GENE Data (pathways to genes)
Examples
# CAUTION: The human database is very large, running these examples require
# the download of the human database.
# Prepare background genes mapping for Hypoxia dataset
# Useful for pathway enrichment analysis of our data
bg_genes <- T2G_prep(
species_id = 96,
category = "KEGG",
genes = rownames(hypoxia_deseq)
)
head(bg_genes)
Query Bioinformatics Database
Description
Retrieves database (.db file) from the SDSU bioinformatics database, creates a connection via SQLite
Usage
connect_database(species_id = NULL)
Arguments
species_id |
ID of species selected (Loads organism info data if NULL) |
Value
SQLite connection to the downloaded file
Examples
# Connect to organism information database
conn <- connect_database()
# Query information using connection
x <- DBI::dbGetQuery(
conn = conn,
statement = "select * from orgInfo;"
)
head(x)
# Disconnect from database file
DBI::dbDisconnect(conn = conn)
# Connect to species information database (e.g. Indian Cobra)
conn <- connect_database(species_id = 99)
# Query information using connection
x <- DBI::dbGetQuery(
conn = conn,
statement = "select * from geneInfo;"
)
head(x)
# Disconnect from database file
DBI::dbDisconnect(conn = conn)
Convert Gene IDs to Ensembl
Description
Queries the database to map user-provided gene identifiers to Ensembl/Entrez IDs. To ensure best matching and conversion, please verify that all gene identifiers have no whitespace and are at least 2 characters long.
Usage
convert_id(genes, data = NULL, species_id, id_type = "ens")
Arguments
genes |
A vector or character string of gene identifiers to convert. |
data |
Optional data frame or matrix. If provided, the function attempts
to match |
species_id |
Numeric. The ID of the species for the database connection. |
id_type |
Character. The type of ID to convert to:
|
Value
A data frame.
If
dataisNULL: Returns a mapping table with original IDs and IDs of selected type.If
datais provided: Returnsdatamerged with the IDs of selected type. ReturnsNULLifspecies_idis missing or no matches are found.Any whitespace found in original IDs will be removed.
Examples
# CAUTION: The human database is very large, running these examples require
# the download of the human database.
# View our experimental gene IDs
head(rownames(hypoxia_reads))
# Convert IDs to Ensembl format for further analysis
ens_conv <- convert_id(genes = rownames(hypoxia_reads),
species_id = 96)
# Yields a conversion table for our genes
head(ens_conv)
# Can also convert to Entrez IDs, if needed
entrez_conv <- convert_id(genes = rownames(hypoxia_reads),
species_id = 96,
id_type = "entrez")
# Yields a conversion table for our genes
head(entrez_conv)
# We want to automatically convert our IDs within our data
ens_hypoxia <- convert_id(genes = rownames(hypoxia_reads),
species_id = 96,
data = hypoxia_reads)
# Original data
head(hypoxia_reads)
# Converted data
head(ens_hypoxia)
Get Gene Information
Description
Retrieves gene information (e.g., Ensembl IDs, positions) for a specific species, optionally filtered by a list of user-provided gene identifiers after converting to Ensembl IDs.
Usage
get_genes(species_id, genes = NULL)
Arguments
species_id |
Numeric. The ID of the desired species. |
genes |
A vector or list of gene identifiers to filter by.
If |
Value
A data frame containing gene information (from the geneInfo table).
If genes are provided, the result is filtered to match the converted Ensembl IDs.
Examples
# CAUTION: The human database is very large, running these examples require
# the download of the human database.
# We have gene IDs that are not commonly recognized
head(rownames(hypoxia_reads))
# Retrieve gene information for genes in our sample
# Converts to Ensembl IDs first
genes <- get_genes(species_id = 96,
genes = rownames(hypoxia_reads))
head(genes)
# Retrieve all genes for desired species
all_genes <- get_genes(species_id = 96)
head(all_genes)
# This is the same as running get_table(96, "geneInfo")
all(get_genes(96) == get_table(96, "geneInfo"), na.rm = TRUE)
Get Species Pathways
Description
Retrieves pathway information for a specific species and optionally filters for specific genes.
Usage
get_pathways(species_id, genes = NULL, category = "GOBP")
Arguments
species_id |
Numeric. The ID of the desired species
(e.g., from |
genes |
A vector or column of a data frame containing gene IDs of interest.
If |
category |
Character. A vector or character constant of pathway categories/databases (e.g. KEGG, GOBP, GOCC, etc.). It is not recommended to use all categories, as some species have many, leading to performance issues |
Details
The function first retrieves the pathway and pathwayInfo tables for the
specified species. If a list of genes is provided, it converts the IDs to
Ensembl IDs, matches them against the pathway map, and joins the results
with pathway metadata.
Value
A data frame containing pathway information. If genes are provided,
the data frame is filtered to include only pathways containing those genes
and joined with gene mapping data.
Examples
# CAUTION: The human database is very large, running these examples require
# the download of the human database.
# Get GOBP pathways for our genes of interest
path_info <- get_pathways(
species_id = 96,
genes = rownames(hypoxia_reads),
category = "GOBP"
)
head(path_info)
Get Table from Selected Database
Description
Retrieves a specific table from the database for a selected species.
Usage
get_table(species_id = NULL, table = NULL)
Arguments
species_id |
Numeric. The selected species ID.
If |
table |
Character. The name of the table to retrieve (e.g., "geneInfo", "pathway").
If |
Value
A data frame containing the data from the selected table.
See Also
list_tables to see available tables for a species.
Examples
# Retrieve geneInfo table for Indian Cobra Species
cobra_genes <- get_table(species_id = 99,
table = "geneInfo")
# View table
head(cobra_genes)
Example KEGG Pathway Mapping for Hypoxia
Description
An example TERM2GENE mapping tailored for the hypox_deseq dataset,
demonstrating the T2G_prep function's output.
Usage
hypoxia_T2G
Format
A data frame with 2 columns:
- description
Pathway ID or description
- gene
Ensembl Gene ID
Hypoxia Data Differential Expression Analysis Results
Description
Results of performing differential expression analysis (DESeq2) on gene counts gathered in the following experiment: RNAseq transcriptomic profile of glioblastoma stem-like cells derived from U87MG cell line treated with a selective A3 adenosine receptor antagonist (MRS1220) under hypoxia.
Usage
hypoxia_deseq
Format
hypox_deseq
A data frame with 13,818 rows and 6 columns:
- baseMean
Mean of normalized counts for all samples
- log2FoldChange
Log2 fold change between treated and control
- lfcSE
Standard error estimate for the log2 fold change estimate
- stat
Wald statistic
- pvalue
Wald test p-value
- padj
Benjamini-Hochberg adjusted p-value
Source
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100146
Hypoxia Data Gene Counts
Description
Gene counts gathered in the following experiment: RNAseq transcriptomic profile of glioblastoma stem-like cells derived from U87MG cell line treated with a selective A3 adenosine receptor antagonist (MRS1220) under hypoxia.
Usage
hypoxia_reads
Format
hypoxia_reads
A data frame with 35,238 rows and 4 columns:
- MRS1220_hypoxia_rep1
Treatment replication 1 counts
- MRS1220_hypoxia_rep2
Treatment replication 2 counts
- vehicle_hypoxia_rep1
Control replication 1 counts
- vehicle_hypoxia_rep2
Control replication 2 counts
Source
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100146
List Table Options
Description
Lists all available tables within the database for a specific species.
Usage
list_tables(species_id = NULL)
Arguments
species_id |
Numeric. The selected species ID.
If |
Value
A character vector of table names available in the database connection.
Examples
# List all tables available for species 99 (Indian Cobra)
list_tables(species_id = 99)
Retrieve Pathway Categories
Description
Retrieves pathway category options (e.g. KEGG, GOBP, etc.) for a given species. May take longer for well-documented species (i.e. Human)
Usage
path_categories(species_id = NULL)
Arguments
species_id |
Numeric. The ID of a desired species from database, found
using |
Value
Data frame of pathway categories for given species
Examples
# Get pathway categories for species 99 (Indian Cobra)
categories <- path_categories(species_id = 99)
head(categories)
Filter Pathway Information By Category
Description
Retrieves pathway mapping information for a specific species, filtered by one or more pathway categories (e.g., "GOBP", "KEGG"). Optionally, the results can be further restricted to a specific list of genes.
Usage
path_filter(species_id, genes = NULL, category = "GOBP")
Arguments
species_id |
Numeric. The ID of the species to search for. |
genes |
Character vector (optional). A vector of gene IDs to filter the pathways.
If |
category |
Character or character vector. The pathway category or categories
to filter by (e.g., |
Value
A data frame containing the pathway mapping information (such as gene, pathway ID, and description) for the specified categories and genes.
Examples
# Get all GO Biological Process pathways for Human (ID 96)
gobp_paths <- path_filter(species_id = 96, category = "GOBP")
# Get KEGG pathways for specific genes in a dataset
data(hypoxia_reads)
kegg_paths <- path_filter(
species_id = 96,
genes = rownames(hypoxia_reads)[1:100],
category = "KEGG"
)
Process Gene Expression Data
Description
Performs pre-processing, missing value imputation, filtering, and transformation on gene expression count data.
Usage
process_data(
data,
missing_value = "geneMedian",
min_cpm = 0.5,
n_min_samples = 1,
rescale = FALSE
)
Arguments
data |
A numeric matrix or data frame (> 1 columns) of gene expression counts. |
missing_value |
Character. Method to handle missing values. Options:
|
min_cpm |
Numeric. Minimum counts per million threshold for filtering genes. |
n_min_samples |
Numeric. Minimum number of samples that must meet
the |
rescale |
Logical. TRUE allows for rescaling if values are exceedingly large. |
Value
The processed and transformed data matrix.
Examples
# Check example data
summary(pathdb::hypoxia_reads)
nrow(pathdb::hypoxia_reads)
# YOU decide how your data is transformed.
# Here, we want to:
# Replace missing values with median
# Set minimum counts-per-million of 0.4
# Meet CPM threshold in 2 samples
# Keep raw counts
hypox_filtered <- process_data(data = pathdb::hypoxia_reads,
missing_value = "geneMedian",
min_cpm = 0.4,
n_min_samples = 2)
# Check filtered data
summary(hypox_filtered)
nrow(hypox_filtered)
Search for Species by Name
Description
Searches the organism database for species matching a query string.
Usage
search_species(query, name_type = "all")
Arguments
query |
Character. The species name, partial name, or ID to search for. |
name_type |
Character. The type of name to search against. Options:
|
Value
A data frame containing information for all matching species. Throws an error if no species are found.
Examples
# Search all names for "Human"
search_species(query = "Human", name_type = "all")
# Search primary names for "Human"
search_species(query = "Human", name_type = "primary")
# Search academic names for "Homo sapiens"
search_species(query = "Homo sapiens", name_type = "academic")
# Search by species ID
search_species(query = 96, name_type = "id")