R version: R version 4.5.2 (2025-10-31)

Bioconductor version: 3.22

Package: 0.1.2

Introduction

Sharing biological data such as genomic, epigenomic, and transcriptomic data is a crucial initiative that enhances research transparency, promotes data reuse, and facilitates large-scale data reanalysis. Representative public repositories include the Gene Expression Omnibus (GEO) (Clough et al. 2023), the Sequence Read Archive (SRA) (Katz et al. 2021), and ArrayExpress (Sarkans et al. 2020), have greatly contributed to this effort. As the value of such data continues to increase, careful consideration is required when handling individual research datasets. Each study inherently involves various technical factors (e.g., experimenters, reagents, and measurement instruments), as well as biological factors (e.g., species, developmental stage, and tissue). Both technical and biological variations have been well documented to influence the reproducibility of microarray and RNA sequencing (RNA-Seq) experiments (Fischer and Hoffmann 2022).

Meta-analysis has emerged as a promising strategy for addressing these challenges, enabling the identification of reliable gene expression changes and underlying biological processes. It is a statistical method for integrating the results of multiple studies on the same topic, and is widely applied in genome research (Tseng, Ghosh, and Feingold 2012). For example, on the basis of the vote-counting method, earlier studies identified abiotic stress-responsive genes in A. thaliana and Oryza sativa (Tamura and Bono 2022; Shintani, Tamura, and Bono 2024). This method led to the development of a p53 Expression Score to capture p53-dependent gene expression (Fischer et al. 2016). Reanalysis of transcriptomic data across multiple studies is expected to identify significant differences that were not detected in individual studies. For example, see (De Toma 2021; Nozu et al. 2024).

In our previous study (Fukuda, Kawaguchi, and Fukushima 2025), we introduced the Stress Response score (SRscore) as a new metric using a modified vote-counting method based on methods reported in earlier studies (Ono and Bono 2021; Tamura and Bono 2022). When calculating fold-changes (FCs), whereas the HN-score (Ono and Bono 2021; Tamura and Bono 2022) restricts the predefined combinations of experimental and control groups, the SRscore considers all possible combinations.

We developed the R package SRscore to facilitate a simple and intuitive transcriptome meta-analysis across multiple research projects. The SRscore package was designed to address two key issues: (1) although some existing tools employ the vote-counting method, they are not structured for seamless integration into downstream analyses, and (2) the substantial effort and time required for meta-analyses often limits research efficiency. The SRscore package offers a standardized meta-analysis pipeline that can be readily utilized by researchers without specialized expertise in bioinformatics. Its applicability extends beyond stress response analysis, encompassing a broad spectrum of two-group comparisons, including organ-specific and drug-induced responses. The primary output, the SRscore, serves as an integrative metric that quantifies the number of studies supporting a particular finding, thereby providing an intuitively interpretable measure of reproducibility and consensus.

Installation

# library(devtools)
# install_github("fusk-kpu/SRscore", build_vignettes = TRUE)
library(SRscore)

Functions Overview

  • expand_by_group() Creates a data frame containing all possible control-treatment sample combinations within each group.

  • calcSRratio() Calculates the gene expression ratio between control and treatment samples.

  • calcSRscore() Computes the SRscore, summarizing overall gene expression trends.

  • directly_calcSRscore() Executes all steps above and returns the results as a list.

  • find_diffexp() Retrieves expression ratios for the specified gene across experiments.

Demonstration datasets

The SRscore package includes three demonstration datasets derived from microarray studies available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and our previous study (Fukuda, Kawaguchi, and Fukushima 2025). Of these, two datasets include six GEO series examining A. thaliana under abscisic acid (ABA) treatment conditions, corresponding to the accession numbers GSE6638, GSE7112, GSE19520, GSE28800, GSE39384, and GSE135489. The demo datasets are a practical resource for users to gain a basic understanding of the SRscore analysis workflow. The following datasets are available in the SRscore package.

MetadataABA

The SRscore package includes metadata, provided as object MetadataABA. These datasets contain information corresponding to the microarray studies described above. The following details were extracted from GEO records associated with the listed GEO accession numbers.

  • Study ID (e.g., GSE; a unique identifier for each study)
  • Sample ID (e.g., GSM; a unique identifier for each sample)
  • Tissue
  • Processing conditions

The primary role of the metadata in the SRscore package is to define the sample combinations to be compared within each study. Consequently, the minimum required information comprises only the study ID and sample ID.

library(tibble)
data("MetadataABA")
tibble(MetadataABA)
#> # A tibble: 19 × 5
#>    Series    control_sample treated_sample treatment tissue      
#>    <chr>     <chr>          <chr>          <chr>     <chr>       
#>  1 GSE135489 GSM4012617     GSM4012620     1mM       shoot system
#>  2 GSE135489 GSM4012618     GSM4012621     1mM       shoot system
#>  3 GSE135489 GSM4012619     GSM4012622     1mM       shoot system
#>  4 GSE39384  GSM967132      GSM967140      10uM      seedling    
#>  5 GSE39384  GSM967133      GSM967141      10uM      seedling    
#>  6 GSE39384  GSM967148      GSM967156      10uM      seedling    
#>  7 GSE39384  GSM967149      GSM967157      10uM      seedling    
#>  8 GSE39384  GSM967164      GSM967172      10uM      seedling    
#>  9 GSE39384  GSM967165      GSM967173      10uM      seedling    
#> 10 GSE28800  GSM713249      GSM713255      10uM      seedling    
#> 11 GSE28800  GSM713250      GSM713256      10uM      seedling    
#> 12 GSE28800  GSM713251      GSM713257      10uM      seedling    
#> 13 GSE19520  GSM486916      GSM486928      50uM      leaf        
#> 14 GSE19520  GSM486917      GSM486929      50uM      leaf        
#> 15 GSE19520  GSM486918      GSM486930      50uM      leaf        
#> 16 GSE6638   GSM153922      GSM153924      0.5uM     seedling    
#> 17 GSE6638   GSM153923      GSM153925      0.5uM     seedling    
#> 18 GSE7112   GSM170896      GSM170897      50uM      leaf        
#> 19 GSE7112   GSM170923      GSM170931      50uM      leaf

TranscriptomeABA

The package includes a gene expression dataset called TranscriptomeABA, which stores sample × gene expression profiles. All microarray data were downloaded from GEO as raw CEL files and preprocessed using the Robust Multi-array Average (RMA) (Irizarry et al. 2003), as implemented in the Bioconductor package affy (Gautier et al. 2004). Users can also perform the analysis workflow with their own data by replacing the contents of the demo datasets, provided that the data conform to the same format. This flexibility allows the workflow to be extended to other platforms, such as RNA-Seq, to different omics fields, such as metabolomics, and to experimental conditions beyond stress responses, including drug treatments.

data("TranscriptomeABA")
tibble(TranscriptomeABA)
#> # A tibble: 1,000 × 39
#>    ensembl_gene_id GSM153922 GSM153923 GSM153924 GSM153925 GSM170896 GSM170897
#>    <chr>               <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 AT1G01010            6.56      7.17      6.66      6.92      7.13      7.01
#>  2 AT1G01030            6.85      6.68      7.01      6.28      6.65      6.74
#>  3 AT1G01040            6.61      7.47      6.81      7.32      6.70      6.28
#>  4 AT1G01050           11.7      11.8      11.8      11.9      11.5      11.4 
#>  5 AT1G01060            9.68     10.7       9.30     10.7       9.68      9.29
#>  6 AT1G01070            7.21      7.34      6.81      5.90      8.15      8.87
#>  7 AT1G01080            9.39     10.6       8.71     10.4      11.2      11.1 
#>  8 AT1G01090           13.2      12.5      12.6      12.3      12.0      10.8 
#>  9 AT1G01100           13.6      13.8      14.1      13.8      12.7      12.8 
#> 10 AT1G01110            6.89      6.62      6.95      6.73      6.12      6.28
#> # ℹ 990 more rows
#> # ℹ 32 more variables: GSM170923 <dbl>, GSM170931 <dbl>, GSM4012617 <dbl>,
#> #   GSM4012618 <dbl>, GSM4012619 <dbl>, GSM4012620 <dbl>, GSM4012621 <dbl>,
#> #   GSM4012622 <dbl>, GSM486916 <dbl>, GSM486917 <dbl>, GSM486918 <dbl>,
#> #   GSM486928 <dbl>, GSM486929 <dbl>, GSM486930 <dbl>, GSM713249 <dbl>,
#> #   GSM713250 <dbl>, GSM713251 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>,
#> #   GSM713257 <dbl>, GSM967132 <dbl>, GSM967133 <dbl>, GSM967140 <dbl>, …

SRGA (Stress Responsive Gene Atlas)

These data were generated by integrating SRscores calculated separately from the experimental datasets under 11 different stress conditions (Fukuda, Kawaguchi, and Fukushima 2025). Given that the SRscore scale differs among different conditions, the values were standardized using z-scores. In subsequent Template Matching (Pavlidis and Noble 2001), we searched for genes with similar SRscore patterns under different stress conditions based on SRGA.

data("SRGA")
tibble(SRGA)
#> # A tibble: 1,000 × 13
#>    ensembl_gene_id   ABA  Cold DC3000 Drought  Heat `High-light` Hypoxia Osmotic
#>    <chr>           <dbl> <dbl>  <dbl>   <dbl> <dbl>        <dbl>   <dbl>   <dbl>
#>  1 AT1G01010           0     0      3       1     0            0       0       0
#>  2 AT1G01030           0     0      1       1     0            0       0       0
#>  3 AT1G01040           0     0      0       0     0            0       0       0
#>  4 AT1G01050           0     0      0       0     0            0       0       0
#>  5 AT1G01060           0     4     -1       1    -1           -4      -2       1
#>  6 AT1G01070           2     0      0       0     0            0      -1       1
#>  7 AT1G01080           0    -1     -3      -2    -1            0      -1      -4
#>  8 AT1G01090           0     0      0       0     0            0       0       0
#>  9 AT1G01100           0     0      0       0     0            0       0       0
#> 10 AT1G01110           0     0      0       0     0            0       0       0
#> # ℹ 990 more rows
#> # ℹ 4 more variables: Oxidation <dbl>, Salt <dbl>, Wound <dbl>, SYMBOL <chr>

Example Workflow

1. Comparison Pairs

The expand_by_group() function generates all possible combinations between the two groups (control and stress-treated groups) in each research project.

grp <- "Series"
var1 <- "control_sample"
var2 <- "treated_sample"

ebg <- expand_by_group(
  .data = MetadataABA,
  grp = grp,
  var1 = var1,
  var2 = var2
)

unique_series <- unique(MetadataABA$Series)
unique_series
#> [1] "GSE135489" "GSE39384"  "GSE28800"  "GSE19520"  "GSE6638"   "GSE7112"

lapply(unique_series, function(x) subset(ebg, Series == x))
#> [[1]]
#>      Series control_sample treated_sample
#> 1 GSE135489     GSM4012617     GSM4012620
#> 2 GSE135489     GSM4012617     GSM4012621
#> 3 GSE135489     GSM4012617     GSM4012622
#> 4 GSE135489     GSM4012618     GSM4012620
#> 5 GSE135489     GSM4012618     GSM4012621
#> 6 GSE135489     GSM4012618     GSM4012622
#> 7 GSE135489     GSM4012619     GSM4012620
#> 8 GSE135489     GSM4012619     GSM4012621
#> 9 GSE135489     GSM4012619     GSM4012622
#> 
#> [[2]]
#>      Series control_sample treated_sample
#> 28 GSE39384      GSM967132      GSM967140
#> 29 GSE39384      GSM967132      GSM967141
#> 30 GSE39384      GSM967132      GSM967156
#> 31 GSE39384      GSM967132      GSM967157
#> 32 GSE39384      GSM967132      GSM967172
#> 33 GSE39384      GSM967132      GSM967173
#> 34 GSE39384      GSM967133      GSM967140
#> 35 GSE39384      GSM967133      GSM967141
#> 36 GSE39384      GSM967133      GSM967156
#> 37 GSE39384      GSM967133      GSM967157
#> 38 GSE39384      GSM967133      GSM967172
#> 39 GSE39384      GSM967133      GSM967173
#> 40 GSE39384      GSM967148      GSM967140
#> 41 GSE39384      GSM967148      GSM967141
#> 42 GSE39384      GSM967148      GSM967156
#> 43 GSE39384      GSM967148      GSM967157
#> 44 GSE39384      GSM967148      GSM967172
#> 45 GSE39384      GSM967148      GSM967173
#> 46 GSE39384      GSM967149      GSM967140
#> 47 GSE39384      GSM967149      GSM967141
#> 48 GSE39384      GSM967149      GSM967156
#> 49 GSE39384      GSM967149      GSM967157
#> 50 GSE39384      GSM967149      GSM967172
#> 51 GSE39384      GSM967149      GSM967173
#> 52 GSE39384      GSM967164      GSM967140
#> 53 GSE39384      GSM967164      GSM967141
#> 54 GSE39384      GSM967164      GSM967156
#> 55 GSE39384      GSM967164      GSM967157
#> 56 GSE39384      GSM967164      GSM967172
#> 57 GSE39384      GSM967164      GSM967173
#> 58 GSE39384      GSM967165      GSM967140
#> 59 GSE39384      GSM967165      GSM967141
#> 60 GSE39384      GSM967165      GSM967156
#> 61 GSE39384      GSM967165      GSM967157
#> 62 GSE39384      GSM967165      GSM967172
#> 63 GSE39384      GSM967165      GSM967173
#> 
#> [[3]]
#>      Series control_sample treated_sample
#> 19 GSE28800      GSM713249      GSM713255
#> 20 GSE28800      GSM713249      GSM713256
#> 21 GSE28800      GSM713249      GSM713257
#> 22 GSE28800      GSM713250      GSM713255
#> 23 GSE28800      GSM713250      GSM713256
#> 24 GSE28800      GSM713250      GSM713257
#> 25 GSE28800      GSM713251      GSM713255
#> 26 GSE28800      GSM713251      GSM713256
#> 27 GSE28800      GSM713251      GSM713257
#> 
#> [[4]]
#>      Series control_sample treated_sample
#> 10 GSE19520      GSM486916      GSM486928
#> 11 GSE19520      GSM486916      GSM486929
#> 12 GSE19520      GSM486916      GSM486930
#> 13 GSE19520      GSM486917      GSM486928
#> 14 GSE19520      GSM486917      GSM486929
#> 15 GSE19520      GSM486917      GSM486930
#> 16 GSE19520      GSM486918      GSM486928
#> 17 GSE19520      GSM486918      GSM486929
#> 18 GSE19520      GSM486918      GSM486930
#> 
#> [[5]]
#>     Series control_sample treated_sample
#> 64 GSE6638      GSM153922      GSM153924
#> 65 GSE6638      GSM153922      GSM153925
#> 66 GSE6638      GSM153923      GSM153924
#> 67 GSE6638      GSM153923      GSM153925
#> 
#> [[6]]
#>     Series control_sample treated_sample
#> 68 GSE7112      GSM170896      GSM170897
#> 69 GSE7112      GSM170896      GSM170931
#> 70 GSE7112      GSM170923      GSM170897
#> 71 GSE7112      GSM170923      GSM170931

2. Calculation of the SRratio

The calcSRratio() function calculates the SRratio, which is the average gene expression ratio calculated based on combinations. If the argument is.log2 is set to FALSE, the logarithmic conversion is performed using 2 as the base.

SRratio <- calcSRratio(
  .data = TranscriptomeABA,
  var1 = var1,
  var2 = var2,
  pair = ebg,
  is.log2 = TRUE
)

tibble(SRratio)
#> # A tibble: 1,000 × 20
#>    ensembl_gene_id GSM4012620 GSM4012621 GSM4012622 GSM486928 GSM486929
#>    <chr>                <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
#>  1 AT1G01010          0.110       0.161     0.364      0.326     0.195 
#>  2 AT1G01030         -0.267       0.152     0.00155    0.253     0.0655
#>  3 AT1G01040          0.280       0.118     0.0964    -1.20     -1.62  
#>  4 AT1G01050         -0.00145    -0.0498    0.0914    -0.134    -0.0378
#>  5 AT1G01060         -0.00587     0.221     0.220     -0.0879   -1.63  
#>  6 AT1G01070          0.0928      0.0119    0.152      2.49      1.95  
#>  7 AT1G01080          0.0545      0.0377   -0.129     -0.759     0.0305
#>  8 AT1G01090         -0.00156    -0.0615    0.0319    -1.01     -0.511 
#>  9 AT1G01100         -0.139      -0.107    -0.0671     0.0846    0.851 
#> 10 AT1G01110          0.0816      0.0434    0.141     -0.220     0.0114
#> # ℹ 990 more rows
#> # ℹ 14 more variables: GSM486930 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>,
#> #   GSM713257 <dbl>, GSM967140 <dbl>, GSM967141 <dbl>, GSM967156 <dbl>,
#> #   GSM967157 <dbl>, GSM967172 <dbl>, GSM967173 <dbl>, GSM153924 <dbl>,
#> #   GSM153925 <dbl>, GSM170897 <dbl>, GSM170931 <dbl>

Alternatively, you can calculate SRratio directly without using expand_by_group(), as in (Tamura and Bono 2022).

conventional_SRratio <- calcSRratio(
  TranscriptomeABA,
  var1 = var1,
  var2 = var2,
  pair = MetadataABA,
  is.log2 = TRUE
)

tibble(conventional_SRratio)
#> # A tibble: 1,000 × 20
#>    ensembl_gene_id GSM4012620 GSM4012621 GSM4012622 GSM967140 GSM967141
#>    <chr>                <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
#>  1 AT1G01010          0.244      0.0300      0.361     0.0989  -0.114  
#>  2 AT1G01030         -0.277     -0.0872      0.251    -0.0687  -0.0984 
#>  3 AT1G01040          0.333      0.0626      0.0982   -0.321   -0.147  
#>  4 AT1G01050          0.0394    -0.0205      0.0213    0.0828   0.0433 
#>  5 AT1G01060         -0.00125    0.173       0.265     0.104   -0.132  
#>  6 AT1G01070         -0.146      0.113       0.289    -0.491   -0.110  
#>  7 AT1G01080         -0.0353    -0.0239      0.0223   -0.248    0.121  
#>  8 AT1G01090          0.0373    -0.00145    -0.0671   -0.0131  -0.0894 
#>  9 AT1G01100         -0.0588    -0.185      -0.0700    0.0326   0.0345 
#> 10 AT1G01110          0.206     -0.145       0.205    -0.230    0.00207
#> # ℹ 990 more rows
#> # ℹ 14 more variables: GSM967156 <dbl>, GSM967157 <dbl>, GSM967172 <dbl>,
#> #   GSM967173 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>, GSM713257 <dbl>,
#> #   GSM486928 <dbl>, GSM486929 <dbl>, GSM486930 <dbl>, GSM153924 <dbl>,
#> #   GSM153925 <dbl>, GSM170897 <dbl>, GSM170931 <dbl>

3. Calculate SRscore

The SRscore summarizes gene expression variations across the entire dataset by classifying the SRratio according to a pre-specified threshold. However, although this threshold can be freely specified, by default, log2FC values above 1 (corresponding to a 2-fold change on the linear scale) are classified as an increase in expression, whereas values below -1 (corresponding to a 1/2-fold change on the linear scale) correspond to a decline in expression, and values between -1 and 1 signify no change. The SRscore is calculated by subtracting the total number of expression decrease determinations from the total number of expression increase determinations.

SRscore <- calcSRscore(SRratio, threshold = c(-2, 2))
head(SRscore)
#>   ensembl_gene_id up dn unchange all score
#> 1       AT1G01010  0  0       19  19     0
#> 2       AT1G01030  0  0       19  19     0
#> 3       AT1G01040  0  0       19  19     0
#> 4       AT1G01050  0  0       19  19     0
#> 5       AT1G01060  0  0       19  19     0
#> 6       AT1G01070  3  0       16  19     3
tibble(SRscore)
#> # A tibble: 1,000 × 6
#>    ensembl_gene_id    up    dn unchange   all score
#>    <chr>           <dbl> <dbl>    <dbl> <dbl> <dbl>
#>  1 AT1G01010           0     0       19    19     0
#>  2 AT1G01030           0     0       19    19     0
#>  3 AT1G01040           0     0       19    19     0
#>  4 AT1G01050           0     0       19    19     0
#>  5 AT1G01060           0     0       19    19     0
#>  6 AT1G01070           3     0       16    19     3
#>  7 AT1G01080           0     0       19    19     0
#>  8 AT1G01090           0     0       19    19     0
#>  9 AT1G01100           0     0       19    19     0
#> 10 AT1G01110           0     0       19    19     0
#> # ℹ 990 more rows

Simple Plotting

Here, we introduce fundamental visualization functions to aid in the intuitive understanding of the SRscore, the primary output of the SRscore package. Utilizing these functions allows for the straightforward assessment of the distribution of SRscores and the identification of gene groups exhibiting distinctive SRscores.

SRscore distribution

plot_SRscore_distr() aggregates the number of genes for each SRscore value (excluding zero) and visualizes this distribution as a bar chart. Setting the log argument to TRUE (default is FALSE) converts the y-axis to a logarithmic scale, making distributions across a wide range of values easier to interpret.

plot_SRscore_distr(SRscore)

plot_SRscore_distr(SRscore, log = TRUE)

Ranked SRscore

plot_SRscore_rank() sorts genes in descending order of SRscore and visualizes their distribution. The top or bottom range can be highlighted with color coding using the threshold argument (default is c(-1, 1)).

plot_SRscore_rank(SRscore)

Top SRscore

plot_SRscore_top() extracts genes with the highest absolute SRscore values and visualizes them as a bar chart. The number of top genes to visualize can be flexibly specified using the argument top_n (default is 20).

plot_SRscore_top(SRscore)

plot_SRscore_top(SRscore, top_n = 30)

All-in-One Execution

directly_calcSRscore() aggregates the results of expand_by_group(), calcSRratio(), and calcSRscore() into a list.

res <- directly_calcSRscore(
  .data1 = MetadataABA,
  grp = grp,
  var1 = var1,
  var2 = var2,
  .data2 = TranscriptomeABA,
  is.log2 = TRUE,
  threshold = c(-2, 2)
)

names(res)
#> [1] "pairs"   "SRratio" "SRscore"
tibble(res$SRscore)
#> # A tibble: 1,000 × 6
#>    ensembl_gene_id    up    dn unchange   all score
#>    <chr>           <dbl> <dbl>    <dbl> <dbl> <dbl>
#>  1 AT1G01010           0     0       19    19     0
#>  2 AT1G01030           0     0       19    19     0
#>  3 AT1G01040           0     0       19    19     0
#>  4 AT1G01050           0     0       19    19     0
#>  5 AT1G01060           0     0       19    19     0
#>  6 AT1G01070           3     0       16    19     3
#>  7 AT1G01080           0     0       19    19     0
#>  8 AT1G01090           0     0       19    19     0
#>  9 AT1G01100           0     0       19    19     0
#> 10 AT1G01110           0     0       19    19     0
#> # ℹ 990 more rows

Metadata viewer

The find_diffexp() function extracts the pre-calculated SRratio and corresponding metadata (research ID, group information, sample ID, etc.) for the specified gene. This enables users to intuitively understand how the expression of a specific gene changes in studies and samples.

set.seed(1)
res <- find_diffexp(
  sample(SRratio$ensembl_gene_id, 1),
  SRratio,
  SRscore,
  MetadataABA
)

tibble(res$result)
#> # A tibble: 19 × 6
#>    AT1G10230 Series    control_sample treated_sample treatment tissue      
#>        <dbl> <chr>     <chr>          <chr>          <chr>     <chr>       
#>  1   -0.228  GSE135489 GSM4012617     GSM4012620     1mM       shoot system
#>  2   -0.178  GSE135489 GSM4012618     GSM4012621     1mM       shoot system
#>  3   -0.246  GSE135489 GSM4012619     GSM4012622     1mM       shoot system
#>  4    1.18   GSE19520  GSM486916      GSM486928      50uM      leaf        
#>  5    1.35   GSE19520  GSM486917      GSM486929      50uM      leaf        
#>  6    0.932  GSE19520  GSM486918      GSM486930      50uM      leaf        
#>  7   -0.433  GSE28800  GSM713249      GSM713255      10uM      seedling    
#>  8   -0.0363 GSE28800  GSM713250      GSM713256      10uM      seedling    
#>  9    0.0359 GSE28800  GSM713251      GSM713257      10uM      seedling    
#> 10    0.0581 GSE39384  GSM967132      GSM967140      10uM      seedling    
#> 11    0.229  GSE39384  GSM967133      GSM967141      10uM      seedling    
#> 12    0.214  GSE39384  GSM967148      GSM967156      10uM      seedling    
#> 13    0.477  GSE39384  GSM967149      GSM967157      10uM      seedling    
#> 14    0.150  GSE39384  GSM967164      GSM967172      10uM      seedling    
#> 15    0.418  GSE39384  GSM967165      GSM967173      10uM      seedling    
#> 16   -0.222  GSE6638   GSM153922      GSM153924      0.5uM     seedling    
#> 17    0.282  GSE6638   GSM153923      GSM153925      0.5uM     seedling    
#> 18    0.608  GSE7112   GSM170896      GSM170897      50uM      leaf        
#> 19   -0.417  GSE7112   GSM170923      GSM170931      50uM      leaf
tibble(res$SRscore)
#> # A tibble: 1 × 6
#>   ensembl_gene_id    up    dn unchange   all score
#>   <chr>           <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1 AT1G10230           0     0       19    19     0

Multiple genes can also be specified:

set.seed(1)
res2 <- find_diffexp(
  sample(SRratio$ensembl_gene_id, 10),
  SRratio,
  SRscore,
  MetadataABA
)

tibble(res2$result)
#> # A tibble: 19 × 15
#>    AT1G02490 AT1G03130 AT1G04040  AT1G04330 AT1G06160 AT1G06580 AT1G08430
#>        <dbl>     <dbl>     <dbl>      <dbl>     <dbl>     <dbl>     <dbl>
#>  1   -0.0967    0.0417  -0.424    0.0660      -1.13     -0.182     0.247 
#>  2   -0.0619    0.0828  -0.614    0.192       -0.676    -0.0644    0.323 
#>  3   -0.228    -0.108   -0.600    0.190       -0.915    -0.0900    0.582 
#>  4   -0.0803   -0.760   -1.68     0.371       -1.08     -0.0134    0.0308
#>  5    0.271    -0.840   -1.74     0.357       -1.48      0.322     0.117 
#>  6    0.124    -0.937   -2.00     0.189        0.0219   -0.0672    0.342 
#>  7   -0.229    -1.43    -0.335    0.0928      -2.44     -0.102     0.701 
#>  8   -0.102    -0.393   -1.67    -0.0750      -3.00      0.0930    0.233 
#>  9   -0.0385   -0.493   -1.82     0.0482      -2.86     -0.106     0.521 
#> 10   -0.0524   -0.188   -0.00345 -0.166       -1.49     -0.0212   -0.229 
#> 11   -0.0957    0.0790   0.107   -0.0390      -1.17      0.118    -0.567 
#> 12    0.178    -0.0976  -0.0455   0.168       -1.79     -0.160     0.626 
#> 13    0.287     0.0909  -0.0420  -0.223       -1.65      0.0514   -0.103 
#> 14    0.0181   -0.126   -1.32     0.165       -1.80     -0.0835   -0.0648
#> 15    0.481    -0.0347  -1.14    -0.0000311   -1.90      0.145    -1.10  
#> 16    0.171    -1.79    -1.67    -0.107        0.407    -0.263     0.121 
#> 17    0.0578   -5.29    -0.869   -0.213       -0.111     0.206     0.719 
#> 18    0.0334   -0.966   -0.466    0.172        0.150     0.211     0.0766
#> 19   -0.0158    0.0653   0.0327  -0.0908       0.169     0.0440   -0.104 
#> # ℹ 8 more variables: AT1G10230 <dbl>, AT1G11290 <dbl>, AT1G11820 <dbl>,
#> #   Series <chr>, control_sample <chr>, treated_sample <chr>, treatment <chr>,
#> #   tissue <chr>
tibble(res2$SRscore)
#> # A tibble: 10 × 6
#>    ensembl_gene_id    up    dn unchange   all score
#>    <chr>           <dbl> <dbl>    <dbl> <dbl> <dbl>
#>  1 AT1G02490           0     0       19    19     0
#>  2 AT1G03130           0     1       18    19    -1
#>  3 AT1G04040           0     1       18    19    -1
#>  4 AT1G04330           0     0       19    19     0
#>  5 AT1G06160           0     3       16    19    -3
#>  6 AT1G06580           0     0       19    19     0
#>  7 AT1G08430           0     0       19    19     0
#>  8 AT1G10230           0     0       19    19     0
#>  9 AT1G11290           0     0       19    19     0
#> 10 AT1G11820           0     0       19    19     0

Downstream analysis

The SRscore package can be applied to various downstream analyses (e.g., enrichment analysis, heatmap, and template matching).

Enrichment Analysis

With respect to downstream analysis, it is important to identify the biological functions and pathways associated with the listed genes extracted based on SRscore values. This can be performed based on enrichment analysis using the clusterProfiler package (Yu et al. 2012). In the present study, genes assigned an SRscore of 1 or higher were designated as up-regulated genes, for which Gene Ontology (GO) enrichment analysis was performed using the clusterProfiler package. This accordingly revealed that the GO term “response to abscisic acid,” associated with the ABA stress response, was the most significantly enriched.

library(clusterProfiler)
library(ggplot2)

ego <- enrichGO(
  gene = SRscore$ensembl_gene_id[SRscore$score >= 1],
  OrgDb = "org.At.tair.db",
  keyType = "TAIR",
  ont = "BP",
  maxGSSize = 2000
)

dotplot(ego, showCategory = 5, font.size = 14) +
  theme(text = element_text(size = 14))

Heatmap

Using the find_diffexp() function to extract the SRratio for a specified group of genes linked to metadata, and applying the ComplexHeatmap package (Gu, Eils, and Schlesner 2016), provides visual confirmation of the extracted information

library(ComplexHeatmap)
#> Loading required package: grid
#> ========================================
#> ComplexHeatmap version 2.26.0
#> Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
#> Github page: https://github.com/jokergoo/ComplexHeatmap
#> Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
#> 
#> If you use it in published research, please cite either one:
#> - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
#> - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
#>     genomic data. Bioinformatics 2016.
#> 
#> 
#> The new InteractiveComplexHeatmap package can directly export static 
#> complex heatmaps into an interactive Shiny app with zero effort. Have a try!
#> 
#> This message can be suppressed by:
#>   suppressPackageStartupMessages(library(ComplexHeatmap))
#> ========================================
library(RColorBrewer)

cor_breaks <- seq(-2, 2, length.out = 51)
cor_color <- colorRampPalette(c("blue", "white", "red"))(51)

annotation_row <- res2$result[, c("treatment", "tissue")]
pal_treatment <- brewer.pal(length(unique(annotation_row$treatment)), "Set1")
pal_tissue <- brewer.pal(length(unique(annotation_row$tissue)), "Set2")

names(pal_treatment) <- unique(annotation_row$treatment)
names(pal_tissue) <- unique(annotation_row$tissue)

ComplexHeatmap::pheatmap(
  as.matrix(res2$result[, sapply(res2$result, is.numeric)]),
  breaks = cor_breaks,
  color = cor_color,
  cluster_rows = FALSE,
  name = "SRratio",
  annotation_row = annotation_row,
  annotation_colors = list(
    treatment = pal_treatment,
    tissue = pal_tissue
    )
)

Template Matching

By applying Template Matching method (Pavlidis and Noble 2001), other genes with similar patterns can be identified based on the specific SRscore pattern (template) of a given gene. Using galactinol synthase 3 (GolS3), one of the cold stress response genes, as an example, Figure 4 shows the results obtained when extracting the top five genes with the most similar SRscore patterns from the 1,000 genes contained within the sample data SRGA.

library(genefilter)
#> 
#> Attaching package: 'genefilter'
#> The following object is masked from 'package:ComplexHeatmap':
#> 
#>     dist2
library(DT)

cl <- colnames(Filter(is.numeric, SRGA))
df <- as.matrix(column_to_rownames(SRGA, var = "ensembl_gene_id")[cl])

template <- "AT1G09350"

close_genes <- genefinder(
  df,
  ilist = template,
  numResults = 5,
  method = "euclidean"
)

datatable(
  SRGA[SRGA$ensembl_gene_id == template, ],
  options = list(dom = "lrtBip"),
  rownames = FALSE
)

datatable(
  SRGA[close_genes[[1]]$indices, ],
  options = list(dom = "lrtBip"),
  rownames = FALSE,
)

Summary

This study demonstrates the utility and applicability of the SRscore package (https://github.com/fusk-kpu/SRscore) by applying it to real datasets. This presents a rational and reproducible workflow for generating control-treatment comparisons, calculating gene expression ratios, and scoring expression patterns across multiple transcriptome datasets. The package streamlines transcriptome meta-analysis, enhances reproducibility, and enables researchers to focus on interpreting the biological significance of the results rather than expending effort on data organization and manual calculations. Furthermore, the core functions of the SRscore package are not restricted to transcriptomic data but can also be applied to other omics data, such as metabolomics, thereby making it a versatile tool for future applications.

Session info

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sequoia 15.5
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Asia/Tokyo
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] grid      stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] DT_0.34.0             genefilter_1.92.0     RColorBrewer_1.1-3   
#> [4] ComplexHeatmap_2.26.0 tibble_3.3.0          SRscore_0.1.2        
#> [7] BiocStyle_2.38.0     
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.50.0       circlize_0.4.17       shape_1.4.6.1        
#>  [4] rjson_0.2.23          xfun_0.55             bslib_0.9.0          
#>  [7] htmlwidgets_1.6.4     GlobalOptions_0.1.3   lattice_0.22-7       
#> [10] Biobase_2.70.0        crosstalk_1.2.2       vctrs_0.6.5          
#> [13] tools_4.5.2           generics_0.1.4        stats4_4.5.2         
#> [16] parallel_4.5.2        AnnotationDbi_1.72.0  RSQLite_2.4.5        
#> [19] cluster_2.1.8.1       blob_1.2.4            pkgconfig_2.0.3      
#> [22] Matrix_1.7-4          S4Vectors_0.48.0      lifecycle_1.0.4      
#> [25] compiler_4.5.2        Biostrings_2.78.0     tinytex_0.58         
#> [28] Seqinfo_1.0.0         codetools_0.2-20      clue_0.3-66          
#> [31] htmltools_0.5.9       sass_0.4.10           yaml_2.3.12          
#> [34] pillar_1.11.1         crayon_1.5.3          jquerylib_0.1.4      
#> [37] tidyr_1.3.2           cachem_1.1.0          magick_2.9.0         
#> [40] iterators_1.0.14      foreach_1.5.2         tidyselect_1.2.1     
#> [43] digest_0.6.39         dplyr_1.1.4           purrr_1.2.0          
#> [46] bookdown_0.46         splines_4.5.2         fastmap_1.2.0        
#> [49] colorspace_2.1-2      cli_3.6.5             magrittr_2.0.4       
#> [52] survival_3.8-3        XML_3.99-0.20         utf8_1.2.6           
#> [55] withr_3.0.2           bit64_4.6.0-1         XVector_0.50.0       
#> [58] rmarkdown_2.30        httr_1.4.7            matrixStats_1.5.0    
#> [61] bit_4.6.0             otel_0.2.0            png_0.1-8            
#> [64] GetoptLong_1.1.0      memoise_2.0.1         evaluate_1.0.5       
#> [67] knitr_1.51            IRanges_2.44.0        doParallel_1.0.17    
#> [70] rlang_1.1.6           Rcpp_1.1.0            xtable_1.8-4         
#> [73] glue_1.8.0            DBI_1.2.3             BiocManager_1.30.27  
#> [76] BiocGenerics_0.56.0   rstudioapi_0.17.1     annotate_1.88.0      
#> [79] jsonlite_2.0.0        R6_2.6.1              MatrixGenerics_1.22.0

References

Clough, Emily, Tanya Barrett, Stephen E Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F Kim, Maxim Tomashevsky, et al. 2023. “NCBI GEO: Archive for Gene Expression and Epigenomics Data Sets: 23-Year Update.” Nucleic Acids Research 52 (D1): D138–44. https://doi.org/10.1093/nar/gkad965.
De Toma, Cesar AND Dierssen, Ilario AND Sierra. 2021. “Meta-Analysis of Transcriptomic Data Reveals Clusters of Consistently Deregulated Gene and Disease Ontologies in down Syndrome.” PLOS Computational Biology 17 (9): 1–26. https://doi.org/10.1371/journal.pcbi.1009317.
Fischer, Martin, Patrick Grossmann, Megha Padi, and James A. DeCaprio. 2016. “Integration of TP53, DREAM, MMB-FOXM1 and RB-E2F Target Gene Analyses Identifies Cell Cycle Gene Regulatory Networks.” Nucleic Acids Research 44 (13): 6070–86. https://doi.org/10.1093/nar/gkw523.
Fischer, Martin, and Steve Hoffmann. 2022. “Synthesizing Genome Regulation Data with Vote-Counting.” Trends in Genetics 38 (12): 1208–16. https://doi.org/https://doi.org/10.1016/j.tig.2022.06.012.
Fukuda, Yusuke, Kohei Kawaguchi, and Atsushi Fukushima. 2025. “AtSRGA: A Shiny Application for Retrieving and Visualizing Stress-Responsive Genes in Arabidopsis Thaliana.” Plant Physiology 197 (4): kiaf105. https://doi.org/10.1093/plphys/kiaf105.
Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—Analysis of Affymetrix GeneChip Data at the Probe Level.” Bioinformatics 20 (3): 307–15. https://doi.org/10.1093/bioinformatics/btg405.
Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics 32 (18): 2847–49.
Irizarry, Rafael A, Bridget Hobbs, Francois Collin, Yasmin D Beazer-Barclay, Kristen J Antonellis, Uwe Scherf, and Terence P Speed. 2003. “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data.” Biostatistics 4 (2): 249–64.
Katz, Kenneth, Oleg Shutov, Richard Lapoint, Michael Kimelman, J Rodney Brister, and Christopher O’Sullivan. 2021. “The Sequence Read Archive: A Decade More of Explosive Growth.” Nucleic Acids Research 50 (D1): D387–90. https://doi.org/10.1093/nar/gkab1053.
Nozu, Ryo, Mitsutaka Kadota, Masaru Nakamura, Shigehiro Kuraku, and Hidemasa Bono. 2024. “Meta-Analysis of Gonadal Transcriptome Provides Novel Insights into Sex Change Mechanism Across Protogynous Fishes.” Genes to Cells 29 (11): 1052–68. https://doi.org/https://doi.org/10.1111/gtc.13166.
Ono, Yoko, and Hidemasa Bono. 2021. “Multi-Omic Meta-Analysis of Transcriptomes and the Bibliome Uncovers Novel Hypoxia-Inducible Genes.” Biomedicines 9 (5). https://doi.org/10.3390/biomedicines9050582.
Pavlidis, P, and W S Noble. 2001. “Analysis of Strain and Regional Variation in Gene Expression in Mouse Brain.” Genome Biol. 2 (10): RESEARCH0042.
Sarkans, Ugis, Anja Füllgrabe, Ahmed Ali, Awais Athar, Ehsan Behrangi, Nestor Diaz, Silvie Fexova, et al. 2020. “From ArrayExpress to BioStudies.” Nucleic Acids Research 49 (D1): D1502–6. https://doi.org/10.1093/nar/gkaa1062.
Shintani, Mitsuo, Keita Tamura, and Hidemasa Bono. 2024. “Meta-Analysis of Public RNA Sequencing Data of Abscisic Acid-Related Abiotic Stresses in Arabidopsis Thaliana.” Frontiers in Plant Science Volume 15 - 2024. https://doi.org/10.3389/fpls.2024.1343787.
Tamura, Keita, and Hidemasa Bono. 2022. “Meta-Analysis of RNA Sequencing Data of Arabidopsis and Rice Under Hypoxia.” Life 12 (7). https://doi.org/10.3390/life12071079.
Tseng, George C., Debashis Ghosh, and Eleanor Feingold. 2012. “Comprehensive Literature Review and Statistical Considerations for Microarray Meta-Analysis.” Nucleic Acids Research 40 (9): 3785–99. https://doi.org/10.1093/nar/gkr1265.
Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: An R Package for Comparing Biological Themes Among Gene Clusters.” OMICS 16 (5): 284–87.

Appendix


Figure 1. Overview of the SRscore package
The SRscore package is an R package that enables the estimation of variations in gene expression across multiple datasets through meta-analysis. The SRscore calculation process is illustrated in three key steps within the workflow, each highlighted in corresponding colored boxes. (a) Metadata and gene expression data are loaded. (b) Sample combinations are determined to compare expression levels. (c) SRratio is calculated. The procedure is similar to that for fold change (FC); however, in this study, FC is calculated for all possible pairs and averaged. Batch effects are considered, and calculations are performed separately for each dataset (e.g., the GEO series). (d) SRscore is calculated. Based on the SRratio calculated for each dataset, gene expression changes are classified as follows (SRratio ≥ 1: expression increase, −1 < SRratio < 1: no change, SRratio ≤ −1: expression decrease). The number of expression decreases is subtracted from the number of expression increases to obtain the SRscore, and gene expression variation is evaluated based on its magnitude. (e) The SRratio, SRscore, etc., of genes (or groups of genes) of interest are checked while linking them to metadata.