11 PADOG

In this script, we will go through the process of running PADOG. Note that PADOG requires the transformed RNA-Seq data and the genes in the Entrez ID format.

11.1 Libraries

All necessary packages are available on Bioconductor, and should be installed from there if not already available on your machine.

install.packages("BiocManager")
BiocManager::install("PADOG")
BiocManager::install("tweeDEseqCountData")
BiocManager::install("KEGGREST")

Load libraries

Summary of the packages:

  • PADOG: Provides the implementation of the method PADOG

  • tweeDEseqCountData: In addition to the preprocessed Pickrell data set, which we load in the following step, PADOG requires the conditions of the samples

  • KEGGREST: To obtain the character value for a variety of organisms the user is required to provide in the function padog(). While the human organism is indicated by default, it might become necessary to consult the KEGGREST package if working with a different organism

11.2 Load Data

# we load the voom-transformed Pickrell data set 
load("data/Results_RNASeq_Transformation/expression_data_voomtransformed_Entrez.Rdata")

# alternatively: load the gene expression measurements that have been transformed using 
# load("data/expression_data_vsttransformed_Entrez.Rdata")

# additionally, we load the pickrell data set so that we can access the sample conditions
data(pickrell)

The sample conditions (i.e. phenotype labels) of the pickrell data set can be accessed using

pickrell.eset$gender 
#>  [1] male   male   female male   female male   female male  
#>  [9] female male   female male   female male   female male  
#> [17] female female male   female male   female female male  
#> [25] female male   female female male   female female male  
#> [33] female male   female female female female male   female
#> [41] male   male   female female male   female female male  
#> [49] female female male   female male   male   female female
#> [57] male   female male   female male   female female male  
#> [65] female female female male   female
#> Levels: female male

We proceed with the voom-transformed pickrell data set and the corresponding phenotype labels.

# gene expression measurements (transformed)
# note: you can also proceed with the vst-transformed gene expression measurements 
expression_data_transformed <- expression_data_voomtransformed_Entrez

# sample conditions
sample_conditions <- pickrell.eset$gender

11.3 PADOG

11.3.1 Step 1: Prepare Sample Conditions

First, we inspect the form of the initial (raw) sample conditions.

# look at the class: 
class(sample_conditions)
#> [1] "factor"
# -> the sample labels are already coded as factors

# the current levels are:
levels(sample_conditions)
#> [1] "female" "male"

PADOG requires a character vector with the class labels of the samples. It can only contain “c” for control samples or “d” for disease samples.

# prepare sample conditions
# we want to convert 
# (i) "female" to "c"
# (ii) "male" to "d"
sample_conditions_prep <- factor(sample_conditions, 
                                levels=c("female","male"), 
                                labels=c("c","d"))

Inspect the prepared sample conditions:

sample_conditions_prep
#>  [1] d d c d c d c d c d c d c d c d c c d c d c c d c d c c
#> [29] d c c d c d c c c c d c d d c c d c c d c c d c d d c c
#> [57] d c d c d c c d c c c d c
#> Levels: c d

11.3.2 Step 2: Run PADOG

It is recommended to set a seed to ensure exact reproducibility of the results if the code is run at multiple time points

You can specify any integer number as the seed. It is VERY IMPORTANT to choose the seed arbitrarily and WITHOUT INSPECTING the results. The seed should NEVER be specified based on which value yields the most preferable results.

# run PADOG: 
PADOG_results <- padog(esetm = as.matrix(expression_data_transformed), 
                        group = sample_conditions_prep, 
                        dseed = 1)

Function arguments:

  • esetm: matrix that contains the expression measurements

    • note: since the expression data is initially stored in a data frame, we transform it to a matrix when running PADOG
  • group: sample conditions (has values “c” and “d”)

  • dseed : seed for random number generation (used in the process of phenotype permutation)

additional arguments (we do not use here):

  • paired: indicates whether the samples in both groups are paired

  • block: if the samples are paired (i.e. argument paired = TRUE), then the paired samples must have the same block value

  • gslist: gives instructions on how to cluster the genes into gene sets

    • ‘gslist = “KEGGRESTpathway”’ (default): gene sets correspond to KEGG pathways

    • alternative: provide a user-defined list of gene sets

  • organism: organism from which the gene expression measurements are taken

    • for human, set organism = “hsa”

    • the required character value for other organisms can be extracted from the KEGGREST package:

    • obtain required organisms from column organism

  • annotation: required if gslist is set to “KEGGRESTpathway” and the rownames of esetm are probe IDs

    • can be set to NULL of gslist is set to “KEGGRESTpathway” and the rownames of esetm are in the Entrez gene ID format

    • if rownames are other gene IDs, then sett annotation = NULL and make sure that the rownames are elements of gslist (and unique!)

  • gs.names: contains names of gene sets -> character vector

    • must have the same length as gslist
  • NI: number of phenotype permutations employed in the assessment of the significance of a given gene set

We want to take a first look at the results table. Note, however, that that we still have to add the adjusted p-values (see next step), based on which differential enrichment is eventually assessed.

head(PADOG_results , n = 10)
#>       Name    ID Size meanAbsT0   padog0 PmeanAbsT Ppadog
#> 04380 <NA> 04380   66  2.622104 3.832812     0.005  0.002
#> 04810 <NA> 04810  111  2.326027 5.476170     0.004  0.004
#> 03010 <NA> 03010   89  5.378671 9.659220     0.014  0.010
#> 05213 <NA> 05213   23  1.528150 1.461810     0.027  0.010
#> 05171 <NA> 05171  118  4.788746 8.020661     0.013  0.013
#> 05417 <NA> 05417  112  2.094534 3.096663     0.015  0.017
#> 05162 <NA> 05162   73  2.089532 3.042040     0.010  0.020
#> 00760 <NA> 00760   17  1.871113 3.583774     0.026  0.023
#> 04064 <NA> 04064   59  1.931306 3.552921     0.019  0.025
#> 04010 <NA> 04010  169  1.987648 4.770146     0.018  0.028

11.3.3 Step 3: Adjust for Multiple Testing

PADOG does not perform multiple testing adjustment internally so that must be performed by the user. We here work with a function from the R package base which uses the method by Benjamini and Hochberg by default.

# add adjusted p-value in column Ppadog_adjusted
PADOG_results$Ppadog_adjusted <- p.adjust(PADOG_results$Ppadog)

11.3.4 Step 4: Interpretation of the results

We want to inspect (a part of) the results and interpret the columns provided in the results table.

head(PADOG_results, n = 10)
#>       Name    ID Size meanAbsT0   padog0 PmeanAbsT Ppadog
#> 04380 <NA> 04380   66  2.622104 3.832812     0.005  0.002
#> 04810 <NA> 04810  111  2.326027 5.476170     0.004  0.004
#> 03010 <NA> 03010   89  5.378671 9.659220     0.014  0.010
#> 05213 <NA> 05213   23  1.528150 1.461810     0.027  0.010
#> 05171 <NA> 05171  118  4.788746 8.020661     0.013  0.013
#> 05417 <NA> 05417  112  2.094534 3.096663     0.015  0.017
#> 05162 <NA> 05162   73  2.089532 3.042040     0.010  0.020
#> 00760 <NA> 00760   17  1.871113 3.583774     0.026  0.023
#> 04064 <NA> 04064   59  1.931306 3.552921     0.019  0.025
#> 04010 <NA> 04010  169  1.987648 4.770146     0.018  0.028
#>       Ppadog_adjusted
#> 04380           0.696
#> 04810           1.000
#> 03010           1.000
#> 05213           1.000
#> 05171           1.000
#> 05417           1.000
#> 05162           1.000
#> 00760           1.000
#> 04064           1.000
#> 04010           1.000

Differential enrichment of a given gene set can now be assessed based on the adjusted p-value in column Ppadog_adjusted. For instance: detect all gene sets with Ppadog_adjusted < 0.05 as differentially enriched.

Additional columns:

  • Name: Name of gene set

  • ID: Identifier of gene set

  • Size: number of genes in gene set

  • meanAbsT0: Mean of absolute (moderated) t-statistic of all genes that are a member of the given gene set

  • padog0: Mean of abolute weighted moderate t-statistic of all genes that are a member of the given gene set

  • PmeanAbsT: significance of of meanAbsT0

  • Ppadog: significance of padog0