Introduction to the Elja package

Marwan EL HOMSI

library(Elja)

Environment-Wide Association Studies (EWAS) are the study of the association between a health event and several exposures one after the other. With this package, it is possible to carry out an EWAS analysis in the simplest way and to display easily interpretable results in the output.

To do this, you must first define several points:

The Elja package works step by step to perform an EWAS analysis:

This document introduces the basic use of this package in an EWAS analysis.

Data: PimaIndiansDiabetes

In order to show in a simple way the use of the Elja package, we will use the PIMA dataset. This dataset is present in the package mlbench (https://mlbench.github.io/).

library(mlbench)
data(PimaIndiansDiabetes)
head(PimaIndiansDiabetes)
#>   pregnant glucose pressure triceps insulin mass pedigree age diabetes
#> 1        6     148       72      35       0 33.6    0.627  50      pos
#> 2        1      85       66      29       0 26.6    0.351  31      neg
#> 3        8     183       64       0       0 23.3    0.672  32      pos
#> 4        1      89       66      23      94 28.1    0.167  21      neg
#> 5        0     137       40      35     168 43.1    2.288  33      pos
#> 6        5     116       74       0       0 25.6    0.201  30      neg

This dataset containing a health event (diabetes) will allow us to to illustrate the functioning of the Elja package.

Preparation of the data set

Before performing the function, we have to make sure that the dataset is well structured.

To do so, we have to check 2 elements:


str(PimaIndiansDiabetes)
#> 'data.frame':    768 obs. of  9 variables:
#>  $ pregnant: num  6 1 8 1 0 5 3 10 2 8 ...
#>  $ glucose : num  148 85 183 89 137 116 78 115 197 125 ...
#>  $ pressure: num  72 66 64 66 40 74 50 0 70 96 ...
#>  $ triceps : num  35 29 0 23 35 0 32 0 45 0 ...
#>  $ insulin : num  0 0 0 94 168 0 88 0 543 0 ...
#>  $ mass    : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
#>  $ pedigree: num  0.627 0.351 0.672 0.167 2.288 ...
#>  $ age     : num  50 31 32 21 33 30 26 29 53 54 ...
#>  $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...

Diabetes, which is our target health event, stands alone with exposures. In addition, the variables all have the correct class associated.

Determine the type of model you want to use

According to the class of the outcome, one model will be preferred to another. It is therefore necessary to choose the right model for the type of variable chosen as the health event.

We have seen previously that our health event is binary categorical: Diabetes (Yes/No).

str(PimaIndiansDiabetes$diabetes)
#>  Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...

We can therefore use a logistic regression model.

Use of the ELJAlogistic function

The approach for the logistic regression is similar for the models linear models with ELJAlinear function and for Generalized Linear Models with ELJAglm function.

The dataset being prepared and the type of model chosen, we can proceed to the analysis.

To do so, the following information are needed:

Other information can be added to the output of the function:


ELJAlogistic(var = 'diabetes',data = PimaIndiansDiabetes,manplot = TRUE,
             Bonferroni = TRUE,FDR = TRUE, nbvalmanplot = 30, manplotsign = FALSE)

results
#>                      level odd_ratio    ci_low  ci_high      p_value   n
#> pregnant_pregnant pregnant  1.147008 1.0970869 1.200315 2.147445e-09 768
#> glucose_glucose    glucose  1.038599 1.0321816 1.045439 2.378098e-31 768
#> pressure_pressure pressure  1.007452 0.9994922 1.015902 7.299362e-02 768
#> triceps_triceps    triceps  1.009911 1.0005344 1.019455 3.881576e-02 768
#> insulin_insulin    insulin  1.002301 1.0010311 1.003607 4.353455e-04 768
#> mass_mass             mass  1.098044 1.0730012 1.124942 8.449577e-15 768
#> pedigree_pedigree pedigree  2.953073 1.8799770 4.713627 3.702926e-06 768
#> age_age                age  1.042922 1.0296867 1.056659 1.773155e-10 768
#>                        AIC
#> pregnant_pregnant 960.2099
#> glucose_glucose   812.7196
#> pressure_pressure 994.1276
#> triceps_triceps   993.1890
#> insulin_insulin   984.8104
#> mass_mass         924.7142
#> pedigree_pedigree 974.8609
#> age_age           954.7203

We observe a Manhattan plot showing the results of the EWAS analysis and a dataframe showing the more detailed results.

References