Using xptcleaner

2022-08-16

Description of Vignette

Describes how to use the python package xptcleaner to apply JSON ontology terms to clean SEND xpt files.

Getting started

Before we are ready to use the functions in the package, we must ensure that a minimum of prerequisites are fulfilled.

xptcleaner python package and sendigR R package installation

Python and R installation

R version 4.1.2 and above, Python 3.9.6 and above were the packages used to develop and test the code. Other versions can be used, but some issues may arise depending on versions.

xptcleaner python package installation

Using pip

Probably the easiest way: from your conda, virtualenv or just base installation do:

pip install xptcleaner

If you are running on a machine without admin rights, and you want to install against your base installation you can do:

pip install xptcleaner --user

Using source archive or using wheel file.

In addtional to install from Python Package Index(PyPI), the source archive and the wheel archive can also be used for installation.

The source archive and the wheel for xptcleaner can be obtained from sendigR Github sendigR- xptcleaner

  • Using source archive: Using the below shell command to install the xptcleaner package, assume that the source archive is under ‘dist’ sub folder. Replace {version} with the correct version number, e.g. 1.0.0.
$ py -m pip install ./dist/xptcleaner-{version}.tar.gz
  • Using wheel: Using the below shell command to install the xptcleaner package, assume that the wheel file is under ‘dist’ sub folder.
$ py -m pip install ./dist/xptcleaner-{version}-py3-none-any.whl

The following required python packages will be installed during the xptcleaner package installation:

* pandas

* pyreadstat

sendigR R package installation

Install sendigR packages, refer to README for more details.

# Get CRAN version
install.packages("sendigR")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github('phuse-org/sendigR')

Locating the scripts

sendigR is located at: https://github.com/phuse-org/sendigR/
The importStudies.R script is located at: https://github.com/phuse-org/sendigR/blob/main/importStudies.R
The Python code to generate a JSON file for the XPT cleanup is located at: https://github.com/phuse-org/sendigR/tree/main/python/xptcleaner

The sample CDISC CT file and the extensible CT file are located under the ‘data-raw’ subfolder of the sendigR package.

Creating the JSON for Vocabulary Mapping

library(reticulate)
library(sendigR)
#input CDISC and Extensible CT files.
infile1 <- "{path to CT file}/SEND_Terminology_EXTENSIBLE.txt"
infile2 <- "{path to CT file}/SEND Terminology_2021_12_17.txt"
#output JSON file
jsonfile <- "{path to CT file to be created}/SENDct.json"
#Call the gen_vocab function with the input and output files
sendigR::gen_vocab(list(infile1, infile2),jsonfile )

Standardize xpt files with the json file created

library(reticulate)
library(sendigR)

#JSON file used for the xpt cleaning
jsonfile <- "{path to CT file to be created}/SENDct.json"

#folder containing the source xpt files
rawXptFolder <- "{path to xpt files}/96298/"
#folder containing the cleaned target xpt files
cleanXptFolder <- "{path to cleaned xpt files}/96298/"
#Call the standardize_file function to clean the xpt file
sendigR::standardize_file(rawXptFolder, cleanXptFolder, jsonfile )