Version: 0.1.1
What | kibior is a R package dedicated to ease the pain of
data handling in science, and more notably with biological data. |
Where | kibior is using Elasticsearch as database
and search engine. |
Who | kibior is built for data science and data manipulation,
so when any data-related action or need is involved, notably
sharing data . It mainly targets bioinformaticians, and more
broadly, data scientists. |
When | Available now from this repository, or CRAN repository. |
Public instances | Use the $get_kibio_instance() method to connect to
Kibio and access known datasets. See
Kibio datasets at the end of this document for a complete
list. |
Cite this package | In R session, run citation("kibior") |
Publication | coming soon . |
This package allows:
Pushing
, pulling
, joining
,
sharing
and searching
tabular data between an
R session and one or multiple Elasticsearch instances/clusters.Massive data query and filter
with Elasticsearch
engine.Multiple living Elasticsearch connections
to different
addresses.Method autocompletion
in proper environments (e.g. R
cli, RStudio).Import and export datasets
from an to files.Server-side execution
for most of operations (i.e. on
Elasticsearch instances/clusters).# Get from CRAN
install.packages("kibior")
# or get the latest from Github
::install_github("regisoc/kibior") devtools
# load
library(kibior)
# Get a specific instance
<- Kibior$new("server_or_address", port)
kc
# Or try something bigger...
<- Kibior$get_kibio_instance()
kibio $list() kibio
Here is an extract of some of the features proposed by
KibioR
. See Introduction
vignette for more
advanced usage.
push
datasets# Push data (R memory -> Elasticsearch)
::starwars %>% kc$push("sw")
dplyr::storms %>% kc$push("st") dplyr
pull
datasets# Pull data with columns selection (Elasticsearch -> R memory)
$pull("sw", query = "homeworld:(naboo || tatooine)",
kccolumns = c("name", "homeworld", "height", "mass", "species"))
# see vignette for query syntax
copy
datasets# Copy dataset (Elasticsearch internal operation)
$copy("sw", "sw_copy") kc
delete
datasets
# Delete datasets
$delete("sw_copy") kc
list
,
match
dataset names# List available datasets
$list()
kc
# Search for index names starting with "s"
$match("s*") kc
columns
names and list unique keys
in
values# Get columns of all datasets starting with "s"
$columns("s*")
kc
# Get unique values of a column
$keys("sw", "homeworld") kc
# Count number of lines in dataset
$count("st")
kc
# Count number of lines with query (name of the storm is Anita)
$count("st", query = "name:anita")
kc
# Generic stats on two columns
$stats("sw", c("height", "mass"))
kc
# Specific descriptive stats with query
$avg("sw", c("height", "mass"), query = "homeworld:naboo") kc
join
# Inner join between:
# 1/ a Elasticsearch-based dataset with query ("sw"),
# 2/ and a in-memory R dataset (dplyr::starwars)
$inner_join("sw", dplyr::starwars,
kcleft_query = "hair_color:black",
left_columns = c("name", "mass", "height"),
by = "name")