There are several “helper” functions which can simplify the definition of complex patterns. First we define some functions that will help us display the patterns:
one.pattern <- function(pat){
if(is.character(pat)){
pat
}else{
nc::var_args_list(pat)[["pattern"]]
}
}
show.patterns <- function(...){
L <- list(...)
str(lapply(L, one.pattern))
}
nc::field
for reducing repetitionThe nc::field
function can be used to avoid repetition
when defining patterns of the form variable: value
. The
example below shows three (mostly) equivalent ways to write a regex that
captures the text after the colon and space; the captured text is stored
in the variable
group or output column:
show.patterns(
"variable: (?<variable>.*)", #repetitive regex string
list("variable: ", variable=".*"),#repetitive nc R code
nc::field("variable", ": ", ".*"))#helper function avoids repetition
#> List of 3
#> $ : chr "variable: (?<variable>.*)"
#> $ : chr "(?:variable: (.*))"
#> $ : chr "(?:variable: (?:(.*)))"
Note that the first version above has a named capture group, whereas the second and third patterns generated by nc have an un-named capture group and some non-capturing groups (but they all match the same pattern).
Another example:
show.patterns(
"Alignment (?<Alignment>[0-9]+)",
list("Alignment ", Alignment="[0-9]+"),
nc::field("Alignment", " ", "[0-9]+"))
#> List of 3
#> $ : chr "Alignment (?<Alignment>[0-9]+)"
#> $ : chr "(?:Alignment ([0-9]+))"
#> $ : chr "(?:Alignment (?:([0-9]+)))"
Another example:
nc::quantifier
for fewer parenthesesAnother helper function is nc::quantifier
which makes
patterns easier to read by reducing the number of parentheses required
to define sub-patterns with quantifiers. For example all three patterns
below create an optional non-capturing group which contains a named
capture group:
show.patterns(
"(?:-(?<chromEnd>[0-9]+))?", #regex string
list(list("-", chromEnd="[0-9]+"), "?"), #nc pattern using lists
nc::quantifier("-", chromEnd="[0-9]+", "?"))#quantifier helper function
#> List of 3
#> $ : chr "(?:-(?<chromEnd>[0-9]+))?"
#> $ : chr "(?:(?:-([0-9]+))?)"
#> $ : chr "(?:(?:-([0-9]+))?)"
Another example with a named capture group inside an optional non-capturing group:
nc::alternatives
for simplified alternationWe also provide a helper function for defining regex patterns with alternation. The following three lines are equivalent.