re is an R package designed to simplify working with
regular expressions by providing a set of functions similar to Python’s
re
module. The package includes utilities for compiling
regular expressions with specific flags, checking for matches, escaping
special characters, and more. By emulating the functionality and naming
conventions of Python’s re
module, re aims
to make regex operations in R more intuitive and accessible, especially
for those familiar with Python.
You can install the re
package directly from GitHub.
# Install devtools if you haven't already
install.packages("devtools")
# Install re package from GitHub
::install_github("pythonicr/re") devtools
Here are some examples demonstrating how to use the functions provided by the re package.
The re_compile
function compiles a regular expression
pattern with specified flags. This step is optional as flags and
patterns can be provided for any of the functions.
library(re)
# Compile a pattern with case-insensitive matching
<- re_compile("^abc", IGNORECASE = TRUE)
pattern # Compile a pattern with multi-line matching (abbreviations are based on Python's re package)
<- re_compile("end$", M = TRUE)
pattern <- re_compile("end$", MULTILINE = TRUE)
pattern # Compile a pattern with DOTALL flag
<- re_compile("a.b", DOTALL = TRUE) pattern
The re_contains
function checks whether a specified
pattern is found within each element of a character vector.
# Check if strings contain a pattern
re_contains(pattern, "Abcdef")
#> [1] FALSE
re_contains("xyz$", "hello world xyz")
#> [1] TRUE
The re_escape
function escapes all special characters in
a regular expression string.
# Escape special characters in a string
<- re_escape("a[bc].*d?")
escaped_pattern print(escaped_pattern)
#> [1] "a\\[bc\\]\\.\\*d\\?"
The re_findall
function extracts all occurrences of a
specified pattern from each element of a character vector.
# Extract all words from a string
<- re_compile("\\b\\w+\\b")
pattern re_findall(pattern, "This is a test.")
#> [[1]]
#> [1] "This" "is" "a" "test"
re_findall("\\d+", "123 abc 456")
#> [[1]]
#> [1] "123" "456"
The re_fullmatch
function checks whether each element of
a character vector fully matches a specified pattern.
# Check for full matches in a string
<- re_compile("\\d{3}-\\d{2}-\\d{4}")
pattern re_fullmatch(pattern, "123-45-6789")
#> [[1]]
#> [1] "123-45-6789"
re_fullmatch("123-45-6789", "123-45-6789 and more")
#> [[1]]
#> [1] NA
The re_match
function checks whether each element of a
character vector matches a specified pattern at the start.
# Check for matches at the start of a string
<- re_compile("\\d{3}")
pattern re_match(pattern, "123abc")
#> [[1]]
#> [1] "123"
re_match("abc", "xyzabc")
#> [[1]]
#> [1] NA
The re_search
function searches for occurrences of a
specified pattern within each element of a character vector.
# Search for a pattern in a string
<- re_compile("\\d+")
pattern re_search(pattern, "abc 123 xyz")
#> [[1]]
#> [1] "123"
re_search("\\bword\\b", "A sentence with the word.")
#> [[1]]
#> [1] "word"
The re_split
function splits each element of a character
vector into substrings based on a specified pattern.
# Split strings based on a pattern
<- re_compile("\\s+")
pattern re_split(pattern, "Split this string")
#> [[1]]
#> [1] "Split" "this" "string"
re_split("\\W+", "Split,with!punctuation.morestuff", maxsplit = 2)
#> [[1]]
#> [1] "Split" "with" "punctuation.morestuff"
The re_sub
function replaces all occurrences of a
specified pattern in each element of a character vector with a
replacement string.
# Substitute patterns in a string
<- re_compile("\\d+")
pattern re_sub(pattern, "number", "Replace 123 with text.")
#> [1] "Replace number with text."
re_sub("\\s+", "-", "Split and join")
#> [1] "Split-and-join"
We welcome contributions to the re package. If you have suggestions, bug reports, or want to contribute code, please open an issue or submit a pull request on our GitHub repository.
re is released under the MIT License. See the LICENSE file in the package’s repository for more details.