rtiktoken: A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models

A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.

Version: 0.0.6
Suggests: testthat (≥ 3.0.0)
Published: 2024-11-06
DOI: 10.32614/CRAN.package.rtiktoken
Author: David Zimmermann-Kollenda [aut, cre], Roger Zurawicki [aut] (tiktoken-rs Rust library), Authors of the dependent Rust crates [aut] (see AUTHORS file)
rtiktoken author details
Maintainer: David Zimmermann-Kollenda <david_j_zimmermann at hotmail.com>
BugReports: https://github.com/DavZim/rtiktoken/issues
License: MIT + file LICENSE
URL: https://davzim.github.io/rtiktoken/, https://github.com/DavZim/rtiktoken/
NeedsCompilation: yes
SystemRequirements: Cargo (Rust's package manager), rustc >= 1.65.0
Materials: README NEWS
CRAN checks: rtiktoken results

Documentation:

Reference manual: rtiktoken.pdf

Downloads:

Package source: rtiktoken_0.0.6.tar.gz
Windows binaries: r-devel: rtiktoken_0.0.6.zip, r-release: rtiktoken_0.0.6.zip, r-oldrel: rtiktoken_0.0.6.zip
macOS binaries: r-release (arm64): rtiktoken_0.0.6.tgz, r-oldrel (arm64): rtiktoken_0.0.6.tgz, r-release (x86_64): rtiktoken_0.0.6.tgz, r-oldrel (x86_64): rtiktoken_0.0.6.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=rtiktoken to link to this page.