bpe

Star

Here are 169 public repositories matching this topic...

rsennrich / subword-nmt

Star

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

machine-translation segmentation neural-machine-translation nmt subword-units bpe

Updated Aug 7, 2024
Python

VKCOM / YouTokenToMe

Star

Unsupervised text tokenizer focused on computational efficiency

nlp natural-language-processing word-segmentation tokenization bpe

Updated Mar 29, 2024
C++

niieani / gpt-tokenizer

Sponsor

Star

The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT models (gpt-5, gpt-o*, gpt-4o, etc.). Port of OpenAI's tiktoken with additional features.

machine-learning encoder decoder tokenizer openai bpe gpt-2 gpt-3 gpt-4 gpt-5 gpt-4o gpt-o1

Updated Feb 10, 2026
TypeScript

zurawiki / tiktoken-rs

Star

Ready-made tokenizer library for working with GPT and tiktoken

rust tokenizer openai bpe

Updated Mar 12, 2026
Rust

OpenNMT / Tokenizer

Star

Fast and customizable text tokenization library with BPE and SentencePiece support

python unicode natural-language-processing cpp icu tokenizer machine-translation tokenization bpe sentencepiece

Updated Jan 10, 2026
C++

ImadSaddik / Train_Your_Language_Model_Course

Sponsor

Star

Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.

transformer tokenization bpe llm personae-cloning training-llms transformer-evolution

Updated Sep 26, 2025
Jupyter Notebook

Kyubyong / nlp_made_easy

Star

Explains nlp building blocks in a simple manner.

nlp seq2seq beam-search bpe

Updated Sep 23, 2019
Jupyter Notebook

soaxelbrooke / python-bpe

Star

Byte Pair Encoding for Python!

python nlp bpe

Updated Sep 16, 2022
Python

akretion / nfelib

Star

nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e

python nfe cte brasil sped mdfe nfse nota-fiscal-eletronica bpe

Updated Mar 8, 2026
Python

gautierdag / bpeasy

Star

Fast bare-bones BPE for modern tokenizer training

tokenizer tokenization bpe

Updated Jun 23, 2025
Python

fangpin / llm-from-scratch

Star

Build LLM from scratch

transformer rl gpt sft bpe llm llm-from-zero-to-hero

Updated Nov 19, 2025
Python

sefineh-ai / Amharic-Tokenizer

Star

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.

nlp machine-learning deep-learning tokenizer python3 text-processing amharic african-languages language-processing ethiopic low-resource-languages bpe llm bpe-tokenizer amharic-tokenizer amharictokenizer amhtokenizer

Updated Nov 17, 2025
Python

samber / go-gpt-3-encoder

Sponsor

Star

Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3

go encoder decoder tokenizer transformer openai token codex bpe byte-pair-encoding gpt-2 gpt-3

Updated Dec 2, 2024
Go

tryAGI / Tiktoken

Star

High-performance .NET BPE tokenizer — up to 618 MiB/s, competitive with Rust. Zero-allocation counting, multilingual cache, o200k/cl100k/r50k/p50k encodings + HuggingFace tokenizer.json support.

sdk ai csharp dotnet high-performance tokenizer openai zero-allocation bpe huggingface tiktoken gpt4o o200k-base cl100k-base

Updated Mar 22, 2026
C#

faizann24 / phishytics-machine-learning-for-phishing

Star

Machine Learning for Phishing Website Detection

security data-science machine-learning random-forest phishing artificial-intelligence cybersecurity tfidf security-tools phishing-detection bpe

Updated Mar 1, 2020
HTML

jiesutd / SubwordEncoding-CWS

Star

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

segmentation bpe latticelstm lstmcrf

Updated Apr 25, 2019
Python

zouharvi / tokenization-scorer

Star

Simple-to-use scoring function for arbitrarily tokenized texts.

segmentation tokenization subword bpe

Updated Feb 19, 2025
Python

Systemcluster / kitoken

Sponsor

Star

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

nodejs python nlp rust web tokenizer word-segmentation unigram bpe sentencepiece

Updated Mar 10, 2026
Rust

aallam / ktoken

Star

Kotlin multiplatform BPE tokenizer library for OpenAI models

kotlin tokenizer openai gpt bpe byte-pair-encoding tiktoken binary-p

Updated Mar 8, 2026
Kotlin

OctopusMind / BBPE

Star

BBPE 底层实现

tokenizer bpe bbpe qwen llama3

Updated Apr 29, 2024
Python

Improve this page

Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpe

Here are 169 public repositories matching this topic...

rsennrich / subword-nmt

VKCOM / YouTokenToMe

niieani / gpt-tokenizer

zurawiki / tiktoken-rs

OpenNMT / Tokenizer

ImadSaddik / Train_Your_Language_Model_Course

Kyubyong / nlp_made_easy

soaxelbrooke / python-bpe

akretion / nfelib

gautierdag / bpeasy

fangpin / llm-from-scratch

sefineh-ai / Amharic-Tokenizer

samber / go-gpt-3-encoder

tryAGI / Tiktoken

faizann24 / phishytics-machine-learning-for-phishing

jiesutd / SubwordEncoding-CWS

zouharvi / tokenization-scorer

Systemcluster / kitoken

aallam / ktoken

OctopusMind / BBPE

Improve this page

Add this topic to your repo