sentencepiece by davila7

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

Coding
15.7K Stars
1.4K Forks
Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

  • Developing new features in the davila7 repository
  • Refactoring existing code to follow davila7 standards
  • Understanding and working with davila7's codebase structure

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Skill Version
main
Community
15.7K 1.4K
Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 236 Lines
Total Files 1
Total Size 0 B
License MIT