tiktoken is a fast BPE tokeniser for use with OpenAI's models
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm
Zero-copy PDF text extraction library written in Zig
Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML
Allows manipulating MIME messages
Unifying 3D Mesh Generation with Language Models
The home of the ICU project source code
Large-language-model & vision-language-model based on Linear Attention
Unified Multimodal Understanding and Generation Models
Create & scan cute qr codes easily
SOTA discrete acoustic codec models with 40/75 tokens per second
A python tool that uses GPT-4, FFmpeg, and OpenCV
Chinese Llama-3 LLMs) developed from Meta Llama 3
Source code to formatted text converter
Binary / hex editor and component written in Java
Another drawing editor for LaTeX with PSTricks & TikZ
Code for the paper Language Models are Unsupervised Multitask Learners
Production space for the TEI Linguistics SIG
Change encoding of text files.
Convert Scala source code to Kotlin source code.
Framework that is dedicated to making neural data processing
Converts PMWiki to Markdown