Minimal, clean code for the Byte Pair Encoding (BPE) algorithm
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Minimal reproduction of OneRec
800,000 step-level correctness labels on LLM solutions to MATH problem
Deep Clustering for Unsupervised Learning of Visual Features
A Framework for Comparing Password Guessing Strategies
Automatic SQL Injection Exploitation Tool