Designed for text embedding and ranking tasks
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Network analysis in Python
Code for the paper "Evaluating Large Language Models Trained on Code"
The best free open source website change detection and restock service
Multi-lingual large voice generation model, providing inference
Python module for parsing semi-structured text into python tables
Open Source Document Management System for Digital Archives
Instant voice cloning by MIT and MyShell. Audio foundation model
A python library that makes AMR parsing, generation and visualization
AutoGluon: AutoML for Image, Text, and Tabular Data
Open source NLP guide with models, methods, and real use cases
Long-form streaming TTS system for multi-speaker dialogue generation
Code and models for ICML 2024 paper, NExT-GPT
Main repository for the Sphinx documentation builder
The official repo of Qwen chat & pretrained large language model
Multilingual sentence & image embeddings with BERT
"Big Model" trains a visual multimodal VLM with 26M parameters
A system for agentic LLM-powered data processing and ETL
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Diffusion Transformer with Fine-Grained Chinese Understanding
An open source implementation of CLIP
OCR expert VLM powered by Hunyuan's native multimodal architecture
A lightweight approach to removing Google web service dependency
21 Lessons, Get Started Building with Generative AI