Achieving 3+ generation speedup on reasoning tasks
Uncover insights, surface problems, monitor, and fine tune your LLM
Qwen3 is the large language model series developed by Qwen team
Towards Human-Sounding Speech
Any model. Any hardware. Zero compromise
Parallax is a distributed model serving framework
Ling is a MoE LLM provided and open-sourced by InclusionAI
Performance-optimized AI inference on your GPUs
Official Python inference and LoRA trainer package
State-of-the-art Parameter-Efficient Fine-Tuning
Taming Stable Diffusion for Lip Sync
LightLLM is a Python-based LLM (Large Language Model) inference
Accelerate local LLM inference and finetuning
Run Local LLMs on Any Device. Open-source
Minimal Python framework for scalable AI inference servers fast
Efficient few-shot learning with Sentence Transformers
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Powering Amazon custom machine learning chips
Personal AI, On Personal Devices
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Phi-3.5 for Mac: Locally-run Vision and Language Models
Libraries for applying sparsification recipes to neural networks
Data manipulation and transformation for audio signal processing
Multilingual Automatic Speech Recognition with word-level timestamps
Technical principles related to large models