Low-latency REST API for serving text-embeddings
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
State-of-the-art Parameter-Efficient Fine-Tuning
Powering Amazon custom machine learning chips
A graphical manager for ollama that can manage your LLMs
Run 100B+ language models at home, BitTorrent-style
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code