A high-performance ML model serving framework, offers dynamic batching
Implement CPU from scratch and play with large model deployments
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
ChatGLM2-6B: An Open Bilingual Chat LLM
Chinese Llama-3 LLMs) developed from Meta Llama 3
Low-latency REST API for serving text-embeddings
Gemma open-weight LLM library, from Google DeepMind
Tensor search for humans
Tools for merging pretrained large language models
Run LLMs locally on Cloud Workstations
Run Mixtral-8x7B models in Colab or consumer desktops
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Chinese LLaMA & Alpaca large language model + local CPU/GPU training