MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
Qwen2.5-VL is the multimodal large language model series
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Lets make video diffusion practical
Industrial-level controllable zero-shot text-to-speech system
Fast, Sharp & Reliable Agentic Intelligence
Provides convenient access to the Anthropic REST API from any Python 3
DeepSeek Coder: Let the Code Write Itself
A Systematic Framework for Interactive World Modeling
RGBD video generation model conditioned on camera input
Easy Docker setup for Stable Diffusion with user-friendly UI
A multimodal model for brain response prediction
Revolutionizing Database Interactions with Private LLM Technology
Strong, Economical, and Efficient Mixture-of-Experts Language Model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Visual Causal Flow
Recovering the Visual Space from Any Views
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Advancing Open-source World Models
AlphaFold 3 inference pipeline
ChatGPT interface with better UI
Claude Code action for GitHub PRs
Flux 2 image generation model pure C inference
Official repository for LTX-Video