Official repository for LTX-Video
Python bindings for llama.cpp
Recovering the Visual Space from Any Views
Video Object and Interaction Deletion
LTX-Video Support for ComfyUI
Contexts Optical Compression
Open-source multi-speaker long-form text-to-speech model
Visual Causal Flow
The official repo of Qwen chat & pretrained large language model
Ultra-Efficient LLMs on End Device
Audio foundation model excelling in audio understanding
Sharp Monocular Metric Depth in Less Than a Second
Diffusion Transformer with Fine-Grained Chinese Understanding
Open Source Speech Language Model
Video understanding codebase from FAIR for reproducing video models
Official implementation of DreamCraft3D
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
OCR expert VLM powered by Hunyuan's native multimodal architecture
Large-language-model & vision-language-model based on Linear Attention
AI-powered tool to quickly remove watermarks from images flawlessly
AI Suite for upscaling, interpolating & restoring images/videos
Qwen2.5-Coder is the code version of Qwen2.5, the large language model