Phi-3.5 for Mac: Locally-run Vision and Language Models
Tiny vision language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
A series of math-specific large language models of our Qwen2 series
Qwen3-omni is a natively end-to-end, omni-modal LLM
Pretrained time-series foundation model developed by Google Research
CodeGeeX2: A More Powerful Multilingual Code Generation Model
A state-of-the-art open visual language model
Open Source Speech Language Model
Fast-stable-diffusion + DreamBooth
Hunyuan Translation Model Version 1.5
Multimodal embedding and reranking models built on Qwen3-VL
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Ling is a MoE LLM provided and open-sourced by InclusionAI
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM