Awesome multilingual OCR toolkits based on PaddlePaddle
Contexts Optical Compression
Accurate × Fast × Comprehensive
OCR expert VLM powered by Hunyuan's native multimodal architecture
Visual Causal Flow
Repo of Qwen2-Audio chat & pretrained large audio language model
Qwen3-Coder is the code version of Qwen3
Official inference repo for FLUX.2 models
A Powerful Native Multimodal Model for Image Generation
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Official code for Style Aligned Image Generation via Shared Attention
Code release for ConvNeXt V2 model
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Russian ASR model fine-tuned on Common Voice and CSS10 datasets