PyTorch extensions for fast R&D prototyping and Kaggle farming
simplejson is a simple, fast, extensible JSON encoder/decoder
Bringing BERT into modernity via both architecture changes and scaling
Usable Implementation of "Bootstrap Your Own Latent" self-supervised
This repository contains the official implementation of FastVLM
Unified Multimodal Understanding and Generation Models
Industrial-level controllable zero-shot text-to-speech system
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Collection of Gemma 3 variants that are trained for performance
Video-based AI memory library. Store millions of text chunks in MP4
End-to-end speech processing toolkit
Open-source industrial-grade ASR models
PyTorch code and models for V-JEPA self-supervised learning from video
Self-supervised visual learning using momentum contrast in PyTorch
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Visual Causal Flow
Accurate × Fast × Comprehensive
A simple but complete full-attention transformer
Towards Real-World Vision-Language Understanding
Official inference repo for FLUX.2 models
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
Fast multimodal LLM for real-time voice interaction and AI apps
Qwen2.5-VL is the multimodal large language model series
Retrieval and Retrieval-augmented LLMs