Bringing BERT into modernity via both architecture changes and scaling
Library to stream in rtmp and rtsp for Android. All code in Java
Unified Multimodal Understanding and Generation Models
This repository contains the official implementation of FastVLM
Industrial-level controllable zero-shot text-to-speech system
Collection of Gemma 3 variants that are trained for performance
An incredibly fast, pure Elixir JSON library
End-to-end speech processing toolkit
PyTorch code and models for V-JEPA self-supervised learning from video
PyTorch code and models for VJEPA2 self-supervised learning from video
Visual Causal Flow
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
Official inference repo for FLUX.2 models
Accurate × Fast × Comprehensive
Audio codecs extracted from Android Open Source Project
Towards Real-World Vision-Language Understanding
Fast multimodal LLM for real-time voice interaction and AI apps
Retrieval and Retrieval-augmented LLMs
Provides code for running inference with the SegmentAnything Model
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
go-querystring is Go library for encoding structs into URL query
Encoder of greater-than-word length text trained on a variety of data
Java application for encryption
Multimodal model achieving SOTA performance
TorchMultimodal is a PyTorch library