OCR expert VLM powered by Hunyuan's native multimodal architecture
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
The ChatGPT Retrieval Plugin lets you easily find personal documents
Chat & pretrained large audio language model proposed by Alibaba Cloud
FlashMLA: Efficient Multi-head Latent Attention Kernels
Release for Improved Denoising Diffusion Probabilistic Models
StudioOllamaUI is a local, portable interface for Ollama
Encoder of greater-than-word length text trained on a variety of data
Open Multilingual Multimodal Chat LMs
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Dataset of GPT-2 outputs for research in detection, biases, and more
Official repo for consistency models
Chinese LLaMA & Alpaca large language model + local CPU/GPU training
Repo for external large-scale work
800,000 step-level correctness labels on LLM solutions to MATH problem
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Learning to Act by Watching Unlabeled Online Videos
PyTorch implementation of MAE
An implementation of model parallel GPT-2 and GPT-3-style models
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Large-scale autoregressive pixel model for image generation by OpenAI
Learning Continuous Signed Distance Functions for Shape Representation
Generate embeddings from large-scale graph-structured data
A library for Multilingual Unsupervised or Supervised word Embeddings