Solve puzzles. Learn CUDA
Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun
A bidirectional pipeline parallelism algorithm
Variational Quantum Circuit Simulator for Quantum Computation Research
Running a big model on a small laptop
Running large language models on a single GPU
Distributed parallelization of stencil-based GPU and CPU applications
Multi-platform high-performance compute language extension for Rust
AirLLM 70B inference with single 4GB GPU
SwissGL is a minimalistic wrapper on top of WebGL2 JS API
Open source machine learning framework
A language for fast, portable data-parallel computation
The Zoo Design Studio app
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Fast inference engine for Transformer models
Tensors and neural networks in Haskell
Tensor Learning in Python
Analyze computation-communication overlap in V3/R1
Faster Whisper transcription with CTranslate2
Easily compute clip embeddings and build a clip retrieval system
Meridian is an MMM framework
A high-performance inference engine for AI models
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code
Advanced evolutionary computation library built on top of PyTorch
A massively parallel, high-level programming language