Besides the usual FP32, it supports FP16, quantized INT4, INT5 and INT8 inference. This project is focused on CPU, but cuBLAS is also supported. RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from the previous step to calculate logits. This makes RWKV very CPU-friendly on large context lengths.
Features
- Windows / Linux / MacOS
- Build the library yourself
- Get an RWKV model
- Requirements: Python 3.x with PyTorch and tokenizers
- ggml moves fast, and can occasionally break compatibility with older file formats
- Requirements: Python 3.x with PyTorch
License
MIT LicenseFollow rwkv.cpp
Other Useful Business Software
Skillfully - The future of skills based hiring
Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of rwkv.cpp!