Llama 2 Everywhere (L2E) is an open-source implementation of the LLaMA-2 large language model architecture designed to demonstrate how transformer-based language models can be executed with extremely minimal code. The project focuses on simplicity and educational clarity by implementing inference for LLaMA-style models in a compact C program rather than relying on large machine learning frameworks. Developers can train models using a Python training pipeline and then run inference using a lightweight C implementation that requires very few dependencies. The architecture mirrors the structure of the LLaMA-2 model family, allowing compatible model checkpoints to be converted and executed within the simplified runtime environment. Because the implementation is intentionally minimal, it serves as a teaching tool for understanding how transformer architectures operate at a low level.
Features
- Minimal implementation of the LLaMA-2 architecture in pure C
- Lightweight inference engine with very few dependencies
- Python training pipeline paired with C inference runtime
- Compatibility with converted LLaMA-2 model checkpoints
- Educational reference implementation for understanding transformer models
- Ability to run small language models efficiently on standard hardware