Showing 43 open source projects for "llama.cpp"

View related business solutions
  • Cycloid: Hybrid Cloud DevOps collaboration platform Icon
    Cycloid: Hybrid Cloud DevOps collaboration platform

    For Developers, DevOps, IT departments, MSPs

    Enable your developers to do their best work and increase time-to-market speed with a leading DevOps and Hybrid Cloud platform.
    Learn More
  • AI-Powered Identity Governance Icon
    AI-Powered Identity Governance

    For IT Teams and MSPs in need of a solution to simplify, optimize and secure their SaaS, file, and device management operations

    Define governance policies, manage access, and optimize licenses with unified visibility across every identity, app, and file.
    Learn More
  • 1
    llama.cpp

    llama.cpp

    Port of Facebook's LLaMA model in C/C++

    The llama.cpp project enables the inference of Meta's LLaMA model (and other models) in pure C/C++ without requiring a Python runtime. It is designed for efficient and fast model execution, offering easy integration for applications needing LLM-based capabilities. The repository focuses on providing a highly optimized and portable implementation for running large language models directly within C/C++ environments.
    Downloads: 184 This Week
    Last Update:
    See Project
  • 2
    llama.cpp Python Bindings

    llama.cpp Python Bindings

    Python bindings for llama.cpp

    llama-cpp-python provides Python bindings for llama.cpp, enabling the integration of LLaMA (Large Language Model Meta AI) language models into Python applications. This facilitates the use of LLaMA's capabilities in natural language processing tasks within Python environments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    LLamaSharp

    LLamaSharp

    C#/.NET binding of llama.cpp, including LLaMa/GPT model inference

    The C#/.NET binding of llama.cpp. It provides APIs to infer the LLaMa Models and deploy it on the local environment. It works on both Windows, Linux and MAC without the requirement for compiling llama.cpp yourself. Its performance is close to llama.cpp. Furthermore, it provides integrations with other projects such as BotSharp to provide higher-level applications and UI.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Downloads: 1 This Week
    Last Update:
    See Project
  • Non Emergency Medical Transportation (NEMT) Software Icon
    Non Emergency Medical Transportation (NEMT) Software

    Healthcare providers in search of a scheduling and dispatch solution for non emergency medical transportation

    NovusMED is an ecosystem that includes call center, administrative, driver applications, and client/clinic booking applications. NovusMED is the platform of choice for a wide range of medical transportation services and includes configurations for brokerage, providers, senior, community, and home health programs. Accurately manage calls and patient information. Monitor real-time performance and adjust resource capacity to meet changes in service demand. Manage will calls, confirmation calls, and recurring trips/standing orders in real time. Improved mileage reimbursement and cost calculators to manage multiple contractors, funding sources (payors), multiple providers, and volunteer driver programs. Enhanced credential management for vehicles and drivers. Manage subcontractor outsourcing with provider mobile, trip bidding, and trip offers. Able to see the closest vehicle and perform immediate bookings.
    Learn More
  • 5
    Maid

    Maid

    Maid is a cross-platform Flutter app for interfacing with GGUF

    Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely. Maid is a cross-platform free and open source application for interfacing with llama.cpp models locally, and remotely with Ollama, Mistral, Google Gemini and OpenAI models remotely. Maid supports Sillytavern character cards to allow you to interact with all your favorite characters.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 6
    wllama

    wllama

    WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

    wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    GPT4All

    GPT4All

    Run Local LLMs on Any Device. Open-source

    ...The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This project also supports Python integrations for easy automation and customization. GPT4All is ideal for individuals and businesses seeking private, offline access to powerful LLMs.
    Downloads: 123 This Week
    Last Update:
    See Project
  • 8
    OpenJarvis

    OpenJarvis

    Personal AI, On Personal Devices

    ...The framework provides shared primitives for building local-first agents, along with evaluation tools that measure performance using metrics such as energy consumption, latency, cost, and accuracy. OpenJarvis integrates with local inference engines like Ollama, vLLM, SGLang, and llama.cpp to run language models directly on personal hardware. It also includes a learning loop that allows models to improve over time using locally generated interaction traces. By prioritizing local execution and efficiency, OpenJarvis aims to provide a foundation for privacy-preserving personal AI assistants.
    Downloads: 209 This Week
    Last Update:
    See Project
  • 9
    Clippy

    Clippy

    Clippy, now with some AI

    ...The project serves as both a playful homage to the early days of personal computing and a practical demonstration of local AI inference. Clippy integrates with the llama.cpp runtime to run models directly on a user’s computer without requiring cloud-based AI services. It supports models in the GGUF format, which allows it to run many publicly available open-source LLMs efficiently on consumer hardware. Users interact with the system through a simple animated assistant interface that can answer questions, generate text, and perform conversational tasks. ...
    Downloads: 43 This Week
    Last Update:
    See Project
  • RouteGenie NEMT software Icon
    RouteGenie NEMT software

    Modern software for non-emergency medical transportation providers, built to improve scheduling, billing, routing, and dispatching processes.

    RouteGenie NEMT software is a modern system built to automate all non-emergency medical transportation processes including routing, scheduling, dispatching, and billing. It helps manage everyday challenges like vehicle breakdowns, traffic problems, cancelations, driver call-offs, will calls, no shows, add-on trips, on-demand trips, and more.
    Learn More
  • 10
    LocalAI

    LocalAI

    The free, Open Source alternative to OpenAI, Claude and others

    ...LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. It integrates with multiple backends like llama.cpp, transformers, and diffusers to support different AI workloads. With its self-hosted architecture and OpenAI-compatible API, LocalAI enables developers to build secure, local-first AI applications.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 11
    FreedomGPT

    FreedomGPT

    React and Electron-based app that executes the FreedomGPT LLM locally

    ...The app's setup is simple, and it includes clear installation guides for both macOS and Windows platforms, as well as detailed instructions for building necessary libraries like llama.cpp.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    node-llama-cpp is a JavaScript and Node.js binding that allows developers to run large language models locally using the high-performance inference engine provided by llama.cpp. The library enables applications built with Node.js to interact directly with local LLM models without requiring a remote API or external service. By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    CoPaw

    CoPaw

    Your Personal AI Assistant; easy to install, deploy on local or coud

    ...Built by the AgentScope team, it connects to multiple chat platforms—including DingTalk, Feishu, QQ, Discord, iMessage, and more—through a single unified assistant. CoPaw supports both cloud-based LLM providers and fully local models such as llama.cpp, MLX, and Ollama, allowing you to operate without API keys if preferred. It includes a browser-based Console for chatting, configuring models, managing memory, and extending capabilities with custom skills. With built-in cron scheduling, heartbeat check-ins, and extensible skill loading, CoPaw grows with your workflow over time. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 15
    mac code

    mac code

    Claude Code, but it runs on your Mac for free

    ...It operates as a CLI-based assistant that routes user prompts into different execution paths such as chat, shell commands, or web search, functioning as a multi-purpose development agent. The system integrates with inference engines like llama.cpp and Apple’s MLX framework, allowing users to run models up to 35B parameters locally with varying performance trade-offs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Harbor LLM

    Harbor LLM

    Run a full local LLM stack with one command using Docker

    ...With a single command, users can start preconfigured tools like Ollama and Open WebUI, enabling chat, workflows, and integrations immediately. Harbor supports multiple inference engines, including llama.cpp and vLLM, and connects them seamlessly to user interfaces. It also includes tools for web retrieval, image generation, voice interaction, and workflow automation. Built on Docker, Harbor allows services to run in isolated containers while communicating over a local network. It is intended for local development and experimentation rather than production deployment, giving developers a flexible way to explore AI systems, test configurations, and manage complex LLM stacks without manual wiring or setup overhead.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    llama2.c

    llama2.c

    Inference Llama 2 in one file of pure C

    ...The goal of llama2.c is to demonstrate how a compact and transparent implementation can perform meaningful inference even with small models, emphasizing simplicity, clarity, and accessibility. The project builds upon lessons from nanoGPT and takes inspiration from llama.cpp, focusing instead on minimalism and educational value over large-scale performance.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    llama.vim

    llama.vim

    Vim plugin for LLM-assisted code/text completion

    ...The plugin enables developers to access AI-assisted text and code completion features without leaving their terminal-based development environment. Instead of relying on remote AI services, the plugin is designed to work with locally running LLM inference engines such as llama.cpp. This approach allows developers to benefit from AI-assisted coding features while maintaining full control over their data and avoiding external API dependencies. The plugin focuses on simplicity and performance, providing fast completions and editing assistance even on consumer-grade hardware. By integrating AI functionality directly into Vim workflows, the tool enables developers to write and edit code more efficiently while staying within a familiar development interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    llama.vscode

    llama.vscode

    VS Code extension for LLM-assisted code/text completion

    llama.vscode is a Visual Studio Code extension that provides AI-assisted coding features powered primarily by locally running language models. The extension is designed to be lightweight and efficient, enabling developers to use AI tools even on consumer-grade hardware. It integrates with the llama.cpp runtime to run language models locally, eliminating the need to rely entirely on external APIs or cloud providers. The extension supports common AI development features such as code completion, conversational chat assistance, and AI-assisted code editing directly within the IDE. Developers can select and manage models through a configuration interface that automatically downloads and runs the required models locally. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Paddler

    Paddler

    Open-source LLM load balancer and serving platform for hosting LLMs

    ...The system acts as a specialized load balancer and serving layer for language models, enabling organizations to run inference workloads without relying on external API providers. It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The architecture is designed with privacy and cost control in mind, making it suitable for organizations that handle sensitive data or require predictable operational costs. Paddler also includes tools for monitoring, request buffering, and autoscaling integration so that deployments can adapt dynamically to changing workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    llamafile

    llamafile

    Distribute and run LLMs with a single file

    llamafile lets you distribute and run LLMs with a single file. (announcement blog post). Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation. The easiest way to try it for yourself is to download our example llamafile for the LLaVA model (license: LLaMA 2, OpenAI). LLaVA is a new LLM that can do more than just chat; you can also upload images and ask it questions about them. ...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 22
    qvac-fabric-llm.cpp

    qvac-fabric-llm.cpp

    QVAC Fabric: cross-platform LLM inference and fine-tuning

    qvac-fabric-llm.cpp is a cross-platform large language model inference and fine-tuning engine built as an advanced fork of llama.cpp, designed to run efficiently across desktops, mobile devices, and heterogeneous GPU environments. The project focuses on removing hardware limitations traditionally associated with LLM deployment by enabling support for a wide range of backends, including Vulkan, Metal, CUDA, and CPU, making it accessible on devices ranging from smartphones to enterprise servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DevoxxGenie

    DevoxxGenie

    DevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's

    Devoxx Genie is a fully Java-based LLM Code Assistant plugin for IntelliJ IDEA, designed to integrate with local LLM providers such as Ollama, LMStudio, GPT4All, Llama.cpp, and Exo but also cloud-based LLMs such as OpenAI, Anthropic, Mistral, Groq, Gemini, DeepInfra, DeepSeek, OpenRouter and Azure OpenAI.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 25
    Qwen3

    Qwen3

    Qwen3 is the large language model series developed by Qwen team

    Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K tokens. It delivers higher quality and more helpful text generation across multiple languages and domains, including mathematics, coding, science, and tool usage. Various quantized versions,...
    Downloads: 30 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB