Page 3 | llama-cpp-static free download

Showing 201 open source projects for "llama-cpp-static"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

Securing the Cloud Made Easy
Multi-cloud security delivered — now and in the future.

Designed for organizations operating in the cloud who need complete, centralized visibility of their entire cloud estate and want more time and resources dedicated to remediating the actual risks that matter, Orca Security is an agentless cloud Security Platform that provides security teams with 100% coverage their entire cloud environment.

Learn More
Accounting practice management software
Accountants, accounting firms, tax attorneys, tax professionals

Canopy is a cloud-based practice management software for accounting and tax firms, offering tools for client engagement, document management, workflow automation, and time & billing. Its Client Engagement platform centralizes interactions with a secure portal, customizable branding, and email integration, while the Document Management system enables organized, paperless file storage. The Workflow module enhances visibility into tasks and projects through templates, task assignments, and automation, reducing human error. Additionally, the Time & Billing feature tracks billable hours, generates invoices, and processes payments, ensuring accurate financial management. With its comprehensive features, Canopy streamlines operations, reduces stress, and enhances client experiences.

Learn More
1

model2Vec

Fast State-of-the-Art Static Embeddings

model2vec is an innovative embedding framework that converts large sentence transformer models into compact, high-speed static embedding models while preserving much of their semantic performance. The project focuses on dramatically reducing the computational cost of generating embeddings, achieving significant improvements in speed and model size without requiring large datasets for retraining. By using a distillation-based approach, it can produce lightweight models that run efficiently on CPUs, making it suitable for edge applications and large-scale processing pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
2

promptfoo

Evaluate and compare LLM outputs, catch regressions, improve prompts

...Use built-in metrics, LLM-graded evals, or define your own custom metrics. Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow. Use OpenAI, Anthropic, and open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API.

Downloads: 2 This Week

Last Update: 2026-04-14
See Project
3

SGLang

SGLang is a fast serving framework for large language models

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
4

AutoCoder

A long-running autonomous coding agent powered by the Claude Agent

...The core idea is to accelerate software production while preserving correctness and readability, minimizing the cognitive overhead that comes from switching between concept and implementation. Its architecture typically integrates language models with static analysis and template logic so that generated code is not only syntactically valid but also idiomatic and testable.

Downloads: 3 This Week

Last Update: 2026-02-05
See Project
Secure Cloud Storage for Files, Photos and Documents | pCloud
Store, access, and manage your files on your own terms, from anywhere.

Store, sync, and share your files securely with pCloud. Get up to 10 GB of free secure cloud storage and access your files from any device, anywhere.

Learn More
5

Curated Transformers

PyTorch library of curated Transformer models and their components

...It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary allocations and initialization.

Downloads: 3 This Week

Last Update: 2024-04-17
See Project
6

Elia

Terminal-based LLM chat tool with multi-model and local support

...It runs entirely in the command line, offering a keyboard-driven experience that reduces the need for switching between apps. Users can chat with both proprietary models like ChatGPT and Claude, as well as local models such as Llama 3, Mistral, and Gemma. Elia stores conversations in a local SQLite database, making it easy to revisit past interactions. It supports flexible usage with inline and full-screen chat modes, along with simple configuration through a single file. Installation is straightforward via pipx, and users can customize themes, system prompts, and model settings. ...

Downloads: 2 This Week

Last Update: 2026-03-19
See Project
7

Tribe AI

Low code tool to rapidly build and coordinate multi-agent teams

Low code tool to rapidly build and coordinate multi-agent teams. Have you heard the saying, 'Two minds are better than one'? That's true for agents too. Tribe leverages on the langgraph framework to let you customize and coordinate teams of agents easily. By splitting up tough tasks among agents who are good at different things, each one can focus on what it does best. This makes solving problems faster and better.

Downloads: 2 This Week

Last Update: 2024-10-07
See Project
8

LLMFarm

llama and other large language models on iOS and MacOS offline

LLMFarm is a framework designed to simplify the deployment, management, and utilization of large language models in local or self-hosted environments, focusing on accessibility and efficient resource usage. It enables users to run LLMs on personal hardware or private infrastructure, reducing dependency on external APIs and improving data privacy. The system typically provides a user-friendly interface for loading models, configuring inference parameters, and interacting with them through...

Downloads: 1 This Week

Last Update: 2026-03-19
See Project
9

Vulnhuntr

AI tool for detecting complex vulnerabilities in Python codebases

Vulnhuntr is an open source security tool that uses large language models to analyze codebases and identify remotely exploitable vulnerabilities. It focuses on Python projects and applies static code analysis combined with LLM reasoning to trace how user input flows through an application. Instead of scanning entire repositories at once, it builds call chains step by step, allowing deeper inspection of complex, multi-stage issues that traditional tools may miss. Vulnhuntr can generate detailed findings, including vulnerability explanations and potential exploit paths, helping developers and security teams understand risks faster. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
Software Defined Storage
The layered architecture of QuantaStor provides solution engineers with unprecedented flexibility and application design options.

QuantaStor is a unified Software-Defined Storage platform designed to scale up and out to make storage management easy while reducing overall enterprise storage costs.

Learn More
10

LLaMA-MoE

Building Mixture-of-Experts from LLaMA with Continual Pre-training

LLaMA-MoE is an open-source project that builds mixture-of-experts language models from LLaMA through expert partitioning and continual pre-training. The repository is centered on making MoE research more accessible by offering smaller and more affordable models with only about 3.0 to 3.5 billion activated parameters, which helps reduce deployment and experimentation costs.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
11

Pruna AI

Pruna is a model optimization framework built for developers

Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models. It is ideal for companies or teams looking to reduce reliance on external APIs while...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
12

chat

chat web app for teams, sass with user management and ratelimit

...The project supports OpenAI, Azure OpenAI, Claude, Gemini, and Ollama-hosted models, giving teams flexibility in how they connect model backends. Its feature set includes shareable static pages generated from conversations, searchable conversation snapshots, text file uploads, multimedia file support when the model allows it, and prompt management with shortcut-based access. The repository structure also points to both web and Flutter-based mobile components, suggesting a broader product surface than a simple browser-only interface.

Downloads: 0 This Week

Last Update: 1 day ago
See Project
13

Meta Agents Research Environments (ARE)

Meta Agents Research Environments is a comprehensive platform

Meta Agents Research Environments (ARE) is a simulation and benchmarking platform. It is designed to evaluate AI agents in dynamic, evolving, multi-step tasks. Unlike static benchmarks, ARE supports environments where agents must adapt to changes over time and reason over sequences of actions. It interacts with applications and faces uncertainty. The included Gaia2 benchmark offers 800 scenarios across multiple “universes”. It can test reasoning, memory, tool use, and adaptability. Integration with simulated applications/agent APIs (email, file system, etc.). ...

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
14

MCP Hub

An MCP client for Neovim that seamlessly integrates MCP servers

mcphub.nvim is an MCP (Model Context Protocol) client plugin for Neovim that seamlessly integrates MCP servers into your editing workflow with an intuitive interface for managing, testing, and using MCP servers with your favorite chat plugins. Create your first MCP capable agent you need only 6 lines of code. Works with any langchain-supported LLM that supports tool calling (OpenAI, Anthropic, Groq, LLama etc.) Explore MCP capabilities and generate starter code with the interactive code builder. An MCP client for Neovim that seamlessly integrates MCP servers into your editing workflow with an intuitive interface for managing, testing, and using MCP servers with your favorite chat plugins.

Downloads: 1 This Week

Last Update: 2025-08-15
See Project
15

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...

Downloads: 1 This Week

Last Update: 2025-02-22
See Project
16

Agents-Flex

Agents-Flex is an elegant LLM Application Framework like LangChain

Agents-Flex includes a variety of network protocols for connecting LLMs, such as HTTP, SSE and WS. Its simple and flexible design allows developers to easily connect to various LLMs, including OpenAI, LLama, and other AI. Agents-Flex provides a rich set of development templates and Prompt Frameworks, including FEW-SHOT, CRISPE, BROKE, and ICIO. Developers can also customize their own unique prompt templates. Agents-Flex has a very flexible Function Calling component. It supports local method definitions, parsing, callbacks through LLMs, and executing local methods to obtain results. ...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
17

NullClaw

Fastest, smallest, and fully autonomous AI assistant infrastructure

NullClaw is the smallest fully autonomous AI assistant infrastructure, built entirely in Zig as a single static binary with zero runtime dependencies. At just 678 KB with ~1 MB peak RAM usage, it boots in under 2 milliseconds and runs on virtually any hardware, including low-cost ARM boards. Despite its size, it delivers a complete AI stack with 22+ model providers, 18+ communication channels, integrated tools, hybrid memory, and sandboxed runtime support.

Downloads: 5 This Week

Last Update: 4 days ago
See Project
18

Cosmos-RL

Cosmos-RL is a flexible and scalable Reinforcement Learning framework

...The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters effectively. It is built with compatibility in mind, supporting popular model families such as LLaMA, Qwen, and diffusion-based world models, as well as integration with Hugging Face ecosystems. cosmos-rl also includes support for advanced RL algorithms, low-precision training, and fault-tolerant execution, making it suitable for large-scale production workloads.

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
19

EmoLLM

Pre & Post-training & Dataset & Evaluation & Depoly & RAG

...Its repository includes multiple model variants and training configurations spanning several underlying model families, including InternLM, Qwen, DeepSeek, Mixtral, LLaMA, and others, which shows that the initiative is structured as a broad ecosystem rather than a single release. The project also covers more than just model weights, with material for datasets, fine-tuning, evaluation, deployment, demos, RAG, and related subprojects such as its psychological digital assistant work.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
20

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation model

...It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.

Downloads: 2 This Week

Last Update: 2026-02-02
See Project
21

AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security

...By simulating realistic development scenarios, the benchmark assesses how well AI code generation systems handle security-sensitive programming tasks. AICGSecEval combines static and dynamic evaluation techniques to analyze generated code for vulnerabilities and functional correctness. The framework includes datasets, test cases, and evaluation metrics that measure how AI programming tools perform across multiple programming languages and vulnerability categories.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
22

Animated Drawings

Code to accompany "A Method for Animating Children's Drawings"

AnimatedDrawings is a framework that converts user sketches or line drawings into fully animated 2D motion sequences using learned motion priors. The idea is that you draw a simple static figure (stick figure, silhouette, or contour lines), and the system produces plausible skeletal motion (walking, jumping, dancing) that adheres to the drawn shape constraints. The architecture separates shape embedding (to understand user-drawn geometry) from motion embedding / generation (to produce temporally coherent movement). ...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
23

LLM-Pruner

On the Structural Pruning of Large Language Models

LLM-Pruner is an open-source framework designed to compress large language models through structured pruning techniques while maintaining their general capabilities. Large language models often require enormous computational resources, making them expensive to deploy and inefficient for many practical applications. LLM-Pruner addresses this issue by identifying and removing non-essential components within transformer architectures, such as redundant attention heads or feed-forward...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
24

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
25

tlm

Local CLI Copilot, powered by Ollama

...Instead of relying on cloud APIs or paid AI services, TLM runs entirely on the user’s workstation and integrates with local models managed through the Ollama runtime. This approach allows developers to use powerful open-source models such as Llama, Phi, DeepSeek, and Qwen while maintaining privacy and avoiding external service dependencies. The system supports contextual queries where the AI analyzes files within a directory and generates answers based on project documentation or source code. It also detects the user’s shell environment automatically, allowing it to generate commands tailored to shells such as Bash, Zsh, or PowerShell.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project