inference free download

Showing 37 open source projects for "inference"

View related business solutions

Artificial Intelligence TypeScript Clear Filters & Widen Search

Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
Rezku Point of Sale
Designed for Real-World Restaurant Operations

Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.

Learn More
1

BrowserAI

Run local LLMs like llama, deepseek, kokoro etc. inside your browser

BrowserAI is a cutting-edge platform that allows users to run large language models (LLMs) directly in their web browser without the need for a server. It leverages WebGPU for accelerated performance and supports offline functionality, making it a highly efficient and privacy-conscious solution. The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as...

Downloads: 7 This Week

Last Update: 3 days ago
See Project
2

wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms.

Downloads: 1 This Week

Last Update: 2026-03-10
See Project
3

NemoClaw

NVIDIA plugin for secure installation of OpenClaw

...It installs and configures the NVIDIA OpenShell runtime, which provides a secure environment for running autonomous AI agents. NemoClaw enables users to launch sandboxed agent environments that control network access, file permissions, and inference requests through policy-based security. The platform integrates with AI models such as NVIDIA Nemotron and supports multiple inference backends including cloud APIs, local NIM deployments, and vLLM. Through its command-line interface, developers can deploy, monitor, and manage AI assistants running inside isolated sandboxes. By combining sandbox orchestration, agent management, and AI model integration, NemoClaw provides a secure foundation for building and operating autonomous AI assistants.

Downloads: 8 This Week

Last Update: 7 hours ago
See Project
4

Frigate NVR

NVR with realtime local object detection for IP cameras

...The system uses OpenCV and TensorFlow to analyze video feeds and detect objects such as people, vehicles, and animals in real time. Frigate is optimized for efficiency and supports hardware acceleration across a wide range of devices, including GPUs and specialized inference hardware. It also provides event recording, snapshot management, and searchable video history to improve home or small-business security workflows. Overall, Frigate functions as a privacy-focused, AI-powered NVR platform for intelligent video monitoring.

Downloads: 13 This Week

Last Update: 2026-03-19
See Project
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
5

RWKV Runner

A RWKV management and startup tool, full automation, only 8MB

RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free. Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off Use Custom CUDA kernel to Accelerate.

Downloads: 3 This Week

Last Update: 2026-02-01
See Project
6

MLX Engine

LM Studio Apple MLX engine

MLX Engine is the Apple MLX-based inference backend used by LM Studio to run large language models efficiently on Apple Silicon hardware. Built on top of the mlx-lm and mlx-vlm ecosystems, the engine provides a unified architecture capable of supporting both text-only and multimodal models. Its design focuses on high-performance on-device inference, leveraging Apple’s MLX stack to accelerate computation on M-series chips.

Downloads: 1 This Week

Last Update: 7 days ago
See Project
7

Secret Llama

Fully private LLM chatbot that runs entirely with a browser

Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the modern chat UX you’d expect—streaming responses, markdown, and a clean layout—so there’s no usability tradeoff to gain privacy. Under the hood it uses a web-native inference engine to accelerate model execution with GPU/WebGPU when available, keeping responses responsive even without a backend. ...

Downloads: 1 This Week

Last Update: 2025-11-07
See Project
8

Riffusion App

Stable diffusion for real-time music generation (web app)

...The application is built with modern web technologies including Next.js, React, and three.js, providing a responsive and visually engaging interface for experimentation. It relies on a separate inference server to perform model computations, enabling flexible deployment depending on hardware capabilities. Users can input prompts or modify parameters to influence the style, tempo, and characteristics of generated audio, making it useful for creative exploration and prototyping.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
9

Eko

Build Production-ready Agentic Workflow with Natural Language

Eko (Eko Keeps Operating) is a JavaScript framework designed for building production-ready agent-based workflows using natural language commands. It allows developers to create automated agents that can handle complex workflows in both computer and browser environments. With a focus on high development efficiency, Eko simplifies the creation of multi-step workflows, enabling users to integrate and automate tasks across platforms. It provides a unified interface for managing agents, offering...

Downloads: 3 This Week

Last Update: 2025-12-29
See Project
AestheticsPro Medical Spa Software
Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.

Learn More
10

Harbor LLM

Run a full local LLM stack with one command using Docker

...With a single command, users can start preconfigured tools like Ollama and Open WebUI, enabling chat, workflows, and integrations immediately. Harbor supports multiple inference engines, including llama.cpp and vLLM, and connects them seamlessly to user interfaces. It also includes tools for web retrieval, image generation, voice interaction, and workflow automation. Built on Docker, Harbor allows services to run in isolated containers while communicating over a local network. It is intended for local development and experimentation rather than production deployment, giving developers a flexible way to explore AI systems, test configurations, and manage complex LLM stacks without manual wiring or setup overhead.

Downloads: 4 This Week

Last Update: 4 days ago
See Project
11

WeKnora

LLM framework for document understanding and semantic retrieval

...This approach enables the system to provide more reliable answers by grounding model reasoning in the content of uploaded documents. WeKnora is designed with a modular architecture that separates components for document processing, search strategies, and model inference, allowing developers to customize or extend different parts of the pipeline. It supports knowledge base management and conversational question answering built on top of structured and unstructured documents.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
12

Latent Box

A collection of awesome-lists for AI, creativity and art. AI

...The platform emphasizes usability by providing a clean user interface that allows users to load models, configure parameters, and interact with them without needing deep technical knowledge of underlying frameworks. It supports local inference workflows, which are increasingly important for privacy-conscious users and organizations seeking to reduce reliance on external APIs. latentbox also enables extensibility through plugins or integrations, allowing developers to customize model pipelines or connect additional tools.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
13

lmstudio.js

LM Studio TypeScript SDK

lmstudio.js is the official TypeScript and JavaScript SDK that enables developers to programmatically interact with LM Studio’s local AI runtime. The library exposes the same capabilities used internally by the LM Studio desktop application, allowing external apps to load models, run inference, and build autonomous AI workflows. It is designed to simplify the creation of local AI tools by handling complex concerns such as dependency management, hardware compatibility, and model configuration. The SDK introduces an agent-style API that can execute multi-step tool-using workflows through a single call, enabling more advanced automation scenarios. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
14

Ollama Grid Search

A multi-platform desktop application to evaluate and compare LLM

Ollama Grid Search is a desktop application designed to automate the evaluation and comparison of large language models, prompts, and inference parameters in a structured and repeatable way. Instead of manually testing combinations, the tool performs grid search experiments by iterating across different models, prompt variations, and parameter configurations, allowing users to quickly identify optimal setups for specific tasks. It provides a visual interface where experiment results can be inspected, compared, and refined, making it especially useful for prompt engineering and benchmarking workflows. ...

Downloads: 0 This Week

Last Update: 14 hours ago
See Project
15

Clippy

Clippy, now with some AI

Clippy is an open-source desktop assistant that allows users to run modern large language models locally while presenting them through a nostalgic interface inspired by Microsoft’s classic Clippy assistant from the 1990s. The project serves as both a playful homage to the early days of personal computing and a practical demonstration of local AI inference. Clippy integrates with the llama.cpp runtime to run models directly on a user’s computer without requiring cloud-based AI services. It supports models in the GGUF format, which allows it to run many publicly available open-source LLMs efficiently on consumer hardware. Users interact with the system through a simple animated assistant interface that can answer questions, generate text, and perform conversational tasks. ...

Downloads: 40 This Week

Last Update: 2026-03-09
See Project
16

lms

LM Studio CLI

...The tool allows developers to control model execution directly from the terminal, providing programmatic access to features that are otherwise available through graphical interfaces. Through the CLI, users can load and unload models, start or stop local inference servers, and inspect the inputs and outputs generated by language models. LMS is built using the LM Studio JavaScript SDK and integrates tightly with the LM Studio runtime environment. The interface is designed to simplify automation workflows and scripting tasks related to local AI deployment. By exposing model management capabilities through command-line commands, the tool enables developers to integrate local LLM operations into development pipelines and backend services. ...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
17

Generative AI JS

This SDK is now deprecated, use the new unified Google GenAI SDK

...Though marked deprecated (likely superseded by newer SDKs), the repo shows how to wrap HTTP/WS endpoints, manage streaming responses, and interoperate with browser UI or server logic. The examples include chat widgets, prompt pipelines, and generalized inference utilities. It also deals with streaming cancellation, retries, backoff logic, and message chunk assembly to help developers handle real-world use. Because it’s JavaScript, the repo supports both ESM and CommonJS contexts, making it versatile in backend and frontend setups. The deprecation label reflects that newer or official SDKs may have replaced it, but many of its patterns still serve as a useful reference to understand how streaming, chunking, and prompt logic can be implemented by hand in JS.

Downloads: 0 This Week

Last Update: 2025-10-06
See Project
18

FlowGram

Extensible workflow development framework

...Instead of shipping as a ready-made product, it provides the building blocks — a canvas for wiring together nodes, a form engine for configuring node parameters, a variable-scope and type-inference engine, and a set of “materials” (pre-built node types such as code execution, conditional logic, LLM calls, etc.) that can be composed into larger workflows. This makes FlowGram highly flexible: you can prototype data-processing pipelines, AI-agent flows, automation scripts, or even business process automation without writing all the plumbing yourself. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
19

Operit AI

Powerful Android AI agent with tools, automation, and Linux shell

Operit is a full-featured AI assistant and agent platform designed specifically for Android devices, aiming to go far beyond traditional chat-based interfaces. It integrates deep system-level capabilities with a wide range of tools, allowing the AI to perform real tasks such as file management, automation, and system control directly on the device. A standout aspect of the project is its built-in Ubuntu 24 environment, which enables users to run Linux commands, scripts, and development tools...

Downloads: 13 This Week

Last Update: 4 days ago
See Project
20

Kodus

AI code reviews, just like your senior dev would do

Kodus-AI is a framework for building, training, and deploying intelligent agents and models, especially focusing on practical AI workflows for businesses and automation. It provides a structured set of tools and abstractions that help teams design agent behaviors, orchestrate data pipelines, optimize inference, and integrate AI capabilities with applications or services. The platform often includes model management, scalable training workflows, and orchestration patterns that help teams move from research or prototypes to production-ready AI deployments. Through configurable pipelines and a focus on modularity, it supports experimentation while maintaining reproducibility and performance. ...

Downloads: 6 This Week

Last Update: 13 hours ago
See Project
21

Nanocoder

A beautiful local-first coding agent running in your terminal

Nanocoder is an open-source, local-first coding assistant that runs in the command line and allows developers to use AI models to assist with programming tasks directly from their terminal environment. The tool is designed as a privacy-focused alternative to proprietary AI coding assistants, allowing users to run local models or connect to external APIs while keeping full control over their data and development workflow. Built with TypeScript and distributed as a CLI application, nanocoder...

Downloads: 11 This Week

Last Update: 2026-04-14
See Project
22

Open Responses

Specification for multi-provider, interoperable LLM interfaces

...It enables you to run a local or private server that speaks the standard Responses API, so tools, applications, and agents built against that API can operate without contacting OpenAI’s cloud and can instead route calls to any large language model provider you choose, such as Claude, Qwen, Ollama, or others. This makes it a powerful option for teams or individuals who want full control over their AI infrastructure, prioritize privacy, or need to standardize inference calls across multiple backends without rewriting their code.

Downloads: 2 This Week

Last Update: 2026-04-13
See Project
23

Cognita

Open source RAG framework for building scalable modular AI apps

Cognita is an open source framework designed to help developers build, organize, and deploy Retrieval-Augmented Generation (RAG) applications in a structured and production-ready way. It addresses the gap between quick experimentation in notebooks and the complexity of deploying scalable AI systems by introducing a modular and API-driven architecture. Cognita provides reusable components such as parsers, data loaders, embedders, retrievers, and query controllers, allowing teams to customize...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
24

Jaaz

Open source multimodal creative AI assistant with infinite canvas tool

...It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and cloud-based inference systems, enabling flexible creative workflows. Jaaz emphasizes privacy and local-first operation, allowing creators to run AI models locally so that their data does not leave their device. It also includes collaborative planning tools such as visual layouts and storyboard organization to support complex creative projects. By combining generative AI with a canvas-based interface, the project aims to provide a creative platform.

Downloads: 2 This Week

Last Update: 2026-03-17
See Project
25

node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama

node-llama-cpp is a JavaScript and Node.js binding that allows developers to run large language models locally using the high-performance inference engine provided by llama.cpp. The library enables applications built with Node.js to interact directly with local LLM models without requiring a remote API or external service. By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. ...

Downloads: 2 This Week

Last Update: 2026-03-17
See Project