inference free download

Showing 51 open source projects for "inference"

View related business solutions

TypeScript Clear Filters & Widen Search

Loan management software that makes it easy.
Ideal for lending professionals who are looking for a feature rich loan management system

Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.

Learn More
Rezku Point of Sale
Designed for Real-World Restaurant Operations

Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.

Learn More
1

wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms.

Downloads: 6 This Week

Last Update: 2026-03-10
See Project
2

BrowserAI

Run local LLMs like llama, deepseek, kokoro etc. inside your browser

BrowserAI is a cutting-edge platform that allows users to run large language models (LLMs) directly in their web browser without the need for a server. It leverages WebGPU for accelerated performance and supports offline functionality, making it a highly efficient and privacy-conscious solution. The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as...

Downloads: 8 This Week

Last Update: 2025-05-21
See Project
3

RWKV Runner

A RWKV management and startup tool, full automation, only 8MB

RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free. Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off Use Custom CUDA kernel to Accelerate.

Downloads: 7 This Week

Last Update: 2026-02-01
See Project
4

Zod

TypeScript-first schema validation with static type inference

TypeScript-first schema validation with static type inference. Zod is a TypeScript-first schema declaration and validation library. I'm using the term "schema" to broadly refer to any data type, from a simple string to a complex nested object. Zod is designed to be as developer-friendly as possible. The goal is to eliminate duplicative type declarations. With Zod, you declare a validator once and Zod will automatically infer the static TypeScript type.

Downloads: 10 This Week

Last Update: 2026-01-22
See Project
Data management solutions for confident marketing
For companies wanting a complete Data Management solution that is native to Salesforce

Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.

Learn More
5

MLX Engine

LM Studio Apple MLX engine

MLX Engine is the Apple MLX-based inference backend used by LM Studio to run large language models efficiently on Apple Silicon hardware. Built on top of the mlx-lm and mlx-vlm ecosystems, the engine provides a unified architecture capable of supporting both text-only and multimodal models. Its design focuses on high-performance on-device inference, leveraging Apple’s MLX stack to accelerate computation on M-series chips.

Downloads: 1 This Week

Last Update: 7 days ago
See Project
6

NemoClaw

NVIDIA plugin for secure installation of OpenClaw

...It installs and configures the NVIDIA OpenShell runtime, which provides a secure environment for running autonomous AI agents. NemoClaw enables users to launch sandboxed agent environments that control network access, file permissions, and inference requests through policy-based security. The platform integrates with AI models such as NVIDIA Nemotron and supports multiple inference backends including cloud APIs, local NIM deployments, and vLLM. Through its command-line interface, developers can deploy, monitor, and manage AI assistants running inside isolated sandboxes. By combining sandbox orchestration, agent management, and AI model integration, NemoClaw provides a secure foundation for building and operating autonomous AI assistants.

Downloads: 5 This Week

Last Update: 1 day ago
See Project
7

Secret Llama

Fully private LLM chatbot that runs entirely with a browser

Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the modern chat UX you’d expect—streaming responses, markdown, and a clean layout—so there’s no usability tradeoff to gain privacy. Under the hood it uses a web-native inference engine to accelerate model execution with GPU/WebGPU when available, keeping responses responsive even without a backend. ...

Downloads: 2 This Week

Last Update: 2025-11-07
See Project
8

Harbor LLM

Run a full local LLM stack with one command using Docker

...With a single command, users can start preconfigured tools like Ollama and Open WebUI, enabling chat, workflows, and integrations immediately. Harbor supports multiple inference engines, including llama.cpp and vLLM, and connects them seamlessly to user interfaces. It also includes tools for web retrieval, image generation, voice interaction, and workflow automation. Built on Docker, Harbor allows services to run in isolated containers while communicating over a local network. It is intended for local development and experimentation rather than production deployment, giving developers a flexible way to explore AI systems, test configurations, and manage complex LLM stacks without manual wiring or setup overhead.

Downloads: 16 This Week

Last Update: 1 day ago
See Project
9

Eko

Build Production-ready Agentic Workflow with Natural Language

Eko (Eko Keeps Operating) is a JavaScript framework designed for building production-ready agent-based workflows using natural language commands. It allows developers to create automated agents that can handle complex workflows in both computer and browser environments. With a focus on high development efficiency, Eko simplifies the creation of multi-step workflows, enabling users to integrate and automate tasks across platforms. It provides a unified interface for managing agents, offering...

Downloads: 10 This Week

Last Update: 2025-12-29
See Project
The AI workplace management platform
Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.

Learn More
10

Frigate NVR

NVR with realtime local object detection for IP cameras

...The system uses OpenCV and TensorFlow to analyze video feeds and detect objects such as people, vehicles, and animals in real time. Frigate is optimized for efficiency and supports hardware acceleration across a wide range of devices, including GPUs and specialized inference hardware. It also provides event recording, snapshot management, and searchable video history to improve home or small-business security workflows. Overall, Frigate functions as a privacy-focused, AI-powered NVR platform for intelligent video monitoring.

Downloads: 3 This Week

Last Update: 2026-03-19
See Project
11

WeKnora

LLM framework for document understanding and semantic retrieval

...This approach enables the system to provide more reliable answers by grounding model reasoning in the content of uploaded documents. WeKnora is designed with a modular architecture that separates components for document processing, search strategies, and model inference, allowing developers to customize or extend different parts of the pipeline. It supports knowledge base management and conversational question answering built on top of structured and unstructured documents.

Downloads: 6 This Week

Last Update: 2 days ago
See Project
12

Generative AI JS

This SDK is now deprecated, use the new unified Google GenAI SDK

...Though marked deprecated (likely superseded by newer SDKs), the repo shows how to wrap HTTP/WS endpoints, manage streaming responses, and interoperate with browser UI or server logic. The examples include chat widgets, prompt pipelines, and generalized inference utilities. It also deals with streaming cancellation, retries, backoff logic, and message chunk assembly to help developers handle real-world use. Because it’s JavaScript, the repo supports both ESM and CommonJS contexts, making it versatile in backend and frontend setups. The deprecation label reflects that newer or official SDKs may have replaced it, but many of its patterns still serve as a useful reference to understand how streaming, chunking, and prompt logic can be implemented by hand in JS.

Downloads: 8 This Week

Last Update: 2025-10-06
See Project
13

Riffusion App

Stable diffusion for real-time music generation (web app)

...The application is built with modern web technologies including Next.js, React, and three.js, providing a responsive and visually engaging interface for experimentation. It relies on a separate inference server to perform model computations, enabling flexible deployment depending on hardware capabilities. Users can input prompts or modify parameters to influence the style, tempo, and characteristics of generated audio, making it useful for creative exploration and prototyping.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
14

Latent Box

A collection of awesome-lists for AI, creativity and art. AI

...The platform emphasizes usability by providing a clean user interface that allows users to load models, configure parameters, and interact with them without needing deep technical knowledge of underlying frameworks. It supports local inference workflows, which are increasingly important for privacy-conscious users and organizations seeking to reduce reliance on external APIs. latentbox also enables extensibility through plugins or integrations, allowing developers to customize model pipelines or connect additional tools.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
15

lmstudio.js

LM Studio TypeScript SDK

lmstudio.js is the official TypeScript and JavaScript SDK that enables developers to programmatically interact with LM Studio’s local AI runtime. The library exposes the same capabilities used internally by the LM Studio desktop application, allowing external apps to load models, run inference, and build autonomous AI workflows. It is designed to simplify the creation of local AI tools by handling complex concerns such as dependency management, hardware compatibility, and model configuration. The SDK introduces an agent-style API that can execute multi-step tool-using workflows through a single call, enabling more advanced automation scenarios. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
16

RestorePhotos.io

Restoring old and blurry face photos with AI

...The project is production-oriented, not just a toy: it uses Bytescale for storage and image processing, Vercel for hosting and serverless functions, Auth.js + Neon for authentication and database, and Upstash Redis for rate limiting. This combination makes it a good blueprint for building real-world AI apps that must deal with authentication, quotas, and storage as well as inference.

Downloads: 2 This Week

Last Update: 2025-11-19
See Project
17

FlowGram

Extensible workflow development framework

...Instead of shipping as a ready-made product, it provides the building blocks — a canvas for wiring together nodes, a form engine for configuring node parameters, a variable-scope and type-inference engine, and a set of “materials” (pre-built node types such as code execution, conditional logic, LLM calls, etc.) that can be composed into larger workflows. This makes FlowGram highly flexible: you can prototype data-processing pipelines, AI-agent flows, automation scripts, or even business process automation without writing all the plumbing yourself. ...

Downloads: 7 This Week

Last Update: 2026-04-02
See Project
18

Clippy

Clippy, now with some AI

Clippy is an open-source desktop assistant that allows users to run modern large language models locally while presenting them through a nostalgic interface inspired by Microsoft’s classic Clippy assistant from the 1990s. The project serves as both a playful homage to the early days of personal computing and a practical demonstration of local AI inference. Clippy integrates with the llama.cpp runtime to run models directly on a user’s computer without requiring cloud-based AI services. It supports models in the GGUF format, which allows it to run many publicly available open-source LLMs efficiently on consumer hardware. Users interact with the system through a simple animated assistant interface that can answer questions, generate text, and perform conversational tasks. ...

Downloads: 44 This Week

Last Update: 2026-03-09
See Project
19

lms

LM Studio CLI

...The tool allows developers to control model execution directly from the terminal, providing programmatic access to features that are otherwise available through graphical interfaces. Through the CLI, users can load and unload models, start or stop local inference servers, and inspect the inputs and outputs generated by language models. LMS is built using the LM Studio JavaScript SDK and integrates tightly with the LM Studio runtime environment. The interface is designed to simplify automation workflows and scripting tasks related to local AI deployment. By exposing model management capabilities through command-line commands, the tool enables developers to integrate local LLM operations into development pipelines and backend services. ...

Downloads: 0 This Week

Last Update: 7 days ago
See Project
20

type-challenges

Collection of TypeScript type challenges with online judge

...Each challenge is a miniature kata where you implement types that transform other types—parsing strings, inferring tuples, mapping unions—without writing any runtime code. Problems are arranged from warm-ups to brain-twisters, letting developers build intuition about distributive conditional types, inference in extends, variance, and other corner cases of the type system. The repository includes tests for each puzzle so you get immediate, compiler-driven feedback when your solution is correct. As a result, it doubles as both training material and a living reference for advanced patterns used in real libraries. Many engineers report that solving a handful of these dramatically improves their ability to write safe, expressive APIs with minimal runtime overhead.

Downloads: 0 This Week

Last Update: 2026-03-01
See Project
21

node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama

node-llama-cpp is a JavaScript and Node.js binding that allows developers to run large language models locally using the high-performance inference engine provided by llama.cpp. The library enables applications built with Node.js to interact directly with local LLM models without requiring a remote API or external service. By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. ...

Downloads: 19 This Week

Last Update: 2026-03-17
See Project
22

Jaaz

Open source multimodal creative AI assistant with infinite canvas tool

...It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and cloud-based inference systems, enabling flexible creative workflows. Jaaz emphasizes privacy and local-first operation, allowing creators to run AI models locally so that their data does not leave their device. It also includes collaborative planning tools such as visual layouts and storyboard organization to support complex creative projects. By combining generative AI with a canvas-based interface, the project aims to provide a creative platform.

Downloads: 14 This Week

Last Update: 2026-03-17
See Project
23

PasteGuard

Masks sensitive data and secrets before they reach AI

...PasteGuard supports two primary modes: mask mode, which anonymizes data and still uses external APIs; and route mode, which forwards sensitive requests to a local LLM inference engine while sending the rest to the cloud. It can be self-hosted via Docker, works with a wide range of SDKs and tools, and includes a browser extension for automatic protection in everyday AI chats.

Downloads: 9 This Week

Last Update: 2026-03-13
See Project
24

better-all

Better Promise.all with automatic dependency optimization

better-all is a TypeScript library that reinvents the familiar Promise.all construct by automatically analyzing and optimizing dependency graphs between asynchronous tasks, enabling maximal parallelization without manual orchestration. It addresses a common limitation where developers must manually refactor their promise chains to achieve efficient concurrency when some tasks depend on others, which can be error-prone and hard to maintain. With an object-based API, each task is declared as...

Downloads: 8 This Week

Last Update: 2026-02-01
See Project
25

llama.vscode

VS Code extension for LLM-assisted code/text completion

llama.vscode is a Visual Studio Code extension that provides AI-assisted coding features powered primarily by locally running language models. The extension is designed to be lightweight and efficient, enabling developers to use AI tools even on consumer-grade hardware. It integrates with the llama.cpp runtime to run language models locally, eliminating the need to rely entirely on external APIs or cloud providers. The extension supports common AI development features such as code...

Downloads: 11 This Week

Last Update: 2026-03-09
See Project