Page 2 | inference free download

Showing 453 open source projects for "inference"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Top Corporate LMS for Training | Best Learning Management Software
Deliver and Track Online Training and Stay Compliant - with Axis LMS!

Axis LMS enables you to deliver online and virtual learning and training through a scalable, easy-to-use LMS that is designed to enhance your training, automate your workflows, engage your learners and keep you compliant.

Learn More
Contract Management Software | Concord
AI-powered contract management that helps businesses track spending, negotiate smarter, and never miss deadlines.

Concord serves small and mid-sized businesses and Fortune 500 companies. This robust, web-based platform is used by human resource, sales, procurement, and legal teams, and virtually anyone who deals with contracts.

Learn More
1

Nano-vLLM

A lightweight vLLM implementation built from scratch

Nano-vLLM is a lightweight implementation of the vLLM inference engine designed to run large language models efficiently while maintaining a minimal and readable codebase. The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
2

Stable Diffusion Version 2

High-Resolution Image Synthesis with Latent Diffusion Models

...The repository provides code for training and running Stable Diffusion-style models, instructions for installing dependencies (with notes about performance libraries like xformers), and guidance on hardware/driver requirements for efficient GPU inference and training. It’s organized as a practical, developer-focused toolkit: model code, scripts for inference, and examples for using memory-efficient attention and related optimizations are included so researchers and engineers can run or adapt the model for their own projects. The project sits within a larger ecosystem of Stability AI repositories (including inference-only reference implementations like SD3.5 and web UI projects) and the README points users toward compatible components, recommended CUDA/PyTorch versions.

Downloads: 9 This Week

Last Update: 2025-10-02
See Project
3

Oumi

Everything you need to build state-of-the-art foundation models

Oumi is an open-source framework that provides everything needed to build state-of-the-art foundation models, end-to-end. It aims to simplify the development of large-scale machine-learning models.

Downloads: 5 This Week

Last Update: 2026-01-28
See Project
4

llama2.c

Inference Llama 2 in one file of pure C

llama2.c is a minimalist implementation of the Llama 2 language model architecture designed to run entirely in pure C. Created by Andrej Karpathy, this project offers an educational and lightweight framework for performing inference on small Llama 2 models without external dependencies. It provides a full training and inference pipeline: models can be trained in PyTorch and later executed using a concise 700-line C program (run.c). While it can technically load Meta’s official Llama 2 models, current support is limited to fp32 precision, meaning practical use is capped at models up to around 7B parameters. ...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
Network Discovery Software | JDisc Discovery
JDisc Discovery supports the IT organizationss of medium-sized businesses and large-scale enterprises.

JDisc Discovery is a comprehensive network inventory and IT asset management solution designed to help organizations gain clear, up-to-date visibility into their IT environment. It automatically scans and maps devices across the network, including servers, workstations, virtual machines, and network hardware, to create a detailed inventory of all connected assets. This includes critical information such as hardware configurations, software installations, patch levels, and relationshipots between devices.

Learn More
5

Causal ML

Uplift modeling and causal inference with machine learning algorithms

Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research [1]. It provides a standard interface that allows users to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. ...

Downloads: 2 This Week

Last Update: 2026-02-06
See Project
6

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous compute infrastructure. ...

Downloads: 5 This Week

Last Update: 2026-03-10
See Project
7

huggingface_hub

The official Python client for the Huggingface Hub

The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.

Downloads: 10 This Week

Last Update: 3 days ago
See Project
8

ModelScope

Bring the notion of Model-as-a-Service to life

...Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of code.

Downloads: 4 This Week

Last Update: 2026-04-11
See Project
9

AirLLM

AirLLM 70B inference with single 4GB GPU

AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of VRAM. AirLLM preprocesses model weights so that each transformer layer can be loaded independently during computation, reducing the memory footprint while still performing full inference.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
Reliable Phone Service for Your Home or Business
Businesses that want a modern business phone system using their current phones

Calling made modern. Your business number. Your employees' phones. Our amazing features. A dial menu spoken by our voice actors. Callers press numbers to make purchases, hear MP3s, connect to specific staff, and more. Make and answer calls using your number on multiple phones without the caller ever knowing. Employees hear secret in-house menus, transfer calls, and send voicemails to their email, all from their dialpad. These business features require no new software or hardware. Your dialpad come to life. Porting your business or personal number at the press of a button. Select from our menu of modern voice features for your business or personal line. We'll activate these features on your current phone for you. No work (or learning) required from you. We'll be here to transform your number whenever your desires change.

Learn More
10

BentoML

Unified Model Serving Framework

...Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
11

GPflow

Gaussian processes in TensorFlow

GPflow is a package for building Gaussian process models in Python. It implements modern Gaussian process inference for composable kernels and likelihoods. GPflow builds on TensorFlow 2.4+ and TensorFlow Probability for running computations, which allows fast execution on GPUs.

Downloads: 0 This Week

Last Update: 2025-05-29
See Project
12

OpenVINO Notebooks

Jupyter notebook tutorials for OpenVINO

...The project is particularly useful for developers who want to learn how to optimize machine learning inference pipelines for production environments.

Downloads: 2 This Week

Last Update: 4 days ago
See Project
13

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

...Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. This project is licensed under the Apache-2.0 License. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role.

Downloads: 5 This Week

Last Update: 2 days ago
See Project
14

Wan2.1

Wan2.1: Open and Advanced Large-Scale Video Generative Model

...The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. Wan2.1’s architecture balances generation quality and inference cost, paving the way for later improvements seen in Wan2.2 such as Mixture-of-Experts and enhanced aesthetics. It was trained on large-scale video and image datasets, providing generalization across diverse scenes and motion patterns.

1 Review

Downloads: 96 This Week

Last Update: 2026-03-05
See Project
15

SageMaker Python SDK

Training and deploying machine learning models on Amazon SageMaker

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker-compatible Docker...

Downloads: 3 This Week

Last Update: 3 days ago
See Project
16

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.

1 Review

Downloads: 5 This Week

Last Update: 2025-09-23
See Project
17

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based on their needs. ...

Downloads: 14 This Week

Last Update: 2026-04-06
See Project
18

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

Downloads: 0 This Week

Last Update: 2025-06-02
See Project
19

SAM 3

Code for running inference and finetuning with SAM 3 model

SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...

Downloads: 51 This Week

Last Update: 2026-04-12
See Project
20

Transformer Engine

A library for accelerating Transformer models on NVIDIA GPUs

...As the number of parameters in Transformer models continues to grow, training and inference for architectures such as BERT, GPT, and T5 become very memory and compute-intensive. Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models.

Downloads: 6 This Week

Last Update: 2026-03-31
See Project
21

OpenLLM

Operating LLMs in production

An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease. With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2， StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.

Downloads: 9 This Week

Last Update: 2025-04-21
See Project
22

tiny-llm

A course of learning LLM inference serving on Apple Silicon

tiny-llm is an educational open-source project designed to teach system engineers how large language model inference and serving systems work by building them from scratch. The project is structured as a guided course that walks developers through the process of implementing the core components required to run a modern language model, including attention mechanisms, token generation, and optimization techniques. Rather than relying on high-level machine learning frameworks, the codebase uses mostly low-level array and matrix manipulation APIs so that developers can understand exactly how model inference works internally. ...

Downloads: 2 This Week

Last Update: 7 days ago
See Project
23

HunyuanVideo-I2V

A Customizable Image-to-Video Model based on HunyuanVideo

...It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and support for parallel inference via xDiT. Resolution, video length, stability mode, flow shift, seed, CPU offload etc. Parallel inference support using xDiT for multi-GPU speedups. LoRA training / fine-tuning support to add special effects or customize generation.

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
24

FLUX.1

Official inference repo for FLUX.1 models

FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic fidelity. ...

Downloads: 28 This Week

Last Update: 2026-01-19
See Project
25

MiniCPM4.1

Achieving 3+ generation speedup on reasoning tasks

...The model also supports both dense and sparse attention mechanisms, enabling more efficient computation depending on the selected inference framework. With improved pretraining on longer sequences and enhanced scaling techniques, MiniCPM4.1 delivers better performance in long-context tasks and complex problem solving.

Downloads: 5 This Week

Last Update: 6 days ago
See Project