inference free download

Showing 139 open source projects for "inference"

View related business solutions

Software Development Linux Clear Filters & Widen Search

AestheticsPro Medical Spa Software
Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.

Learn More
PageDNA: Web-to-Print eCommerce Software
eCommerce for Print, Signs and Fulfillment Trusted by In‑Plants and Commercial Print Leaders

PageDNA enables successful eCommerce strategies for commercial print sales organizations, internal print shops, and brand owners. PageDNA’s online ordering platform increases print volume while decreasing touch costs for all stakeholders: clientele, print operations, and the organizations they support.

Learn More
1

Mistral Inference

Official inference library for Mistral models

Open and portable generative AI for devs and businesses. We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license.

Downloads: 3 This Week

Last Update: 2025-03-20
See Project
2

SageMaker Hugging Face Inference Toolkit

Library for serving Transformers models on Amazon SageMaker

SageMaker Hugging Face Inference Toolkit is an open-source library for serving Transformers models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain Transformers models and tasks. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests.

Downloads: 3 This Week

Last Update: 2026-03-17
See Project
3

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...

Downloads: 19 This Week

Last Update: 2026-03-25
See Project
4

Gen.jl

A general-purpose probabilistic programming system

An open-source stack for generative modeling and probabilistic inference. Gen’s inference library gives users building blocks for writing efficient probabilistic inference algorithms that are tailored to their models, while automating the tricky math and the low-level implementation details. Gen helps users write hybrid algorithms that combine neural networks, variational inference, sequential Monte Carlo samplers, and Markov chain Monte Carlo.

Downloads: 3 This Week

Last Update: 2025-07-11
See Project
Find out just how much your login box can do for your customer | Auth0
With over 53 social login options, you can fast-track the signup and login experience for users.

From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.

Sign up
5

ONNX Runtime

ONNX Runtime: cross-platform, high performance ML inferencing

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. ...

Downloads: 62 This Week

Last Update: 2026-03-17
See Project
6

AIMET

AIMET is a library that provides advanced quantization and compression

Qualcomm Innovation Center (QuIC) is at the forefront of enabling low-power inference at the edge through its pioneering model-efficiency research. QuIC has a mission to help migrate the ecosystem toward fixed-point inference. With this goal, QuIC presents the AI Model Efficiency Toolkit (AIMET) - a library that provides advanced quantization and compression techniques for trained neural network models. AIMET enables neural networks to run more efficiently on fixed-point AI hardware accelerators. ...

Downloads: 33 This Week

Last Update: 2026-04-06
See Project
7

KServe

Standardized Serverless ML Inference Platform on Kubernetes

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and...

Downloads: 5 This Week

Last Update: 2026-03-13
See Project
8

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs.

Downloads: 87 This Week

Last Update: 2026-01-13
See Project
9

MegEngine

Easy-to-use deep learning framework with 3 key features

...On Windows 10 you can either install the Linux distribution through Windows Subsystem for Linux (WSL) or install the Windows distribution directly. Many other platforms are supported for inference.

Downloads: 2 This Week

Last Update: 2024-04-30
See Project
Safety Compliance Made Easy
SiteDocs is a digital safety management software used to support work site compliance.

Ideally designed for business that deals with Construction, Oil & Gas, Mining, Manufacturing, Mechanical, Electrical, Plumbing, Heating, and Excavating, SiteDocs is a perfect solution for any size business looking to modernize the way Safety Compliance is organized.

Learn More
10

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous compute infrastructure. ...

Downloads: 5 This Week

Last Update: 2026-03-10
See Project
11

BentoML

Unified Model Serving Framework

...Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
12

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

...Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. This project is licensed under the Apache-2.0 License. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role.

Downloads: 5 This Week

Last Update: 2 days ago
See Project
13

OpenShell

OpenShell is the safe, private runtime for autonomous AI agents.

...Each agent runs inside a containerized sandbox governed by declarative YAML security policies that control network access, file permissions, and process behavior. The platform includes a gateway service that manages sandbox lifecycles and routes AI inference requests through controlled providers. OpenShell also features a privacy-aware routing system that prevents sensitive information from leaving the sandbox environment. By combining container isolation, policy enforcement, and agent orchestration, OpenShell offers a secure infrastructure for developing and operating AI agents.

Downloads: 28 This Week

Last Update: 2 days ago
See Project
14

AWS Neuron

Powering Amazon custom machine learning chips

AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Amazon EC2 Inf1 instances. Using Neuron developers can easily train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. ...

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
15

Scala 3

The Scala 3 compiler, also known as Dotty

Scala 3 is the latest major release of the Scala language—featuring a complete compiler rewrite (Dotty), new syntax with optional braces and given/using contextual abstractions, union/intersection types, opaque types, first-class enums, and better type inference. It unifies object-oriented and functional programming paradigms into a safer, more expressive language running on the JVM with full Java interoperability.

Downloads: 19 This Week

Last Update: 2026-03-25
See Project
16

MNN

MNN is a blazing fast, lightweight deep learning framework

MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models, and has industry leading performance for inference and training on-device. At present, MNN has been integrated in more than 20 apps of Alibaba Inc, such as Taobao, Tmall, Youku, Dingtalk, Xianyu and etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. ...

Downloads: 13 This Week

Last Update: 2026-04-07
See Project
17

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
18

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...

Downloads: 4 This Week

Last Update: 2025-10-13
See Project
19

SageMaker TensorFlow Training Toolkit

Toolkit for running TensorFlow training scripts on SageMaker

...To use your TensorFlow Serving model on SageMaker, you first need to create a SageMaker Model. After creating a SageMaker Model, you can use it to create SageMaker Batch Transform Jobs for offline inference, or create SageMaker Endpoints for real-time inference. A SageMaker Model contains references to a model.tar.gz file in S3 containing serialized model data, and a Docker image used to serve predictions with that model. A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.

Downloads: 0 This Week

Last Update: 2025-06-04
See Project
20

XNNPACK

High-efficiency floating-point neural network inference operators

XNNPACK is a highly optimized, low-level neural network inference library developed by Google for accelerating deep learning workloads across a variety of hardware architectures, including ARM, x86, WebAssembly, and RISC-V. Rather than serving as a standalone ML framework, XNNPACK provides high-performance computational primitives—such as convolutions, pooling, activation functions, and arithmetic operations—that are integrated into higher-level frameworks like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, TensorFlow.js, and MediaPipe. ...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
21

Gitleaks

Protect and discover secrets using Gitleaks

Gitleaks is a fast, lightweight, portable, and open-source secret scanner for git repositories, files, and directories. With over 6.8 million docker downloads, 11.2k GitHub stars, 1.7 million GitHub Downloads, thousands of weekly clones, and over 400k homebrew installs, gitleaks is the most trusted secret scanner among security professionals, enterprises, and developers. Gitleaks-Action is our official GitHub Action. You can use it to automatically run a gitleaks scan on all your team's pull...

Downloads: 29 This Week

Last Update: 2026-03-21
See Project
22

DALI

A GPU-accelerated library containing highly optimized building blocks

...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.

Downloads: 1 This Week

Last Update: 2026-02-19
See Project
23

PyMC3

Probabilistic programming in Python

...A Gaussian process (GP) can be used as a prior probability distribution whose support is over the space of continuous functions. PyMC3 provides rich support for defining and using GPs. Variational inference saves computational cost by turning a problem of integration into one of optimization. PyMC3's variational API supports a number of cutting edge algorithms, as well as minibatch for scaling to large datasets.

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
24

Cthulhu.jl

The slow descent into madness

Cthulhu.jl is a powerful introspection tool for exploring the Julia compiler’s method dispatch and type inference system. It allows users to interactively descend into the type-inferred lowered and LLVM IR of Julia functions from the REPL. This makes it ideal for developers who want to optimize performance, debug type instability, or understand how Julia compiles code. Named after the Lovecraftian idea of descending into madness, Cthulhu reveals the "underworld" of Julia compilation.

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
25

pytype

A static type analyzer for Python code

...Documentation and a user guide detail configuration, supported features, and how inference interacts with modern Python idioms.

Downloads: 0 This Week

Last Update: 2025-10-09
See Project