cpu disk memory for java free download

FlexLLMGen

Running large language models on a single GPU

...The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. ...

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples....

Downloads: 365 This Week

Last Update: 2026-03-19

See Project

LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

LMCache is an extension layer for LLM serving engines that accelerates inference, especially with long contexts, by storing and reusing key-value (KV) attention caches across requests. Instead of rebuilding KV states for repeated or shared text segments, LMCache persists and retrieves them from multiple tiers—GPU memory, CPU DRAM, and local disk—then injects them into subsequent requests to reduce TTFT and increase throughput. Its design supports reuse beyond strict prefix matching and enables sharing across serving instances, improving efficiency under real multi-tenant traffic. The broader project includes examples, tests, a server component, and public posts describing cross-engine sharing and inter-GPU KV transfers. ...

Downloads: 12 This Week

Last Update: 2026-04-07

See Project

MCP Monitor

A system monitoring tool that exposes system metrics

The MCP System Monitor is a tool that exposes system metrics via the Model Context Protocol (MCP), allowing Large Language Models (LLMs) to retrieve real-time system information through an MCP-compatible interface.

Downloads: 2 This Week

Last Update: 2025-08-02

See Project

NNVM

Open deep learning compiler stack for cpu, gpu

The vision of the Apache NNVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging machine learning models for any hardware platform. Compilation of deep learning models into minimum deployable modules. Infrastructure to automatically generates and optimize models on more backend with better performance....

Downloads: 0 This Week

Last Update: 2022-08-12

See Project

Search Results for "cpu disk memory for java"

Showing 5 open source projects for "cpu disk memory for java"

FlexLLMGen

whisper.cpp

LMCache

MCP Monitor

NNVM

Search Results for "cpu disk memory for java"

Showing 5 open source projects for "cpu disk memory for java"

FlexLLMGen

whisper.cpp

LMCache

MCP Monitor

NNVM

Related Searches

Related Categories