Intel LLM Library for PyTorch - Browse /v0.1.0-mas at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
Multi-Arc Serving release 0.1.0 source code.tar.gz	2025-04-07	4.2 MB	0
Multi-Arc Serving release 0.1.0 source code.zip	2025-04-07	6.0 MB	0
README.md	2025-04-07	1.9 kB	0
Totals: 3 Items		10.2 MB	0

Overview

This release introduces the latest update to the Multi-ARC vLLM serving solution, optimized for Intel Xeon + ARC platforms with ipex-llm vLLM. The new version delivers low latency and high throughput LLM serving with improved model compatibility and resource efficiency. Major component upgrades include: vLLM upgraded to 0.6.6, PyTorch upgraded to 2.6, oneAPI upgraded to 2025.0, oneCCL patch updated to 0.0.6.6.

New Features

Optimized vLLM serving for Intel Xeon + ARC multi-GPU platforms, enabling lower latency and higher throughput.
Supported various LLM models.
Enhanced support for loading models with minimal memory requirements.
Refined Docker image for improved ease of use and deployment.
Improved WebUI model connectivity and stability.
Added VLLM_LOG_OUTPUT=1 option to enable detailed input/output logging for vLLM.

Bug Fixes

Resolved multimodal issues including get_image failures and inference errors with models such as MiniCPM-V-2_6, Qwen2-VL, and GLM-4v-9B.
Fixed Qwen2-VL multi-request crash by removing Qwen2VisionAttention’s attention_mask and addressing mrope_positions instability.
Updated profile_run usage to avoid OOM (Out of Memory) crashes.
Resolved GQA kernel issues causing errors with multiple concurrent outputs.
Fixed --enable-prefix-caching none crash in specific cases.
Addressed low-bit overflow causing !!!!!! output error in DeepSeek-R1-Distill-Qwen-14B.
Resolved GPTQ and AWQ-related errors to improve compatibility across more models.

Docker Images

Source: README.md, updated 2025-04-07

Intel LLM Library for PyTorch Files

Accelerate local LLM inference and finetuning

Overview

New Features

Bug Fixes

Docker Images

Intel LLM Library for PyTorch Files

Accelerate local LLM inference and finetuning

Get an email when there's a new version of Intel LLM Library for PyTorch

Overview

New Features

Bug Fixes

Docker Images