LMDeploy - Browse /v0.12.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmdeploy-0.12.0+cu128-cp310-cp310-manylinux2014_x86_64.whl	2026-02-04	98.8 MB	0
lmdeploy-0.12.0+cu128-cp310-cp310-win_amd64.whl	2026-02-04	36.1 MB	0
lmdeploy-0.12.0+cu128-cp311-cp311-manylinux2014_x86_64.whl	2026-02-04	98.8 MB	0
lmdeploy-0.12.0+cu128-cp311-cp311-win_amd64.whl	2026-02-04	36.1 MB	0
lmdeploy-0.12.0+cu128-cp312-cp312-manylinux2014_x86_64.whl	2026-02-04	98.8 MB	0
lmdeploy-0.12.0+cu128-cp312-cp312-win_amd64.whl	2026-02-04	36.1 MB	0
lmdeploy-0.12.0+cu128-cp313-cp313-manylinux2014_x86_64.whl	2026-02-04	98.8 MB	0
lmdeploy-0.12.0+cu128-cp313-cp313-win_amd64.whl	2026-02-04	36.1 MB	0
README.md	2026-02-04	5.9 kB	0
v0.12.0 source code.tar.gz	2026-02-04	1.4 MB	0
v0.12.0 source code.zip	2026-02-04	2.2 MB	0
Totals: 11 Items		543.2 MB	0

What's Changed

🚀 Features

Add Gloo communication to turbomind by @irexyc in https://github.com/InternLM/lmdeploy/pull/3362
[Feat] Support llm-compressor AWQ models in TurboMind by @43758726 in https://github.com/InternLM/lmdeploy/pull/4290
Router replay for gpt oss by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4298
Support llm-compressor symmetric quantized model inference in TurboMind by @43758726 in https://github.com/InternLM/lmdeploy/pull/4305
Support Intern-S1-Pro by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4318

💥 Improvements

Configurable max CTAs and NVLS usage for CUDA IPC communicator by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4227
Improve aborting all sessions by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4215
Moe Reduce kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/4228
Refactor attn by @grimoire in https://github.com/InternLM/lmdeploy/pull/4238
Optimize exception raising and error process by @grimoire in https://github.com/InternLM/lmdeploy/pull/4236
[AsyncEngine Refactor 1/N] define MultimodalProcessor to handle multimodal data processing by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4250
[AsyncEngine Refactor 2/N] Remove deprecates from chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4252
Configurable uvicorn timeout by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4255
Adapt to dlsime v0.0.2 by @JimyMa in https://github.com/InternLM/lmdeploy/pull/4242
[Fix] fix quant calibration dataset by @43758726 in https://github.com/InternLM/lmdeploy/pull/4256
lmdeploy suppport parrllel embedding by @Tsundoku958 in https://github.com/InternLM/lmdeploy/pull/4192
Refactor turbomind engine by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4223
Refactor Engine & ModelAgent interact by @grimoire in https://github.com/InternLM/lmdeploy/pull/4265
Support sleep and destroy deepep buffer by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4246
add yarn truncate by @grimoire in https://github.com/InternLM/lmdeploy/pull/4301
[AsyncEngine Refactor 3/N] Introduce Session and SessionManager by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4253
Add warning about NCCL 2.27 memory leaks by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4313

🐞 Bug fixes

Fix fope cos/sin coef device type by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4240
Fix include_stop_str_in_output with output_logits Exception by @windreamer in https://github.com/InternLM/lmdeploy/pull/4244
fix logit softcapping is None by @grimoire in https://github.com/InternLM/lmdeploy/pull/4247
Fix performance regression for prefix caching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4270
convert float16 weight to bfloat16 for FP8 models by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4276
[ascend] fix dp multinode rank_table mapping by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/4268
[Fix] move calibrate load dataset location by @43758726 in https://github.com/InternLM/lmdeploy/pull/4289
fix ignore-eos by @grimoire in https://github.com/InternLM/lmdeploy/pull/4282
fix MPEngine poll by @grimoire in https://github.com/InternLM/lmdeploy/pull/4287
Fix prefix caching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4292
Fix gemma chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4280
Fix scheduler metrics by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4294
Fix NVLS init for mixed DP+TP by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4296
[side-effect] The tool message dump is incomplete by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4299
Fix mla with spec tokens by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4302
fix stop long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/4309
fix crash on client disconnect (Ctrl+C) by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4308
Ensure the pipe benchmark uses kwargs when calling pipe.stream_infer by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4312
fix get_ppl for long context by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4314
fix sleep engine for dp=1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4315

🌐 Other

[ci] fix fail testcase and add generate testcase in pr test by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4231
Pin nvshmem version by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4257
fix: Pin timm version to avoid failed tests by @windreamer in https://github.com/InternLM/lmdeploy/pull/4258
docs: add generated openapi spec documentation by @windreamer in https://github.com/InternLM/lmdeploy/pull/4251
fix: get rid of buggy timm-1.0.23 by @windreamer in https://github.com/InternLM/lmdeploy/pull/4260
[ascend] fix paged prefill by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/4254
Fix ascend/maca/camb runtime_requirements by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/4262
docs: refine the documents by @windreamer in https://github.com/InternLM/lmdeploy/pull/4259
docs: add cli docs by @windreamer in https://github.com/InternLM/lmdeploy/pull/4264
Drop support for Python 3.9 as it has reached end-of-life by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4281
bump version to v0.12.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4300

New Contributors

@43758726 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4256

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.11.1...v0.12.0

Source: README.md, updated 2026-02-04

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

Get an email when there's a new version of LMDeploy

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors