LMDeploy - Browse /v0.12.2 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmdeploy-0.12.2+cu128-cp310-cp310-manylinux2014_x86_64.whl	2026-03-18	115.9 MB	0
lmdeploy-0.12.2+cu128-cp310-cp310-win_amd64.whl	2026-03-18	45.8 MB	0
lmdeploy-0.12.2+cu128-cp311-cp311-manylinux2014_x86_64.whl	2026-03-18	115.9 MB	0
lmdeploy-0.12.2+cu128-cp311-cp311-win_amd64.whl	2026-03-18	45.8 MB	0
lmdeploy-0.12.2+cu128-cp312-cp312-manylinux2014_x86_64.whl	2026-03-18	115.9 MB	0
lmdeploy-0.12.2+cu128-cp312-cp312-win_amd64.whl	2026-03-18	45.8 MB	0
lmdeploy-0.12.2+cu128-cp313-cp313-manylinux2014_x86_64.whl	2026-03-18	115.9 MB	0
lmdeploy-0.12.2+cu128-cp313-cp313-win_amd64.whl	2026-03-18	45.8 MB	0
README.md	2026-03-18	4.1 kB	0
v0.12.2 source code.tar.gz	2026-03-18	1.5 MB	0
v0.12.2 source code.zip	2026-03-18	2.3 MB	0
Totals: 11 Items		650.9 MB	0

What's Changed

support glm5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4355
Qwen/Internlm/Llama Dense/Moe model fp8 quant online by @43758726 in https://github.com/InternLM/lmdeploy/pull/4324
Qwen3.5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4351
GLM-4.7-Flash Turbomind support by @lapy in https://github.com/InternLM/lmdeploy/pull/4362
Support router replay and ignore quant layer for qwen3.5 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4394
[Feature] Add TurboMind support for Qwen3.5 models (dense + MoE) by @lapy in https://github.com/InternLM/lmdeploy/pull/4389
support repetition ngram logits processor by @grimoire in https://github.com/InternLM/lmdeploy/pull/4288

Compatible with transformers 5.0 at TurboMind side by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4304
Support fp32 head for qwen and internlm models by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4160
Reduce MLA kv-cache memory by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4373
add recurrent_gated_delta_rule kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/4376
[ascend]adapt for s1-pro dp*tp+ep by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4380
Support glm4.7 with mtp by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4346
Faster MLA kernels by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4391
Attention kernel self-registration and decoupled dispatching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4396

fix: change debug log from ERROR to DEBUG in RepetitionPenaltyKernel by @murray-macdonald in https://github.com/InternLM/lmdeploy/pull/4363
Fix quant config parsing for internvl awq model by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4369
Fix XGrammar bitmask initialization and add null check for gen_config in generate method by @windreamer in https://github.com/InternLM/lmdeploy/pull/4349
fix the logic of closing session by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4370
Fix authorization by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4338
Fix some minor issues and provide tests for Pipeline by @windreamer in https://github.com/InternLM/lmdeploy/pull/4365
fix dllm mask on set_step by @grimoire in https://github.com/InternLM/lmdeploy/pull/4278
fix models for transformers>=5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4381
fix exception when aborting a request by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4403
fix inference crashed on v100 with qwen3.5-0.8b by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4420

ci(lint): skip flaky deadlink test for python wiki page by @windreamer in https://github.com/InternLM/lmdeploy/pull/4357
fix fa3 install by @irexyc in https://github.com/InternLM/lmdeploy/pull/4361
fix lint by @windreamer in https://github.com/InternLM/lmdeploy/pull/4375
upgrade triton and torch by @grimoire in https://github.com/InternLM/lmdeploy/pull/4379
Add speculative decoding test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4377
ci: integrate clang-format lint into pre-commit hooks by @windreamer in https://github.com/InternLM/lmdeploy/pull/4390
Update dockerfile by removing cu11 and changing cu12.4 to cu12.6 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4398
manually build dev image instead of publishing it every version by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4409
bump version to v0.12.2 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4378

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.1...v0.12.2

Source: README.md, updated 2026-03-18