| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| lmdeploy-0.12.2+cu128-cp310-cp310-manylinux2014_x86_64.whl | 2026-03-18 | 115.9 MB | |
| lmdeploy-0.12.2+cu128-cp310-cp310-win_amd64.whl | 2026-03-18 | 45.8 MB | |
| lmdeploy-0.12.2+cu128-cp311-cp311-manylinux2014_x86_64.whl | 2026-03-18 | 115.9 MB | |
| lmdeploy-0.12.2+cu128-cp311-cp311-win_amd64.whl | 2026-03-18 | 45.8 MB | |
| lmdeploy-0.12.2+cu128-cp312-cp312-manylinux2014_x86_64.whl | 2026-03-18 | 115.9 MB | |
| lmdeploy-0.12.2+cu128-cp312-cp312-win_amd64.whl | 2026-03-18 | 45.8 MB | |
| lmdeploy-0.12.2+cu128-cp313-cp313-manylinux2014_x86_64.whl | 2026-03-18 | 115.9 MB | |
| lmdeploy-0.12.2+cu128-cp313-cp313-win_amd64.whl | 2026-03-18 | 45.8 MB | |
| README.md | 2026-03-18 | 4.1 kB | |
| v0.12.2 source code.tar.gz | 2026-03-18 | 1.5 MB | |
| v0.12.2 source code.zip | 2026-03-18 | 2.3 MB | |
| Totals: 11 Items | 650.9 MB | 0 | |
What's Changed
🚀 Features
- support glm5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4355
- Qwen/Internlm/Llama Dense/Moe model fp8 quant online by @43758726 in https://github.com/InternLM/lmdeploy/pull/4324
- Qwen3.5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4351
- GLM-4.7-Flash Turbomind support by @lapy in https://github.com/InternLM/lmdeploy/pull/4362
- Support router replay and ignore quant layer for qwen3.5 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4394
- [Feature] Add TurboMind support for Qwen3.5 models (dense + MoE) by @lapy in https://github.com/InternLM/lmdeploy/pull/4389
- support repetition ngram logits processor by @grimoire in https://github.com/InternLM/lmdeploy/pull/4288
💥 Improvements
- Compatible with transformers 5.0 at TurboMind side by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4304
- Support fp32 head for qwen and internlm models by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4160
- Reduce MLA kv-cache memory by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4373
- add recurrent_gated_delta_rule kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/4376
- [ascend]adapt for s1-pro dp*tp+ep by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4380
- Support glm4.7 with mtp by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4346
- Faster MLA kernels by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4391
- Attention kernel self-registration and decoupled dispatching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4396
🐞 Bug fixes
- fix: change debug log from ERROR to DEBUG in RepetitionPenaltyKernel by @murray-macdonald in https://github.com/InternLM/lmdeploy/pull/4363
- Fix quant config parsing for internvl awq model by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4369
- Fix XGrammar bitmask initialization and add null check for gen_config in generate method by @windreamer in https://github.com/InternLM/lmdeploy/pull/4349
- fix the logic of closing session by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4370
- Fix authorization by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4338
- Fix some minor issues and provide tests for Pipeline by @windreamer in https://github.com/InternLM/lmdeploy/pull/4365
- fix dllm mask on set_step by @grimoire in https://github.com/InternLM/lmdeploy/pull/4278
- fix models for transformers>=5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4381
- fix exception when aborting a request by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4403
- fix inference crashed on v100 with qwen3.5-0.8b by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4420
🌐 Other
- ci(lint): skip flaky deadlink test for python wiki page by @windreamer in https://github.com/InternLM/lmdeploy/pull/4357
- fix fa3 install by @irexyc in https://github.com/InternLM/lmdeploy/pull/4361
- fix lint by @windreamer in https://github.com/InternLM/lmdeploy/pull/4375
- upgrade triton and torch by @grimoire in https://github.com/InternLM/lmdeploy/pull/4379
- Add speculative decoding test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4377
- ci: integrate clang-format lint into pre-commit hooks by @windreamer in https://github.com/InternLM/lmdeploy/pull/4390
- Update dockerfile by removing cu11 and changing cu12.4 to cu12.6 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4398
- manually build dev image instead of publishing it every version by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4409
- bump version to v0.12.2 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4378
New Contributors
- @murray-macdonald made their first contribution in https://github.com/InternLM/lmdeploy/pull/4363
- @lapy made their first contribution in https://github.com/InternLM/lmdeploy/pull/4362
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.1...v0.12.2