| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| lmdeploy-0.12.1+cu128-cp310-cp310-manylinux2014_x86_64.whl | 2026-02-13 | 98.8 MB | |
| lmdeploy-0.12.1+cu128-cp310-cp310-win_amd64.whl | 2026-02-13 | 36.1 MB | |
| lmdeploy-0.12.1+cu128-cp311-cp311-manylinux2014_x86_64.whl | 2026-02-13 | 98.8 MB | |
| lmdeploy-0.12.1+cu128-cp311-cp311-win_amd64.whl | 2026-02-13 | 36.1 MB | |
| lmdeploy-0.12.1+cu128-cp312-cp312-manylinux2014_x86_64.whl | 2026-02-13 | 98.9 MB | |
| lmdeploy-0.12.1+cu128-cp312-cp312-win_amd64.whl | 2026-02-13 | 36.1 MB | |
| lmdeploy-0.12.1+cu128-cp313-cp313-manylinux2014_x86_64.whl | 2026-02-13 | 98.9 MB | |
| lmdeploy-0.12.1+cu128-cp313-cp313-win_amd64.whl | 2026-02-13 | 36.1 MB | |
| README.md | 2026-02-13 | 2.5 kB | |
| v0.12.1 source code.tar.gz | 2026-02-13 | 1.5 MB | |
| v0.12.1 source code.zip | 2026-02-13 | 2.2 MB | |
| Totals: 11 Items | 543.4 MB | 1 | |
What's Changed
🚀 Features
- support glm-4.7-flash by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4320
- [ascend]suppot ep by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3696
💥 Improvements
- fix rotary embedding for transformers v5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4303
- Improve metrics log by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4297
- Support ignore layers in quant config for qwen3 models by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4293
- add custom noaux kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/4345
- fix qwen3vl with transformers5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4348
🐞 Bug fixes
- fix tool call parser's streaming cursor by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4333
- Fix data race for guided decoding in TP mode by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4341
- fa3 check by @grimoire in https://github.com/InternLM/lmdeploy/pull/4340
- Fix time series preprocess by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4339
- Negative KV sequence length error in Attention op by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/4316
- fix qwen3-vl-moe long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/4342
- fix: move quantized norm to CPU instead of stale q_linear reference in smooth_quant by @Mr-Neutr0n in https://github.com/InternLM/lmdeploy/pull/4352
- update noaux-kernel check by @grimoire in https://github.com/InternLM/lmdeploy/pull/4358
🌐 Other
- change INPUT_CUDA_VERSION to 12.6.2 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4322
- add Qwen3-8B accuracy evaluation in llm_compressor.md by @43758726 in https://github.com/InternLM/lmdeploy/pull/4319
- [ci] refactor ete testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4274
- Set alias interns1_1 for interns1_pro by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4334
- build(docker): skip FA2 when use cu13 by @windreamer in https://github.com/InternLM/lmdeploy/pull/4356
- bump version to v0.12.1 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4350
New Contributors
- @Mr-Neutr0n made their first contribution in https://github.com/InternLM/lmdeploy/pull/4352
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.0...v0.12.1