| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| lmdeploy-0.12.0+cu128-cp310-cp310-manylinux2014_x86_64.whl | 2026-02-04 | 98.8 MB | |
| lmdeploy-0.12.0+cu128-cp310-cp310-win_amd64.whl | 2026-02-04 | 36.1 MB | |
| lmdeploy-0.12.0+cu128-cp311-cp311-manylinux2014_x86_64.whl | 2026-02-04 | 98.8 MB | |
| lmdeploy-0.12.0+cu128-cp311-cp311-win_amd64.whl | 2026-02-04 | 36.1 MB | |
| lmdeploy-0.12.0+cu128-cp312-cp312-manylinux2014_x86_64.whl | 2026-02-04 | 98.8 MB | |
| lmdeploy-0.12.0+cu128-cp312-cp312-win_amd64.whl | 2026-02-04 | 36.1 MB | |
| lmdeploy-0.12.0+cu128-cp313-cp313-manylinux2014_x86_64.whl | 2026-02-04 | 98.8 MB | |
| lmdeploy-0.12.0+cu128-cp313-cp313-win_amd64.whl | 2026-02-04 | 36.1 MB | |
| README.md | 2026-02-04 | 5.9 kB | |
| v0.12.0 source code.tar.gz | 2026-02-04 | 1.4 MB | |
| v0.12.0 source code.zip | 2026-02-04 | 2.2 MB | |
| Totals: 11 Items | 543.2 MB | 0 | |
What's Changed
🚀 Features
- Add Gloo communication to turbomind by @irexyc in https://github.com/InternLM/lmdeploy/pull/3362
- [Feat] Support llm-compressor AWQ models in TurboMind by @43758726 in https://github.com/InternLM/lmdeploy/pull/4290
- Router replay for gpt oss by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4298
- Support llm-compressor symmetric quantized model inference in TurboMind by @43758726 in https://github.com/InternLM/lmdeploy/pull/4305
- Support Intern-S1-Pro by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4318
💥 Improvements
- Configurable max CTAs and NVLS usage for CUDA IPC communicator by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4227
- Improve aborting all sessions by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4215
- Moe Reduce kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/4228
- Refactor attn by @grimoire in https://github.com/InternLM/lmdeploy/pull/4238
- Optimize exception raising and error process by @grimoire in https://github.com/InternLM/lmdeploy/pull/4236
- [AsyncEngine Refactor 1/N] define MultimodalProcessor to handle multimodal data processing by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4250
- [AsyncEngine Refactor 2/N] Remove deprecates from chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4252
- Configurable uvicorn timeout by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4255
- Adapt to dlsime v0.0.2 by @JimyMa in https://github.com/InternLM/lmdeploy/pull/4242
- [Fix] fix quant calibration dataset by @43758726 in https://github.com/InternLM/lmdeploy/pull/4256
- lmdeploy suppport parrllel embedding by @Tsundoku958 in https://github.com/InternLM/lmdeploy/pull/4192
- Refactor turbomind engine by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4223
- Refactor Engine & ModelAgent interact by @grimoire in https://github.com/InternLM/lmdeploy/pull/4265
- Support sleep and destroy deepep buffer by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4246
- add yarn truncate by @grimoire in https://github.com/InternLM/lmdeploy/pull/4301
- [AsyncEngine Refactor 3/N] Introduce Session and SessionManager by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4253
- Add warning about NCCL 2.27 memory leaks by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4313
🐞 Bug fixes
- Fix fope cos/sin coef device type by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4240
- Fix include_stop_str_in_output with output_logits Exception by @windreamer in https://github.com/InternLM/lmdeploy/pull/4244
- fix logit softcapping is None by @grimoire in https://github.com/InternLM/lmdeploy/pull/4247
- Fix performance regression for prefix caching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4270
- convert float16 weight to bfloat16 for FP8 models by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4276
- [ascend] fix dp multinode rank_table mapping by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/4268
- [Fix] move calibrate load dataset location by @43758726 in https://github.com/InternLM/lmdeploy/pull/4289
- fix ignore-eos by @grimoire in https://github.com/InternLM/lmdeploy/pull/4282
- fix MPEngine poll by @grimoire in https://github.com/InternLM/lmdeploy/pull/4287
- Fix prefix caching by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4292
- Fix gemma chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4280
- Fix scheduler metrics by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4294
- Fix NVLS init for mixed DP+TP by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4296
- [side-effect] The tool message dump is incomplete by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4299
- Fix mla with spec tokens by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4302
- fix stop long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/4309
- fix crash on client disconnect (Ctrl+C) by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4308
- Ensure the pipe benchmark uses kwargs when calling
pipe.stream_inferby @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4312 - fix get_ppl for long context by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4314
- fix sleep engine for dp=1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4315
🌐 Other
- [ci] fix fail testcase and add generate testcase in pr test by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4231
- Pin nvshmem version by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4257
- fix: Pin
timmversion to avoid failed tests by @windreamer in https://github.com/InternLM/lmdeploy/pull/4258 - docs: add generated openapi spec documentation by @windreamer in https://github.com/InternLM/lmdeploy/pull/4251
- fix: get rid of buggy timm-1.0.23 by @windreamer in https://github.com/InternLM/lmdeploy/pull/4260
- [ascend] fix paged prefill by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/4254
- Fix ascend/maca/camb runtime_requirements by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/4262
- docs: refine the documents by @windreamer in https://github.com/InternLM/lmdeploy/pull/4259
- docs: add cli docs by @windreamer in https://github.com/InternLM/lmdeploy/pull/4264
- Drop support for Python 3.9 as it has reached end-of-life by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4281
- bump version to v0.12.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4300
New Contributors
- @43758726 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4256
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.11.1...v0.12.0