| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| lmdeploy-0.12.3+cu128-cp310-cp310-manylinux2014_x86_64.whl | 2026-04-08 | 136.3 MB | |
| lmdeploy-0.12.3+cu128-cp310-cp310-win_amd64.whl | 2026-04-08 | 57.9 MB | |
| lmdeploy-0.12.3+cu128-cp311-cp311-manylinux2014_x86_64.whl | 2026-04-08 | 136.3 MB | |
| lmdeploy-0.12.3+cu128-cp311-cp311-win_amd64.whl | 2026-04-08 | 57.9 MB | |
| lmdeploy-0.12.3+cu128-cp312-cp312-manylinux2014_x86_64.whl | 2026-04-08 | 136.3 MB | |
| lmdeploy-0.12.3+cu128-cp312-cp312-win_amd64.whl | 2026-04-08 | 57.9 MB | |
| lmdeploy-0.12.3+cu128-cp313-cp313-manylinux2014_x86_64.whl | 2026-04-08 | 136.3 MB | |
| lmdeploy-0.12.3+cu128-cp313-cp313-win_amd64.whl | 2026-04-08 | 57.9 MB | |
| README.md | 2026-04-08 | 5.3 kB | |
| v0.12.3 source code.tar.gz | 2026-04-08 | 1.6 MB | |
| v0.12.3 source code.zip | 2026-04-08 | 2.4 MB | |
| Totals: 11 Items | 780.6 MB | 1 | |
What's Changed
🚀 Features
- Support video inputs by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4360
- feat: fully implement compressed-tensors gs32 support in TurboMind by @lapy in https://github.com/InternLM/lmdeploy/pull/4429
- Draft model update params by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4452
💥 Improvements
- support qwen3.5 on volta by @grimoire in https://github.com/InternLM/lmdeploy/pull/4405
- Optimize Qwen3.5 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4434
- Builtin mrope by @grimoire in https://github.com/InternLM/lmdeploy/pull/4393
- delete ray remote function return value by @grimoire in https://github.com/InternLM/lmdeploy/pull/4422
- support cache_seqlen on recurrent-gdr and causal-conv1d-update by @grimoire in https://github.com/InternLM/lmdeploy/pull/4417
- safe ray api by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4455
- add R3 for qwen3-vl-moe models by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4457
- Align rope init in lmdeploy by @RangiLyu in https://github.com/InternLM/lmdeploy/pull/4466
- Make tilelang a Linux-only dependency (like triton) by @Copilot in https://github.com/InternLM/lmdeploy/pull/4469
- prepare chunk indices before cache initialize by @grimoire in https://github.com/InternLM/lmdeploy/pull/4458
- unify rope device by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4467
- custom processor args by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4472
- Assign sequential api_server ports when proxy_url is unset by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4416
- disable fla intracard_backend by @grimoire in https://github.com/InternLM/lmdeploy/pull/4482
- [Fix][Feat] Fix worker sorting with external pg bundles & Support persistent buffer for update_params by @CyCle1024 in https://github.com/InternLM/lmdeploy/pull/4397
- simplify interns1 pro codes by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4480
🐞 Bug fixes
- fix test_hf_overrides for transformers>5 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4418
- fix qwen3.5 pytorch multimodal inference by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4430
- fix
generateendpoint by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4432 - Make Intern-S1-Pro compatible with Transformers 5.0+ by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4435
- fix multiround chat by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4438
- fix(async_engine): make safe_run cancellation cleanup reliable with shield and SafeRunException by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4439
- release state cache by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4462
- Split/tool call args json for qwen3coder tool calls (Qwen3.5) by @lapy in https://github.com/InternLM/lmdeploy/pull/4433
- fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace by @windreamer in https://github.com/InternLM/lmdeploy/pull/4456
- fix metrics by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4410
- fix security issues by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4447
- fix qwen3.5 fp8 support by @grimoire in https://github.com/InternLM/lmdeploy/pull/4470
- fix image / video resize function by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4478
- fix dynamic ntk device by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4483
- fix pagedattention pointer range by @grimoire in https://github.com/InternLM/lmdeploy/pull/4494
- fix glm4.7-flash by @grimoire in https://github.com/InternLM/lmdeploy/pull/4500
- Fix torch awq by @grimoire in https://github.com/InternLM/lmdeploy/pull/4503
🌐 Other
- [ci] add legacy test workflow and test config by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4387
- chore: add CLAUDE.md and Claude Code skills by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4413
- Fix CI errors including linting error and unit test error by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4431
- Use pyupgrade and ruff to modernize LMDeploy Python Code by @windreamer in https://github.com/InternLM/lmdeploy/pull/4392
- reduce ci memory by @irexyc in https://github.com/InternLM/lmdeploy/pull/4471
- fix: add safe.directory for git in docker workflows by @windreamer in https://github.com/InternLM/lmdeploy/pull/4474
- [ci] add nightly docker build workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4406
- split docker wheel preparation into staged build steps and use python 3.12 as the default version by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4476
- [Feat]: Support qwen35 with mtp by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4437
- bump version to v0.12.3 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4493
New Contributors
- @RangiLyu made their first contribution in https://github.com/InternLM/lmdeploy/pull/4466
- @Copilot made their first contribution in https://github.com/InternLM/lmdeploy/pull/4469
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.2...v0.12.3