Download Latest Version ModelOpt 0.43.0 Release source code.tar.gz (12.8 MB)
Email in envelope

Get an email when there's a new version of NVIDIA Model Optimizer

Home / 0.41.0
Name Modified Size InfoDownloads / Week
Parent folder
nvidia_modelopt-0.41.0-py3-none-any.whl 2026-01-20 934.6 kB
ModelOpt 0.41.0 Release source code.tar.gz 2026-01-19 11.7 MB
ModelOpt 0.41.0 Release source code.zip 2026-01-19 12.4 MB
README.md 2026-01-19 1.5 kB
Totals: 4 Items   25.1 MB 0

Bug Fixes

  • Fix Megatron KV Cache quantization checkpoint restore for QAT/QAD (device placement, amax sync across DP/TP, flash_decode compatibility).

New Features

  • Add support for Transformer Engine quantization for Megatron Core models.
  • Add support for Qwen3-Next model quantization.
  • Add support for dynamically linked TensorRT plugins in the ONNX quantization workflow.
  • Add support for KV Cache Quantization for vLLM FakeQuant PTQ script. See examples/vllm_serve/README.md for more details.
  • Add support for subgraphs in ONNX autocast.
  • Add support for parallel draft heads in Eagle speculative decoding.
  • Add support to enable custom emulated quantization backend. See register_quant_backend for more details. See an example in tests/unit/torch/quantization/test_custom_backend.py.
  • Add examples/llm_qad for QAD training with Megatron-LM.

Deprecations

  • Deprecate num_query_groups parameter in Minitron pruning (mcore_minitron). You can use ModelOpt 0.40.0 or earlier instead if you need to prune it.

Backward Breaking Changes

  • Remove torchprofile as a default dependency from ModelOpt as it's used only for flops-based FastNAS pruning (computer vision models). It can be installed separately if needed.
Source: README.md, updated 2026-01-19