Download Latest Version ModelOpt 0.43.0 Release source code.tar.gz (12.8 MB)
Email in envelope

Get an email when there's a new version of NVIDIA Model Optimizer

Home / 0.40.0
Name Modified Size InfoDownloads / Week
Parent folder
ModelOpt 0.40.0 Release source code.tar.gz 2025-12-11 11.7 MB
ModelOpt 0.40.0 Release source code.zip 2025-12-11 12.3 MB
README.md 2025-12-11 2.5 kB
Totals: 3 Items   24.0 MB 0

Bug Fixes

  • Fix a bug in FastNAS pruning (computer vision models) where the model parameters were sorted twice, messing up the ordering.
  • Fix Q/DQ/Cast node placements in 'FP32 required' tensors in custom ops in the ONNX quantization workflow.

New Features

  • Add MoE (e.g. Qwen3-30B-A3B, gpt-oss-20b) pruning support for num_moe_experts, moe_ffn_hidden_size, and moe_shared_expert_intermediate_size parameters in Minitron pruning (mcore_minitron).
  • Add specdec_bench example to benchmark speculative decoding performance. See examples/specdec_bench/README.md for more details.
  • Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
  • Add KL Divergence loss-based auto_quantize method. See auto_quantize API docs for more details.
  • Add support for saving and resuming auto_quantize search state. This speeds up the auto_quantize process by skipping the score estimation step if the search state is provided.
  • Add flag trt_plugins_precision in ONNX autocast to indicate custom ops precision. This is similar to the flag already existing in the quantization workflow.
  • Add support for PyTorch Geometric quantization.
  • Add per tensor and per channel MSE calibrator support.
  • Added support for PTQ/QAT checkpoint export and loading for running fakequant evaluation in vLLM. See examples/vllm_serve/README.md for more details.

Documentation

Misc

  • NVIDIA TensorRT Model Optimizer is now officially rebranded as NVIDIA Model Optimizer. GitHub will automatically redirect the old repository path (NVIDIA/TensorRT-Model-Optimizer) to the new one (NVIDIA/Model-Optimizer). Documentation URL is also changed to nvidia.github.io/Model-Optimizer.
  • Bump TensorRT-LLM test docker to 1.2.0rc4.
  • Bump minimum recommended transformers version to 4.53.
  • Replace ONNX simplification package from onnxsim to onnxslim.
Source: README.md, updated 2025-12-11