Megatron-LM
Ongoing research training transformer models at scale
...Megatron-LM is widely used in research and industry for pretraining GPT-, BERT-, T5-, and multimodal-style models, with tooling for checkpoint conversion and interoperability with Hugging Face. Overall, it is a production-grade system for organizations pushing the limits of large-scale language model training.