Step1X-3D is an open-source framework for generating high-fidelity textured 3D assets from scratch — both their geometry and surface textures — using modern generative AI techniques. It combines a hybrid architecture: a geometry generation stage using a VAE-DiT model to output a watertight 3D representation (e.g. TSDF surface), and a texture synthesis stage that conditions on geometry and optionally reference input (or prompts) to produce view-consistent textures using a diffusion-based texture module. The result is fully 3D assets — meshes + textures — which can be rendered from any viewpoint, textured consistently, and used in 3D applications. To achieve this, the project includes a massive curated dataset: among more than 5 million candidate 3D assets, it filters and standardizes to produce a high-quality 2 million–asset subset suitable for training.
Features
- Two-stage 3D-native generative pipeline: geometry generation (via hybrid VAE-DiT) + diffusion-based texture synthesis
- Watertight mesh generation (TSDF → mesh) enabling clean, usable 3D assets for rendering or export
- View-consistent texturing — textures remain coherent across angles/views thanks to geometry conditioning and latent-space synchronization
- Large curated training dataset (~2 M high-quality 3D assets) drawn from >5 M raw assets after rigorous filtering and standardization
- Support for diverse styles (photorealistic, cartoon, sketch, stylized) and potential for user control / conditioning akin to 2D generative workflows
- Fully open-source: training code, inference code, model weights, dataset identifiers — enabling adaptation, fine-tuning, and reproducible research