Alternatives to HunyuanWorld
Compare HunyuanWorld alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to HunyuanWorld in 2026. Compare features, ratings, user reviews, pricing, and more from HunyuanWorld competitors and alternatives in order to make an informed decision for your business.
-
1
Hunyuan T1
Tencent
Hunyuan T1 is Tencent's deep-thinking AI model, now fully open to all users through the Tencent Yuanbao platform. This model excels in understanding multiple dimensions and potential logical relationships, making it suitable for handling complex tasks. Users can experience various AI models on the platform, including DeepSeek-R1 and Tencent Hunyuan Turbo. The official version of the Tencent Hunyuan T1 model will also be launched soon, providing external API access and other services. Built upon Tencent's Hunyuan large language model, Yuanbao excels in Chinese language understanding, logical reasoning, and task execution. It offers AI-based search, summaries, and writing capabilities, enabling users to analyze documents and engage in prompt-based interactions. -
2
Hunyuan-TurboS
Tencent
Tencent's Hunyuan-TurboS is a next-generation AI model designed to offer rapid responses and outstanding performance in various domains such as knowledge, mathematics, and creative tasks. Unlike previous models that require "slow thinking," Hunyuan-TurboS enhances response speed, doubling word output speed and reducing first-word latency by 44%. Through innovative architecture, it provides superior performance while lowering deployment costs. This model combines fast thinking (intuition-based responses) with slow thinking (logical analysis), ensuring quicker, more accurate solutions across diverse scenarios. Hunyuan-TurboS excels in benchmarks, competing with leading models like GPT-4 and DeepSeek V3, making it a breakthrough in AI-driven performance. -
3
HunyuanVideo
Tencent
HunyuanVideo is an advanced AI-powered video generation model developed by Tencent, designed to seamlessly blend virtual and real elements, offering limitless creative possibilities. It delivers cinematic-quality videos with natural movements and precise expressions, capable of transitioning effortlessly between realistic and virtual styles. This technology overcomes the constraints of short dynamic images by presenting complete, fluid actions and rich semantic content, making it ideal for applications in advertising, film production, and other commercial industries. -
4
HunyuanOCR
Tencent
Tencent Hunyuan is a large-scale, multimodal AI model family developed by Tencent that spans text, image, video, and 3D modalities, designed for general-purpose AI tasks like content generation, visual reasoning, and business automation. Its model lineup includes variants optimized for natural language understanding, multimodal vision-language comprehension (e.g., image & video understanding), text-to-image creation, video generation, and 3D content generation. Hunyuan models leverage a mixture-of-experts architecture and other innovations (like hybrid “mamba-transformer” designs) to deliver strong performance on reasoning, long-context understanding, cross-modal tasks, and efficient inference. For example, the vision-language model Hunyuan-Vision-1.5 supports “thinking-on-image”, enabling deep multimodal understanding and reasoning on images, video frames, diagrams, or spatial data. -
5
Hunyuan-Vision-1.5
Tencent
HunyuanVision is a cutting-edge vision-language model developed by Tencent’s Hunyuan team. It uses a mamba-transformer hybrid architecture to deliver strong performance and efficient inference in multimodal reasoning tasks. The version Hunyuan-Vision-1.5 is designed for “thinking on images,” meaning it not only understands vision+language content, but can perform deeper reasoning that involves manipulating or reflecting on image inputs, such as cropping, zooming, pointing, box drawing, or drawing on the image to acquire additional knowledge. It supports a variety of vision tasks (image + video recognition, OCR, diagram understanding), visual reasoning, and even 3D spatial comprehension, all in a unified multilingual framework. The model is built to work seamlessly across languages and tasks and is intended to be open sourced (including checkpoints, technical report, inference support) to encourage the community to experiment and adopt.Starting Price: Free -
6
HunyuanCustom
Tencent
HunyuanCustom is a multi-modal customized video generation framework that emphasizes subject consistency while supporting image, audio, video, and text conditions. Built upon HunyuanVideo, it introduces a text-image fusion module based on LLaVA for enhanced multi-modal understanding, along with an image ID enhancement module that leverages temporal concatenation to reinforce identity features across frames. To enable audio- and video-conditioned generation, it further proposes modality-specific condition injection mechanisms, an AudioNet module that achieves hierarchical alignment via spatial cross-attention, and a video-driven injection module that integrates latent-compressed conditional video through a patchify-based feature-alignment network. Extensive experiments on single- and multi-subject scenarios demonstrate that HunyuanCustom significantly outperforms state-of-the-art open and closed source methods in terms of ID consistency, realism, and text-video alignment. -
7
Text2Mesh
Text2Mesh
Text2Mesh produces color and geometric details over a variety of source meshes, driven by a target text prompt. Our stylization results coherently blend unique and ostensibly unrelated combinations of text, capturing both global semantics and part-aware attributes. Our framework, Text2Mesh, stylizes a 3D mesh by predicting color and local geometric details which conform to a target text prompt. We consider a disentangled representation of a 3D object using a fixed mesh input (content) coupled with a learned neural network, which we term neural style field network. In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP. Text2Mesh requires neither a pre-trained generative model nor a specialized 3D mesh dataset. It can handle low-quality meshes (non-manifold, boundaries, etc.) with arbitrary genus, and does not require UV parameterization. -
8
Hunyuan3D 2.0
Tencent
Tencent Hunyuan 3D is an AI-powered platform developed by Tencent that specializes in generating 3D content. Leveraging advanced artificial intelligence technology, the platform allows users to create realistic and dynamic 3D models and animations efficiently. It is designed for industries such as gaming, virtual reality, and digital media, offering a streamlined solution for high-quality 3D asset creation. -
9
SAM 3D
Meta
SAM 3D is a pair of advanced foundation models designed to convert a single standard RGB image into a high-fidelity 3D reconstruction of either objects or human bodies. It comprises SAM 3D Objects, which recovers full 3D geometry, texture, and layout of objects within real-world scenes, handling clutter, occlusions, and diverse lighting, and SAM 3D Body, which produces animatable human mesh models with detailed pose and shape, built on the “Meta Momentum Human Rig” (MHR) format. It is engineered to generalize across in-the-wild images without further training or finetuning: you upload an image, prompt the model by selecting the object or person, and it outputs a downloadable asset ready for use in 3D applications. SAM 3D emphasizes open vocabulary reconstruction (any object category), multi-view consistency, occlusion reasoning, and a massive new dataset of over one million annotated real-world images, enabling its robustness.Starting Price: Free -
10
AudioLM
Google
AudioLM is a pure audio language model that generates high‑fidelity, long‑term coherent speech and piano music by learning from raw audio alone, without requiring any text transcripts or symbolic representations. It represents audio hierarchically using two types of discrete tokens, semantic tokens extracted from a self‑supervised model to capture phonetic or melodic structure and global context, and acoustic tokens from a neural codec to preserve speaker characteristics and fine waveform details, and chains three Transformer stages to predict first semantic tokens for high‑level structure, then coarse and finally fine acoustic tokens for detailed synthesis. The resulting pipeline allows AudioLM to condition on a few seconds of input audio and produce seamless continuations that retain voice identity, prosody, and recording conditions in speech or melody, harmony, and rhythm in music. Human evaluations show that synthetic continuations are nearly indistinguishable from real recordings. -
11
Hunyuan Motion 1.0
Tencent Hunyuan
Hunyuan Motion (also known as HY-Motion 1.0) is a state-of-the-art text-to-3D motion generation AI model that uses a billion-parameter Diffusion Transformer with flow matching to turn natural language prompts into high-quality, skeleton-based 3D character animation in seconds. It understands descriptive text in English and Chinese and produces smooth, physically plausible motion sequences that integrate seamlessly into standard 3D animation pipelines by exporting to skeleton formats such as SMPL or SMPLH and common formats like FBX or BVH for use in Blender, Unity, Unreal Engine, Maya, and other tools. The model’s three-stage training pipeline (large-scale pre-training on thousands of hours of motion data, fine-tuning on curated sequences, and reinforcement learning from human feedback) enhances its ability to follow complex instructions and generate realistic, temporally coherent motion. -
12
Tencent Yuanbao
Tencent
Tencent Yuanbao is an AI-powered assistant that has quickly become popular in China, leveraging advanced large language models, including Tencent's proprietary Hunyuan model, and integrating with DeepSeek. The application excels in areas like Chinese language processing, logical reasoning, and efficient task execution. Yuanbao's popularity has surged in recent months, even surpassing competitors such as DeepSeek to top the Apple App Store download charts in China. A key driver of its growth is its deep integration into the Tencent ecosystem, particularly within WeChat, further enhancing its accessibility and functionality. This rapid rise highlights Tencent's growing ambition in the competitive AI assistant market. -
13
HunyuanVideo-Avatar
Tencent-Hunyuan
HunyuanVideo‑Avatar supports animating any input avatar images to high‑dynamic, emotion‑controllable videos using simple audio conditions. It is a multimodal diffusion transformer (MM‑DiT)‑based model capable of generating dynamic, emotion‑controllable, multi‑character dialogue videos. It accepts multi‑style avatar inputs, photorealistic, cartoon, 3D‑rendered, anthropomorphic, at arbitrary scales from portrait to full body. Provides a character image injection module that ensures strong character consistency while enabling dynamic motion; an Audio Emotion Module (AEM) that extracts emotional cues from a reference image to enable fine‑grained emotion control over generated video; and a Face‑Aware Audio Adapter (FAA) that isolates audio influence to specific face regions via latent‑level masking, supporting independent audio‑driven animation in multi‑character scenarios.Starting Price: Free -
14
Ferret
Apple
An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response. Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM. GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset. Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.Starting Price: Free -
15
Imagen 3
Google
Imagen 3 is the next evolution of Google's cutting-edge text-to-image AI generation technology. Building on the strengths of its predecessors, Imagen 3 offers significant advancements in image fidelity, resolution, and semantic alignment with user prompts. By employing enhanced diffusion models and more sophisticated natural language understanding, it can produce hyper-realistic, high-resolution images with intricate textures, vivid colors, and precise object interactions. Imagen 3 also introduces better handling of complex prompts, including abstract concepts and multi-object scenes, while reducing artifacts and improving coherence. With its powerful capabilities, Imagen 3 is poised to revolutionize creative industries, from advertising and design to gaming and entertainment, by providing artists, developers, and creators with an intuitive tool for visual storytelling and ideation. -
16
Cohere
Cohere
Cohere is an enterprise AI platform that enables developers and businesses to build powerful language-based applications. Specializing in large language models (LLMs), Cohere provides solutions for text generation, summarization, and semantic search. Their model offerings include the Command family for high-performance language tasks and Aya Expanse for multilingual applications across 23 languages. Focused on security and customization, Cohere allows flexible deployment across major cloud providers, private cloud environments, or on-premises setups to meet diverse enterprise needs. The company collaborates with industry leaders like Oracle and Salesforce to integrate generative AI into business applications, improving automation and customer engagement. Additionally, Cohere For AI, their research lab, advances machine learning through open-source projects and a global research community.Starting Price: Free -
17
Marengo
TwelveLabs
Marengo is a multimodal video foundation model that transforms video, audio, image, and text inputs into unified embeddings, enabling powerful “any-to-any” search, retrieval, classification, and analysis across vast video and multimedia libraries. It integrates visual frames (with spatial and temporal dynamics), audio (speech, ambient sound, music), and textual content (subtitles, overlays, metadata) to create a rich, multidimensional representation of each media item. With this embedding architecture, Marengo supports robust tasks such as search (text-to-video, image-to-video, video-to-audio, etc.), semantic content discovery, anomaly detection, hybrid search, clustering, and similarity-based recommendation. The latest versions introduce multi-vector embeddings, separating representations for appearance, motion, and audio/text features, which significantly improve precision and context awareness, especially for complex or long-form content.Starting Price: $0.042 per minute -
18
Niantic Spatial
Niantic Spatial
Niantic Spatial is an advanced geospatial AI platform that bridges the physical and digital worlds through real-time spatial intelligence. Its core technologies—Reconstruct, Localize, and Understand—enable the creation of digital twins, precise centimeter-level positioning, and semantic understanding of the world at every 3D point. The platform captures and processes high-quality aerial and ground data to deliver accurate, context-aware insights for people and machines. Designed for enterprises and developers, Niantic Spatial powers applications in logistics, collaboration, and immersive experiences. From autonomous navigation to AR-based interactions, it transforms how humans and systems perceive and engage with their surroundings. Built on cutting-edge AI and large-scale mapping, Niantic Spatial makes the real world machine-readable. -
19
Happy Oyster
Alibaba
Happy Oyster is an open-ended AI “world model” platform designed for real-time world creation and interaction, enabling users to generate, explore, and continuously evolve immersive 3D environments from simple prompts. Instead of producing a fixed output, it operates as a living system that responds dynamically to user input, allowing scenes to update in real time as instructions are given through text, voice, or images. It supports multimodal interaction and maintains consistent physical logic, including lighting, gravity, motion, and scene continuity, so that generated environments behave like coherent, persistent worlds rather than isolated clips. It introduces two core modes: Directing, where users actively control scenes, adjust camera angles, guide characters, and shape narratives as they unfold; and Wandering, where users can freely explore an infinitely extendable world in a first-person perspective, moving beyond initial frames.Starting Price: Free -
20
word2vec
Google
Word2Vec is a neural network-based technique for learning word embeddings, developed by researchers at Google. It transforms words into continuous vector representations in a multi-dimensional space, capturing semantic relationships based on context. Word2Vec uses two main architectures: Skip-gram, which predicts surrounding words given a target word, and Continuous Bag-of-Words (CBOW), which predicts a target word based on surrounding words. By training on large text corpora, Word2Vec generates word embeddings where similar words are positioned closely, enabling tasks like semantic similarity, analogy solving, and text clustering. The model was influential in advancing NLP by introducing efficient training techniques such as hierarchical softmax and negative sampling. Though newer embedding models like BERT and Transformer-based methods have surpassed it in complexity and performance, Word2Vec remains a foundational method in natural language processing and machine learning research.Starting Price: Free -
21
ReCap Pro
Autodesk
Reality capture software connecting the physical world to the digital. Use ReCap™ Pro 3D scanning software to create 3D models from imported photographs and laser scans. Deliver a point cloud or mesh in support of BIM processes. Collaborate across teams with design based on reality. ReCap Photo, a service included with ReCap Pro, processes drone photography to create 3D representations of current site conditions, objects, and more. It also supports the creation of point clouds, meshes, and ortho photos. Use solutions created with the ReCap Pro Software Development Kit (SDK) to quickly get reality data into Autodesk design and construction tools. Compare the scan view (RealView) and overhead map view side-by-side. Use the compass widget to set the XY axis for the user coordinate system in the overhead view. Use high-precision GPS technology to avoid costly prep work in setting ground control points and get survey-grade accuracy from photo reconstruction.Starting Price: $26 per month -
22
Seedream 4.0
ByteDance
Seedream 4.0 is a next-generation multimodal AI image generation and editing model that unifies text-to-image creation and text-guided image editing within a single architecture, delivering professional-grade visuals up to 4K resolution with exceptional fidelity and speed. It’s built around an efficient diffusion transformer and variational autoencoder design that lets it interpret text prompts and reference images to produce highly detailed, consistent outputs while handling complex semantics, lighting, and structure reliably, and it offers batch generation, multi-reference support, and precise control over edits such as style, background, or object changes without degrading the rest of the scene. Seedream 4.0 demonstrates industry-leading prompt understanding, aesthetic quality, and structural stability across generation and editing tasks, outperforming earlier versions and rival models in benchmarks for prompt adherence and visual coherence. -
23
RDFox
Oxford Semantic Technologies
The world's most performant knowledge graph and semantic reasoning engine. Founded by three professors at the University of Oxford, Oxford Semantic Technologies emerged as a result of extensive research into Knowledge Representation and Reasoning (KRR), out of which came the most powerful knowledge graph and semantic reasoning engine on the market today, RDFox. As an AI reasoning engine, RDFox mirrors human reasoning principles. With unrivaled reasoning capabilities, relying on accuracy, truth, and explainability, it empowers the next generation of AI applications. By inferring new knowledge exclusively from factual data, RDFox ensures results are firmly grounded in reality. RDFox’s incremental reasoning capabilities cause the consequences of the rules-based AI to be applied to the database in real-time as data is added, changed, or removed, all without needing a restart. Only the relevant information is updated without needing to reanalyze the entire data set.Starting Price: Free -
24
Lapentor
Lapentor
Lapentor.com is a revolutionary platform redefining immersive panoramic experiences. With a user-friendly interface, it empowers users to create captivating 360-degree content effortlessly. Customize hotspots, integrate multimedia elements, and seamlessly embed panoramas into websites or share across social media. Lapentor.com fosters a vibrant community of panoramic enthusiasts, offering support and inspiration. Versatile and engaging, it caters to photographers, real estate agents, and educators alike. Experience the future of panoramic storytelling with Lapentor.com. Lapentor.com is a game-changer in the world of panoramic content creation, offering a versatile platform that empowers users to create captivating experiences with ease. Whether you're a photographer, real estate agent, or educator, Lapentor.com provides all the tools and support you need to bring your panoramic visions to life.Starting Price: $25 per month -
25
GloVe
Stanford NLP
GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm developed by the Stanford NLP Group to obtain vector representations for words. It constructs word embeddings by analyzing global word-word co-occurrence statistics from a given corpus, resulting in vector spaces where the geometric relationships reflect semantic similarities and differences among words. A notable feature of GloVe is its ability to capture linear substructures within the word vector space, enabling vector arithmetic to express relationships. The model is trained on the non-zero entries of a global word-word co-occurrence matrix, which records how frequently pairs of words appear together in a corpus. This approach efficiently leverages statistical information by focusing on significant co-occurrences, leading to meaningful word representations. Pre-trained word vectors are available for various corpora, including Wikipedia 2014.Starting Price: Free -
26
ProxyMesh
ProxyMesh
ProxyMesh helps web scrapers avoid IP bans and rate limits to crawl data quickly and easily at an affordable price. Since 2011, ProxyMesh has been providing elite anonymous rotating IP address proxy servers to thousands of customers. We strive to provide the highest quality affordable proxies designed specifically for web scraping. ProxyMesh works with the HTTP proxy protocol, so your software can already work with us. You don't need to download anything. Our proxies maintain over 99% uptime while handling many 100s of terabytes of data every month. ProxyMesh proxies provide elite level 1 anonymity, where all identifying headers are removed, so that your requests cannot be traced back to you. And each request you make with our rotating IP proxy servers goes through a randomly chosen outgoing IP addresses, further enhancing your anonymity. Each of our rotating proxy server locations around the world has 10 outgoing IP addresses that get rotated every 12 hours.Starting Price: $10/month -
27
MetaMate
MetaMate
MetaMate is an open source semantic service bus that provides a unified API for accessing diverse data sources, including APIs, blockchains, websites, and peer-to-peer networks. By mapping vendor-specific data representations onto an abstract schema graph, MetaMate enables seamless interaction with various services. Its community-driven approach allows contributors to add new types and fields, ensuring the system evolves with real-world data. The platform's type system is derived from widely adopted data transmission technologies such as GraphQL, gRPC, Thrift, and OpenAPI, facilitating compatibility across different protocols. MetaMate enforces backward compatibility programmatically, ensuring that services and applications built on it remain functional over time. Additionally, its command-line interface can generate slim, typed SDKs tailored to specific project needs, covering only the desired subset of the overall schema graph.Starting Price: Free -
28
WaveSpeedAI
WaveSpeedAI
WaveSpeedAI is a high-performance generative media platform built to dramatically accelerate image, video, and audio creation by combining cutting-edge multimodal models with an ultra-fast inference engine. It supports a wide array of creative workflows, from text-to-video and image-to-video to text-to-image, voice generation, and 3D asset creation, through a unified API designed for scale and speed. The platform integrates top-tier foundation models such as WAN 2.1/2.2, Seedream, FLUX, and HunyuanVideo, and provides streamlined access to a vast model library. Users benefit from blazing-fast generation times, real-time throughput, and enterprise-grade reliability while retaining high-quality output. WaveSpeedAI emphasises “fast, vast, efficient” performance; fast generation of creative assets, access to a wide-ranging set of state-of-the-art models, and cost-efficient execution without sacrificing quality. -
29
ContextCapture
Bentley Systems
Create 3D models from simple photographs and/or point clouds. Reality modeling is the process of capturing the physical reality of an infrastructure asset, creating a representation of it, and maintaining it through continuous surveys. Bentley's reality modeling software, ContextCapture, provides you with real-world digital context in the form of a 3D reality mesh. A reality mesh is a 3D model of real-world conditions that contains large amounts of triangles and image data. Each digital component can be automatically recognized and/or geospatially referenced, providing you with an intuitive and immersive way to navigate, find, view, and query your asset information. You can use reality meshes in many engineering, maintenance, or GIS workflows to provide precise real-world digital context for design, construction, and operations decisions. Overlapping photos from drones and ground-level imagery, supplemented by laser scans where needed. -
30
Synexa
Synexa
Synexa AI enables users to deploy AI models with a single line of code, offering a simple, fast, and stable solution. It supports various functionalities, including image and video generation, image restoration, image captioning, model fine-tuning, and speech generation. Synexa provides access to over 100 production-ready AI models, such as FLUX Pro, Ideogram v2, and Hunyuan Video, with new models added weekly and zero setup required. Synexa's optimized inference engine delivers up to 4x faster performance on diffusion models, achieving sub-second generation times with FLUX and other popular models. Developers can integrate AI capabilities in minutes using intuitive SDKs and comprehensive API documentation, with support for Python, JavaScript, and REST API. Synexa offers enterprise-grade GPU infrastructure with A100s and H100s across three continents, ensuring sub-100ms latency with smart routing and a 99.9% uptime guarantee.Starting Price: $0.0125 per image -
31
SeedEdit 3.0
ByteDance
SeedEdit is a generative AI image editing model from ByteDance’s Seed team that enables text-guided, high-quality image modification by applying natural language instructions to change specific parts of an image while maintaining consistency in the rest of the scene. Built on advanced diffusion and multimodal learning techniques, later versions like SeedEdit 3.0 improve on earlier releases with enhanced fidelity, accurate instruction following, and the ability to edit at high resolution (including up to 4K outputs) while preserving original subjects, backgrounds, and fine visual details. It supports common edit tasks such as portrait retouching, background replacement, object removal, lighting and perspective changes, and stylistic transformations without manual masking or tools, and achieves higher usability and visual quality than previous models by balancing between reconstruction and regeneration of images. -
32
Seaweed
ByteDance
Seaweed is a foundational AI model for video generation developed by ByteDance. It utilizes a diffusion transformer architecture with approximately 7 billion parameters, trained on a compute equivalent to 1,000 H100 GPUs. Seaweed learns world representations from vast multi-modal data, including video, image, and text, enabling it to create videos of various resolutions, aspect ratios, and durations from text descriptions. It excels at generating lifelike human characters exhibiting diverse actions, gestures, and emotions, as well as a wide variety of landscapes with intricate detail and dynamic composition. Seaweed offers enhanced controls, allowing users to generate videos from images by providing an initial frame to guide consistent motion and style throughout the video. It can also condition on both the first and last frames to create transition videos, and be fine-tuned to generate videos based on reference images. -
33
Composer 1
Cursor
Composer is Cursor’s custom-built agentic AI model optimized specifically for software engineering tasks and designed to power fast, interactive coding assistance directly within the Cursor IDE, a VS Code-derived editor enhanced with intelligent automation. It is a mixture-of-experts model trained with reinforcement learning (RL) on real-world coding problems across large codebases, so it can produce high-speed, context-aware responses, from code edits and planning to answers that understand project structure, tools, and conventions, with generation speeds roughly four times faster than similar models in benchmarks. Composer is specialized for development workflows, leveraging long-context understanding, semantic search, and limited tool access (like file editing and terminal commands) so it can solve complex engineering requests with efficient and practical outputs.Starting Price: $20 per month -
34
PanoramaStudio
Tobias Hüllmandel Software
PanoramaStudio. Creation of seamless 360-degree and wide-angle panoramic images. This program combines the simple creation of perfect panoramic images within a few steps with ambitious postprocessing features for advanced users. Clear and simple user interface, large work space. Automatic alignment of the images. Seamless blending into a panoramic image. Manual postprocessing of all steps possible. Automatic focal length detection, automatic correction of lens distortions. Automatic exposure correction. Interactive panoramas can be connected to virtual tours using hotspots. Filters for additional image editing. Export your panoramas in various image formats, as screensavers and as interactive 3D panoramas or zoom images for websites. Print panoramas in poster size on multiple pages. Save panoramas as multi-layered image for professional post-processing.Starting Price: $39.95 one-time payment -
35
Veo 3.1
Google
Veo 3.1 builds on the capabilities of the previous model to enable longer and more versatile AI-generated videos. With this version, users can create multi-shot clips guided by multiple prompts, generate sequences from three reference images, and use frames in video workflows that transition between a start and end image, both with native, synchronized audio. The scene extension feature allows extension of a final second of a clip by up to a full minute of newly generated visuals and sound. Veo 3.1 supports editing of lighting and shadow parameters to improve realism and scene consistency, and offers advanced object removal that reconstructs backgrounds to remove unwanted items from generated footage. These enhancements make Veo 3.1 sharper in prompt-adherence, more cinematic in presentation, and broader in scale compared to shorter-clip models. Developers can access Veo 3.1 via the Gemini API or through the tool Flow, targeting professional video workflows. -
36
GLM-Image
Z.ai
GLM-Image is a next-generation, open source image generation model developed by Z.ai, designed to combine deep language understanding with high-fidelity visual synthesis. Unlike traditional diffusion-only models, it uses a hybrid architecture that integrates an autoregressive language model with a diffusion decoder, enabling it to first reason about the structure, meaning, and relationships within a prompt before generating the image itself. This approach allows GLM-Image to excel in scenarios that require precise semantic control, such as generating infographics, presentation slides, posters, and diagrams with accurate embedded text and complex layouts. With a total of around 16 billion parameters, the model achieves strong performance in rendering readable, correctly placed text within images, an area where many image models struggle, while maintaining detailed visual quality and consistency. -
37
Genie 3
Google DeepMind
Genie 3 is DeepMind’s next-generation, general-purpose world model capable of generating richly interactive 3D environments in real time at 24 frames per second and 720p resolution that remain consistent for several minutes. Prompted by text input, the system constructs dynamic virtual worlds where users (or embodied agents) can navigate and interact with natural phenomena from multiple perspectives, like first-person or isometric. A standout feature is its emergent long-horizon visual memory: Genie 3 maintains environmental consistency over extended durations, preserving off-screen elements and spatial coherence across revisits. It also supports “promptable world events,” enabling users to modify scenes, such as changing weather or introducing new objects, on the fly. Designed to support embodied agent research, Genie 3 seamlessly integrates with agents like SIMA, facilitating goal-based navigation and complex task accomplishment. -
38
Imagen 2
Google
Imagen 2 is a state-of-the-art AI-powered text-to-image generation model developed by Google Research. It leverages advanced diffusion models and large-scale language understanding to produce highly detailed, photorealistic images from natural language prompts. Imagen 2 builds on its predecessor, Imagen, with improved resolution, finer texture details, and enhanced semantic coherence, allowing for more accurate visual representations of complex and abstract concepts. Its unique blend of vision and language models enables it to handle a wide range of artistic, conceptual, and realistic image styles. This breakthrough technology has broad applications in fields like content creation, design, and entertainment, pushing the boundaries of creative AI. -
39
E5 Text Embeddings
Microsoft
E5 Text Embeddings, developed by Microsoft, are advanced models designed to convert textual data into meaningful vector representations, enhancing tasks like semantic search and information retrieval. These models are trained using weakly-supervised contrastive learning on a vast dataset of over one billion text pairs, enabling them to capture intricate semantic relationships across multiple languages. The E5 family includes models of varying sizes—small, base, and large—offering a balance between computational efficiency and embedding quality. Additionally, multilingual versions of these models have been fine-tuned to support diverse languages, ensuring broad applicability in global contexts. Comprehensive evaluations demonstrate that E5 models achieve performance on par with state-of-the-art, English-only models of similar sizes.Starting Price: Free -
40
Toil-free traffic management for your service mesh. Service mesh is a powerful abstraction that's become increasingly popular to deliver microservices and modern applications. In a service mesh, the service mesh data plane, with service proxies like Envoy, moves the traffic around and the service mesh control plane provides policy, configuration, and intelligence to these service proxies. Traffic Director is GCP's fully managed traffic control plane for service mesh. With Traffic Director, you can easily deploy global load balancing across clusters and VM instances in multiple regions, offload health checking from service proxies, and configure sophisticated traffic control policies. Traffic Director uses open xDSv2 APIs to communicate with the service proxies in the data plane, which ensures that you are not locked into a proprietary interface.
-
41
Gemini Embedding 2
Google
Gemini Embedding models, including the newer Gemini Embedding 2, are part of Google’s Gemini AI ecosystem and are designed to convert text, phrases, sentences, and code into numerical vector representations that capture their semantic meaning. Unlike generative models that produce new content, the embedding model transforms input data into dense vectors that represent meaning in a mathematical format, allowing computers to compare and analyze information based on conceptual similarity rather than exact wording. These embeddings enable applications such as semantic search, recommendation systems, document retrieval, clustering, classification, and retrieval-augmented generation pipelines. The model can process input in more than 100 languages and supports up to 2048 tokens per request, allowing it to embed longer pieces of text or code while maintaining strong contextual understanding.Starting Price: Free -
42
Gensim
Radim Řehůřek
Gensim is a free, open source Python library designed for unsupervised topic modeling and natural language processing, focusing on large-scale semantic modeling. It enables the training of models like Word2Vec, FastText, Latent Semantic Analysis (LSA), and Latent Dirichlet Allocation (LDA), facilitating the representation of documents as semantic vectors and the discovery of semantically related documents. Gensim is optimized for performance with highly efficient implementations in Python and Cython, allowing it to process arbitrarily large corpora using data streaming and incremental algorithms without loading the entire dataset into RAM. It is platform-independent, running on Linux, Windows, and macOS, and is licensed under the GNU LGPL, promoting both personal and commercial use. The library is widely adopted, with thousands of companies utilizing it daily, over 2,600 academic citations, and more than 1 million downloads per week.Starting Price: Free -
43
ActiViz
Kitware
ActiViz is a 3D visualization library for .NET C# and Unity, enabling developers to integrate advanced 3D visualization into their applications seamlessly. Built around the open source Visualization Toolkit (VTK), ActiViz supports a wide array of visualization algorithms, including scalar, vector, tensor, texture, and volumetric methods. It also offers advanced modeling techniques such as implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation. ActiViz allows for the rapid development of production-ready, interactive 3D applications within the .NET environment and provides support for Windows Presentation Foundation (WPF). Integration with Unity software is also possible, expanding its applicability to game development and interactive simulations. Recent improvements in ActiViz 9.4 include support for multiple .NET versions from .NET Framework 4.0 to .NET 8, curved planar reformation to create panoramic views.Starting Price: Free -
44
Qwen-Image
Alibaba
Qwen-Image is a multimodal diffusion transformer (MMDiT) foundation model offering state-of-the-art image generation, text rendering, editing, and understanding. It excels at complex text integration, seamlessly embedding alphabetic and logographic scripts into visuals with typographic fidelity, and supports diverse artistic styles from photorealism to impressionism, anime, and minimalist design. Beyond creation, it enables advanced image editing operations such as style transfer, object insertion or removal, detail enhancement, in-image text editing, and human pose manipulation through intuitive prompts. Its built-in vision understanding tasks, including object detection, semantic segmentation, depth and edge estimation, novel view synthesis, and super-resolution, extend its capabilities into intelligent visual comprehension. Qwen-Image is accessible via popular libraries like Hugging Face Diffusers and integrates prompt-enhancement tools for multilingual support.Starting Price: Free -
45
Codd AI
Codd AI
Codd AI solves one of the biggest problems in analytics: making data truly business-ready. Instead of teams spending weeks manually mapping schemas, building models, and defining metrics, Codd uses generative AI to automatically create a context-aware semantic layer that aligns technical data with your business language. That means business users can ask questions in plain English and get accurate, governed answers instantly—through BI tools, conversational AI, or any endpoint. With governance and auditability built in, Codd makes analytics faster, clearer, and more trustworthy. Codd AI ingests both technical metadata from your database, as well as business rules and logic to use AI to auto-generate the most comprehensive semantic layer. This semantic layer is embedded in an intelligent query agent to power natural language (NLP) conversational analytics or power traditional BI toolsStarting Price: $25k per year -
46
Magic3D
Magic3D
Together with image conditioning techniques as well as prompt-based editing approach, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications. Magic3D can create high-quality 3D textured mesh models from input text prompts. It utilizes a coarse-to-fine strategy leveraging both low- and high-resolution diffusion priors for learning the 3D representation of the target content. Magic3D synthesizes 3D content with 8× higher-resolution supervision than DreamFusion while also being 2× faster. Given a coarse model generated with a base text prompt, we can modify parts of the text in the prompt, and then fine-tune the NeRF and 3D mesh models to obtain an edited high-resolution 3D mesh. -
47
Mirage 2
Dynamics Lab
Mirage 2 is an AI-driven Generative World Engine that lets anyone instantly transform images or descriptions into fully playable, interactive game environments directly in the browser. Upload sketches, concept art, photos, or prompts, like “Ghibli-style village” or “Paris street scene”, and Mirage 2 builds immersive worlds you can explore in real time. The experience isn’t pre-scripted: you can modify your world mid-play using natural-language chat, evolving settings dynamically, from a cyberpunk city to a rainforest or a mountaintop castle, all with minimal latency (around 200 ms) on a single consumer GPU. Mirage 2 supports smooth rendering, real-time prompt control, and extended gameplay stretches beyond ten minutes. It outpaces earlier world-model systems by offering true general-domain generation, no upper limit on styles or genres, as well as seamless world adaptation and sharing features. -
48
Amazon Titan
Amazon
Amazon Titan is a series of advanced foundation models (FMs) from AWS, designed to enhance generative AI applications with high performance and flexibility. Built on AWS's 25 years of AI and machine learning experience, Titan models support a range of use cases such as text generation, summarization, semantic search, and image generation. Titan models are optimized for responsible AI use, incorporating built-in safety features and fine-tuning capabilities. They can be customized with your own data through Retrieval Augmented Generation (RAG) to improve accuracy and relevance, making them ideal for both general-purpose and specialized AI tasks. -
49
Animant
Animant
Introducing a tool that blends your imagination and the world around you to create engaging experiences. Animant was designed with AR at the center, so you can visualize interactive 3D experiences within your real world and bring your real world into a virtual one. Create a detailed 3D scan of any object with your camera. Import them into your scene, or export them for other apps. From external lighting to physics support, your scenes can feel like a natural extension of your world. Captions let you add words to the bottom or over your scene with markdown formatting. Animant can even read aloud your captions as part of your storyline. Create a texture from a photo and apply it to an object or, take panoramic photos of your world and set them as your scene's environment.Starting Price: $5.99 per month -
50
Panoramic
Panoramic
Panoramic is an enterprise SaaS company that provides the world’s most successful brands with the tools they need to ingest and model marketing data into meaningful insights. Its team of data scientists and marketing analysts works with marketers to build a customized internal data platform used across the organization for data analysis, benchmarking, internal collaboration, and more. Its platform shines a spotlight on the insights agile marketers care about most giving them the confidence to make strategic decisions in support of key objectives. Founded in 2018, Panoramic is headquartered in Los Angeles with offices in New York, San Francisco, Washington D.C., Montreal, London, Bratislava, Prague, Santiago, and Manila. Learn more at panoramicHQ.com.