Showing 98 open source projects for "text to video"

View related business solutions
  • Outbound sales software Icon
    Outbound sales software

    Unified cloud-based platform for dialing, emailing, appointment scheduling, lead management and much more.

    Adversus is an outbound dialing solution that helps you streamline your call strategies, automate manual processes, and provide valuable insights to improve your outbound workflows and efficiency.
    Learn More
  • MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design. Icon
    MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design.

    Microstation enables architects, engineers, and designers to create precise 2D and 3D drawings that bring complex projects to life.

    MicroStation is the only computer-aided design software for infrastructure design, helping architects and engineers like you bring their vision to life, present their designs to their clients, and deliver their projects to the community.
    Learn More
  • 1
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files.
    Downloads: 54 This Week
    Last Update:
    See Project
  • 2
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    Video Subtitle Remover is an AI-based application designed to remove hardcoded subtitles from videos and generate new files without the embedded text. Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. ...
    Downloads: 135 This Week
    Last Update:
    See Project
  • 3
    Video Diffusion - Pytorch

    Video Diffusion - Pytorch

    Implementation of Video Diffusion Models

    ...Any new developments for text-to-video synthesis will be centralized at Imagen-pytorch. For conditioning on text, they derived text embeddings by first passing the tokenized text through BERT-large. You can also directly pass in the descriptions of the video as strings, if you plan on using BERT-base for text conditioning. This repository also contains a handy Trainer class for training on a folder of gifs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Loan management software that makes it easy. Icon
    Loan management software that makes it easy.

    Ideal for lending professionals who are looking for a feature rich loan management system

    Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.
    Learn More
  • 5
    Make-A-Video - Pytorch (wip)

    Make-A-Video - Pytorch (wip)

    Implementation of Make-A-Video, new SOTA text to video generator

    Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks".
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    ...The model is trained on significantly larger datasets than its predecessor, greatly enhancing motion complexity, semantic understanding, and aesthetic diversity. Wan2.2 also open-sources a 5-billion parameter high-compression VAE-based hybrid text-image-to-video (TI2V) model that supports 720P video generation at 24fps on consumer-grade GPUs like the RTX 4090. It supports multiple video generation tasks including text-to-video.
    Downloads: 180 This Week
    Last Update:
    See Project
  • 7
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. ...
    Downloads: 79 This Week
    Last Update:
    See Project
  • 8
    CogVideo

    CogVideo

    Text and image to video generation: CogVideoX and CogVideo

    CogVideo is an open-source family of advanced video generation models that can create videos from text, images, or existing video inputs. Built on large-scale Transformer and diffusion architectures, it enables multimodal generation across text-to-video, image-to-video, and video continuation tasks. The latest CogVideoX models offer higher resolution outputs, longer video durations, and improved controllability through prompt engineering. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 9
    Phenaki - Pytorch

    Phenaki - Pytorch

    Implementation of Phenaki Video, which uses Mask GIT

    ...This repository will also endeavor to allow the researcher to train on text-to-image and then text-to-video. Similarly, for unconditional training, the researcher should be able to first train on images and then fine tune on video.
    Downloads: 4 This Week
    Last Update:
    See Project
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 10
    AutoCut

    AutoCut

    Cut videos with a text editor

    AutoCut is an innovative tool that lets users edit and cut videos using a text-centric workflow instead of a traditional video editor. AutoCut automatically generates subtitles or transcripts for uploaded videos, and users can simply edit the text file to select the segments of the video they want to keep. This approach transforms video editing into a textual editing task, greatly lowering the barrier to editing for users who find traditional video editors complex or unintuitive. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 66 This Week
    Last Update:
    See Project
  • 12
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while...
    Downloads: 154 This Week
    Last Update:
    See Project
  • 13
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license, with a pyproject.toml and uv-based workflow that makes installation and execution reproducible. Configuration is handled through JSON files that tell MCP clients how to launch the server (typically via uvx minimax-mcp) and which environment variables to use for the API key, host, and output directory. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Open-Sora

    Open-Sora

    Open-Sora: Democratizing Efficient Video Production for All

    Open-Sora is an open-source initiative aimed at democratizing high-quality video production. It offers a user-friendly platform that simplifies the complexities of video generation, making advanced video techniques accessible to everyone. The project embraces open-source principles, fostering creativity and innovation in content creation. Open-Sora provides tools, models, and resources to create high-quality videos, aiming to lower the entry barrier for video production and support diverse...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 15
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 17
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 21
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps. After transcription, large language models are used to intelligently restructure subtitles into natural sentences, correct wording, and improve readability for viewers. ...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 22
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 24
    OpenMontage

    OpenMontage

    World's first open-source, agentic video production system

    OpenMontage is an open-source, agent-driven video production system that transforms AI coding assistants into fully automated multimedia creation pipelines. Instead of focusing on a single capability such as text-to-video generation, it treats video production as a structured, multi-stage workflow that mirrors how a real production team operates, including research, scripting, asset generation, editing, and final rendering.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 25
    yt-fts

    yt-fts

    Search all of YouTube from the command line

    yt-fts, short for YouTube Full Text Search, is an open-source command-line tool that enables users to search the spoken content of YouTube videos by indexing their subtitles. The program automatically downloads subtitles from a specified YouTube channel using the yt-dlp utility and stores them in a local SQLite database. Once indexed, users can perform full-text searches across all transcripts to quickly locate keywords or phrases mentioned within the videos.
    Downloads: 16 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB