Showing 15 open source projects for "ffmpeg"

View related business solutions
  • Iris Powered By Generali - Iris puts your customer in control of their identity. Icon
    Iris Powered By Generali - Iris puts your customer in control of their identity.

    Increase customer and employee retention by offering Onwatch identity protection today.

    Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.
    Learn More
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • 1
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    ...It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. It separates client-side media handling from backend AI processing, reducing data exposure while still enabling transcription and document generation. AI-Media2Doc supports flexible customization through prompts, allowing users to tailor output styles based on their needs. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    JavaCV

    JavaCV

    Java interface to OpenCV, FFmpeg, and more

    JavaCV uses wrappers from the JavaCPP Presets of commonly used libraries by researchers in the field of computer vision (OpenCV, FFmpeg, libdc1394, FlyCapture, Spinnaker, OpenKinect, librealsense, CL PS3 Eye Driver, videoInput, ARToolKitPlus, flandmark, Leptonica, and Tesseract) and provides utility classes to make their functionality easier to use on the Java platform, including Android. JavaCV also comes with hardware accelerated full-screen image display (CanvasFrame and GLCanvasFrame), easy-to-use methods to execute code in parallel on multiple cores (Parallel), user-friendly geometric and color calibration of cameras and projectors (GeometricCalibrator, ProCamGeometricCalibrator, ProCamColorCalibrator), detection and matching of feature points (ObjectFinder), a set of classes that implement direct image alignment of projector-camera systems (mainly GNImageAligner, ProjectiveTransformer, ProjectiveColorTransformer, ProCamTransformer, and ReflectanceInitializer), and more.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    ChatTTS webUI & API

    ChatTTS webUI & API

    A simple native web interface that uses ChatTTS to synthesize text

    ChatTTS-ui is a local web interface and API wrapper around the ChatTTS speech synthesis system, designed to make advanced TTS models easy to use from a browser. It runs a small backend server (Python + Torch + ffmpeg) and exposes a simple webpage where you can type text, adjust parameters, and generate audio. The project supports Chinese, English, and mixed text with digits and control symbols, making it suitable for bilingual content and numerically heavy text like announcements or prompts. From version 0.96 onward, ffmpeg installation is required for deployment, and previous CSV/PT voice tables are no longer valid, so users instead work with updated “voice value” parameters. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Agentic AI SRE built for Engineering and DevOps teams. Icon
    Agentic AI SRE built for Engineering and DevOps teams.

    No More Time Lost to Troubleshooting

    NeuBird AI's agentic AI SRE delivers autonomous incident resolution, helping team cut MTTR up to 90% and reclaim engineering hours lost to troubleshooting.
    Learn More
  • 5
    StemRoller

    StemRoller

    Isolate vocals, drums, bass, and other instrumental stems from songs

    StemRoller is the first free app that enables you to separate vocal and instrumental stems from any song with a single click! StemRoller uses Facebook's state-of-the-art Demucs algorithm for demixing songs and integrates search results from YouTube. Simply type the name/artist of any song into the search bar and click the Split button that appears in the results! You'll need to wait several minutes for splitting to complete. Once stems have been extracted, you'll see an Open button next to...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 6
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 7
    ChatGPT Telegram Bot

    ChatGPT Telegram Bot

    A Telegram bot that integrates with OpenAI's official ChatGPT APIs

    A Telegram bot that integrates with OpenAI's official ChatGPT, DALL·E and Whisper APIs to provide answers. Ready to use with minimal configuration required.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    ...Under the hood, easyVoice uses a modern stack with Vue 3 and Element Plus on the front end, Node.js and Express on the back end, and TTS engines such as Microsoft Azure TTS and OpenAI-compatible APIs, orchestrated through ffmpeg.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 4 This Week
    Last Update:
    See Project
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 10
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 11
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DCVGAN

    DCVGAN

    DCVGAN: Depth Conditional Video Generation, ICIP 2019.

    This paper proposes a new GAN architecture for video generation with depth videos and color videos. The proposed model explicitly uses the information of depth in a video sequence as additional information for a GAN-based video generation scheme to make the model understands scene dynamics more accurately. The model uses pairs of color video and depth video for training and generates a video using the two steps. Generate the depth video to model the scene dynamics based on the geometrical...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    3D ResNets for Action Recognition

    3D ResNets for Action Recognition

    3D ResNets for Action Recognition (CVPR 2018)

    We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time. We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    JAVT - Just Another Voice Transformer

    JAVT - Just Another Voice Transformer

    Just Another Speech Recognition and Text to Speech software.

    JAVT or Just Another Voice Transformer (formerly, it is called Just Another Video Transcriber) is a Speech Recognition software that also support text to Speech and simple media conversion. JAVT allows you to convert from video files to audio wav file using ffmpeg, and then transcribe the audio file to text using either Microsoft SAPI or CMU Sphinx. You can also open a text file and allow JAVT to read it out for you through text to speech conversion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    voicecommand

    voicecommand

    Run Bash commands using Google voice recognition

    This simple pygtk application uses ffmpeg and arecord; to record sound Google's unofficial text to speech service; to convert sound to text The python subprocess module to run the text as a shell command. The text to speech service used by this application is unofficial, and this program should therefore be considered a complete hack.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB