Showing 9 open source projects for "tamil speech recognition"

View related business solutions
  • Empowering Companies To Excel In Safety Data Sheet Compliance Icon
    Empowering Companies To Excel In Safety Data Sheet Compliance

    For any organization using chemicals that require Safety Data Sheets

    Effortless setup and maintenance: Simplified management and seamless online access to safety data sheets for your team
    Learn More
  • Bitdefender Ultimate Small Business Security Icon
    Bitdefender Ultimate Small Business Security

    Protect the big future of your small business

    Get exceptional protection against all digital threats for your business and employees.
    Learn More
  • 1
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Xorbits Inference

    Xorbits Inference

    Replace OpenAI GPT with another LLM in your app

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • We help you deliver Virtual and Hybrid Events using our Award Winning end-to-end Event Management Platform Icon
    We help you deliver Virtual and Hybrid Events using our Award Winning end-to-end Event Management Platform

    Designed by event planners for event planners, the EventsAIR platform gives you the ability to manage your event, conference, meeting or function with

    EventsAIR have been anticipating and responding to the ever-changing event industry needs for over 30 years, providing innovative solutions that empower event organizers to create successful events around the globe.
    Learn More
  • 5
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    ...Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    ...By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make purchasing decisions. The system integrates multiple AI technologies including retrieval-augmented generation to incorporate product knowledge, speech synthesis to convert generated scripts into voice output, and digital human generation to create virtual hosts. It also supports automatic speech recognition and agent-based tools that can retrieve additional information such as logistics or product details during live sessions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system supports dynamic audio input and output, meaning it can handle different voices, tones, and conversational contexts without forcing users into typed interactions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Discover the power of eDiscovery for law firms. Icon
    Discover the power of eDiscovery for law firms.

    Streamline your legal processes and ensure compliance with our eDiscovery company.

    DWR eDiscovery allows legal professionals to process, analyze, review, and produce documents that are relevant to litigation and other legal disclosure obligations. Our tools allow easy ingestion and analysis of client and opposing party documents using a comprehensive set of document review features including AI search, keyword search, keyword highlighting, metadata filtering, marking documents, privilege log management, redactions, and a range of analysis tools to help users best understand their document corpus.
    Learn More
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB