Showing 3 open source projects for "foo_input_dvda.fb2k-component"

View related business solutions
  • Marketing automation for any business | ActiveCampaign Icon
    Marketing automation for any business | ActiveCampaign

    Your team of AI agents handles email, SMS, WhatsApp and more for you

    Active Intelligence revolutionizes how you work. You guide direction while AI handles execution, acts on insights, and shows you the path forward. It's how marketing should be.
    Learn More
  • The AI coach for teams, built on validated assessments. Icon
    The AI coach for teams, built on validated assessments.

    Cloverleaf is an assessment-backed AI Coach that fully understands your people and the context of their workday.

    Give managers and teams proactive, contextual coaching to lead effectively, communicate clearly, and navigate real work situations as they happen.
    Learn More
  • 1
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    ...“What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. The repository includes evaluation results (e.g. image/text alignment scores, common VL benchmarks), configuration files, and model weights (where permitted). While the internal architecture details are not fully documented publicly, the repo suggests that VL2 introduces enhancements over prior vision-language models (e.g. better scaling, cross-modal attention, more robust alignment) to improve grounding and multimodal understanding.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. Through its architecture, Step-Audio supports multilingual interaction, dialects, emotional tones (joy, sadness, etc.), and even more creative speech styles (like rap or singing), while allowing dynamic control over speech characteristics. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    PokeeResearch-7B

    PokeeResearch-7B

    Pokee Deep Research Model Open Source Repo

    ...The repository includes evaluation results on multi-step QA and research benchmarks, illustrating how web-time context boosts accuracy. Because the system is modular, you can swap the search component, reader, or policy to fit private deployments or different data domains. It’s aimed at developers who want a transparent, hackable research agent they can run locally or wire into existing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB