Showing 15 open source projects for "text analysis linguistic"

View related business solutions
  • anny is an all-in-one platform for managing hybrid workplaces and shared resources. Icon
    anny is an all-in-one platform for managing hybrid workplaces and shared resources.

    For Businesses looking for a flexible solution for internal and external bookings

    Enable your employees to easily book desks, meeting rooms, parking spots, equipment, and more – all in one place. With flexible rules and group permissions, you stay in full control of who can access what.
    Learn More
  • Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes. Icon
    Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.

    Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

    Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.
    Learn More
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    ...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    InternVL

    InternVL

    A Pioneering Open-Source Alternative to GPT-4o

    InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Managed File Transfer Software Icon
    Managed File Transfer Software

    Products to help you get data where it needs to go—securely and efficiently.

    For too many businesses, complex file transfer needs make it difficult to create, manage and support data flows to and from internal and external systems. Progress® MOVEit® empowers enterprises to take control of their file transfer workflows with solutions that help secure, simplify and centralize data exchanges throughout the organization.
    Learn More
  • 5
    WeClone

    WeClone

    One-stop solution for creating your digital avatar from chat history

    WeClone is an open source AI project designed to replicate a person’s conversational style and personality by training models on chat history data. The system analyzes message patterns, linguistic style, and contextual behavior in order to generate responses that resemble the original user’s communication style. It is intended primarily as an experimental exploration of digital personality modeling and conversational AI personalization. By processing large volumes of conversation data,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    TAME LLM

    TAME LLM

    Traditional Mandarin LLMs for Taiwan

    TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    ...The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    ...The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Cloudbrink Personal SASE service Icon
    Cloudbrink Personal SASE service

    For companies looking for low maintenance, secure, high performance connectivity for hybrid and remote workers

    Cloudbrink’s Personal SASE is a high-performance connectivity and security service that delivers a lightning-fast, in-office experience to the modern hybrid workforce anywhere. Combining high-performance ZTNA with Automated Moving Target Defense (AMTD), and Personal SD-WAN all connections are ultra-secure.
    Learn More
  • 10
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Scikit-LLM

    Scikit-LLM

    Seamlessly integrate LLMs into scikit-learn

    Seamlessly integrate powerful language models like ChatGPT into sci-kit-learn for enhanced text analysis tasks. At the moment the majority of the Scikit-LLM estimators are only compatible with some of the OpenAI models. Hence, a user-provided OpenAI API key is required. Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training set). ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    LongBench

    LongBench

    LongBench v2 and LongBench (ACL 25'&24')

    ...Traditional language model benchmarks typically evaluate tasks involving relatively short inputs, which does not reflect many real-world applications such as analyzing large documents or entire code repositories. LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    ...The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. The core concept of the framework is the use of semantic operators, which extend traditional relational database operations to support reasoning over text and other unstructured data. These operators allow tasks such as semantic filtering, ranking, clustering, and summarization to be expressed directly within data processing pipelines. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Ailice

    Ailice

    AIlice is a fully autonomous, general-purpose AI agent

    AIlice is an open-source autonomous AI agent framework built to function as a general-purpose assistant that can plan, decompose, and execute complex tasks through a structured multi-agent architecture. The project presents itself as a standalone assistant powered by open-source language models, with an internal design that treats user requests almost like executable programs rather than simple chat prompts. Its core IACT architecture allows the system to break large goals into smaller...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    YAYI

    YAYI

    Repo for YaYi Chinese LLMs based on LlaMA2 & BLOOM

    YAYI is an open-source large language model project developed to provide a multilingual conversational AI system capable of performing a wide variety of natural language processing tasks. The model is trained on diverse datasets covering multiple languages and domains so that it can support applications ranging from dialogue systems to text analysis and knowledge retrieval. The architecture is based on transformer-style language models optimized for conversational understanding and generation. In addition to producing coherent responses, the system is designed to handle tasks such as summarization, translation, question answering, and text classification. The repository provides model checkpoints, training resources, and inference tools that allow developers to deploy the model in their own applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB