Search Results for "text analysis linguistic" - Page 2

Showing 146 open source projects for "text analysis linguistic"

View related business solutions
  • Secure Cloud Storage for Files, Photos and Documents | pCloud Icon
    Secure Cloud Storage for Files, Photos and Documents | pCloud

    Store, access, and manage your files on your own terms, from anywhere.

    Store, sync, and share your files securely with pCloud. Get up to 10 GB of free secure cloud storage and access your files from any device, anywhere.
    Learn More
  • RentGuruz is an all-in-one vehicle rental software solution designed to streamline operations for car rental businesses worldwide. Icon
    RentGuruz is an all-in-one vehicle rental software solution designed to streamline operations for car rental businesses worldwide.

    Auto rental businesses seeking a solution to manage all their cloud business needs

    RentGuruz. The simple, intuitive, and powerful cloud application platform that manages all kinds of mobility for all kinds of rental businesses.
    Learn More
  • 1
    WeClone

    WeClone

    One-stop solution for creating your digital avatar from chat history

    WeClone is an open source AI project designed to replicate a person’s conversational style and personality by training models on chat history data. The system analyzes message patterns, linguistic style, and contextual behavior in order to generate responses that resemble the original user’s communication style. It is intended primarily as an experimental exploration of digital personality modeling and conversational AI personalization. By processing large volumes of conversation data,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    NetworkX

    NetworkX

    Network analysis in Python

    NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Data structures for graphs, digraphs, and multigraphs. Many standard graph algorithms. Network structure and analysis measures. Generators for classic graphs, random graphs, and synthetic networks. Nodes can be "anything" (e.g., text, images, XML records). Edges can hold arbitrary data (e.g., weights, time-series). Open source 3-clause BSD license. Well tested with over 90% code coverage. Additional benefits from Python include fast prototyping, easy to teach, and multi-platform. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    DeepAnalyze

    DeepAnalyze

    Autonomous LLM agent for end-to-end data science workflows

    DeepAnalyze is an open source project that introduces an agentic large language model designed to perform autonomous data science tasks from start to finish. It is built to handle the entire data science pipeline, including data preparation, analysis, modeling, visualization, and report generation without requiring continuous human guidance. DeepAnalyze is capable of conducting open-ended data research across multiple data formats such as structured tables, semi-structured files, and unstructured text, enabling flexible and comprehensive analysis workflows. It integrates execution-based reasoning by generating and running code as part of its analysis process, allowing it to iteratively refine results and produce more accurate outputs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    ...We provide high-speed and ultra-lightweight models, and also cutting-edge technology. We provide production ready streaming asr and streaming tts system. Our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure User Management, Made Simple | Frontegg Icon
    Secure User Management, Made Simple | Frontegg

    Get 7,500 MAUs, 50 tenants, and 5 SSOs free – integrated into your app with just a few lines of code.

    Frontegg powers modern businesses with a user management platform that’s fast to deploy and built to scale. Embed SSO, multi-tenancy, and a customer-facing admin portal using robust SDKs and APIs – no complex setup required. Designed for the Product-Led Growth era, it simplifies setup, secures your users, and frees your team to innovate. From startups to enterprises, Frontegg delivers enterprise-grade tools at zero cost to start. Kick off today.
    Start for Free
  • 5
    Sweetviz

    Sweetviz

    Visualize and compare datasets, target values and associations

    Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    RAG Anything

    RAG Anything

    RAG-Anything: All-in-One RAG Framework

    ...The system uses a multi-stage pipeline (e.g., document parsing, content analysis, knowledge graph construction, intelligent retrieval) so queries can navigate across modalities with deeper understanding and relevance.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    NLP

    NLP

    Open source NLP guide with models, methods, and real use cases

    NLP is an open source introductory resource for natural language processing, presented as a continuously updated book hosted on GitHub. It explains how machines process and understand human language, combining theory with practical examples. Its covers core NLP concepts such as text representation, feature extraction, and model evaluation, alongside hands-on implementations using tools like Word2Vec, TF-IDF, and FastText. It also introduces topic modeling with LDA, keyword extraction techniques, and document similarity methods. NLP extends into real-world applications, including sentiment analysis and text classification, helping readers connect concepts to use cases. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TAME LLM

    TAME LLM

    Traditional Mandarin LLMs for Taiwan

    TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    ...The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Respond 100x faster, more accurately, and improve your documentation Icon
    Respond 100x faster, more accurately, and improve your documentation

    Designed for forward-thinking security, sales, and compliance teams

    Slash response times for questionnaires, audits, and RFPs by up to 90%. OptiValue.ai automates the heavy lifting, freeing your team to focus on strategic priorities with intuitive tools for seamless review and validation.
    Learn More
  • 10
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Python Client For NLP Cloud

    Python Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models for NER

    NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline. Internally, it leverages pretrained language models and speech encoders, with a multimodal adapter that integrates both modalities for inference and training. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    flair

    flair

    A very simple framework for state-of-the-art NLP

    ...Developed by Humboldt University of Berlin and friends. A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15

    Tokenized Text Aligner

    Aligns tokens in two versions of a text with differing tokenization.

    This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    ...The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Instagram OSINT Tool

    Instagram OSINT Tool

    Instagram OSINT tool for gathering profile data and public posts

    InstagramOSINT is an open source intelligence (OSINT) tool designed to collect publicly accessible information from Instagram profiles. It retrieves details that are not always easily visible when browsing an Instagram account normally, allowing investigators, researchers, and developers to gather structured data about a target profile. It works by scraping publicly available profile information and extracting metadata from Instagram pages using Python. It collects various attributes such as...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 21
    Biomni

    Biomni

    Biomni: a general-purpose biomedical AI agent

    ...It integrates retrieval-augmented generation with code-based execution, allowing it to access external knowledge, process data, and generate testable hypotheses in scientific workflows. The system is built to support researchers by automating repetitive and time-consuming tasks such as literature review, data analysis, and experimental design. Biomni operates within a comprehensive environment that includes tools, APIs, and datasets, enabling it to execute multi-step research processes rather than just generating text responses. It supports integration with multiple AI models, allowing flexibility in selecting the most appropriate model for specific tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Code-Graph-RAG

    Code-Graph-RAG

    The ultimate RAG for your monorepo

    ...It uses Tree-sitter to parse source code into abstract syntax trees, extracting relationships between functions, classes, and modules to build a graph-based representation of the entire codebase. This structured approach enables more accurate and context-aware querying compared to traditional text-based search methods, allowing users to ask natural language questions about code structure and functionality. The system integrates with graph databases such as Memgraph to store and manage relationships, enabling efficient querying and visualization of complex dependencies. It also supports AI-driven query translation, converting natural language into graph queries for deeper analysis and interaction.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 23
    Gitingest

    Gitingest

    Create prompt-friendly codebase digests from any Git repository URL

    Gitingest is a developer utility that converts an entire Git repository into a structured, prompt-friendly text digest suitable for use with large language models. It analyzes a repository and produces a consolidated textual representation that includes the file structure and code content in an organized format. This makes it easier to provide meaningful code context when working with AI systems that require compact, readable inputs. Developers can generate these digests from either a local...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    spaCy models

    spaCy models

    Models for the spaCy Natural Language Processing (NLP) library

    spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB