Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "text analysis linguistic"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 15
Mac 15
Windows 15
More...
BSD 8
ChromeOS 8

Category

Artificial Intelligence 15

License

OSI-Approved Open Source 15

Programming Language

Python 15

Showing 15 open source projects for "text analysis linguistic"

View related business solutions

Large Language Models (LLM) Python Clear Filters & Widen Search

anny is an all-in-one platform for managing hybrid workplaces and shared resources.
For Businesses looking for a flexible solution for internal and external bookings

Enable your employees to easily book desks, meeting rooms, parking spots, equipment, and more – all in one place. With flexible rules and group permissions, you stay in full control of who can access what.

Learn More
Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.
Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.

Learn More
1

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 1 This Week

Last Update: 2026-03-05
See Project
2

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
3

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...

Downloads: 3 This Week

Last Update: 2026-03-23
See Project
4

Qwen-VL

Chat & pretrained large vision language model

Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. ...

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
Managed File Transfer Software
Products to help you get data where it needs to go—securely and efficiently.

For too many businesses, complex file transfer needs make it difficult to create, manage and support data flows to and from internal and external systems. Progress® MOVEit® empowers enterprises to take control of their file transfer workflows with solutions that help secure, simplify and centralize data exchanges throughout the organization.

Learn More
5

WeClone

One-stop solution for creating your digital avatar from chat history

WeClone is an open source AI project designed to replicate a person’s conversational style and personality by training models on chat history data. The system analyzes message patterns, linguistic style, and contextual behavior in order to generate responses that resemble the original user’s communication style. It is intended primarily as an experimental exploration of digital personality modeling and conversational AI personalization. By processing large volumes of conversation data,...

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
6

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
7

ML Ferret

Refer and Ground Anything Anywhere at Any Granularity

...The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
8

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. ...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
9

DocETL

A system for agentic LLM-powered data processing and ETL

...The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.

Downloads: 3 This Week

Last Update: 2026-03-05
See Project
Cloudbrink Personal SASE service
For companies looking for low maintenance, secure, high performance connectivity for hybrid and remote workers

Cloudbrink’s Personal SASE is a high-performance connectivity and security service that delivers a lightning-fast, in-office experience to the modern hybrid workforce anywhere. Combining high-performance ZTNA with Automated Moving Target Defense (AMTD), and Personal SD-WAN all connections are ultra-secure.

Learn More
10

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
11

Scikit-LLM

Seamlessly integrate LLMs into scikit-learn

Seamlessly integrate powerful language models like ChatGPT into sci-kit-learn for enhanced text analysis tasks. At the moment the majority of the Scikit-LLM estimators are only compatible with some of the OpenAI models. Hence, a user-provided OpenAI API key is required. Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training set). ...

Downloads: 3 This Week

Last Update: 2026-01-21
See Project
12

LongBench

LongBench v2 and LongBench (ACL 25'&24')

...Traditional language model benchmarks typically evaluate tasks involving relatively short inputs, which does not reflect many real-world applications such as analyzing large documents or entire code repositories. LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
13

LOTUS

AI-Powered Data Processing: Use LOTUS to process all of your datasets

...The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. The core concept of the framework is the use of semantic operators, which extend traditional relational database operations to support reasoning over text and other unstructured data. These operators allow tasks such as semantic filtering, ranking, clustering, and summarization to be expressed directly within data processing pipelines. ...

Downloads: 3 This Week

Last Update: 2026-03-06
See Project
14

Ailice

AIlice is a fully autonomous, general-purpose AI agent

AIlice is an open-source autonomous AI agent framework built to function as a general-purpose assistant that can plan, decompose, and execute complex tasks through a structured multi-agent architecture. The project presents itself as a standalone assistant powered by open-source language models, with an internal design that treats user requests almost like executable programs rather than simple chat prompts. Its core IACT architecture allows the system to break large goals into smaller...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
15

YAYI

Repo for YaYi Chinese LLMs based on LlaMA2 & BLOOM

YAYI is an open-source large language model project developed to provide a multilingual conversational AI system capable of performing a wide variety of natural language processing tasks. The model is trained on diverse datasets covering multiple languages and domains so that it can support applications ranging from dialogue systems to text analysis and knowledge retrieval. The architecture is based on transformer-style language models optimized for conversational understanding and generation. In addition to producing coherent responses, the system is designed to handle tasks such as summarization, translation, question answering, and text classification. The repository provides model checkpoints, training resources, and inference tools that allow developers to deploy the model in their own applications. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project

Previous
You're on page 1
Next

Related Searches

nvidia

forensic audio analysis

text-to-speech

speech to speech translation

audio voice

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise