parallel corpus free download

Showing 12 open source projects for "parallel corpus"

View related business solutions

Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.

Learn More
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
1

Step3-VL-10B

Multimodal model achieving SOTA performance

Step3-VL-10B is an open-source multimodal foundation model developed by StepFun AI that pushes the boundaries of what compact models can achieve by combining visual and language understanding in a single architecture. Despite having only about 10 billion parameters, it delivers performance that rivals or even surpasses much larger models (10×–20× larger) on a wide range of multimodal benchmarks covering reasoning, perception, and complex tasks, positioning it as one of the most powerful...

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
2

PADIC

A multilingual Parallel Arabic DIalectal Corpus

PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA.

Downloads: 2 This Week

Last Update: 2017-05-26
See Project
3

English-Vietnamese Bilingual Corpus

The English-Vietnamese Bilingual Corpus (EVBCorpus) is a collection of English and Vietnamese parallel translations and bitexts.

Downloads: 2 This Week

Last Update: 2016-09-01
See Project
4

Osman Arabic Text Readability

Open Source tool for Arabic text readability

We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text....

Downloads: 0 This Week

Last Update: 2016-11-17
See Project
Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
5

Transml

Phrase based Statistical Machine Transltion system for English Languag

...Statistical Machine Translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The SMT is a corpus based approach, where a massive parallel corpus is required for training the SMT systems.

Downloads: 0 This Week

Last Update: 2013-10-03
See Project
6

Uplug corpus tools

Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools. Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB. Download 'uplug-main' first and then add other packages.

Downloads: 0 This Week

Last Update: 2013-04-29
See Project
7

Khmer Automatic Translation

Khmer-English-Khmer Automatic Translation

The project attempts to develop a parallel-corpus-based hybrid high quality English-Khmer-English automatic translation system based on statistical analysis and enhanced with part-of-speech analysis.

Downloads: 0 This Week

Last Update: 2013-05-29
See Project
8

English-Khmer S. Machine Translation

English-Khmer Automatic Statistic Machine Translation (SMT)

Automatic Machine Translation from English to Khmer project is the first effort in Natural Language Processing field for translating English to Khmer (Cambodian) language. This project uses Domy CE, an open source SMT toolkit, for training parallel corpus and web technologies such as Python, Apache2, HTML, XML, and XSLT for developing web-based application. This project is developed by Ms. Kim Sokphyrum (DU) and Ms. Suos Samak (Jamia), under Supervision of Mr. Javier Sola, a Program Manager at Open Institute (OI), Cambodia, Dr. Vasudha Bhatnagar, an Assistant professor and a Head of Computer Science at University of Delhi (DU), New Delhi, India. and Dr. ...

3 Reviews

Downloads: 0 This Week

Last Update: 2016-07-23
See Project
9

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
10

Sanchay

Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
11

Quick Parallel Search

This is a project implemented to build a parallel text search system with the main emphasize being placed on searching for the provided query within the shortest possible time. The query will be searched within the GigaWord corpus.

Downloads: 0 This Week

Last Update: 2014-04-14
See Project
12

GigaChat 3 Ultra

High-performance MoE model with MLA, MTP, and multilingual reasoning

...It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its training corpus incorporates ten languages, enriched with books, academic sources, code datasets, mathematical tasks, and more than 5.5 trillion tokens of high-quality synthetic data. This combination significantly boosts reasoning, coding, and multilingual performance across modern benchmarks. Designed for high-performance deployment, GigaChat 3 Ultra supports major inference engines and offers optimized BF16 and FP8 execution paths for cluster-grade hardware.

Downloads: 0 This Week

Last Update: 2025-12-03
See Project