parallel corpus free download

5 projects for "parallel corpus" with 1 filter applied:

BSD Clear Filters & Widen Search

AI-Powered Identity Governance
For IT Teams and MSPs in need of a solution to simplify, optimize and secure their SaaS, file, and device management operations

Define governance policies, manage access, and optimize licenses with unified visibility across every identity, app, and file.

Learn More
End-To-End Document Management Software
UnForm is ideal for businesses focusing on distribution, manufacturing ERP solutions, and general accounting.

UnForm® is a platform-independent software product that creates, delivers, stores and retrieves graphically enhanced documents from ERP application printing. A complete, end-to-end document management solution, UnForm interfaces at the point of printing to produce documents in various formats for printing and electronic delivery.

Learn More
1

Step3-VL-10B

Multimodal model achieving SOTA performance

Step3-VL-10B is an open-source multimodal foundation model developed by StepFun AI that pushes the boundaries of what compact models can achieve by combining visual and language understanding in a single architecture. Despite having only about 10 billion parameters, it delivers performance that rivals or even surpasses much larger models (10×–20× larger) on a wide range of multimodal benchmarks covering reasoning, perception, and complex tasks, positioning it as one of the most powerful...

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
2

Uplug corpus tools

Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools. Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB. Download 'uplug-main' first and then add other packages.

Downloads: 0 This Week

Last Update: 2013-04-29
See Project
3

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
4

Sanchay

Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
PeerGFS PEER Software - File Sharing and Collaboration
One Solution to Simplify File Management and Orchestration Across Edge, Data Center, and Cloud Storage

PeerGFS is a software-only solution developed to solve file management/file replication challenges in multi-site, multi-platform, and hybrid multi-cloud environments.

Learn More
5

GigaChat 3 Ultra

High-performance MoE model with MLA, MTP, and multilingual reasoning

...It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its training corpus incorporates ten languages, enriched with books, academic sources, code datasets, mathematical tasks, and more than 5.5 trillion tokens of high-quality synthetic data. This combination significantly boosts reasoning, coding, and multilingual performance across modern benchmarks. Designed for high-performance deployment, GigaChat 3 Ultra supports major inference engines and offers optimized BF16 and FP8 execution paths for cluster-grade hardware.

Downloads: 0 This Week

Last Update: 2025-12-03
See Project