parallel corpus free download

4 projects for "parallel corpus" with 1 filter applied:

ChromeOS Clear Filters & Widen Search

1

Step3-VL-10B

Multimodal model achieving SOTA performance

Step3-VL-10B is an open-source multimodal foundation model developed by StepFun AI that pushes the boundaries of what compact models can achieve by combining visual and language understanding in a single architecture. Despite having only about 10 billion parameters, it delivers performance that rivals or even surpasses much larger models (10×–20× larger) on a wide range of multimodal benchmarks covering reasoning, perception, and complex tasks, positioning it as one of the most powerful...

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
2

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
3

Sanchay

Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
4

GigaChat 3 Ultra

High-performance MoE model with MLA, MTP, and multilingual reasoning

...It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its training corpus incorporates ten languages, enriched with books, academic sources, code datasets, mathematical tasks, and more than 5.5 trillion tokens of high-quality synthetic data. This combination significantly boosts reasoning, coding, and multilingual performance across modern benchmarks. Designed for high-performance deployment, GigaChat 3 Ultra supports major inference engines and offers optimized BF16 and FP8 execution paths for cluster-grade hardware.

Downloads: 0 This Week

Last Update: 2025-12-03
See Project