Best Open Source Desktop Operating Systems Linguistics Software 2026

Presage

the intelligent predictive text entry platform

Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic predictive algorithms. Presage's predictive capabilities are implemented by predictive plugins. Predictive plugins use services provided by the platform to implement multiple prediction techniques.

3 Reviews

Downloads: 223 This Week

Last Update: 2018-10-11

See Project

XBNF Neurotranslator compiler

(X)BNF simple and clever translation grammar compiler

XBNF Neurotranslator is a powerfull extended BNF grammar language to handle translations easily and many features to handle different kind of situations. This project is for common arch binaries, C++ sources, tests & support tickets. No installation, juste get binary for your architecture : > See [Files] > binary.{version} Library of smart samples of grammars> https://sourceforge.net/projects/xbnf/ Docker image which embeds the Linux/64bits binary and the library. https://hub.docker.com/r/damolab/neurotranslator/ Docker image with GNU C++ toolchain to build the xbnf command: https://hub.docker.com/r/damolab/neurotranslator-compil French blog dedicated to XBNF : https://damolab.zapto.org/xbnf/

1 Review

Downloads: 46 This Week

Last Update: 5 days ago

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora

Downloads: 28 This Week

Last Update: 2019-03-05

See Project

Korean Analyzer Rhino

Parsing Korean words by morpheme and part-of-speech

RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.

Downloads: 14 This Week

Last Update: 2020-10-11

See Project

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.

Downloads: 13 This Week

Last Update: 2024-12-09

See Project

MGIZA++

mgiza has now moved to github https://github.com/moses-smt/mgiza

Downloads: 1 This Week

Last Update: 2014-11-13

See Project

AsiEs

AsiEs stands for Asistente de Escritura (writing assistant). It provides word prediction and autocomplete for fast writing. Thought for people with difficulties writing on keyboard, improves the writing speed preventing the user from pressing at most 50% of keys to write and avoids ortographic errors. Made by Fundación Teletón Uruguay (http://www.teleton.org.uy/home/)

Downloads: 0 This Week

Last Update: 2015-06-17

See Project

Bermuda Text-to-Speech

This project includes basic NLP and DSP techniques for Text-to-Speech

See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.

Downloads: 0 This Week

Last Update: 2014-03-24

See Project

Vtgrep

Vtgrep stands for Visual Tree Grep and is a GUI to tgrep. It allows the user to build graphical representations of tree structures and then translates them into the tgrep syntax. provides search functionality, as well as search and result logging.

Downloads: 0 This Week

Last Update: 2014-03-16

See Project

mwetoolkit

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see the SVN logs). Up-to-date documentation and details about the tool can be found on the mwetoolkit website: http://mwetoolkit.sourceforge.net/

1 Review

Downloads: 0 This Week

Last Update: 2019-05-01

See Project

nippon writter

ecrire avec dictionnaire japonais francais - kanji hiragana katakana

Programme d'écriture en langue japonaise avec : - dictionnaire intégrée Français-Japonais de 20 000 mots. - convertisseur automatique Romaji-Hiragana, Romaji-Katakana. - base de données fournie pour l'écriture en Kanji. - Interface soignée, copier-coller des expressions entières d'un simple clic. - Modules d'aide pour les débutants (particules, conjugueur, expressions courantes, etc...).

Downloads: 0 This Week

Last Update: 2012-07-13

See Project

qseg

A (UTF-8) Chinese word segmentation program based on ngram language model. It is written in C++, provides multi-threading and high throughput.

Downloads: 0 This Week

Last Update: 2015-03-31

See Project

tiny-hyphenator

C++ Library to hyphenate a text

Use this class to achieve efficient and mostly exact word hyphenations. Currently the only language supported is German, but it can very easily be adapted to support other languages. This library spans ~370 lines of C++ code. At the moment there is no real documentation, pleas refer to the comments in the code, they should be quite helpful. A sample is also included. This library is not based on any algorithm published on the internet or elsewhere and it works without a dictionary. The Library is written in C++ and is tested under GCC 4.7, 4.8 and 4.9 [Use it in any way you want, including commercial work, but please include my name and a link to this page]

Downloads: 0 This Week

Last Update: 2015-12-01

See Project

Open Source Desktop Operating Systems Linguistics Software

Linguistics Software for Desktop Operating Systems

Presage

XBNF Neurotranslator compiler

Arabic Corpus

Korean Analyzer Rhino

TXM

MGIZA++

AsiEs

Bermuda Text-to-Speech

Vtgrep

mwetoolkit

nippon writter

qseg

tiny-hyphenator

Related Searches