"Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable for word segmentation in search engines. The paddle mode uses the PaddlePaddle deep learning framework to train the sequence labeling (bidirectional GRU) network model to achieve word segmentation. Also supports part-of-speech tagging. To use paddle mode, you need to install paddlepaddle-tiny, pip install paddlepaddle-tiny==1.6.1. Currently paddle mode supports jieba v0.40 and above. For versions below jieba v0.40, please upgrade jieba, pip install jieba --upgrade.

Features

  • Although jieba has the ability to recognize new words, adding new words by yourself can ensure a higher accuracy rate
  • Developers can specify their own custom dictionaries to include words that are not in the jieba thesaurus
  • Dictionaries can be modified dynamically in the program
  • Keyword extraction based on TextRank Algorithm
  • The Inverse Document Frequency (IDF) text corpus used for keyword extraction can be switched to the path of a custom corpus
  • Dynamic programming is used to find the maximum probability path

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow jieba

jieba Web Site

Other Useful Business Software
Data management solutions for confident marketing Icon
Data management solutions for confident marketing

For companies wanting a complete Data Management solution that is native to Salesforce

Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of jieba!

Additional Project Details

Operating Systems

Linux, Windows

Programming Language

Python

Related Categories

Python Word Processors, Python Languages Software, Python Deep Learning Frameworks

Registered

2022-02-18