IK Analyzer is an open source, lightweight Chinese word segmentation toolkit developed based on java language. Since the release of version 1.0 in December 2006, IKAnalyzer has launched 4 major versions. Initially, it was a Chinese word segmentation component based on the open source project Luence as the main application, combined with dictionary word segmentation and grammar analysis algorithms. Starting from version 3.0, IK has developed into a common word segmentation component for Java, independent of the Lucene project, and at the same time provides a default optimized implementation of Lucene. In the 2012 version, IK implemented a simple word segmentation ambiguity elimination algorithm, marking the evolution of the IK tokenizer from pure dictionary word segmentation to analog semantic word segmentation.

Features

  • Adopt the unique "forward iterative most fine-grained segmentation algorithm", support two segmentation modes of fine-grained and intelligent word segmentation
  • The 2012 version of the smart word segmentation mode supports simple word segmentation and ambiguity processing and quantifier merge output
  • Adopts multi-sub-processor analysis mode, supports: word segmentation processing such as English letters, numbers, Chinese vocabulary, etc., compatible with Korean and Japanese characters
  • Optimized dictionary storage, smaller memory footprint
  • Support user dictionary extension definition. In particular, in the 2012 version, the dictionary supports Chinese, English, and number mixed words
  • Provides a simple word segmentation ambiguity elimination algorithm

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow IK Analysis for Elasticsearch

IK Analysis for Elasticsearch Web Site

Other Useful Business Software
Build innovative business apps powered by process automation Icon
Build innovative business apps powered by process automation

Connect workflows, teams and systems within one digital business transformation platform

Manage your business as a unified system of interacting processes. Use BPMN 2.0 for low-code process modeling by business people. Follow your strategic goals with process architecture that always corresponds to the structure of an actual business.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of IK Analysis for Elasticsearch!

Additional Project Details

Operating Systems

Windows

Programming Language

Java

Related Categories

Java Browser Extensions and Plugins, Java Languages Software

Registered

2021-05-17