Real-time Automatic Word Segmentation for User-generated Text

Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim

For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language investigated is Korean, a head-final language with various morpho-syllabic blocks as characters. The training scheme is fully neural network-based and straightforward. Besides, we show how the proposed system can be utilized in a web-based real-time revision for a user-generated text. With qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. The demonstration is available online.

Knowledge Graph



Sign up or login to leave a comment