Challenges in Persian Electronic Text Analysis

Behrang QasemiZadeh, Saeed Rahimi, Mehdi Safaee Ghalati

Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are crucial when developing and processing written corpora of Farsi.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment