10.1 C
New York
Sunday, May 5, 2024

Buy now

Sunday, May 5, 2024

Natural Language Processing app gives Khasi language a technology boost

Developed by Dr Medari Janai Tham, the NLP is the application of computational techniques to analyse and synthesise human language, both speech and text.

GUWAHATI:

A researcher at Assam Don Bosco University (ADBU), has developed a Natural Language Processing (NLP) application for the computation of the Khasi language.

Developed by Dr Medari Janai Tham, the NLP is the application of computational techniques for the analysis and synthesis of human language both speech and text. The development of corpus, which is a collection of machine-readable text that is sampled to be representative of a particular language, is an essential step in the building of NLP systems for a language.

Such corpora exist for languages such as English, German, Chinese, Hindi, Bengali, Punjabi, etc. However, not all of these corpora are easily accessible. In English, the most widely used corpora are the British National Corpus (BNC) and it is popular among researchers due to its accessibility.

Where Khasi is concerned, there is no such publicly available corpus and hence it is referred to as a resource-poor language in so far as the application of NLP is concerned. A major contribution in this field has been made with the release of the Khasi annotated corpus titled “Tham Khasi Annotated Corpus” which is freely accessible through the European Language Resources Association (ELRA).

The corpus is manually tagged using the formulated BIS (Bureau of Indian Standards) PoS (Parts-of-Speech) to ensure standardised tagging with other Indian languages.

The corpus was developed by Tham, who was awarded PhD from the Department of Computer Science and Engineering, ADBU for her thesis ‘Shallow Parsing for Khasi’ under the supervision of Prof. Pushpak Bhattacharyya of IIT Bombay.

The details of the corpus including the annotation scheme and the development of the Khasi NLP tools are available in research papers published as part of her PhD and available in www.grammarkhasi.in, which is also a companion website of the book “Ka Grammar Khasi Da Ka Jingdro” by the author published by Macmillan Education, India.

Father Stephen Mavely, vice-chancellor at Assam Don Bosco University, expressed his joy regarding the development.

“Northeast is a land of vibrant people and many such untapped cultural resources. Don Bosco has been investing in the people of Northeast India since the inception of their journey in 1922,” Mavely said.

The other contributions made by the scholar include the BIS Khasi tagset, a Hybrid Khasi PoS tagger, an HMM Khasi PoS tagger, an NLTK Khasi POS tagger, an HMM Khasi shallow parser, and a Khasi shallow parser using the bi-directional gated recurrent unit, seminar report on ‘Towards Standardization of Khasi language for Computational Purposes’ available in the above-mentioned website.

Some of the NLP tools for Khasi are available online for users and researchers to run any Khasi sentence and verify the response of the taggers and parser in www.medaritham.pythonanywhere.com.

Related Articles

Stay Connected

146,751FansLike
12,800FollowersFollow
268FollowersFollow
80,400SubscribersSubscribe

Latest Articles