Detecting and correcting real-word errors in Tamil sentences

Show simple item record

dc.contributor.author Sakuntharaj, R.
dc.contributor.author Mahesan, S.
dc.date.accessioned 2025-05-20T04:03:55Z
dc.date.available 2025-05-20T04:03:55Z
dc.date.issued 2019-12-01
dc.identifier.issn 2536-8400
dc.identifier.uri http://drr.vau.ac.lk/handle/123456789/1190
dc.description.abstract Spell checkers concern two types of errors namely non-word errors and real-word errors. Non-word errors can be of two categories: First one is that the word itself is invalid; the other is that the word is valid but not present in a valid lexicon. Real-word error means the word is valid but inappropriate in the context of the sentence. An approach to correcting real-word errors in Tamil language is proposed in this paper. A bigram probability model is constructed to determine appropriateness of the valid word in the context of the sentence using a 3GB volume of corpora of Tamil text. In case of lacking appropriateness, the word is marked as a real-word error and minimum edit distance technique is used to find lexically similar words, and the appropriateness of such words is measured by a word-level n-gram language probability model. A hash table with word-length as the key is used to speed up the search for words to check for the lexical similarity. Words of lengths of m-1 to m+1 are considered with m being the length of the word found to be ‘inappropriate’. Test results show that the suggestions generated by the system are with more than 98% accuracy as approved by a Scholar in Tamil. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna en_US
dc.subject Tamil en_US
dc.subject real-word error en_US
dc.subject bigram en_US
dc.subject minimum edit distance en_US
dc.subject error correction en_US
dc.title Detecting and correcting real-word errors in Tamil sentences en_US
dc.type Journal article en_US
dc.identifier.doi http://doi.org/10.4038/rjs.v9i2.43 en_US
dc.identifier.journal RUHUNA JOURNAL OF SCIENCE en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Browse

My Account