| dc.contributor.author | Sakuntharaj, R. | |
| dc.contributor.author | Mahesan, S. | |
| dc.date.accessioned | 2025-05-20T04:03:55Z | |
| dc.date.available | 2025-05-20T04:03:55Z | |
| dc.date.issued | 2019-12-01 | |
| dc.identifier.issn | 2536-8400 | |
| dc.identifier.uri | http://drr.vau.ac.lk/handle/123456789/1190 | |
| dc.description.abstract | Spell checkers concern two types of errors namely non-word errors and real-word errors. Non-word errors can be of two categories: First one is that the word itself is invalid; the other is that the word is valid but not present in a valid lexicon. Real-word error means the word is valid but inappropriate in the context of the sentence. An approach to correcting real-word errors in Tamil language is proposed in this paper. A bigram probability model is constructed to determine appropriateness of the valid word in the context of the sentence using a 3GB volume of corpora of Tamil text. In case of lacking appropriateness, the word is marked as a real-word error and minimum edit distance technique is used to find lexically similar words, and the appropriateness of such words is measured by a word-level n-gram language probability model. A hash table with word-length as the key is used to speed up the search for words to check for the lexical similarity. Words of lengths of m-1 to m+1 are considered with m being the length of the word found to be ‘inappropriate’. Test results show that the suggestions generated by the system are with more than 98% accuracy as approved by a Scholar in Tamil. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Faculty of Science, University of Ruhuna | en_US |
| dc.subject | Tamil | en_US |
| dc.subject | real-word error | en_US |
| dc.subject | bigram | en_US |
| dc.subject | minimum edit distance | en_US |
| dc.subject | error correction | en_US |
| dc.title | Detecting and correcting real-word errors in Tamil sentences | en_US |
| dc.type | Journal article | en_US |
| dc.identifier.doi | http://doi.org/10.4038/rjs.v9i2.43 | en_US |
| dc.identifier.journal | RUHUNA JOURNAL OF SCIENCE | en_US |