Abstract:
This paper presents a legal NLP system that maps free-text traffic violation narratives to the
applicable offences and penalties under Sri Lanka’s Motor Traffic Act. We digitize the principal Act and its
amendments into a structured corpus and evaluate a progression of retrieval and reasoning methods—TF
IDF, BM25, SBERT (with a lexical–semantic hybrid), a compact local TinyLLM, and an OpenAI LLM inte
grated with Retrieval-Augmented Generation (RAG). A staged methodology first validates the pipeline on
42 on-the-spot fine offences and then scales to the full consolidated Act. Using 300 expert-validated scenarios
(multi-offence, up to three penalties per case), we require exact section/subsection/paragraph/subparagraph
matches for correctness. The implemented system outputs, for each detected offence, the exact Motor Traffic
Act citation and the corresponding prescribed penalty/fine as defined in the Act. The OpenAI RAG ap
proach achieves 94.00% overall accuracy and 100.00% partial accuracy, substantially outperforming TinyLLM
(68.33% overall), SBERT (33.67%), BM25 (26.67%), and TF-IDF (0.67%). These results indicate that dense
retrieval coupled with grounded generation handles paraphrase, multi-offence narratives, and subtle context
better than sparse baselines. We enforce ethical safeguards: evidence-linked outputs, confidence scoring, and
abstention under uncertainty to support transparent, auditable use. We discuss validity threats (synthetic
narratives, label robustness), guardrails (citation-linked outputs, abstention), and deployment aspects (tem
poral indexing, bilingual support), showing that legal RAG can deliver deployment-grade accuracy for traffic
enforcement in a low-resource jurisdiction.