| dc.description.abstract |
String matching is a fundamental computational problem with broad applications in bioinformat
ics, natural language processing, cybersecurity, and large-scale text analytics. While advanced algorithms
such as Knuth–Morris–Pratt (KMP) and Boyer–Moore achieve faster performance through preprocessing
and efficient skipping strategies, the Na¨ıve String-Matching Algorithm remains attractive for its concep
tual simplicity, implementation ease, and adaptability across diverse datasets. However, its quadratic time
complexity (O(mn)) constrains performance for large-scale data processing. To address this limitation,
this research investigates parallelization techniques to enhance the Na¨ıve algorithm, introducing a novel
bidirectional parallel variant named “Binamatch.” The Binamatch algorithm was implemented using Open
Multi-Processing (OpenMP) to leverage thread-level parallelism on multi-core architectures. Unlike the tra
ditional unidirectional comparison, Binamatch performs a bidirectional search, scanning from both ends of
the pattern simultaneously to reduce redundant comparisons. The algorithm was evaluated against sequen
tial and parallel implementations of Na¨ıve, Knuth–Morris–Pratt (KMP), and Boyer–Moore algorithms across
various datasets, including Deoxyribonucleic Acid (DNA) sequences, ASCII text, and symbolic patterns. Ex
periments conducted on 16-core and 32-core systems with chunk sizes ranging from 64 to 1024 measured
scalability, execution time, and efficiency. Results revealed that although the bidirectional strategy theoret
ically improved comparison locality, Binamatch underperformed in practice, executing 10–1000 times slower
than baseline algorithms due to high synchronization overhead, cache inefficiency, and unbalanced workload
distribution. Despite this outcome, the research contributes a rigorous benchmarking framework and empir
ical insights into the challenges of parallelizing simple algorithms on shared-memory systems. To enhance
future applicability, a targeted optimization plan is proposed: adaptive workload partitioning to minimize
idle threads, cache-aware scheduling to improve memory locality, and hybrid execution combining sequential
preprocessing with selective parallel matching. These refinements are expected to reduce synchronization
costs and enhance scalability. Overall, this study underscores that while the Na¨ıve algorithm can be paral
lelized, performance benefits depend critically on architectural optimization, offering valuable direction for
developing efficient hybrid models in modern parallel string-matching research. |
en_US |