Extract and Analyze the Performance Data from Research Papers using  Generative AI

Abeysekara, W.A.D.S.S.; Yasotha, R.

Extract and Analyze the Performance Data from Research Papers using Generative AI

Abeysekara, W.A.D.S.S.; Yasotha, R.

URI: http://drr.vau.ac.lk/handle/123456789/2030

Date: 2026

Abstract:

This study explores automated extraction and analysis of tabular data from research papers to streamline researchers’ workflows. It integrates Generative AI and Optical Character Recognition within an end-to-end pipeline applied to over 1,200 open-access Artificial Intelligence and Machine Learning papers. PDF files were first converted into high-resolution images, after which tables were detected using a fine-tuned YOLOv8 model. Text from the detected tables was extracted using Tesseract OCR, and performance-related data was filtered and analyzed using Retrieval-Augmented Generation methods. The analysis identified top-performing models, such as BERT and CODEX, and widely used datasets including SQuAD and GSM8K, enabling automated meta-analysis. The results demonstrate the scalability and effectiveness of combining computer vision and NLP for high-quality data extraction.Models such as Llama3-8B and deepseek-r1:8b 0528-quwen3-q8_0 provided domain-specific insights. The study also suggests improved table detection without relying on keyword searches by leveraging advanced AI, NLP, and ensemble learning techniques.

Show full item record