Securing Large Language Models: Investigating Prompt Injection Attacks and Remediation Tactics

Zahry, M.Z.L.

dc.contributor.author	Zahry, M.Z.L.
dc.date.accessioned	2024-11-22T08:15:15Z
dc.date.available	2024-11-22T08:15:15Z
dc.date.issued	2024-10-30
dc.identifier.uri	http://drr.vau.ac.lk/handle/123456789/1080
dc.description.abstract	The rapid advancement of Large Language Models (LLMs) has brought about remarkable capabilities in natural language processing, but it has also exposed vulnerabilities such as prompt injection attacks, which pose significant security threats. This research nvestigates the effectiveness of prompt injection attacks on LLMs, focusing on role-based scenarios, and explores potential remediation tactics to mitigate these risks. The primary objective is to test the impact of direct prompt injection attacks and identify mitigations. To address this, we developed a dataset containing both benign and malicious prompts and evaluated the responses of four LLMs: Gemini, ChatGPT, Perplexity, and a quantized Llama 2 model. Our methodology involved testing these models’ behaviours and implementing a system that applies sentiment analysis to filter harmful outputs. The results indicate that Gemini and Perplexity exhibited significant vulnerability, often generating harmful or manipulative content. ChatGPT-4 and quantized Llama 2 demonstrated moderate resistance, producing safer alternatives but still failing in some cases. To mitigate harmful content, a response filtering system based on sentiment analysis was implemented. This successfully flagged and neutralised harmful outputs by replacing them with neutral responses when sentiment scores fell below a predetermined threshold. Llama 2 was used as the ground for research and the sentiment analysis revealed that Llama 2’s responses improved significantly after applying these mitigation techniques, with compound sentiment scores increasing from 0.5453 to 0.8345, reflecting a notable reduction in harmful content. These findings highlight the need for defence strategy, like real time sentiment monitoring, to enhance the security of LLMs against prompt injection attacks. This research suggests the need for ongoing refinement of mitigation tactics as LLMs continue to evolve, with potential applications in improving the security of AI-driven systems across various domains	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Applied Science, University of Vavuniya	en_US
dc.subject	AI security	en_US
dc.subject	Large language models	en_US
dc.subject	Malicious prompts	en_US
dc.subject	Prompt injection attacks	en_US
dc.subject	Role-based prompts	en_US
dc.subject	Sentiment analysis	en_US
dc.title	Securing Large Language Models: Investigating Prompt Injection Attacks and Remediation Tactics	en_US
dc.type	Conference abstract	en_US
dc.identifier.proceedings	The 5th Faculty Annual Research Session - "Exploring Science for Humanity"	en_US