Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models.

التفاصيل البيبلوغرافية
العنوان:	Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models.
المؤلفون:	Trad, Fouad, Chehab, Ali
المصدر:	Machine Learning & Knowledge Extraction; Mar2024, Vol. 6 Issue 1, p367-384, 18p
مصطلحات موضوعية:	LANGUAGE models, UNIFORM Resource Locators, PHISHING, MACHINE learning, ENGINEERING
مستخلص:	Large Language Models (LLMs) are reshaping the landscape of Machine Learning (ML) application development. The emergence of versatile LLMs capable of undertaking a wide array of tasks has reduced the necessity for intensive human involvement in training and maintaining ML models. Despite these advancements, a pivotal question emerges: can these generalized models negate the need for task-specific models? This study addresses this question by comparing the effectiveness of LLMs in detecting phishing URLs when utilized with prompt-engineering techniques versus when fine-tuned. Notably, we explore multiple prompt-engineering strategies for phishing URL detection and apply them to two chat models, GPT-3.5-turbo and Claude 2. In this context, the maximum result achieved was an F1-score of 92.74% by using a test set of 1000 samples. Following this, we fine-tune a range of base LLMs, including GPT-2, Bloom, Baby LLaMA, and DistilGPT-2—all primarily developed for text generation—exclusively for phishing URL detection. The fine-tuning approach culminated in a peak performance, achieving an F1-score of 97.29% and an AUC of 99.56% on the same test set, thereby outperforming existing state-of-the-art methods. These results highlight that while LLMs harnessed through prompt engineering can expedite application development processes, achieving a decent performance, they are not as effective as dedicated, task-specific LLMs. [ABSTRACT FROM AUTHOR]
	Copyright of Machine Learning & Knowledge Extraction is the property of MDPI and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات:	Complementary Index

Find this article in full text from ProQuest

Full Text Finder

الوصف
تدمد:	25044990
DOI:	10.3390/make6010018