LogPr\'ecis: Unleashing Language Models for Automated Malicious Log Analysis

التفاصيل البيبلوغرافية
العنوان: LogPr\'ecis: Unleashing Language Models for Automated Malicious Log Analysis
المؤلفون: Boffa, Matteo, Valentim, Rodolfo Vieira, Vassio, Luca, Giordano, Danilo, Drago, Idilio, Mellia, Marco, Houidi, Zied Ben
المصدر: Computers & Security, 2024, 103805, ISSN 0167-4048
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
الوصف: The collection of security-related logs holds the key to understanding attack behaviors and diagnosing vulnerabilities. Still, their analysis remains a daunting challenge. Recently, Language Models (LMs) have demonstrated unmatched potential in understanding natural and programming languages. The question arises whether and how LMs could be also useful for security experts since their logs contain intrinsically confused and obfuscated information. In this paper, we systematically study how to benefit from the state-of-the-art in LM to automatically analyze text-like Unix shell attack logs. We present a thorough design methodology that leads to LogPr\'ecis. It receives as input raw shell sessions and automatically identifies and assigns the attacker tactic to each portion of the session, i.e., unveiling the sequence of the attacker's goals. We demonstrate LogPr\'ecis capability to support the analysis of two large datasets containing about 400,000 unique Unix shell attacks. LogPr\'ecis reduces them into about 3,000 fingerprints, each grouping sessions with the same sequence of tactics. The abstraction it provides lets the analyst better understand attacks, identify fingerprints, detect novelty, link similar attacks, and track families and mutations. Overall, LogPr\'ecis, released as open source, paves the way for better and more responsive defense against cyberattacks.
Comment: 18 pages, Computer&Security (https://www.sciencedirect.com/science/article/pii/S0167404824001068), code available at https://github.com/SmartData-Polito/logprecis, models available at https://huggingface.co/SmartDataPolito
نوع الوثيقة: Working Paper
DOI: 10.1016/j.cose.2024.103805
URL الوصول: http://arxiv.org/abs/2307.08309
رقم الأكسشن: edsarx.2307.08309
قاعدة البيانات: arXiv
الوصف
DOI:10.1016/j.cose.2024.103805