CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model

التفاصيل البيبلوغرافية
العنوان: CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model
المؤلفون: Juan Lopez, Michael Hempel, Kalyan S. Perumalla, Kimia Ameri, Hamid Sharif
المصدر: Journal of Cybersecurity and Privacy, Vol 1, Iss 31, Pp 615-637 (2021)
Journal of Cybersecurity and Privacy
Volume 1
Issue 4
Pages 31-637
بيانات النشر: MDPI AG, 2021.
سنة النشر: 2021
مصطلحات موضوعية: Artificial neural network, cybersecurity, Computer science, Random seed, Initialization, transfer learning, Computer security, computer.software_genre, CYVET, classification, Robustness (computer science), Feature (machine learning), T1-995, Language model, natural language processing, F1 score, computer, Classifier (UML), Technology (General), BERT
الوصف: We introduce CyBERT, a cybersecurity feature claims classifier based on bidirectional encoder representations from transformers and a key component in our semi-automated cybersecurity vetting for industrial control systems (ICS). To train CyBERT, we created a corpus of labeled sequences from ICS device documentation collected across a wide range of vendors and devices. This corpus provides the foundation for fine-tuning BERT’s language model, including a prediction-guided relabeling process. We propose an approach to obtain optimal hyperparameters, including the learning rate, the number of dense layers, and their configuration, to increase the accuracy of our classifier. Fine-tuning all hyperparameters of the resulting model led to an increase in classification accuracy from 76% obtained with BertForSequenceClassification’s original architecture to 94.4% obtained with CyBERT. Furthermore, we evaluated CyBERT for the impact of randomness in the initialization, training, and data-sampling phases. CyBERT demonstrated a standard deviation of ±0.6% during validation across 100 random seed values. Finally, we also compared the performance of CyBERT to other well-established language models including GPT2, ULMFiT, and ELMo, as well as neural network models such as CNN, LSTM, and BiLSTM. The results showed that CyBERT outperforms these models on the validation accuracy and the F1 score, validating CyBERT’s robustness and accuracy as a cybersecurity feature claims classifier.
وصف الملف: application/pdf
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3fbb1bdc9cece078a3b3b74f9dd0ae2f
https://www.mdpi.com/2624-800X/1/4/31
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....3fbb1bdc9cece078a3b3b74f9dd0ae2f
قاعدة البيانات: OpenAIRE