دورية أكاديمية

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model.

التفاصيل البيبلوغرافية
العنوان: A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model.
المؤلفون: Jian Feng, Ying Zhang, Yuqiang Qiao
المصدر: Journal of Computing & Information Technology; 2020, Vol. 28 Issue 1, p19-31, 13p
مصطلحات موضوعية: WEBSITES, PHISHING, SEMANTIC Web, HIERARCHICAL clustering (Cluster analysis), SEMANTICS, NATURAL languages
مستخلص: Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Computing & Information Technology is the property of CIT. Journal of Computing & Information Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Supplemental Index
الوصف
تدمد:13301136
DOI:10.20532/cit.2020.1004899