XtraLibD: Detecting Irrelevant Third-Party libraries in Java and Python Applications

التفاصيل البيبلوغرافية
العنوان: XtraLibD: Detecting Irrelevant Third-Party libraries in Java and Python Applications
المؤلفون: Kapur, Ritu, Rao, Poojith U, Dewan, Agrim, Sodhi, Balwinder
المصدر: Springer's Communications in Computer and Information Science, vol 1556. Springer, Cham. Extended version of paper published in Evaluation of Novel Approaches to Software Engineering. ENASE 2021
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Software Engineering
الوصف: Software development comprises the use of multiple Third-Party Libraries (TPLs). However, the irrelevant libraries present in software application's distributable often lead to excessive consumption of resources such as CPU cycles, memory, and modile-devices' battery usage. Therefore, the identification and removal of unused TPLs present in an application are desirable. We present a rapid, storage-efficient, obfuscation-resilient method to detect the irrelevant-TPLs in Java and Python applications. Our approach's novel aspects are i) Computing a vector representation of a .class file using a model that we call Lib2Vec. The Lib2Vec model is trained using the Paragraph Vector Algorithm. ii) Before using it for training the Lib2Vec models, a .class file is converted to a normalized form via semantics-preserving transformations. iii) A eXtra Library Detector (XtraLibD) developed and tested with 27 different language-specific Lib2Vec models. These models were trained using different parameters and >30,000 .class and >478,000 .py files taken from >100 different Java libraries and 43,711 Python available at MavenCentral.com and Pypi.com, respectively. XtraLibD achieves an accuracy of 99.48% with an F1 score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar, and LibD with an accuracy improvement of 74.5%, 30.33%, and 14.1%, respectively. Compared with LibD, XtraLibD achieves a response time improvement of 61.37% and a storage reduction of 87.93% (99.85% over JIngredient). Our program artifacts are available at https://www.doi.org/10.5281/zenodo.5179747.
Comment: 25 pages, 5 figures, 4 tables, Book Chapter of Springer's Communications in Computer and Information Science, vol 1556. Springer, Cham. Extended version of paper published in Evaluation of Novel Approaches to Software Engineering. ENASE 2021
نوع الوثيقة: Working Paper
DOI: 10.1007/978-3-030-96648-5_7
URL الوصول: http://arxiv.org/abs/2202.10776
رقم الأكسشن: edsarx.2202.10776
قاعدة البيانات: arXiv
الوصف
DOI:10.1007/978-3-030-96648-5_7