دورية أكاديمية

An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications

التفاصيل البيبلوغرافية
العنوان: An End-to-End Named Entity Recognition Platform for Vietnamese Real Estate Advertisement Posts and Analytical Applications
المؤلفون: Binh T. Nguyen, Tung Tran Nguyen Doan, Son Thanh Huynh, Khanh Quoc Tran, An Trong Nguyen, An Tran-Hoai Le, Anh Minh Tran, Nhi Ho, Trung T. Nguyen, Dang T. Huynh
المصدر: IEEE Access, Vol 10, Pp 87681-87697 (2022)
بيانات النشر: IEEE, 2022.
سنة النشر: 2022
المجموعة: LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية: Information extraction, information retrieval and text mining, NLP applications, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف: The volume and complexity of publicly available real estate data have been snowballing. As a result, information extraction and processing have become increasingly challenging and essential for many PropTech (Property Technology) companies worldwide. The challenges are even more pronounced with languages other than English, such as Vietnamese, where few studies in this field have taken place. This paper presents an end-to-end framework for automatically collecting real estate advertisement posts from different data sources, extracting useful information, and storing computed data into proper data warehouses and data marts for the Vietnamese advertisement posts in real estate. After that, one can serve aggregated data for other descriptive and predictive analytics. We combine two models for constructing the most appropriate extraction step: Noise Filtering and Named Entity Recognition (NER). These models can help process initial input data and extract all helpful information. The experiment results show that using $\text{PhoBERT}_{large}$ can achieve the best performance compared to other approaches. Furthermore, we can obtain the corresponding F1 scores of the Noise filtering module and the NER module as 0.8697 and 0.8996, respectively. Finally, we utilize Superset for implementing analytic dashboards to visualize the predicted results and serve for further analysis and management processes.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2169-3536
Relation: https://ieeexplore.ieee.org/document/9846984/; https://doaj.org/toc/2169-3536
DOI: 10.1109/ACCESS.2022.3195496
URL الوصول: https://doaj.org/article/7de9011ea16144db9e0f9a4038993f16
رقم الأكسشن: edsdoj.7de9011ea16144db9e0f9a4038993f16
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21693536
DOI:10.1109/ACCESS.2022.3195496