R\'esum\'e Parsing as Hierarchical Sequence Labeling: An Empirical Study

التفاصيل البيبلوغرافية
العنوان: R\'esum\'e Parsing as Hierarchical Sequence Labeling: An Empirical Study
المؤلفون: Retyk, Federico, Fabregat, Hermenegildo, Aizpuru, Juan, Taglio, Mariana, Zbib, Rabih
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, I.2.7
الوصف: Extracting information from r\'esum\'es is typically formulated as a two-stage problem, where the document is first segmented into sections and then each section is processed individually to extract the target entities. Instead, we cast the whole problem as sequence labeling in two levels -- lines and tokens -- and study model architectures for solving both tasks simultaneously. We build high-quality r\'esum\'e parsing corpora in English, French, Chinese, Spanish, German, Portuguese, and Swedish. Based on these corpora, we present experimental results that demonstrate the effectiveness of the proposed models for the information extraction task, outperforming approaches introduced in previous work. We conduct an ablation study of the proposed architectures. We also analyze both model performance and resource efficiency, and describe the trade-offs for model deployment in the context of a production environment.
Comment: RecSys in HR'23: The 3rd Workshop on Recommender Systems for Human Resources, in conjunction with the 17th ACM Conference on Recommender Systems, September 18--22, 2023, Singapore, Singapore
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2309.07015
رقم الأكسشن: edsarx.2309.07015
قاعدة البيانات: arXiv