Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge

التفاصيل البيبلوغرافية
العنوان: Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
المؤلفون: Alexandre Durand-Salmon, Philippe Jolivet, Claire Gardent, Anna Liednikova
المساهمون: Natural Language Processing : representations, inference and semantics (SYNALP), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Gardent, Claire
المصدر: Proceedings of the 28th Conference on Computational Linguistics
Proceedings of the 28th Conference on Computational Linguistics, Dec 2020, Barcelona, Spain
COLING
بيانات النشر: HAL CCSD, 2020.
سنة النشر: 2020
مصطلحات موضوعية: Training set, Computer science, business.industry, media_common.quotation_subject, 02 engineering and technology, 010501 environmental sciences, [INFO] Computer Science [cs], Machine learning, computer.software_genre, 01 natural sciences, Paraphrase, Domain (software engineering), 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Quality (business), [INFO]Computer Science [cs], Artificial intelligence, Dialog box, business, computer, 0105 earth and related environmental sciences, media_common
الوصف: International audience; A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to create doctor-patient interaction data. We use this data to learn both a generation and a hybrid classification/retrieval model and find that the generation model consistently outperforms the hybrid model. We show that our data creation method has several advantages. Not only does it allow for the semi-automatic creation of large quantities of training data. It also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.
وصف الملف: application/pdf
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1abcfd5b643439feb1a0ef39511d3f29
https://hal.science/hal-03020294/document
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....1abcfd5b643439feb1a0ef39511d3f29
قاعدة البيانات: OpenAIRE