How to detect novelty in textual data streams? A comparative study of existing methods

التفاصيل البيبلوغرافية
العنوان: How to detect novelty in textual data streams? A comparative study of existing methods
المؤلفون: Christophe, Clément, Velcin, Julien, Cugliari, Jairo, Suignard, Philippe, Boumghar, Manel
سنة النشر: 2019
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Information Retrieval, Statistics - Machine Learning
الوصف: Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset.
Comment: 16 pages
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/1909.05099
رقم الأكسشن: edsarx.1909.05099
قاعدة البيانات: arXiv