Detection of Common Subtrees with Identical Label Distribution

التفاصيل البيبلوغرافية
العنوان: Detection of Common Subtrees with Identical Label Distribution
المؤلفون: Azaïs, Romain, Ingels, Florian
سنة النشر: 2023
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Computer Science - Data Structures and Algorithms, Statistics - Machine Learning
الوصف: Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism problem is linear, the patterns found provide a more parsimonious representation of the data.
Comment: 40 pages
نوع الوثيقة: Working Paper
DOI: 10.1016/j.tcs.2023.114366
URL الوصول: http://arxiv.org/abs/2307.13068
رقم الأكسشن: edsarx.2307.13068
قاعدة البيانات: arXiv
الوصف
DOI:10.1016/j.tcs.2023.114366