VersaMatch : ontology matching with weak supervision

التفاصيل البيبلوغرافية
العنوان: VersaMatch : ontology matching with weak supervision
المؤلفون: Jonathan Fürst, Mauricio Fadel Argerich, Bin Cheng
بيانات النشر: ACM, 2023.
سنة النشر: 2023
مصطلحات موضوعية: Ontology matching, Machine learning, General Engineering, Data integration, Weak supervision, 006: Spezielle Computerverfahren
الوصف: Ontology matching is crucial to data integration for across-silo data sharing and has been mainly addressed with heuristic and machine learning (ML) methods. While heuristic methods are often inflexible and hard to extend to new domains, ML methods rely on substantial and hard to obtain amounts of labeled training data. To overcome these limitations, we propose VersaMatch , a flexible, weakly-supervised ontology matching system. VersaMatch employs various weak supervision sources, such as heuristic rules, pattern matching, and external knowledge bases, to produce labels from a large amount of unlabeled data for training a discriminative ML model. For prediction, VersaMatch develops a novel ensemble model combining the weak supervision sources with the discriminative model to support generalization while retaining a high precision. Our ensemble method boosts end model performance by 4 points compared to a traditional weak-supervision baseline. In addition, compared to state-of-the-art ontology matchers, VersaMatch achieves an overall 4-point performance improvement in F1 score across 26 ontology combinations from different domains. For recently released, in-the-wild datasets, VersaMatch beats the next best matchers by 9 points in F1. Furthermore, its core weak-supervision logic can easily be improved by adding more knowledge sources and collecting more unlabeled data for training.
وصف الملف: application/pdf
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::83887b877652ae9cdbf1573eb764b706
https://hdl.handle.net/11475/27771
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....83887b877652ae9cdbf1573eb764b706
قاعدة البيانات: OpenAIRE