Distance Functions and Normalization Under Stream Scenarios

التفاصيل البيبلوغرافية
العنوان: Distance Functions and Normalization Under Stream Scenarios
المؤلفون: Barboza, Eduardo V. L., de Almeida, Paulo R. Lisboa, Britto Jr, Alceu de Souza, Cruz, Rafael M. O.
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning
الوصف: Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.
Comment: Paper accepted to the 2023 International Joint Conference on Neural Networks
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2307.00106
رقم الأكسشن: edsarx.2307.00106
قاعدة البيانات: arXiv