دورية أكاديمية

Automatic Lifestate Identification and Clustering

التفاصيل البيبلوغرافية
العنوان: Automatic Lifestate Identification and Clustering
المؤلفون: Sam Smith, Gavin Smith, John Harvey
المصدر: International Journal of Population Data Science, Vol 8, Iss 3 (2023)
بيانات النشر: Swansea University, 2023.
سنة النشر: 2023
المجموعة: LCC:Demography. Population. Vital events
مصطلحات موضوعية: Demography. Population. Vital events, HB848-3697
الوصف: Introduction & Background Summarising high-dimensional time series data across multiple entities is an increasingly prevalent problem because mass data collection has become routine in most domains. We propose a method of automatically summarising high-dimensional data. Objectives & Approach Summarization in such a context is both with regard to a reduction of the high-dimensional observations and large number of temporal points. While numerous methods to segment and/or summarise time series exist, the properties often do not align with the needs of consumers of the summaries or require the unrealistic setting of parameters. Addressing this, we define a set of broad properties that lead to high utility in a broad class of domains, which are determined by an information theoretic notion of optimality. Intuitively these properties reflect the summarization of such data into lifestates where (1) the number of possible lifestates is limited and shared across entities to allow interpretation and comparison and (2) the number of lifestate-transitions is jointly controlled to provide a parameterless, optimal summarization of both the high sample and temporal dimensionality. Relevance to Digital Footprints Example data include: regular survey collection, consumer purchasing history from transactional data (where the number of possible items to choose from is high), or other repeatedly sampled digital data. Within the Digital Footprints domain, concise descriptions of high-dimensional data (summarizations) are extremely important. For example, lifestates within health records could be identified and used to find critical patterns in the decline or recovery of patients. Conclusions & Implications This work aims to find segmentations that optimally trade off the number of states and segments that humans must then interpret, while still capturing salient state changes. Building on prior work, we propose a model with complexity controlled by normalised maximum likelihood (NML). In short, the proposed model generates automated summarizations that are both optimally concise and informationally rich, according to information theory, a branch of mathematics.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2399-4908
Relation: https://ijpds.org/article/view/2274; https://doaj.org/toc/2399-4908
DOI: 10.23889/ijpds.v8i3.2274
URL الوصول: https://doaj.org/article/9310d8cc3ea643c0b64fab3c0bcac13b
رقم الأكسشن: edsdoj.9310d8cc3ea643c0b64fab3c0bcac13b
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:23994908
DOI:10.23889/ijpds.v8i3.2274