A deep semantic search method for random tweets

التفاصيل البيبلوغرافية
العنوان: A deep semantic search method for random tweets
المؤلفون: Mark Liptrott, Isa Inuwa-Dutse, Ioannis Korkontzelos
المصدر: Online Social Networks and Media
سنة النشر: 2019
مصطلحات موضوعية: Information retrieval, Computer Networks and Communications, Computer science, business.industry, Communication, Deep learning, Semantic search, 020206 networking & telecommunications, 02 engineering and technology, Duplicate content, Convolutional neural network, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Social media, Artificial intelligence, Cluster analysis, business, Information Systems, Linear search
الوصف: Contemporary social media platforms enable users to act as both producers and consumers of content, leading to the generation of enormous amounts of data. While this ability is empowering, it is also posing many challenges concerning efficient searches for relevant information. Many search approaches have been proposed in the literature. However, searching for information on Twitter is particularly challenging due to both the inconsistency in writing styles and the high generation rate of spurious and duplicate content. The quest for instant and efficient data processing to retrieve relevant information renders many existing techniques ineffective when applied to Twitter. We present a multilevel approach based on state-of-the-art deep learning methods and a novel scalable windowing approach for pairwise-similarity search (SWAPS) to improve search efficiency. SWAPS optimises searches using a strategic balancing criterion to assess the trade-off between accuracy and search speed, thereby circumnavigating sequential search problems. Moreover, we propose a deep search strategy that establishes a relationship between the status of a tweet and its longevity measured in terms of engagement lifespan since posting. Deep search utilises a convolutional neural network for textual n-grams features extraction and meta-features from the tweet to train a fully connected network on a vast number of tweets. This approach differs from existing ones by recognising the relationship between the status of a tweet and its engagement lifespan to ensure a better understanding of the compositional semantics in tweets. The results highlight interesting symmetrical properties with respect to similarity distribution and duration. We evaluate our approach on various benchmark datasets and demonstrate the efficacy and applicability of the method. Problems of event detection, clustering and ads, among others, can utilise this approach to detect items of interest effectively.
تدمد: 2468-6964
DOI: 10.1016/j.osnem.2019.07.002
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa2ce4263d8870cefc07252112f67d1e
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....fa2ce4263d8870cefc07252112f67d1e
قاعدة البيانات: OpenAIRE
الوصف
تدمد:24686964
DOI:10.1016/j.osnem.2019.07.002