يعرض 1 - 10 نتائج من 294 نتيجة بحث عن '"Zubiaga, Arkaitz"', وقت الاستعلام: 2.68s تنقيح النتائج
  1. 1
    تقرير

    الوصف: Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that leverages information about a user's social network to help predict their stance, outperforming certain cross-target text-based approaches. Unfortunately, the data required for such graph-based approaches is not always available. This paper proposes two novel tools for Stance Detection: the Ego Network Model (ENM) and the Signed Ego Network Model (SENM). These models are founded in anthropological and psychological studies and have been used within the context of social network analysis and related tasks (e.g., link prediction). Stance Detection predictions obtained using these features achieve a level of accuracy similar to the graph-based features used by CT-TN while requiring less and more easily obtainable data. In addition to this, the performances of the inner and outer circles of the ENM, representing stronger and weaker social ties, respectively are compared. Surprisingly, the outer circles, which contain more numerous but less intimate connections, are more useful for predicting stance.
    Comment: Accepted at ASONAM 2024

  2. 2
    تقرير

    مصطلحات موضوعية: Computer Science - Computation and Language

    الوصف: The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically which claims are worth to be fact-checked. Despite advancements in this area in recent years, the application of large language models (LLMs), such as GPT, has only recently drawn attention in studies. However, many open-source LLMs remain underexplored. Therefore, this study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions. Further, we propose a two-step data pruning approach to automatically identify high-quality training data instances for effective learning. The efficiency of our approach is demonstrated through evaluations on the English language dataset as part of the check-worthiness estimation task of CheckThat! 2024. Further, the experiments conducted with data pruning demonstrate that competitive performance can be achieved with only about 44\% of the training data. Our team ranked first in the check-worthiness estimation task in the English language.

  3. 3
    تقرير

    الوصف: Political leaning can be defined as the inclination of an individual towards certain political orientations that align with their personal beliefs. Political leaning inference has traditionally been framed as a binary classification problem, namely, to distinguish between left vs. right or conservative vs liberal. Furthermore, although some recent work considers political leaning inference in a multi-party multi-region framework, their study is limited to the application of social interaction data. In order to address these shortcomings, in this study we propose Hybrid Text-Interaction Modeling (HTIM), a framework that enables hybrid modeling fusioning text and interactions from Social Media to accurately identify the political leaning of users in a multi-party multi-region framework. Access to textual and interaction-based data not only allows us to compare these data sources but also avoids reliance on specific data types. We show that, while state-of-the-art text-based representations on their own are not able to improve over interaction-based representations, a combination of text-based and interaction-based modeling using HTIM considerably improves the performance across the three regions, an improvement that is more prominent when we focus on the most challenging cases involving users who are less engaged in politics.

  4. 4
    تقرير

    المؤلفون: A, Bharathi, Zubiaga, Arkaitz

    مصطلحات موضوعية: Computer Science - Computation and Language

    الوصف: Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This paper makes the first such effort by introducing a novel approach to zero-shot cross-lingual stance detection, Multilingual Translation-Augmented BERT (MTAB), aiming to enhance the performance of a cross-lingual classifier in the absence of explicit training data for target languages. Our technique employs translation augmentation to improve zero-shot performance and pairs it with adversarial learning to further boost model efficacy. Through experiments on datasets labeled for stance towards vaccines in four languages English, German, French, Italian. We demonstrate the effectiveness of our proposed approach, showcasing improved results in comparison to a strong baseline model as well as ablated versions of our model. Our experiments demonstrate the effectiveness of model components, not least the translation-augmented data as well as the adversarial learning component, to the improved performance of the model. We have made our source code accessible on GitHub.

  5. 5
    تقرير

    الوصف: Stance detection, as the task of determining the viewpoint of a social media post towards a target as 'favor' or 'against', has been understudied in the challenging yet realistic scenario where there is limited labeled data for a certain target. Our work advances research in few-shot stance detection by introducing SocialPET, a socially informed approach to leveraging language models for the task. Our proposed approach builds on the Pattern Exploiting Training (PET) technique, which addresses classification tasks as cloze questions through the use of language models. To enhance the approach with social awareness, we exploit the social network structure surrounding social media posts. We prove the effectiveness of SocialPET on two stance datasets, Multi-target and P-Stance, outperforming competitive stance detection models as well as the base model, PET, where the labeled instances for the target under study is as few as 100. When we delve into the results, we observe that SocialPET is comparatively strong in identifying instances of the `against' class, where baseline models underperform.

  6. 6
    تقرير

    المؤلفون: Yi, Peiling, Zubiaga, Arkaitz

    مصطلحات موضوعية: Computer Science - Computation and Language

    الوصف: Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. We explore ID-XCB on two popular session-based cyberbullying datasets along with comprehensive ablation and generalisation studies. We show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases, outperforming state-of-the-art debiasing methods in both performance and bias mitigation. Our quantitative and qualitative analyses demonstrate its generalisability to unseen data.

  7. 7
    تقرير

    المؤلفون: Zeng, Xia, Zubiaga, Arkaitz

    الوصف: Claim verification is an essential step in the automated fact-checking pipeline which assesses the veracity of a claim against a piece of evidence. In this work, we explore the potential of few-shot claim verification, where only very limited data is available for supervision. We propose MAPLE (Micro Analysis of Pairwise Language Evolution), a pioneering approach that explores the alignment between a claim and its evidence with a small seq2seq model and a novel semantic measure. Its innovative utilization of micro language evolution path leverages unlabelled pairwise data to facilitate claim verification while imposing low demand on data annotations and computing resources. MAPLE demonstrates significant performance improvements over SOTA baselines SEED, PET and LLaMA 2 across three fact-checking datasets: FEVER, Climate FEVER, and SciFact. Data and code are available here: https://github.com/XiaZeng0223/MAPLE
    Comment: accepted by EACL Findings 2024

  8. 8
    تقرير

    مصطلحات موضوعية: Computer Science - Computation and Language

    الوصف: The advancement of machine learning and symbolic approaches have underscored their strengths and weaknesses in Natural Language Processing (NLP). While machine learning approaches are powerful in identifying patterns in data, they often fall short in learning commonsense and the factual knowledge required for the NLP tasks. Meanwhile, the symbolic methods excel in representing knowledge-rich data. However, they struggle to adapt dynamic data and generalize the knowledge. Bridging these two paradigms through hybrid approaches enables the alleviation of weaknesses in both while preserving their strengths. Recent studies extol the virtues of this union, showcasing promising results in a wide range of NLP tasks. In this paper, we present an overview of hybrid approaches used for NLP. Specifically, we delve into the state-of-the-art hybrid approaches used for a broad spectrum of NLP tasks requiring natural language understanding, generation, and reasoning. Furthermore, we discuss the existing resources available for hybrid approaches for NLP along with the challenges and future directions, offering a roadmap for future research avenues.
    Comment: Revised according to review comments

  9. 9
    تقرير

    مصطلحات موضوعية: Computer Science - Computation and Language

    الوصف: Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
    Comment: Accepted revision

  10. 10
    تقرير

    الوصف: An ability to infer the political leaning of social media users can help in gathering opinion polls thereby leading to a better understanding of public opinion. While there has been a body of research attempting to infer the political leaning of social media users, this has been typically simplified as a binary classification problem (e.g. left vs right) and has been limited to a single location, leading to a dearth of investigation into more complex, multiclass classification and its generalizability to different locations, particularly those with multi-party systems. Our work performs the first such effort by studying political leaning inference in three of the UK's nations (Scotland, Wales and Northern Ireland), each of which has a different political landscape composed of multiple parties. To do so, we collect and release a dataset comprising users labelled by their political leaning as well as interactions with one another. We investigate the ability to predict the political leaning of users by leveraging these interactions in challenging scenarios such as few-shot learning, where training data is scarce, as well as assessing the applicability to users with different levels of political engagement. We show that interactions in the form of retweets between users can be a very powerful feature to enable political leaning inference, leading to consistent and robust results across different regions with multi-party systems. However, we also see that there is room for improvement in predicting the political leaning of users who are less engaged in politics.