نتائج البحث - "Kumar, Sonal"

يعرض 11 - 20 نتائج من 408 نتيجة بحث عن '"Kumar, Sonal"', وقت الاستعلام: 1.42s تنقيح النتائج

النتائج لكل صفحة

فرز بـ

عرض العنوان عرض مختصر العرض المفصل

تحديد الصفحة | بالمحدد:

تحديد النتيجة رقم 11
11

تقرير

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

المؤلفون: Ghosh, Sreyan, Seth, Ashish, Kumar, Sonal, Tyagi, Utkarsh, Evuru, Chandra Kiran, S, Ramaneswaran, Sakshi, S., Nieto, Oriol, Duraiswami, Ramani, Manocha, Dinesh

مصطلحات موضوعية: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing

الوصف: A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute-binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.
Comment: ICLR 2024

URL الوصول: http://arxiv.org/abs/2310.08753

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 12
12

تقرير

RECAP: Retrieval-Augmented Audio Captioning

المؤلفون: Ghosh, Sreyan, Kumar, Sonal, Evuru, Chandra Kiran Reddy, Duraiswami, Ramani, Manocha, Dinesh

مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Sound

الوصف: We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-text model CLAP to retrieve captions similar to it from a replaceable datastore, which are then used to construct a prompt. Next, we feed this prompt to a GPT-2 decoder and introduce cross-attention layers between the CLAP encoder and GPT-2 to condition the audio for caption generation. Experiments on two benchmark datasets, Clotho and AudioCaps, show that RECAP achieves competitive performance in in-domain settings and significant improvements in out-of-domain settings. Additionally, due to its capability to exploit a large text-captions-only datastore in a training-free fashion, RECAP shows unique capabilities of captioning novel audio events never seen during training and compositional audios with multiple events. To promote research in this space, we also release 150,000+ new weakly labeled captions for AudioSet, AudioCaps, and Clotho.
Comment: ICASSP 2024. Code and data: https://github.com/Sreyan88/RECAP

URL الوصول: http://arxiv.org/abs/2309.09836

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 13
13

تقرير

ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations

المؤلفون: Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, Tyagi, Utkarsh, Singh, Sakshi, Chowdhury, Sanjoy, Manocha, Dinesh

مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language

الوصف: Neural image classifiers can often learn to make predictions by overly relying on non-predictive features that are spuriously correlated with the class labels in the training data. This leads to poor performance in real-world atypical scenarios where such features are absent. This paper presents ASPIRE (Language-guided Data Augmentation for SPurIous correlation REmoval), a simple yet effective solution for supplementing the training dataset with images without spurious features, for robust learning against spurious correlations via better generalization. ASPIRE, guided by language at various steps, can generate non-spurious images without requiring any group labeling or existing non-spurious images in the training set. Precisely, we employ LLMs to first extract foreground and background features from textual descriptions of an image, followed by advanced language-guided image editing to discover the features that are spuriously correlated with the class label. Finally, we personalize a text-to-image generation model using the edited images to generate diverse in-domain images without spurious features. ASPIRE is complementary to all prior robust training methods in literature, and we demonstrate its effectiveness across 4 datasets and 9 baselines and show that ASPIRE improves the worst-group classification accuracy of prior methods by 1% - 38%. We also contribute a novel test set for the challenging Hard ImageNet dataset.
Comment: ACL 2024 Findings. Code: https://github.com/Sreyan88/ASPIRE

URL الوصول: http://arxiv.org/abs/2308.10103

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 14
14

تقرير

ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER

المؤلفون: Ghosh, Sreyan, Tyagi, Utkarsh, Suri, Manan, Kumar, Sonal, Ramaneswaran, S, Manocha, Dinesh

مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval

الوصف: Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem existing NER data augmentation techniques suffer from and often generates incoherent augmentations by placing complex named entities in the wrong context. ACLM builds on BART and is optimized on a novel text reconstruction or denoising task - we use selective masking (aided by attention maps) to retain the named entities and certain keywords in the input sentence that provide contextually relevant additional knowledge or hints about the named entities. Compared with other data augmentation strategies, ACLM can generate more diverse and coherent augmentations preserving the true word sense of complex entities in the sentence. We demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER across various low-resource settings. ACLM outperforms all our neural baselines by a significant margin (1%-36%). In addition, we demonstrate the application of ACLM to other domains that suffer from data scarcity (e.g., biomedical). In practice, ACLM generates more effective and factual augmentations for these domains than prior methods. Code: https://github.com/Sreyan88/ACLM
Comment: ACL 2023 Main Conference

URL الوصول: http://arxiv.org/abs/2306.00928

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 15
15

تقرير

BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER

المؤلفون: Ghosh, Sreyan, Tyagi, Utkarsh, Kumar, Sonal, Manocha, Dinesh

مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval

الوصف: Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation techniques fail to produce factual and diverse augmentations for BioNER. In this paper, we present BioAug, a novel data augmentation framework for low-resource BioNER. BioAug, built on BART, is trained to solve a novel text reconstruction task based on selective masking and knowledge augmentation. Post training, we perform conditional generation and generate diverse augmentations conditioning BioAug on selectively corrupted text similar to the training stage. We demonstrate the effectiveness of BioAug on 5 benchmark BioNER datasets and show that BioAug outperforms all our baselines by a significant margin (1.5%-21.5% absolute improvement) and is able to generate augmentations that are both more factual and diverse. Code: https://github.com/Sreyan88/BioAug.
Comment: SIGIR 2023

URL الوصول: http://arxiv.org/abs/2305.10647

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 16
16

تقرير

CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

المؤلفون: Ghosh, Sreyan, Suri, Manan, Chiniya, Purva, Tyagi, Utkarsh, Kumar, Sonal, Manocha, Dinesh

مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Social and Information Networks

الوصف: The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. CoSyn introduces novel ways to encode these external contexts and employs a novel context interaction mechanism that clearly captures the interplay between them, making independent assessments of the amounts of information to be retrieved from these noisy contexts. Additionally, it carries out all these operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn on 6 hate speech datasets and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
Comment: Accepted to EMNLP 2023 Main Conference. Code: https://github.com/Sreyan88/CoSyn

URL الوصول: http://arxiv.org/abs/2303.03387

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 17
17

تقرير

A novel multimodal dynamic fusion network for disfluency detection in spoken utterances

المؤلفون: Ghosh, Sreyan, Tyagi, Utkarsh, Kumar, Sonal, Suri, Manan, Shah, Rajiv Ratn

مصطلحات موضوعية: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing

الوصف: Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a multimodal dynamic fusion network that adds minimal parameters over an existing text encoder commonly used in prior art to leverage the prosodic and acoustic cues hidden in speech. Through experiments, we show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior unimodal and multimodal systems in literature by a significant margin. In addition, we make a thorough qualitative analysis and show that, unlike text-only systems, which suffer from spurious correlations in the data, our system overcomes this problem through additional cues from speech signals. We make all our codes publicly available on GitHub.
Comment: Submitted to ICASSP 2023. arXiv admin note: text overlap with arXiv:2203.16794

URL الوصول: http://arxiv.org/abs/2211.14700

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 18
18

دورية أكاديمية

Long-term clinical outcomes of patients with COVID-19 and chronic liver disease: US multicenter COLD study.

المؤلفون: Aby, Elizabeth S, Moafa, Ghady, Latt, Nyan, Sultan, Mohammad T, Cacioppo, Paula A, Kumar, Sonal, Chung, Raymond T, Bloom, Patricia P, Gustafson, Jenna, Daidone, Michael, Reinus, Zoe, Debes, Jose D, Sandhu, Sunny, Sohal, Aalam, Khalid, Sameeha, Roytman, Marina, Catana, Andreea Maria, Wegermann, Kara, Carr, Rotonya M, Saiman, Yedidya, Kassab, Ihab, Chen, Vincent L, Rabiee, Atoosa, Rosenberg, Carly, Nguyen, Veronica, Gainey, Christina, Zhou, Kali, Chavin, Kenneth, Lizaola-Mayo, Blanca C, Chascsa, David M, Varelas, Lee, Moghe, Akshata, Dhanasekaran, Renumathy

المصدر: Hepatology communications. 7(1)

مصطلحات موضوعية: Humans, Liver Diseases, Hospitalization, Adult, Middle Aged, Female, Male, COVID-19, COVID-19 Vaccines, Post-Acute COVID-19 Syndrome, Digestive Diseases, Vaccine Related, Liver Disease, Clinical Research, Management of diseases and conditions, 7.1 Individual care needs, Good Health and Well Being

الوصف: BackgroundCOVID-19 is associated with higher morbidity and mortality in patients with chronic liver diseases (CLDs). However, our understanding of the long-term outcomes of COVID-19 in patients with CLD is limited.MethodsWe conducted a multicenter, observational cohort study of adult patients with CLD who were diagnosed with COVID-19 before May 30, 2020, to determine long-term clinical outcomes. We used a control group of patients with CLD confirmed negative for COVID-19.ResultsWe followed 666 patients with CLD (median age 58 years, 52.8% male) for a median of 384 (interquartile range: 31-462) days. The long-term mortality was 8.1%; with 3.6% experiencing delayed COVID-19-related mortality. Compared to a propensity-matched control group of patients with CLD without COVID-19 (n=1332), patients with CLD with COVID-19 had worse long-term survival [p

وصف الملف: application/pdf

URL الوصول: https://escholarship.org/uc/item/09b9v2tw

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 19
19

دورية أكاديمية

Thermal Performance Analysis of Twin Pulsating Turbulent Jets Impinging over Intermittently Moving Plate

المؤلفون: Kumar, Sonal^{Aff1, IDs1336902308302w_cor1}, Halder, Pabitra

المصدر: Arabian Journal for Science and Engineering. :1-22

Full Text Finder

أضف إلى المفضلة

محفوظ في:
تحديد النتيجة رقم 20
20

تقرير

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

المؤلفون: Ghosh, Sreyan, Kumar, Sonal, Singla, Yaman Kumar, Shah, Rajiv Ratn, Umesh, S.

مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

الوصف: Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)

URL الوصول: http://arxiv.org/abs/2203.16028

أضف إلى المفضلة

محفوظ في: