LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

التفاصيل البيبلوغرافية
العنوان: LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
المؤلفون: Helff, Lukas, Friedrich, Felix, Brack, Manuel, Kersting, Kristian, Schramowski, Patrick
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
الوصف: We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard's new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.
Comment: Project page at https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.05113
رقم الأكسشن: edsarx.2406.05113
قاعدة البيانات: arXiv