ShieldGemma 2: Robust and Tractable Image Content Moderation

Zeng, Wenjun; Kurniawan, Dana; Mullins, Ryan; Liu, Yuchi; Saha, Tamoghna; Ike-Njoku, Dirichi; Gu, Jindong; Song, Yiwen; Xu, Cai; Zhou, Jingjing; Joshi, Aparna; Dheep, Shravan; Malek, Mani; Palangi, Hamid; Baek, Joon; Pereira, Rick; Narasimhan, Karthik

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.01081 (cs)

[Submitted on 1 Apr 2025 (v1), last revised 8 Apr 2025 (this version, v2)]

Title:ShieldGemma 2: Robust and Tractable Image Content Moderation

Authors:Wenjun Zeng, Dana Kurniawan, Ryan Mullins, Yuchi Liu, Tamoghna Saha, Dirichi Ike-Njoku, Jindong Gu, Yiwen Song, Cai Xu, Jingjing Zhou, Aparna Joshi, Shravan Dheep, Mani Malek, Hamid Palangi, Joon Baek, Rick Pereira, Karthik Narasimhan

View PDF HTML (experimental)

Abstract:We introduce ShieldGemma 2, a 4B parameter image content moderation model built on Gemma 3. This model provides robust safety risk predictions across the following key harm categories: Sexually Explicit, Violence \& Gore, and Dangerous Content for synthetic images (e.g. output of any image generation model) and natural images (e.g. any image input to a Vision-Language Model). We evaluated on both internal and external benchmarks to demonstrate state-of-the-art performance compared to LlavaGuard \citep{helff2024llavaguard}, GPT-4o mini \citep{hurst2024gpt}, and the base Gemma 3 model \citep{gemma_2025} based on our policies. Additionally, we present a novel adversarial data generation pipeline which enables a controlled, diverse, and robust image generation. ShieldGemma 2 provides an open image moderation tool to advance multimodal safety and responsible AI development.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Image and Video Processing (eess.IV)
Cite as:	arXiv:2504.01081 [cs.CV]
	(or arXiv:2504.01081v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.01081

Submission history

From: Wenjun Zeng [view email]
[v1] Tue, 1 Apr 2025 18:00:20 UTC (5,836 KB)
[v2] Tue, 8 Apr 2025 18:38:04 UTC (5,836 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ShieldGemma 2: Robust and Tractable Image Content Moderation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ShieldGemma 2: Robust and Tractable Image Content Moderation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators