Welcome to the official website of MMU-RAG: the Massive Multi-Modal User-Centric Retrieval-Augmented Generation Benchmark. This competition invites researchers and developers to build RAG systems that perform in real-world conditions.

Update 06 Oct 2025

As we approach the competition deadline, we’d like to share a few important updates and reminders to help you prepare your final submissions.

🗓️ Submission Deadline

The submission will close on October 15 (23:59 AoE). Please make sure all materials are uploaded before the deadline.

📂 Test Dataset for Static Evaluation

For participants taking part only in the static evaluation, the test-release dataset is now available at the following links:

Please follow the submission instructions in the documentation and generate responses only for the queries in this list.

📝 Short System Paper

We ask each participating team to prepare a short paper describing your system and methodology. Please use the NeurIPS short-paper format (2–4 pages).

This write-up will serve as part of the competition record and allow others to learn from your approach.

Submission details for the paper will be shared shortly after the system submission deadline.

🗣️ Workshop & Presentations

We’re excited to announce that the MMU-RAG Workshop will take place at NeurIPS 2025 on 📅 Sunday, December 7, from 3–6 PM PDT. Selected teams will be invited to present their work during the session. If you are interested in presenting (accommodating both in-person or virtually), please indicate your interest when submitting your final materials.

Thank you again for being part of MMU-RAG! We’re looking forward to seeing your submissions and showcasing your work at NeurIPS.

Participants will tackle real-user queries, retrieve from web-scale corpora, and generate high-quality responses in both text and/or video formats.

MMU-RAG features two tracks:

Text-to-Text
Text-to-Video

Submissions are evaluated using a blend of:

Automatic metrics
LLM-as-a-judge evaluations
Real-time human feedback through our interactive RAG-Arena platform

Evaluation Methods and Metrics

Illustration of our static evaluation methods and their corresponding metrics.

Track	Evaluation Method	Evaluation Metric
Text-to-text	Automatic	Rouge-L, BERTScore
	LLM-as-a-Judge	Semantic Similarity, Coverage, Factuality, Citation Quality
	Human Likert Ratings	Semantic Similarity, Coverage, Factuality, Citation Quality
Text-to-video	Automatic	Subject Consistency, Background Consistency, Motion Smoothness, Dynamic Degree, Aesthetic Quality, Imaging Quality (from VBench)
Text-to-video	Human Likert Ratings	Relevance, Precision, Recall, Usefulness

Whether you’re advancing retrieval strategies, generation quality, or multimodal outputs, this is your opportunity to benchmark your system in a setting that reflects actual user needs.

Timeline

Aug 1: Competition launch & dataset release

Two exciting tracks, both with provided corpora, APIs, and starter codes. You are also allowed to use external resources or APIs for retrieval as long as they are clearly documented in submission.

Text-to-Text (details)	Text-to-Video (details)
Standard text-to-text RAG: Create systems that retrieve from a text corpus and generate text responses from text queries Deep Research Systems welcome! e.g. Multi-hop retrieval, Structured reasoning, Integration with external tools or knowledge bases, etc.	More novel task! Given text queries that benefit from video outputs (“how to peel banana”), Retrieve from a text corpus and generate video responses.

Aug 1 - Oct 24: ACTION REQUIRED: Register to get necessary resources

Go to Getting Started page to see:
- Our competition rules
- Instructions on registration (required)
- Detailed instructions for the two tracks

Aug 1 - Oct 24: ACTION REQUIRED: Competition Submission

Deadline extended to October 24, 2025 (23:59 AoE) for both Text-to-Text and Text-to-Video tracks.

Step-by-step instructions for the text-to-text and text-to-video tracks.
Submission options preview (applicable for both tracks):

Static Evaluation (Non-Cash Prizes)	Full System Submission (Cash Prizes)
Run your system on the public validation set Submit outputs (.jsonl or video folder) via Google Drive Eligible for honorable mentions and website features	Package your RAG system as a Docker image Submit via AWS ECR for live + static evaluation Eligible for leaderboard rankings and cash prizes

Oct 24 - Nov: Organizers Running Evaluations

Submissions will be evaluated using a blend of:
- Automatic metrics
- LLM-as-a-judge evaluations
- Real-time user feedback from our RAG-Arena

Action required: All participants are required to submit a report detailing their system, methods, and results. Top-performing and innovative teams will be invited to present their work at our associated NeurIPS 2025 workshop. Further details on the report format and submission deadlines will be announced soon.

Dec 6-7: MMU-RAG Workshop at NeurIPS 2025

Presentations by selected teams
Winners and runner(s)-up announced

Prizes

We’re excited to offer both monetary prizes and academic exposure opportunities to recognize outstanding submissions.

💰 Prize Pool

Thanks to the support of Amazon, MMU-RAG offers a $10,000 prize pool in AWS credits. Prizes will be awarded to top-performing teams across both tracks.

🎤 Present at NeurIPS

Top teams will also be invited to present their systems during the MMU-RAG competition session at NeurIPS 2025. This is a unique opportunity to share your work with the community.

🥇 Eligibility

Prize eligibility requires full system reproducibility and clear documentation of all components. Only participants in the Full System Submission option are eligible for cash prizes.

Contact Us

For any questions or clarifications, email the organizers directly at: mmu-rag@andrew.cmu.edu

Organizers

Luo Qi Chan, DSO National Laboratories / Carnegie Mellon University
Tevin Wang, Carnegie Mellon University
Shuting Wang, Renmin University of China / Carnegie Mellon University
Zhihan Zhang, Carnegie Mellon University
Alfredo Gomez, Carnegie Mellon University
Prahaladh Chandrahasan, Carnegie Mellon University
Lan Yan, Carnegie Mellon University
Andy Tang, Carnegie Mellon University
Zimeng (Chris) Qiu, Amazon AGI
Morteza Ziyadi, Amazon AGI
Sherry Wu, Carnegie Mellon University
Mona Diab, Carnegie Mellon University
Akari Asai, University of Washington
Chenyan Xiong, Carnegie Mellon University