MMU-RAG


NeurIPS 2025 Competition

Welcome to the official website of MMU-RAG: the Massive Multi-Modal User-Centric Retrieval-Augmented Generation Benchmark. This competition invites researchers and developers to build RAG systems that perform in real-world conditions.


Update 06 Oct 2025

As we approach the competition deadline, we’d like to share a few important updates and reminders to help you prepare your final submissions.

🗓️ Submission Deadline

The submission will close on October 15 (23:59 AoE). Please make sure all materials are uploaded before the deadline.

📂 Test Dataset for Static Evaluation

For participants taking part only in the static evaluation, the test-release dataset is now available at the following links:

  1. Text-to-Text Test Set (For static evaluation)
  2. Text-to-Video Test Set (For static evaluation)

Please follow the submission instructions in the documentation and generate responses only for the queries in this list.

📝 Short System Paper

We ask each participating team to prepare a short paper describing your system and methodology. Please use the NeurIPS short-paper format (2–4 pages).

This write-up will serve as part of the competition record and allow others to learn from your approach.

Submission details for the paper will be shared shortly after the system submission deadline.

🗣️ Workshop & Presentations

We’re excited to announce that the MMU-RAG Workshop will take place at NeurIPS 2025 on 📅 Sunday, December 7, from 3–6 PM PDT. Selected teams will be invited to present their work during the session. If you are interested in presenting (accommodating both in-person or virtually), please indicate your interest when submitting your final materials.

Thank you again for being part of MMU-RAG! We’re looking forward to seeing your submissions and showcasing your work at NeurIPS.


Participants will tackle real-user queries, retrieve from web-scale corpora, and generate high-quality responses in both text and/or video formats.

MMU-RAG features two tracks:

  1. Text-to-Text
  2. Text-to-Video

Submissions are evaluated using a blend of:

  • Automatic metrics
  • LLM-as-a-judge evaluations
  • Real-time human feedback through our interactive RAG-Arena platform

Evaluation Methods and Metrics

Illustration of our static evaluation methods and their corresponding metrics.

Track Evaluation Method Evaluation Metric
Text-to-text Automatic Rouge-L, BERTScore
LLM-as-a-Judge Semantic Similarity, Coverage, Factuality, Citation Quality
Human Likert Ratings
Text-to-video Automatic Subject Consistency, Background Consistency, Motion Smoothness, Dynamic Degree, Aesthetic Quality, Imaging Quality (from VBench)
Human Likert Ratings Relevance, Precision, Recall, Usefulness

Whether you’re advancing retrieval strategies, generation quality, or multimodal outputs, this is your opportunity to benchmark your system in a setting that reflects actual user needs.


Timeline

Aug 1: Competition launch & dataset release

Two exciting tracks, both with provided corpora, APIs, and starter codes. You are also allowed to use external resources or APIs for retrieval as long as they are clearly documented in submission.

Text-to-Text (details) Text-to-Video (details)
Standard text-to-text RAG: Create systems that retrieve from a text corpus and generate text responses from text queries

Deep Research Systems welcome! e.g. Multi-hop retrieval, Structured reasoning, Integration with external tools or knowledge bases, etc.
More novel task! Given text queries that benefit from video outputs (“how to peel banana”), Retrieve from a text corpus and generate video responses.

Aug 1 - Oct 24: ACTION REQUIRED: Register to get necessary resources

  • Go to Getting Started page to see:
    • Our competition rules
    • Instructions on registration (required)
    • Detailed instructions for the two tracks

Aug 1 - Oct 24: ACTION REQUIRED: Competition Submission

Deadline extended to October 24, 2025 (23:59 AoE) for both Text-to-Text and Text-to-Video tracks.

  • Step-by-step instructions for the text-to-text and text-to-video tracks.
  • Submission options preview (applicable for both tracks):
Static Evaluation (Non-Cash Prizes) Full System Submission (Cash Prizes)
Run your system on the public validation set

Submit outputs (.jsonl or video folder) via Google Drive

Eligible for honorable mentions and website features
Package your RAG system as a Docker image

Submit via AWS ECR for live + static evaluation

Eligible for leaderboard rankings and cash prizes

Oct 24 - Nov: Organizers Running Evaluations

  • Submissions will be evaluated using a blend of:
    • Automatic metrics
    • LLM-as-a-judge evaluations
    • Real-time user feedback from our RAG-Arena

Action required: All participants are required to submit a report detailing their system, methods, and results. Top-performing and innovative teams will be invited to present their work at our associated NeurIPS 2025 workshop. Further details on the report format and submission deadlines will be announced soon.

Dec 6-7: MMU-RAG Workshop at NeurIPS 2025

  • Presentations by selected teams
  • Winners and runner(s)-up announced

Prizes

We’re excited to offer both monetary prizes and academic exposure opportunities to recognize outstanding submissions.

💰 Prize Pool

Thanks to the support of Amazon, MMU-RAG offers a $10,000 prize pool in AWS credits. Prizes will be awarded to top-performing teams across both tracks.

🎤 Present at NeurIPS

Top teams will also be invited to present their systems during the MMU-RAG competition session at NeurIPS 2025. This is a unique opportunity to share your work with the community.

🥇 Eligibility

Prize eligibility requires full system reproducibility and clear documentation of all components. Only participants in the Full System Submission option are eligible for cash prizes.


Contact Us

For any questions or clarifications, email the organizers directly at: mmu-rag@andrew.cmu.edu

Organizers

  • Luo Qi Chan, DSO National Laboratories / Carnegie Mellon University
  • Tevin Wang, Carnegie Mellon University
  • Shuting Wang, Renmin University of China / Carnegie Mellon University
  • Zhihan Zhang, Carnegie Mellon University
  • Alfredo Gomez, Carnegie Mellon University
  • Prahaladh Chandrahasan, Carnegie Mellon University
  • Lan Yan, Carnegie Mellon University
  • Andy Tang, Carnegie Mellon University
  • Zimeng (Chris) Qiu, Amazon AGI
  • Morteza Ziyadi, Amazon AGI
  • Sherry Wu, Carnegie Mellon University
  • Mona Diab, Carnegie Mellon University
  • Akari Asai, University of Washington
  • Chenyan Xiong, Carnegie Mellon University