Getting Started

Welcome to the MMU-RAG competition! This page contains all the rules, resources, and APIs you need to get started.

Rules

The goal of the competition is to create a RAG system that is robust to real-user queries. We provide starter code for a simple RAG system, comprising three components:

An embedding model
A retriever
A generation model

Data and Models

To encourage diversity in submissions, participants may use any open- or close-source data or models.

However:

Systems that include close-source and/or proprietary components will be tagged clearly on leaderboards.
All retrieval methods, generation models, corpora, and external tools or APIs must be clearly specified.
For systems with close-source components, please provide as much detail as possible for fair evaluation.
Commercial API-only deep research systems (e.g., perplexity sonar API, Open AI deep research API) are not allowed.

Component	Open Source	Close Source
Embedding / Retrieval Modules	Open-weight models: publicly available for public download and use (i.e available on HuggingFace or public GitHub repository)	Model weights are unavailable for public download and use.
Retrieval Corpus	Fixed corpora that is available for public download and use.	Search implemented using commercial and/or proprietary API, often giving access to whole-of-internet or equivalent.
Generation Modules	Open-weight models: publicly available for public download and use (i.e available on HuggingFace or public GitHub repository)	Model weights are unavailable for public download and use

Please feel free to consult the organizing team if any clarifications are required.

System Restrictions

All outputs must be model-generated.
Human-verified outputs or any form of human intervention are not allowed, to ensure full replicability of the systems.

Registration and Resources

Registration

To participate in any track, you must register your team by filling out this short form: Register Your Team.

Once a team is registered, the organizers will contact you on their registered email (preferably gmail) and will be assigning the following items.

Team ID
ECR Repository ARN
AWS ECR access keys
S3 bucket name and region
Port Number where the API needs to run
ClueWeb 22 API key request instructions (if you want API key access)

Competition Tracks

Choose your track to get started with detailed instructions, starter code, and submission guidelines:

Text-to-Text Track - Build RAG or Deep Research systems for text generation
Text-to-Video Track - Build RAG systems that retrieve relevant information and generate videos from text queries

Each track page contains:

Starter code and templates
API access details
Submission requirements
Technical specifications