Text-to-Text Track

Welcome to the MMU-RAG competition! This page contains everything you might need to start building and submitting your system for the text-to-text track. This track accepts both RAG and/or Deep Research systems.

Resources

Once a team is registered the organizers will contact you on their registered email (preferably gmail) and will be assigning the following items.

Team ID
ECR Repository ARN
AWS ECR access keys
Port Number where the API needs to run
Clueweb 22 API key (if requested)
- Participants can request the Clueweb 22 API key later in the competition too!

Validation Set

To support the development and debugging of your models, we are releasing a validation set for the Text-to-Text track:

Text-to-Text Validation Set

The validation set consists of a small number of example queries and their gold text references. This set is meant to help teams test their pipelines and ensure compatibility with our evaluation format before final submission.

Note: Validation sets are not used in the final evaluation and are safe to use for model tuning and format debugging.

Each line in the validation set is a JSON object with the following fields:

query (string): The user’s information-seeking question.
reference (string): A gold reference response that accurately and completely answers the query. This can be used for tuning or evaluation with automatic metrics such as ROUGE or BERTScore.
iid (string): A unique instance identifier. This can be used to track and match outputs in your system.

Starter code

We provide a modular starter code template to help you build your RAG system efficiently. The codebase is structured with separate components for each stage of the pipeline, making it easy to experiment and iterate.

Starter Code GitHub Repository: https://github.com/AGI-LTI/MMU-RAG-Starter

The starter code provides a complete RAG pipeline framework with the following modular components:

1. Pipeline Orchestrator (`pipeline.py`)

Main entry point that coordinates all RAG components
Handles configuration loading and pipeline execution flow
Manages the complete query-to-answer workflow

2. Data Processing Pipeline

Loader (loader.py): Load documents from various file formats using load_corpus(path)
Cleaner (cleaner.py): Preprocess and normalize text content using clean_text()
Tokenizer (tokenizer.py): Convert text to tokens using HuggingFace models
Chunker (chunker.py): Split documents into overlapping chunks using chunk_tokens()
Indexer (indexer.py): Build FAISS vector index for semantic search using build_index()

3. Query Processing Components

Retriever (retriever.py): Search the index and retrieve relevant chunks using retrieve()
Generator (generator.py): Generate answers using retrieved context and language models

4. Testing & Validation

Local Test Runner (local_test.py): Comprehensive test runner to validate both /run and /evaluate endpoints

FineWeb Search API

Base URL:https://clueweb22.us/fineweb/search

Authentication

All requests must include an API key:

x-api-key: <YOUR_FINEWEB_API_KEY>

Your API key will be sent to you after team registration.

HTTP Request

GET https://clueweb22.us/fineweb/search

Query Parameters:

Name	Type	Required	Description
`query`	string	yes	The search query string
`k`	integer	yes	Number of documents to return

Response Format:

{
  "results": [Base64-encoded JSON documents]
}

ClueWeb-22 Search API Access

Base URL:https://clueweb22.us/search

Authentication

All requests must include an API key:

x-api-key: <YOUR_RETRIEVER_API_KEY>

Your API key will be sent to you after your ClueWeb application is approved.

HTTP Request

GET https://clueweb22.us/search

Query Parameters:

Name	Type	Required	Description
`query`	string	yes	The search query string
`k`	integer	yes	Number of documents to return
`cw22_a`	boolean	no	Use ClueWeb22-A instead of default ClueWeb22-B

Submission Requirements and Formats

We require your submission to fulfill the following requirements for us to perform static and dynamic evaluation:

Implement a static /evaluate endpoint
Implement a specific /run streaming API
Submit a Docker file

Detailed instructions are provided below.

Static `/evaluate` endpoint

Your system must implement a static /evaluate endpoint that accepts validation queries and returns responses. This will be used to generate a .jsonl submission file.

Required Endpoint

POST /evaluate
Content-Type: application/json

Request Format

{
  "query": "string",
  "iid": "string"
}

Response Format

{
  "query_id": "string", // same as iid from the request
  "generated_response": "string" // your system's generated answer
}

Streaming API

Your system must implement a specific streaming API that follows our standardized response format. This is for us to integrate your system into our RAG-Arena live evaluation system.

⚠️ Important Note: The live RAG-Arena can be accessed by multiple users simultaneously. Please ensure your submission is designed to handle at least 10-15 concurrent requests efficiently and reliably.

Required Endpoint

Your service must expose the following endpoint:

POST /run

Request Format

Content-Type: application/json

Request Body

{
  "question": "string"
}

question (required, string): The research question/query from the user

Example Request

{
  "question": "What are the latest developments in quantum computing?"
}

Response Format

Content-Type: text/event-stream (preferred) or text/plain
Response Structure: Server-Sent Events (SSE) format where each line starts with data: followed by JSON:

data: {"intermediate_steps": "...", "final_report": "...", "is_intermediate": true, "complete": false}
data: {"intermediate_steps": "...", "final_report": "...", "is_intermediate": false, "complete": true}

Required JSON Response Fields

Each JSON object in the stream must contain these fields:

Core Fields (Required)

Field	Type	Description
`intermediate_steps`	string \| null	The thinking/reasoning process, search queries, retrieved information, etc. Use `"\|\|\|---\|\|\|"` as separator between steps
`final_report`	string \| null	The final answer content being generated
`is_intermediate`	boolean	`true` when showing thinking process, `false` when generating final answer
`complete`	boolean	`true` on the final message to signal completion

Optional Fields

Field	Type	Description
`citations`	array	List of citation URLs (see format below)
`error`	string	Error message if something goes wrong (stops the stream)

Citation Format (Optional)

Citation is optional. Citations are displayed in the frontend as numbered clickable links: [1], [2], [3], etc. The numbering is automatic based on array order.

Format: Array of URL strings

{
  "citations": [
    "https://example.com/article1",
    "https://example.com/article2"
  ]
}

Note: Citations always appear as [1], [2], [3] regardless of URL content. Each number is a clickable link to the corresponding URL.

Streaming Response Pattern

Your service should follow this behavioral pattern:

1. Thinking Phase

Start with is_intermediate: true
Populate intermediate_steps with research process
Set final_report: null
Set complete: false

2. Answer Generation Phase

Switch to is_intermediate: false
Start populating final_report with answer content
Keep accumulated intermediate_steps
Set complete: false

3. Completion

Send final message with complete: true
Include final complete answer in final_report
Include citations if available

Error Handling

If your service encounters an error, send an error response and stop the stream:

{
  "error": "Description of what went wrong",
  "complete": true
}

3. Dockerizing Your System

Your service must be containerized for deployment. Create a Dockerfile in your service directory.

A basic Dockerfile template is listed below:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

# FastAPI (ASGI)
CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

Testing Your Implementation

You can test your service independently by sending POST requests to /run:

curl -X POST "http://your-service-url/run" \
  -H "Content-Type: application/json" \
  -d '{"question": "Test question"}'

Verify that:

Response streams in the correct SSE format
All required fields are present
is_intermediate transitions from true to false
Final message has complete: true
Intermediate steps use |||---||| separators

Notes

The main application handles user session management, database logging, and frontend integration
Your service only needs to focus on generating high-quality research responses
The system supports both streaming and non-streaming implementations, but streaming is preferred for better user experience

Submission Options:

Static Submission Guidelines - Option 1: Static evaluation on validation set (non-cash prizes)
Full System Submission Guidelines - Option 2: Complete system submission (main competition, cash prizes)

Resources

Validation Set

Starter code

1. Pipeline Orchestrator (pipeline.py)

2. Data Processing Pipeline

3. Query Processing Components

4. Testing & Validation

FineWeb Search API

Authentication

HTTP Request

ClueWeb-22 Search API Access

Authentication

HTTP Request

Submission Requirements and Formats

Static /evaluate endpoint

Required Endpoint

Request Format

Response Format

Streaming API

Required Endpoint

Request Format

Request Body

Example Request

Response Format

Required JSON Response Fields

Core Fields (Required)

Optional Fields

Citation Format (Optional)

Streaming Response Pattern

1. Thinking Phase

2. Answer Generation Phase

3. Completion

Error Handling

3. Dockerizing Your System

Testing Your Implementation

Notes

1. Pipeline Orchestrator (`pipeline.py`)

Static `/evaluate` endpoint