MCP Server Setup and Usage
Overview
The Maven Mail MCP Server provides AI assistants with access to Apache Maven mailing list archives through the Model Context Protocol (MCP). This enables natural language queries about Maven development discussions, decisions, and technical issues.
Prerequisites
-
Docker Desktop (or Podman) with Docker Compose V2
-
Claude Code or Claude Desktop (MCP client)
-
Sufficient disk space (~1.5GB for archives + Elasticsearch data)
Quick Start
1. Start All Services
docker compose up -d
This starts:
-
Elasticsearch for storing and searching emails
-
Mail MCP server (Streamable HTTP on port 58080)
-
Scheduler for automatic hourly updates of dev@ and users@ lists
2. Initial Data Retrieval
Before using the MCP, retrieve the mailing list archives:
# Retrieve dev@ archives (July 2002-present, ~750MB)
for year in $(seq 2002 $(date +%Y)); do
for month in $(seq -w 1 12); do
docker compose exec scheduler retrieve-mbox --date ${year}-${month} --output-dir /app/data/dev || true
done
done
# Retrieve users@ archives (Nov 2002-present, ~800MB)
for year in $(seq 2002 $(date +%Y)); do
for month in $(seq -w 1 12); do
docker compose exec scheduler retrieve-mbox --date ${year}-${month} --list users@maven.apache.org --output-dir /app/data/users || true
done
done
# Index all downloaded mbox files
docker compose exec scheduler index-mbox --directory /app/data/dev/ --list dev@maven.apache.org
docker compose exec scheduler index-mbox --directory /app/data/users/ --list users@maven.apache.org
| Initial retrieval (~1.5GB total) takes time but only needs to be done once. The scheduler container automatically updates both lists' archives hourly. |
3. Verify Services
# Check services are running
docker compose ps
# Check mail-mcp health
curl http://localhost:58080/health
# Check Elasticsearch has data
curl http://localhost:59200/maven-dev/_count
curl http://localhost:59200/maven-users/_count
See Data Management for additional data management options.
Configuration
Environment Variables
Configure the server using environment variables or .env file:
# Elasticsearch connection
MAIL_MCP_ELASTICSEARCH_URL=http://localhost:59200
MAIL_MCP_ELASTICSEARCH_INDEX_PREFIX=maven
# Archive URL resolution (default: true)
# When enabled, automatically looks up Pony Mail permalink IDs
# for messages and caches them in Elasticsearch
MAIL_MCP_RESOLVE_ARCHIVE_URLS=true
# Logging
MAIL_MCP_LOG_LEVEL=INFO
Available Tools
The MCP server provides four tools for querying mailing list archives:
search_emails
Full-text search across email archives.
Parameters:
-
query(required): Search query (searches subject and body) -
list_name(optional): Mailing list to search (default: dev@maven.apache.org) -
from_date(optional): Start date filter (ISO format: YYYY-MM-DD) -
to_date(optional): End date filter (ISO format: YYYY-MM-DD) -
has_jira(optional): Filter for emails with JIRA references -
has_vote(optional): Filter for emails with votes -
size(optional): Maximum results (default: 10, max: 100)
Example queries:
"Search for emails about Maven 4.0 release" "Find discussions about dependency resolution from 2024" "Show votes about the build cache feature"
get_message
Retrieve a specific email message by Message-ID.
Parameters:
-
message_id(required): Message-ID to retrieve -
list_name(optional): Mailing list name (default: dev@maven.apache.org)
Example:
"Show me the email with ID <abc123@example.com>"
get_thread
Retrieve an entire email thread containing a specific message.
Parameters:
-
message_id(required): Message-ID of any message in the thread -
list_name(optional): Mailing list name (default: dev@maven.apache.org) -
max_messages(optional): Maximum messages to retrieve (default: 50)
Example:
"Get the full thread for message <abc123@example.com>" "Show me the discussion thread about MNG-1234"
find_references
Find emails referencing a specific JIRA issue or GitHub PR.
Parameters:
-
reference(required): Reference to search for (e.g., "MNG-1234" or "567") -
reference_type(optional): Type of reference ("jira" or "github_pr", default: "jira") -
list_name(optional): Mailing list name (default: dev@maven.apache.org) -
size(optional): Maximum results (default: 20, max: 100)
Example queries:
"Find all discussions about MNG-1234" "Show emails mentioning GitHub PR #567"
Usage Examples
Once configured in Claude Desktop, you can use natural language to query the archives:
Search Examples
"What were the main discussions about Maven 4.0 in 2024?" "Find recent decisions about the build cache" "Show me emails where people voted on the wrapper feature" "Search for discussions about dependency resolution performance"
Data and Metadata
The MCP server provides rich metadata extraction:
-
JIRA references: Automatic extraction of Maven JIRA issues (MNG-1234, etc.)
-
GitHub references: PR numbers and commit SHAs
-
Version numbers: Mentioned versions (4.0.0, 3.9.0-alpha-1, etc.)
-
Decision indicators: Keywords like "decided", "consensus", "approved"
-
Vote detection: Identifies [VOTE] threads and +1/-1 votes
-
Quote filtering: Separates quoted content from original contributions
-
Archive URLs: Direct links to emails at https://lists.apache.org (when cached)
Archive URLs
When available, search results include an Archive: field with a direct link to the email at https://lists.apache.org.
This allows viewing the original email in the Apache Pony Mail web interface.
Automatic Resolution (Default)
By default, archive URLs are resolved automatically when using the get_message tool.
When you retrieve a specific message:
-
The system checks if an archive URL is already cached
-
If not cached, it queries the Pony Mail API to look up the permalink ID
-
The result is cached in Elasticsearch for future use
-
The archive URL is displayed in the response
This behavior is controlled by the MAIL_MCP_RESOLVE_ARCHIVE_URLS environment variable (default: true).
Set to false to disable automatic resolution.
Search results (search_emails, find_references, etc.) only display cached archive URLs.
Use get_message on a specific email to trigger automatic resolution.
|
Manual Resolution
If automatic resolution is disabled, you can still populate the cache programmatically:
import asyncio
from datetime import datetime
from mail_mcp.ponymail import PonymailResolver
from mail_mcp.storage.elasticsearch import ElasticsearchClient
async def cache_archive_url():
es = ElasticsearchClient()
await es.connect()
resolver = PonymailResolver(es)
url = await resolver.resolve_url(
message_id='<message-id@example.org>',
list_name='dev@maven.apache.org',
date=datetime(2024, 10, 1), # approximate date
subject='Email subject for search'
)
print(f'Archive URL: {url}')
await es.close()
asyncio.run(cache_archive_url())
Once cached, the Archive URL will appear in all MCP tool outputs for that message.
See ADR-0003 for technical details on the URL resolution strategy.
Troubleshooting
Server Won’t Connect
Check that Docker services are running:
# Check service status
docker compose ps
# Check mail-mcp logs
docker compose logs mail-mcp
# Check health endpoint
curl http://localhost:58080/health
Common issues:
-
Docker services not running: Start with
docker compose up -d -
Port conflict: Ensure port 58080 is not in use by another application
-
Elasticsearch not healthy: Wait for Elasticsearch to fully start
No Results Found
-
Check if data is indexed:
curl http://localhost:59200/maven-dev/_count -
Verify index name: Default is
maven-devfor dev@maven.apache.org -
Index data if missing:
docker compose exec scheduler index-mbox --directory /app/data/dev/
Connection Errors
-
Verify all services are running:
docker compose ps -
Verify Elasticsearch is accessible:
curl http://localhost:59200/_cluster/health -
Check mail-mcp logs for errors:
docker compose logs --tail=50 mail-mcp
Session Errors After Server Restart
If you receive HTTP 404 errors with message "Invalid or expired session ID" after restarting the server (e.g., docker compose restart mail-mcp), this is expected behavior.
Cause: The Streamable HTTP transport uses in-memory session tracking.
When the server restarts, all session state is lost.
Clients that send a stale mcp-session-id header will receive a 404 error.
Solution: MCP clients should handle 404 responses by re-initializing their session:
-
Detect the 404 "Invalid or expired session ID" response
-
Discard the stale session ID
-
Send a new
initializerequest without themcp-session-idheader -
Continue with the new session
| This is standard MCP protocol behavior. Well-implemented clients should automatically recover from session loss. |
Performance Considerations
-
Batch size: Default 100 documents per batch for indexing
-
Search limits: Default 10 results, max 100 per query
-
Thread limits: Default 50 messages, configurable
-
Cache: Elasticsearch handles caching automatically
For large-scale deployments, see Performance Optimization.
Security Notes
-
The MCP server runs locally and connects to local Elasticsearch
-
No external network access required (after initial data retrieval)
-
All communication stays on localhost
-
No authentication currently implemented (local-only access)