Roadmap and Implementation Status
Current Status
🚧 Early Development Phase
The project currently provides data retrieval infrastructure and is building toward a full MCP server implementation.
Completed Features
Infrastructure
✅ Python project structure with Poetry ✅ Docker Compose configuration with Elasticsearch ✅ Configuration management with environment variables ✅ Structured logging with structlog
Data Retrieval
✅ CLI tool for downloading mbox files (retrieve-mbox)
✅ Support for any Apache mailing list
✅ Proper error handling and atomic file writes
Recently Completed
Metadata Extraction ✅
✅ JIRA references extraction (MNG-1234, MRESOLVER-567, etc.) ✅ GitHub references (PR numbers, commit SHAs) ✅ Version number detection ✅ Decision indicators (votes, keywords like "decided", "consensus") ✅ Quote detection and filtering ✅ Effective body extraction (quotes removed)
Indexing Pipeline ✅
✅ Bulk indexing with progress tracking ✅ CLI tool for indexing single files or directories ✅ Configurable batch sizes ✅ Error recovery and statistics
MCP Server Implementation ✅
✅ FastMCP-based server with stdio transport
✅ Four MCP tools implemented:
* search_emails - Full-text search with filters
* get_message - Retrieve message by ID
* get_thread - Reconstruct email threads
* find_references - Find JIRA/GitHub references
✅ Claude Desktop integration ready
✅ Comprehensive documentation
Planned Features
Phase 1: Enhanced Threading
- Improved thread reconstruction
-
-
Build hierarchical thread trees from References/In-Reply-To headers
-
Handle missing messages gracefully
-
Fallback to subject-based threading
-
Thread visualization and navigation
-
Phase 2: Additional MCP Tools
- Enhanced search capabilities
-
-
get_decisions- Dedicated tool for finding decision-related discussions -
search_by_contributor- Find emails by specific contributors -
get_statistics- Get mailing list statistics and trends
-
- Resource definitions
-
-
Message resources for direct access
-
Thread resources for conversation contexts
-
List resources for mailing list metadata
-
Phase 3: Enhanced Search
- Advanced features
-
-
Temporal queries (date ranges, trends over time)
-
Aggregations (top contributors, active periods)
-
Similarity search (find related discussions)
-
Cross-list search (search multiple mailing lists)
-
- Quote handling
-
-
Separate storage of quoted vs. new content
-
Quote percentage calculation
-
Focus search on original content only
-
Phase 4: Additional Features
- Performance optimization
-
-
Caching layer for frequently accessed data
-
Pagination for large result sets
-
Query optimization
-
- Data enrichment
-
-
Link emails to Jira issues (external API)
-
Link emails to GitHub PRs (external API)
-
Contributor identity resolution
-
- Monitoring
-
-
Metrics export (Prometheus)
-
Health check endpoints
-
Performance dashboards
-
Implementation Phases (from ADR-0002)
Phase 1: Core Infrastructure ✅
-
✅ Set up Python project structure with poetry
-
✅ Implement Elasticsearch client wrapper
-
✅ Define index schema and mappings
-
✅ Basic mbox/email parsing
Phase 2: Data Ingestion ✅
-
✅ Implement full mbox parser
-
✅ Metadata extraction (decision indicators, references)
-
✅ Quote detection and filtering
-
⏳ Advanced thread reconstruction logic
-
✅ Bulk indexing to Elasticsearch
Phase 3: MCP Server ✅
-
✅ Implement MCP server with official SDK (FastMCP)
-
✅ Define and implement MCP tools
-
✅ Tool parameter validation
-
✅ Error handling and logging
Open Questions
From ADRs and development process:
-
Vector search: When to implement semantic search capabilities?
-
Multi-list strategy: How to handle cross-list queries efficiently?
-
Thread fallback: Subject-based threading when headers are missing?
-
Update strategy: Pull-based (periodic) vs. push-based (webhooks)?
-
Citation handling: How aggressive to filter quoted content?
-
Cross-posting: Deduplicate messages sent to multiple lists?
See Architecture Decision Records for detailed discussions.
Contributing
Want to help implement features? See Development Guide for setup and contribution guidelines.
Priority areas for contribution:
-
Metadata extraction implementations
-
Test coverage
-
Documentation improvements
-
Performance optimization
-
Additional MCP tools