An enhanced fork of Emergence AI's emergence_simple_fast repository, featuring adaptive retrieval with MMR + Top-P to replace traditional fixed Top-K approaches. This implementation provides more adaptive, chunk-size and dataset-independent retrieval for long-term memory evaluation across multiple benchmark datasets.
- MMR + Top-P Adaptive Retrieval: Replaces fixed Top-K with Maximum Marginal Relevance (Ξ»=0.65) + Top-P nucleus sampling (p=0.7)
- Token-aware Selection: Intelligent token budgeting with 15kβ10k token limits across pipeline stages
- Multi-stage Filtering: Initial retrieval β MMR diversity β Top-P quality refinement
- LongMemEval (LME): Original factual memory evaluation dataset
- MSC: Multiple-choice Situation Comprehension
- LoCoMo: Long Conversation Memory benchmarks
- Comprehensive retrieval statistics and efficiency analysis
- Multi-panel visualizations showing token usage, K-distribution, and accuracy patterns
- Question-type performance breakdown with statistical insights
Our adaptive MMR + Top-P approach achieves strong performance across all datasets:
- MSC: 95.4% accuracy (vs 94% baseline)
- LoCoMo: 60.4% accuracy (vs 51.2% baseline)
- LongMemEval: 77.4% accuracy (vs 76.8% baseline)
The pipeline progressively refines from 100 initial candidates β 80 diverse items (MMR) β 60 high-quality items (Top-P) β 50 final selections, optimizing both relevance and diversity while respecting token constraints.
Comprehensive analysis showing:
- Token efficiency: 50-80% reduction from initial to final retrieval
- K-distribution: Adaptive selection vs fixed Top-K
- Question-type performance: Accuracy patterns across different memory tasks
- Efficiency vs Accuracy: Correlation between retrieval optimization and performance
- Original approach (
main_original.py): ~$1.40 USD for full LongMemEval dataset - Enhanced approach (
main.py): ~$1.99 USD for full LongMemEval dataset
The 42% cost increase reflects the more sophisticated multi-stage retrieval process and comprehensive analysis features. Consider using --num_samples for testing or budget-conscious runs.
git clone https://github.com/[your-username]/emergence_simple_fast.git
cd emergence_simple_fast
pip install -r requirements.txtexport OPENAI_API_KEY=your-api-key-hereEnhanced Adaptive Retrieval (Recommended):
# LongMemEval with adaptive MMR + Top-P and stratified sampling
python main.py lme --adaptive-k --num_samples 50 --stratified
# MSC with adaptive retrieval
python main.py msc --adaptive-k --num_samples 20
# LoCoMo with standard fixed Top-K and stratified sampling
python main.py locomo --num_samples 15 --stratifiedOriginal Method (for comparison):
python main_original.py# Fixed Top-K (original approach)
python main.py lme --num_samples 100
# Adaptive MMR + Top-P (enhanced approach)
python main.py lme --adaptive-k --num_samples 100# LME with question type filtering
python main.py lme --question-types single-session-user multi-session --adaptive-k
# Full dataset processing (cost: ~$2 USD)
python main.py lme --load-all --adaptive-k
# Resume interrupted runs
python main.py lme --resume results/results_lme_20240801_142037.json# Detailed logging and analysis
python main.py lme --adaptive-k --log_level DEBUG --num_samples 20
# Custom output directory
python main.py lme --adaptive-k --output_dir my_results/- Embedding Generation:
all-MiniLM-L6-v2sentence transformer - Adaptive Retrieval: MMR-based diversity + Top-P quality filtering
- Fact Extraction: GPT-4o-mini structured fact extraction
- Multiple Choice QA: Fact + context integration for answer selection
Initial Semantic Search (100 candidates)
β
MMR Diversity Filtering (Ξ»=0.65, 15k tokens)
β
Top-P Quality Refinement (p=0.7, 10k tokens)
β
Final Context Selection (~50 items)
- LME: 42 default Top-K, supports question type filtering
- MSC: 15 default Top-K, conversation comprehension tasks
- LoCoMo: 50 default Top-K, long conversation memory
- Token usage efficiency across pipeline stages
- K-value distribution patterns
- Question-type performance breakdown
- Accuracy vs retrieval efficiency correlation
- Per-question retrieval statistics
- Pipeline constraint analysis (token limits vs thresholds)
- Timing and cost tracking
- Resumable processing with intermediate saves
Core Dependencies:
sentence-transformers: Embedding generationopenai: GPT API integrationdatasets: HuggingFace dataset loadingpandas,seaborn,matplotlib: Analysis and visualizationtorch: PyTorch backend for embeddings
Full requirements: See requirements.txt for complete dependency list.
Results are automatically organized in the results/ directory:
results/
βββ results_lme_20240801_142037.json # Question results & accuracy
βββ timing_lme_20240801_142037.json # Performance timing data
βββ k_distribution_lme_20240801_142037.json # Retrieval statistics
βββ retrieval_analysis_lme_20240801_142037.png # Visualization dashboard
βββ run_lme_20240801_142037.log # Detailed execution logs
- Diversity: MMR prevents redundant context selection
- Quality: Top-P ensures high-relevance final candidates
- Adaptivity: Token-aware selection adapts to content density
- Robustness: Less sensitive to dataset-specific chunk sizes
- Multiple Choice: Direct answer selection (no AI judge needed)
- Cross-dataset: Unified interface across LME, MSC, and LoCoMo
- Comprehensive Metrics: Accuracy, efficiency, and token utilization analysis
This work builds upon:
This project maintains the same license as the original emergence_simple_fast repository.



