Skip to content

life-itself/markdown-to-wordpress

Repository files navigation

Markdown to WordPress Migration Tool

A clean, secure, and modular ETL pipeline for migrating markdown content to WordPress. Features comprehensive testing, security best practices, and step-by-step processing.

✨ Features

  • 🔐 Secure: Tokens in .env, never committed to git
  • 🧪 Well-Tested: Comprehensive unit tests for all components
  • 📦 Modular: ETL pipeline with isolated, testable steps
  • 🔄 Resumable: Each step can be run independently
  • 📊 Detailed Reports: Complete analysis and progress tracking

🚀 Quick Start

1. Setup Environment

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

2. Configure WordPress.com

Get your OAuth token (see WORDPRESS_SETUP.md) and edit .env:

WORDPRESS_SITE_DOMAIN=yoursite.wordpress.com
WORDPRESS_OAUTH_TOKEN=your-token-here

3. Test Connection

python etl/test_connection.py

4. Run Migration

# Run all implemented steps
python etl/run_pipeline.py --input path/to/your/content

# Run specific step
python etl/run_pipeline.py --steps 0-image-processing

# Test a step
python etl/run_pipeline.py --test 0-image-processing

📁 ETL Pipeline Structure

Each step is isolated with clear inputs and outputs:

etl/
├── 0-image-processing/  ✅ COMPLETE
├── 1-prepare-content/   ✅ COMPLETE
├── 2-decide-mappings/   🚧 To Do
├── 3-upload-media/      ✅ COMPLETE
├── 4-rewrite-links/     🚧 To Do
├── 5-create-content/    ✅ COMPLETE
└── 6-verify/           🚧 To Do

✅ Step 0: Image Processing

Purpose: Analyze images and create SEO-friendly rename dictionary

Features:

  • Scans all markdown files for image references
  • Creates SEO-friendly filenames: post-slug-feature.jpg, post-slug-01.jpg
  • Detects orphaned images not used anywhere
  • Handles featured images vs inline images
  • Generates comprehensive usage reports

CLI Commands:

# Basic usage
cd etl/0-image-processing
python main.py --input-dir /path/to/content --output-dir output

# Full migration example
python main.py --input-dir ../../lifeitself.org/content --output-dir ./full-migration-output

# Run tests
python -m pytest main_test.py -v

Output:

  • image_rename_dict.json - Mapping of old to new filenames
  • image_usage_report.json - Detailed usage analysis
  • orphaned_images.json - List of unused images

✅ Step 1: Prepare Content

Purpose: Convert markdown files to WordPress-ready JSON format

Features:

  • Extracts YAML frontmatter and markdown content
  • Converts markdown to HTML
  • Processes wiki-style links [[]]
  • Updates image references using rename dictionary
  • Generates WordPress-compatible data structure

CLI Commands:

# Basic usage
cd etl/1-prepare-content
python main.py --input-dir /path/to/content --output-dir output --rename-dict ../0-image-processing/output/image_rename_dict.json

# Full migration example
python main.py --input-dir ../../lifeitself.org/content --output-dir ./full-migration-output --rename-dict ../0-image-processing/full-migration-output/image_rename_dict.json

# Run tests
python -m pytest main_test.py -v

Output:

  • prepared_content.json - All posts ready for WordPress

✅ Step 3: Upload Media

Purpose: Upload images to WordPress Media Library

Features:

  • Uses WordPress REST API with authentication
  • Uploads images with SEO-friendly names
  • Handles deduplication (checks if media already exists)
  • Implements rate limiting and error handling
  • Creates mapping of local paths to WordPress media IDs/URLs

CLI Commands:

# Basic usage
cd etl/3-upload-media
python main.py --input-dir /path/to/content --output-dir output --rename-dict ../0-image-processing/output/image_rename_dict.json

# Full migration example
python main.py --input-dir ../../lifeitself.org/content --output-dir ./full-migration-output --rename-dict ../0-image-processing/full-migration-output/image_rename_dict.json

# Run tests
python -m pytest main_test.py -v

Output:

  • media_upload_map.json - WordPress media IDs and URLs
  • upload_errors.json - Any failed uploads

✅ Step 5: Create Content

Purpose: Create WordPress posts/pages from prepared content

Features:

  • Creates posts with proper metadata
  • Sets featured images using media IDs
  • Assigns categories and tags
  • Handles idempotent updates (won't duplicate)
  • Supports both posts and pages

CLI Commands:

# Basic usage
cd etl/5-create-content
python main.py --input-dir /path/to/content --output-dir output --content-file ../1-prepare-content/output/prepared_content.json --media-map ../3-upload-media/output/media_upload_map.json

# Full batch migration (recommended for large sites)
python batch_create.py --content-file ../1-prepare-content/full-migration-output/prepared_content.json --media-map ../3-upload-media/full-migration-output/media_upload_map.json --output-dir ./full-migration-output

# Custom batch size
python batch_create.py --batch-size 20

# Run tests
python -m pytest main_test.py -v

Output:

  • content_creation_results.json - WordPress post IDs and URLs
  • creation_errors.json - Any failed posts

🧪 Testing

All components have comprehensive unit tests:

# Test specific step
python etl/run_pipeline.py --test 0-image-processing

# Test from step directory
cd etl/0-image-processing && python -m pytest tests/ -v

🔐 Security Features

  • No hardcoded credentials - everything in .env
  • Comprehensive .gitignore - prevents token leaks
  • Token validation - connection testing before migration
  • Safe defaults - all posts created as drafts

📊 Sample Results

With lifeitself.org content:

  • Images analyzed: Detects featured vs inline images
  • Rename mapping: hero-image.jpgmy-blog-post-feature.jpg
  • Orphaned detection: Finds unused image files
  • Processing time: < 1 second for hundreds of files
  • Test coverage: 100% (13/13 tests passing)

🔧 Development

Adding New Steps

  1. Create step directory: etl/N-step-name/
  2. Add main.py with processing logic
  3. Create tests/test_*.py with comprehensive tests
  4. Document inputs/outputs in README.md
  5. Update pipeline runner

Project Philosophy

  • Security First: Never commit sensitive data
  • Test Everything: No untested code
  • Clear Separation: Each step has single responsibility
  • Real Data: Test with actual content from lifeitself.org
  • Easy Setup: One-command development environment

📚 Documentation

Core Documentation

Migration Reports

Migration Steps

🎯 Next Steps

  1. Implement Step 1: Content preparation and frontmatter standardization
  2. Add Integration Tests: End-to-end pipeline testing
  3. WordPress Upload: Media and content creation steps
  4. Link Rewriting: Internal link resolution
  5. Verification: Migration success validation

Ready to migrate? Start with the WordPress Setup Guide 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •