A powerful web content archiving solution with advanced features for content extraction, organization, and RSS feed management. REQUIRES PYTHON 3.9
- Python 3.9
- NVIDIA GPU with CUDA (optional, for Text To Speech (TTS) acceleration)
conda create --name nameofenvironment python=3.9
After you have created the environment install the required packages.
pip install -r requirements.txt
python webpage2markdown3.py
- Launch the Program
- Enter a URL in the converter tab
- Click "Extract" or press Ctrl+E
- Preview the content
- Add tags (Ctrl+T)
- Save as markdown (Ctrl+S)
- Secure URL validation and content extraction
- Clean reader view conversion to markdown
- Real-time content preview
- Image downloading and management
- Browsing history with back/forward navigation
- Content caching for improved performance
- Support for favorite marking
- Built-in RSS feed reader with categorization
- Multiple default feed categories (Technology, Wikipedia, AI)
- Custom feed category management
- Feed validation and testing
- Feed content caching
- Easy clipping from feeds to markdown converter
- Tag-based organization system
- Hierarchical tag support
- YAML metadata format compatible with Obsidian
- Automatic metadata generation (title, source, date, tags)
- Custom save location management
- Built-in TTS support using VCTK model
- Multiple voice options
- GPU acceleration support
- Preview voice feature
- Read article functionality
- URL validation and attack pattern detection
- Secure HTTPS handling
- Input sanitization
- Domain validation
- Image content validation
- Switch to the News Reader tab
- Select a category and feed
- Click "Refresh" to load articles
- Use "Clip" to send articles to converter
- Manage feeds through the "Add Feed" button
- Organize categories in Settings
Access settings through:
- Menu bar → Settings → Preferences
- Keyboard shortcut: Ctrl+,
Configure:
- Save location
- TTS options
- Image handling
- RSS feed categories
- Feed display limits
- Ctrl+E: Extract content
- Ctrl+S: Save as markdown
- Ctrl+T: Manage tags
- Ctrl+,: Open settings
Files are saved to ~/Documents/saved_articles/
by default, with:
- Markdown files in root directory
- Images in
images/
subdirectory - Cache in
~/.webpage_converter/cache/
Files are saved with YAML frontmatter:
---
date: YYYY-MM-DD
time: HH:MM
source: URL
favorite: true/false
tags:
- #tag1
- #tag2
---
Version: 3.0.0 Last Updated: December 2024 Author: Alex Towery
- RSS feed management
- Content caching system
- TTS integration
- History navigation
- Tag system improvements
- Obsidian-compatible metadata
- Enhanced MIME-type validation
- Nested tag hierarchies
- Advanced feed filtering
- Offline reading mode
- Custom CSS support
Copyright (c) 2024 Alex Towery All rights reserved.
This software is provided "as is" without warranty. See LICENSE for full details.
- PyQt6: GPL v3
- requests: Apache 2.0
- readability-lxml: Apache 2.0
- html2text: GPL v3
- PyQt6-WebEngine: GPL v3
- Beautiful Soup 4: MIT
- Pillow: HPND
- TTS: MIT
For issues and feature requests, please contact the author or submit through the project's issue tracker.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Please ensure your contributions follow the existing code style and include appropriate tests.
-
3.0.0 (December 2024)
- Added RSS feed reader with category management
- Implemented content caching system
- Added TTS integration with GPU acceleration
- Added browsing history with navigation
- Enhanced tag system with Obsidian compatibility
- Added favorite marking feature
- Improved feed validation and management
-
2.0.0 (June 2024)
- Added tag-based organization system
- Implemented image handling and storage
- Added YAML metadata support
- Enhanced security features
- Improved URL validation
-
1.0.0 (January 2024)
- Initial release
- Basic webpage to markdown conversion
- Preview functionality
- Simple save system
- Modern, clean interface design
- URL input with navigation controls
- Live markdown preview
- Tag management system
- Favorite marking option
- TTS controls
- Category and feed management
- Article previews with clipping
- Feed validation tools
- Custom category organization
- TTS configuration with voice selection
- Save location management
- Image handling options
- RSS feed category management
- Cache configuration
- Enhanced tag organization
- Multiple tag selection
- Tag validation rules
- Obsidian-compatible format
- Category creation and editing
- Feed validation tools
- Custom feed organization
- Easy feed addition
Note: Screenshots are updated with each major release to reflect current functionality.
Special thanks to:
- The PyQt team for the robust GUI framework
- The TTS project for text-to-speech capabilities
- The Readability project for content extraction
- The Beautiful Soup team for HTML parsing
- The broader open-source community for their invaluable contributions
- All users who have provided feedback and suggestions
Alex Towery
- GitHub: [profile-link]
- Email: [contact-email]
- Website: [website-url]