My-researcher Web Research Tool

A Python tool that acts as a "human in the loop" agent for AI Chats to gather relevant information from the internet for research purposes. This tool uses Google Search and Claude Haiku to intelligently search for, evaluate, and collect relevant sources based on your research questions.

Features

Intelligent search query generation using Claude Haiku
Automatic evaluation of source relevance
Iterative searching based on initial findings
Support for HTML and PDF content extraction
Comprehensive research summary generation
Export of findings in a format ready for Claude Web

Project Structure

web_research_tool/
├── __init__.py          # Package initialization
├── main.py              # Command-line interface
├── web_research_tool.py # Main orchestration class
├── config.py            # Configuration handling
├── models.py            # Data models (SearchQuery, Source)
├── search.py            # Google search functionality
├── content_extraction.py # Content extraction from URLs
├── source_evaluation.py # Source relevance evaluation
├── query_generation.py  # Initial and follow-up query generation
├── summarization.py     # Result summarization functionality
└── output.py            # Output formatting and file handling

Installation

Prerequisites

Python 3.7+
Google Custom Search API key and Search Engine ID
Anthropic API key (for Claude Haiku)

Setup

Clone this repository:

git clone https://github.com/galfthan/my-researcher
cd my-researcher

Install required dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

# On Windows
set GOOGLE_API_KEY=your_google_api_key
set GOOGLE_CSE_ID=your_google_cse_id
set ANTHROPIC_API_KEY=your_anthropic_api_key

# On Linux/macOS
export GOOGLE_API_KEY=your_google_api_key
export GOOGLE_CSE_ID=your_google_cse_id
export ANTHROPIC_API_KEY=your_anthropic_api_key

Google Custom Search Setup

Create a Google Custom Search Engine at https://programmablesearchengine.google.com/
Enable "Search the entire web" option
Get your CSE ID from the setup page
Create a Google Cloud project and enable the Custom Search API
Create an API key for your project

Usage

Command Line Interface

Run the tool with a YAML file containing your research request:

 python -m web_research_tool.main --input research_request.yaml --output research_results

Or provide the YAML through standard input:

python -m web_research_tool.main

Then paste your YAML and press Ctrl+D (or Ctrl+Z on Windows) to finish input.

YAML Research Request Format

Your research request should be structured like this:

topic: "The main research topic or question"
objective: "The specific objective of this research"
depth: "shallow | moderate | deep"  # How deep should the research go
format: "academic | casual | technical"  # What style of sources to prioritize
time_sensitivity: "recent | any"  # Whether to prioritize recent sources
domains:  # Specific domains to focus on or exclude
  include:
    - "example.com"
    - "trusteddomain.org"
  exclude:
    - "untrustedsource.com"
context: "Additional context about the research question"
expected_output: "What specifically you want to know or learn"

Workflow

Start a conversation with an AI chat service and discuss your research topic
Ask it to help you craft a YAML research request based on your discussion
Run the research tool with the YAML request
Upload the research summary and source files to the conversation
Continue your discussion, now informed by the research results

Output

The tool generates:

A research summary in Markdown format
Text files containing the extracted content from each source
A directory structure that includes all findings

Limitations

The tool is limited by Google's Search API quota (100 queries per day for free tier)
PDF extraction may not work perfectly for all PDFs, especially scanned documents

Troubleshooting

API Key Issues: Ensure all environment variables are set correctly
PDF Extraction Errors: Try installing additional dependencies: pip install PyMuPDF
Rate Limiting: If you encounter rate limits, add a delay parameter: --delay 5

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
web_research_tool		web_research_tool
LICENSE		LICENSE
config-helper.py		config-helper.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My-researcher Web Research Tool

Features

Project Structure

Installation

Prerequisites

Setup

Google Custom Search Setup

Usage

Command Line Interface

YAML Research Request Format

Workflow

Output

Limitations

Troubleshooting

Contributing

License

About

Releases

Packages

Languages

License

galfthan/my-researcher

Folders and files

Latest commit

History

Repository files navigation

My-researcher Web Research Tool

Features

Project Structure

Installation

Prerequisites

Setup

Google Custom Search Setup

Usage

Command Line Interface

YAML Research Request Format

Workflow

Output

Limitations

Troubleshooting

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages