store crawled research results in a folder, log research topic/follow-up questions+learnings in output file #105

L3Gaunt · 2025-02-20T04:13:23Z

This PR implements storing the homepages accessed in a downloaded-urls/ subdirectory, which allows a user to validate the report by looking at the sources and building a knowledge base that can be consulted for more detail. Filenames derive from URLs sanitized with sanitize-url, plus a timestamp recording when files were accessed. Files contain the title, description, URL, accessed-at timestamp, and markdown content from firecrawl.

In the near future, I want to implement storing the log of queries, research and learnings as well, so that a user can judge the quality of the research process for themselves and give feedback.

…he report contains the file download path

L3Gaunt · 2025-02-21T23:03:43Z

I changed things as follows:

the accessed-at date isn't put into filenames of downloaded URLs anymore; I think it is usually a desired behavior to overwrite web pages with newer versions, someone who really wants version tracking should add git to their knowledge base. In edge cases, the mapping of URLs->filenames is not 1-to-1 anymore though.
The final report now includes the download locations of the files we get
the output.md file now contains a timestamp, initial+follow-up questions, and the final learnings. Want to add intermediate learnings too. I think having the option to supervise and judge the quality of what the thing did during the process is important for quality control, and someone who doesn't want to see it can always just scroll past it.
using path.join to put folder+filename together (so it should work on Windows now...?)

Feel free to cherry-pick what you like.

L3Gaunt added 6 commits February 19, 2025 22:06

store crawled research results in a folder

c20391c

Don't put accessed date into filename anymore

b366806

refactor: filename path from url is in its own function now

fabc472

fix: closing brackets

9eb4c88

feat: logging of final urls and learnings

0a8703a

fix/feat: now path+filename joining should work in windows too, and t…

e04332b

…he report contains the file download path

chore: .gitignore now ignores downloaded files

41220e5

L3Gaunt changed the title ~~store crawled research results in a folder~~ store crawled research results in a folder, log questions+learnings in output file Feb 22, 2025

L3Gaunt changed the title ~~store crawled research results in a folder, log questions+learnings in output file~~ store crawled research results in a folder, log research topic/follow-up questions+learnings in output file Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

store crawled research results in a folder, log research topic/follow-up questions+learnings in output file #105

store crawled research results in a folder, log research topic/follow-up questions+learnings in output file #105

Uh oh!

L3Gaunt commented Feb 20, 2025 •

edited

Loading

Uh oh!

L3Gaunt commented Feb 21, 2025

Uh oh!

Uh oh!

store crawled research results in a folder, log research topic/follow-up questions+learnings in output file #105

Are you sure you want to change the base?

store crawled research results in a folder, log research topic/follow-up questions+learnings in output file #105

Uh oh!

Conversation

L3Gaunt commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

L3Gaunt commented Feb 21, 2025

Uh oh!

Uh oh!

L3Gaunt commented Feb 20, 2025 •

edited

Loading