Skip to content

Releases: adbar/courlan

courlan-0.9.0

07 Mar 12:28
Compare
Choose a tag to compare
  • hardening of filters and URL parses (#14)
  • normalize punicode to unicode
  • methods added to UrlStore: get_crawl_delay(), print_unvisited_urls()
  • UrlStore now triggers exit code 1 when interrupted
  • argument added to extract_links(): no_filter
  • code refactoring: simplifications

Full Changelog: v0.8.3...v0.9.0

courlan-0.8.3

28 Jul 16:53
Compare
Choose a tag to compare
  • fixed bug in domain name extraction
  • uniform logging parameters

Full Changelog: v0.8.2...v0.8.3

courlan-0.8.2

26 Jul 16:22
Compare
Choose a tag to compare
  • full type hinting
  • maintenance: code linted

Full Changelog: v0.8.1...v0.8.2

courlan-0.8.1

11 Jul 12:04
Compare
Choose a tag to compare
  • add type annotations and check with mypy
  • url_filter() function moved from Trafilatura
  • code style: use black

courlan-0.8.0

30 Jun 12:14
Compare
Choose a tag to compare
  • performance optimizations
  • fast track for domain extraction (extract_domain(url, fast=True)), now taking subdomains into account

Full Changelog: v0.7.2...v0.8.0

courlan-0.7.2

17 May 16:25
Compare
Choose a tag to compare
  • UrlStore: threading lock and convenience functions added

courlan-0.7.1

29 Mar 16:14
Compare
Choose a tag to compare
  • bug in sampling fixed
  • UrlStore: validation by default

Full Changelog: v0.7.0...v0.7.1

courlan-0.7.0

21 Mar 14:54
Compare
Choose a tag to compare
  • UrlStore class added: data store containing URLs with relevant information
  • code cleaning and maintenance (bugs, simplification)

Full Changelog: v0.6.0...v0.7.0

courlan-0.6.0

11 Nov 17:39
Compare
Choose a tag to compare
  • reviewed code base: simplicity and execution speed
  • dropped support for Python 3.5

courlan-0.5.0

13 Oct 14:59
Compare
Choose a tag to compare
  • more complex language heuristics, use langcodes
  • extended blacklists and whitelists
  • more precise filters and more efficient code
  • support for Python 3.10