Skip to content

Releases: adbar/courlan

courlan-0.4.0

25 May 17:35
Compare
Choose a tag to compare
  • URL manipulation tools added: extract parts, fix relative URLs
  • filters added: language, navigation and crawls
  • more robust link handling and extraction
  • removed support for Python 3.4

courlan-0.3.1

19 Feb 17:28
Compare
Choose a tag to compare
  • improve filter precision

courlan-0.3.0

04 Jan 12:17
Compare
Choose a tag to compare
  • reduced dependencies: replace requests with bare urllib3, and tldextract with tld for Python 3.6 upwards
  • better path and fragment normalization

courlan-0.2.3

20 Oct 14:56
Compare
Choose a tag to compare
  • Python 3.9 compatibility
  • Simplified imports
  • Bug fixes

courlan-0.2.2

21 Sep 14:41
Compare
Choose a tag to compare
  • English and German language filters
  • Function to detect external links
  • Support for domain blacklisting

courlan-0.2.1

02 Sep 13:48
Compare
Choose a tag to compare
  • Less aggressive strict filters
  • CLI bug fixed

courlan-0.2.0

01 Sep 17:25
Compare
Choose a tag to compare
  • Cleaner and more efficient filtering
  • Helper functions to scrub, clean and normalize
  • Removed two dependencies with more extensive usage of urllib.parse

courlan-0.1.0

27 Aug 17:43
Compare
Choose a tag to compare
  • Cleaning and filtering targeting non-spam HTML pages with primarily text
  • URL validation
  • Sampling by domain name
  • Command-line interface (CLI) and Python tool