Skip to content

Commit

Permalink
prepare version release
Browse files Browse the repository at this point in the history
  • Loading branch information
adbar committed Sep 1, 2020
1 parent 45c68a3 commit 194a08c
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 11 deletions.
7 changes: 7 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
## History / Changelog


### 0.2.0

- Cleaner and more efficient filtering
- Helper functions to scrub, clean and normalize
- Removed two dependencies with more extensive usage of urllib.parse


### 0.1.0

- Cleaning and filtering targeting non-spam HTML pages with primarily text
Expand Down
12 changes: 8 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
coURLan: clean, filter and sample URLs
======================================
coURLan: Clean, filter, normalize, and sample URLs
==================================================


.. image:: https://img.shields.io/pypi/v/courlan.svg
Expand Down Expand Up @@ -72,9 +72,10 @@ All operations chained:
.. code-block:: python
>>> from courlan.core import check_url
>>> check_url('https://github.com/adbar/courlan') # returns url and domain name
# returns url and domain name
>>> check_url('https://github.com/adbar/courlan')
('https://github.com/adbar/courlan', 'github.com')
# noisy query parameters are removed
# noisy query parameters can be removed
>>> check_url('https://httpbin.org/redirect-to?url=http%3A%2F%2Fexample.org', strict=True)
('https://httpbin.org/redirect-to', 'httpbin.org')
# Check for redirects (HEAD request)
Expand Down Expand Up @@ -107,6 +108,9 @@ Basic normalization only:
>>> my_url = normalize_url(urlparse(my_url))
# passing URL strings directly also works
>>> my_url = normalize_url(my_url)
# remove unnecessary components and re-order query elements
>>> normalize_url('http://test.net/foo.html?utm_source=twitter&post=abc&page=2#fragment', strict=True)
'http://test.net/foo.html?page=2&post=abc'
Basic URL validation only:
Expand Down
14 changes: 7 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
URL manipulation tools
URL filter and manipulation tools
http://github.com/adbar/courlan
"""

Expand All @@ -20,13 +20,12 @@ def readme():

setup(
name='courlan',
version='0.1.0',
description='Clean, filter and sample URLs',
version='0.2.0',
description='Clean, filter, normalize, and sample URLs',
long_description=readme(),
classifiers=[
# As from http://pypi.python.org/pypi?%3Aaction=list_classifiers
'Development Status :: 2 - Pre-Alpha',
#'Development Status :: 3 - Alpha',
'Development Status :: 3 - Alpha',
#'Development Status :: 4 - Beta',
#'Development Status :: 5 - Production/Stable',
#'Development Status :: 6 - Mature',
Expand All @@ -48,9 +47,10 @@ def readme():
'Programming Language :: Python :: 3.8',
'Topic :: Internet :: WWW/HTTP',
'Topic :: Scientific/Engineering :: Information Analysis',
'Topic :: Text Processing :: Filters',
],
keywords=['urls', 'url-parsing', 'url-manipulation', 'preprocessing', 'validation'],
url='http://github.com/adbar/urltools',
keywords=['urls', 'url-parsing', 'url-manipulation', 'preprocessing', 'validation', 'webcrawling'],
url='http://github.com/adbar/courlan',
author='Adrien Barbaresi',
author_email='[email protected]',
license='GPLv3+',
Expand Down
1 change: 1 addition & 0 deletions tests/unit_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,3 +125,4 @@ def test_examples():
assert clean_url('HTTPS://WWW.DWDS.DE:80/') == 'https://www.dwds.de'
assert validate_url('http://1234') == (False, None)
assert validate_url('http://www.example.org/')[0] is True
assert normalize_url('http://test.net/foo.html?utm_source=twitter&post=abc&page=2#fragment', strict=True) == 'http://test.net/foo.html?page=2&post=abc'

0 comments on commit 194a08c

Please sign in to comment.