Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling empty segments in urls according to RFC3986 #1026

Merged
merged 38 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2b58097
Add (failing) test for trailing slash when joining with ""
youtux Jun 9, 2024
a4ae068
Add back trailing slash when joining with ""
youtux Jun 9, 2024
6eeabc1
Add another test for path joining to ""
youtux Jun 9, 2024
d6de640
join path with empty segments
commonism Jun 13, 2024
7a07521
refactor - rename segments in _make_child to match the wording in rfc…
commonism Jun 14, 2024
8a9ad79
tests - extend tests for empty segments
commonism Jun 14, 2024
9e412ae
adding CHANGES/
commonism Jun 18, 2024
b7bcddf
CHANGES - spelling
commonism Jun 18, 2024
38892cd
Update tests/test_url.py
commonism Jun 20, 2024
0a56a8e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 20, 2024
3bc0c09
not overuse :=
commonism Jun 21, 2024
24b696f
using codecov-action@v3
commonism Jun 27, 2024
64ccea3
[email protected] && fail_ci_if_error=true
commonism Jun 27, 2024
be46800
Update CHANGES/1026.bugfix.rst
commonism Jun 28, 2024
a9548f4
Update yarl/_url.py
commonism Jun 28, 2024
50b2642
requested changes to CHANGES & ci
commonism Jun 28, 2024
8816bb2
keep using codecov @v4
commonism Jun 29, 2024
0c35ed3
changes - formatting & spelling
commonism Jun 29, 2024
9d3dfdb
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
10e965f
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
e9d8e85
CHANGES - reference methods
commonism Jul 1, 2024
b766d32
CHANGES - URL.joinpath
commonism Jul 1, 2024
bd0eb94
CHANGES - include module in reference
commonism Jul 1, 2024
25978a2
CHANGES - shorten method ref
commonism Jul 1, 2024
1c3b9ce
docs - == != =
commonism Jul 1, 2024
84ee421
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
fa41f77
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
9a1e48b
Update docs/api.rst
commonism Jul 1, 2024
572dad4
Update yarl/_url.py
commonism Jul 1, 2024
b265e58
CHANGES - doc /
commonism Jul 1, 2024
f6ab864
CHANGES - ref __truediv__
commonism Jul 1, 2024
b06a09d
Update CHANGES/1026.doc.rst
commonism Jul 1, 2024
0026b6d
Update CHANGES.rst
commonism Jul 1, 2024
583acb1
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
474926d
Update CHANGES.rst
commonism Jul 1, 2024
04d2f55
Update CHANGES/1026.doc.rst
webknjaz Jul 1, 2024
c521e33
tests - remove chained comparision
commonism Jul 3, 2024
cc1b9fa
Merge branch 'master' into youtux-fix-join-trailing-slash
commonism Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Bug fixes
---------

- Stopped dropping trailing slashes in :py:meth:`~yarl.URL.joinpath` -- by :user:`gmacon`. (:issue:`862`, :issue:`866`)
- Started accepting string subclasses in ``__truediv__()`` operations (``URL / segment``) -- by :user:`mjpieters`. (:issue:`871`, :issue:`884`)
- Started accepting string subclasses in :meth:`~yarl.URL.__truediv__` operations (``URL / segment``) -- by :user:`mjpieters`. (:issue:`871`, :issue:`884`)
- Fixed the human representation of URLs with square brackets in usernames and passwords -- by :user:`mjpieters`. (:issue:`876`, :issue:`882`)
- Updated type hints to include ``URL.missing_port()``, ``URL.__bytes__()``
and the ``encoding`` argument to :py:meth:`~yarl.URL.joinpath`
Expand Down Expand Up @@ -174,7 +174,7 @@ Contributor-facing changes
Bugfixes
--------

- Fix regression with ``__truediv__`` and absolute URLs with empty paths causing the raw path to lack the leading ``/``.
- Fix regression with :meth:`~yarl.URL.__truediv__` and absolute URLs with empty paths causing the raw path to lack the leading ``/``.
(`#854 <https://github.com/aio-libs/yarl/issues/854>`_)


Expand All @@ -196,7 +196,7 @@ Features
--------

- Added ``URL.joinpath(*elements)``, to create a new URL appending multiple path elements. (`#704 <https://github.com/aio-libs/yarl/issues/704>`_)
- Made ``URL.__truediv__()`` return ``NotImplemented`` if called with an
- Made :meth:`URL.__truediv__() <yarl.URL.__truediv__>` return ``NotImplemented`` if called with an
unsupported type — by :user:`michaeljpeters`.
(`#832 <https://github.com/aio-libs/yarl/issues/832>`_)

Expand Down
21 changes: 21 additions & 0 deletions CHANGES/1026.bugfix.rst
commonism marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Joining URLs with empty segments has been changed
to match :rfc:`3986`.

Previously empty segments would be removed from path,
breaking use-cases such as

.. code-block:: python

URL("https://web.archive.org/web/") / "https://github.com/"

Now :meth:`/ operation <yarl.URL.__truediv__>` and :meth:`URL.joinpath() <yarl.URL.joinpath>`
keep empty segments, but do not introduce new empty segments.
e.g.

.. code-block:: python

URL("https://example.org/") / ""

does not introduce an empty segment.

-- by :user:`commonism` and :user:`youtux`
2 changes: 2 additions & 0 deletions CHANGES/1026.doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The pre-existing :meth:`/ magic method <yarl.URL.__truediv__>`
has been documented in the API reference -- by :user:`commonism`.
14 changes: 14 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -844,6 +844,20 @@ The path is encoded if needed.

.. versionadded:: 1.9

.. method:: URL.__truediv__(url)
commonism marked this conversation as resolved.
Show resolved Hide resolved

Shortcut for :meth:`URL.joinpath` with a single element and ``encoded=False``.

.. doctest::
commonism marked this conversation as resolved.
Show resolved Hide resolved

>>> url = URL('http://example.com/path?arg#frag') / 'to'
>>> url
URL('http://example.com/path/to')
>>> url.parts
('/', 'path', 'to')

.. versionadded:: 0.9

.. method:: URL.join(url)

Construct a full (“absolute”) URL by combining a “base URL”
Expand Down
45 changes: 45 additions & 0 deletions tests/test_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,13 +799,25 @@
pytest.param(
"/path/", ("to",), "http://example.com/path/to", id="path-with-slash"
),
pytest.param(
"/path", ("",), "http://example.com/path/", id="path-add-trailing-slash"
),
pytest.param(
"/path?a=1#frag",
("to",),
"http://example.com/path/to",
id="cleanup-query-and-fragment",
),
pytest.param("", ("path/",), "http://example.com/path/", id="trailing-slash"),
pytest.param(
"",
(
"path",
"",
),
"http://example.com/path/",
id="trailing-slash-empty-string",
),
pytest.param(
"", ("path/", "to/"), "http://example.com/path/to/", id="duplicate-slash"
),
Expand All @@ -827,6 +839,39 @@
assert str(url.joinpath(*to_join)) == expected


@pytest.mark.parametrize(
"base,to_join,expected",
[
pytest.param("path", "a", "path/a", id="default_default"),
pytest.param("path", "./a", "path/a", id="default_relative"),
pytest.param("path/", "a", "path/a", id="empty-segment_default"),
pytest.param("path/", "./a", "path/a", id="empty-segment_relative"),
pytest.param("path", ".//a", "path//a", id="default_empty-segment"),
pytest.param("path/", ".//a", "path//a", id="empty-segment_empty_segment"),
pytest.param("path//", "a", "path//a", id="empty-segments_default"),
pytest.param("path//", "./a", "path//a", id="empty-segments_relative"),
pytest.param("path//", ".//a", "path///a", id="empty-segments_empty-segment"),
pytest.param("path", "a/", "path/a/", id="default_trailing-empty-segment"),
pytest.param("path", "a//", "path/a//", id="default_trailing-empty-segments"),
pytest.param("path", "a//b", "path/a//b", id="default_embedded-empty-segment"),
],
)
def test_joinpath_empty_segments(base, to_join, expected):
url = URL(f"http://example.com/{base}")

Check warning on line 860 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L859-L860

Added lines #L859 - L860 were not covered by tests
assert (
f"http://example.com/{expected}" == str(url.joinpath(to_join))
and str(url / to_join) == f"http://example.com/{expected}"

Check warning on line 863 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L862-L863

Added lines #L862 - L863 were not covered by tests
)


def test_joinpath_single_empty_segments():

Check warning on line 867 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L867

Added line #L867 was not covered by tests
"""joining standalone empty segments does not create empty segments"""
a = URL("/1//2///3")
assert a.parts == ("/", "1", "", "2", "", "", "3")
b = URL("scheme://host").joinpath(*a.parts[1:])
assert b.path == "/1/2/3"

Check warning on line 872 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L869-L872

Added lines #L869 - L872 were not covered by tests


@pytest.mark.parametrize(
"url,to_join,expected",
[
Expand Down
44 changes: 25 additions & 19 deletions yarl/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,28 +713,34 @@
"Path in a URL with authority should start with a slash ('/') if set"
)

def _make_child(self, segments, encoded=False):
"""add segments to self._val.path, accounting for absolute vs relative paths"""
# keep the trailing slash if the last segment ends with /
parsed = [""] if segments and segments[-1][-1:] == "/" else []
for seg in reversed(segments):
if not seg:
continue
if seg[0] == "/":
def _make_child(self, paths, encoded=False):
"""
add paths to self._val.path, accounting for absolute vs relative paths,
keep existing, but do not create new, empty segments
"""
parsed = []
for idx, path in enumerate(reversed(paths)):
# empty segment of last is not removed
last = idx == 0
if path and path[0] == "/":
raise ValueError(
f"Appending path {seg!r} starting from slash is forbidden"
)
seg = seg if encoded else self._PATH_QUOTER(seg)
if "/" in seg:
parsed += (
sub for sub in reversed(seg.split("/")) if sub and sub != "."
f"Appending path {path!r} starting from slash is forbidden"

Check warning on line 727 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L727

Added line #L727 was not covered by tests
)
elif seg != ".":
parsed.append(seg)
path = path if encoded else self._PATH_QUOTER(path)
segments = [
segment for segment in reversed(path.split("/")) if segment != "."

Check warning on line 731 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L731

Added line #L731 was not covered by tests
]
if not segments:
continue
# remove trailing empty segment for all but the last path
segment_slice_start = int(not last and segments[0] == "")
parsed += segments[segment_slice_start:]
parsed.reverse()
old_path = self._val.path
if old_path:
parsed = [*old_path.rstrip("/").split("/"), *parsed]

if self._val.path and (old_path_segments := self._val.path.split("/")):
old_path_cutoff = -1 if old_path_segments[-1] == "" else None
parsed = [*old_path_segments[:old_path_cutoff], *parsed]

if self.is_absolute():
parsed = _normalize_path_segments(parsed)
if parsed and parsed[0] != "":
Expand Down
Loading