Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling empty segments in urls according to RFC3986 #1026

Merged
merged 38 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2b58097
Add (failing) test for trailing slash when joining with ""
youtux Jun 9, 2024
a4ae068
Add back trailing slash when joining with ""
youtux Jun 9, 2024
6eeabc1
Add another test for path joining to ""
youtux Jun 9, 2024
d6de640
join path with empty segments
commonism Jun 13, 2024
7a07521
refactor - rename segments in _make_child to match the wording in rfc…
commonism Jun 14, 2024
8a9ad79
tests - extend tests for empty segments
commonism Jun 14, 2024
9e412ae
adding CHANGES/
commonism Jun 18, 2024
b7bcddf
CHANGES - spelling
commonism Jun 18, 2024
38892cd
Update tests/test_url.py
commonism Jun 20, 2024
0a56a8e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 20, 2024
3bc0c09
not overuse :=
commonism Jun 21, 2024
24b696f
using codecov-action@v3
commonism Jun 27, 2024
64ccea3
[email protected] && fail_ci_if_error=true
commonism Jun 27, 2024
be46800
Update CHANGES/1026.bugfix.rst
commonism Jun 28, 2024
a9548f4
Update yarl/_url.py
commonism Jun 28, 2024
50b2642
requested changes to CHANGES & ci
commonism Jun 28, 2024
8816bb2
keep using codecov @v4
commonism Jun 29, 2024
0c35ed3
changes - formatting & spelling
commonism Jun 29, 2024
9d3dfdb
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
10e965f
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
e9d8e85
CHANGES - reference methods
commonism Jul 1, 2024
b766d32
CHANGES - URL.joinpath
commonism Jul 1, 2024
bd0eb94
CHANGES - include module in reference
commonism Jul 1, 2024
25978a2
CHANGES - shorten method ref
commonism Jul 1, 2024
1c3b9ce
docs - == != =
commonism Jul 1, 2024
84ee421
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
fa41f77
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
9a1e48b
Update docs/api.rst
commonism Jul 1, 2024
572dad4
Update yarl/_url.py
commonism Jul 1, 2024
b265e58
CHANGES - doc /
commonism Jul 1, 2024
f6ab864
CHANGES - ref __truediv__
commonism Jul 1, 2024
b06a09d
Update CHANGES/1026.doc.rst
commonism Jul 1, 2024
0026b6d
Update CHANGES.rst
commonism Jul 1, 2024
583acb1
Update CHANGES/1026.bugfix.rst
commonism Jul 1, 2024
474926d
Update CHANGES.rst
commonism Jul 1, 2024
04d2f55
Update CHANGES/1026.doc.rst
webknjaz Jul 1, 2024
c521e33
tests - remove chained comparision
commonism Jul 3, 2024
cc1b9fa
Merge branch 'master' into youtux-fix-join-trailing-slash
commonism Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGES/1026.bugfix.rst
commonism marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Joining URLs with empty segments has been changed
to match :rfc:`3986`.

Previously empty segments would be removed from path,
breaking use-cases such as

.. code-block:: python

URL("https://web.archive.org/web/") / "https://github.com/"

Now :meth:`~yarl.URL.__truediv__` and :meth:`~yarl.URL.joinpath` operation
commonism marked this conversation as resolved.
Show resolved Hide resolved
keep empty segments, but do not introduce new empty segments.
e.g.

.. code-block:: python

URL("https://example.org/") / ""

does not introduce an empty segment.
-- by :user:`commonism` and :user:`youtux`.
commonism marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 16 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -844,6 +844,20 @@ The path is encoded if needed.

.. versionadded:: 1.9

.. method:: URL.__truediv__(url)
commonism marked this conversation as resolved.
Show resolved Hide resolved

Shortcut for :meth:`URL.joinpath` with a single element and ``encoded=False``.

.. doctest::
commonism marked this conversation as resolved.
Show resolved Hide resolved

>>> url = URL('http://example.com/path?arg#frag') / 'to'
>>> url
URL('http://example.com/path/to')
>>> url.parts
('/', 'path', 'to')

.. versionadded:: 0.9

.. method:: URL.join(url)

Construct a full (“absolute”) URL by combining a “base URL”
Expand All @@ -870,6 +884,8 @@ The path is encoded if needed.
>>> base.join(URL('//python.org/page.html'))
URL('http://python.org/page.html')



commonism marked this conversation as resolved.
Show resolved Hide resolved
Human readable representation
-----------------------------

Expand Down
46 changes: 46 additions & 0 deletions tests/test_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,13 +799,25 @@
pytest.param(
"/path/", ("to",), "http://example.com/path/to", id="path-with-slash"
),
pytest.param(
"/path", ("",), "http://example.com/path/", id="path-add-trailing-slash"
),
pytest.param(
"/path?a=1#frag",
("to",),
"http://example.com/path/to",
id="cleanup-query-and-fragment",
),
pytest.param("", ("path/",), "http://example.com/path/", id="trailing-slash"),
pytest.param(
"",
(
"path",
"",
),
"http://example.com/path/",
id="trailing-slash-empty-string",
),
pytest.param(
"", ("path/", "to/"), "http://example.com/path/to/", id="duplicate-slash"
),
Expand All @@ -827,6 +839,40 @@
assert str(url.joinpath(*to_join)) == expected


@pytest.mark.parametrize(
"base,to_join,expected",
[
pytest.param("path", "a", "path/a", id="default_default"),
pytest.param("path", "./a", "path/a", id="default_relative"),
pytest.param("path/", "a", "path/a", id="empty-segment_default"),
pytest.param("path/", "./a", "path/a", id="empty-segment_relative"),
pytest.param("path", ".//a", "path//a", id="default_empty-segment"),
pytest.param("path/", ".//a", "path//a", id="empty-segment_empty_segment"),
pytest.param("path//", "a", "path//a", id="empty-segments_default"),
pytest.param("path//", "./a", "path//a", id="empty-segments_relative"),
pytest.param("path//", ".//a", "path///a", id="empty-segments_empty-segment"),
pytest.param("path", "a/", "path/a/", id="default_trailing-empty-segment"),
pytest.param("path", "a//", "path/a//", id="default_trailing-empty-segments"),
pytest.param("path", "a//b", "path/a//b", id="default_embedded-empty-segment"),
],
)
def test_joinpath_empty_segments(base, to_join, expected):
url = URL(f"http://example.com/{base}")

Check warning on line 860 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L859-L860

Added lines #L859 - L860 were not covered by tests
assert (
str(url.joinpath(to_join))
== str(url / to_join)
== f"http://example.com/{expected}"

Check warning on line 864 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L862-L864

Added lines #L862 - L864 were not covered by tests
commonism marked this conversation as resolved.
Show resolved Hide resolved
)


def test_joinpath_single_empty_segments():

Check warning on line 868 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L868

Added line #L868 was not covered by tests
"""joining standalone empty segments does not create empty segments"""
a = URL("/1//2///3")
assert a.parts == ("/", "1", "", "2", "", "", "3")
b = URL("scheme://host").joinpath(*a.parts[1:])
assert b.path == "/1/2/3"

Check warning on line 873 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L870-L873

Added lines #L870 - L873 were not covered by tests


@pytest.mark.parametrize(
"url,to_join,expected",
[
Expand Down
43 changes: 24 additions & 19 deletions yarl/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,28 +713,33 @@
"Path in a URL with authority should start with a slash ('/') if set"
)

def _make_child(self, segments, encoded=False):
"""add segments to self._val.path, accounting for absolute vs relative paths"""
# keep the trailing slash if the last segment ends with /
parsed = [""] if segments and segments[-1][-1:] == "/" else []
for seg in reversed(segments):
if not seg:
continue
if seg[0] == "/":
def _make_child(self, paths, encoded=False):
"""
add paths to self._val.path, accounting for absolute vs relative paths,
keep existing, but do not create new, empty segments
"""
parsed = []
for idx, path in enumerate(reversed(paths)):
# empty segment of last is not removed
last = idx == 0
if path and path[0] == "/":
raise ValueError(
f"Appending path {seg!r} starting from slash is forbidden"
)
seg = seg if encoded else self._PATH_QUOTER(seg)
if "/" in seg:
parsed += (
sub for sub in reversed(seg.split("/")) if sub and sub != "."
f"Appending path {path!r} starting from slash is forbidden"

Check warning on line 727 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L727

Added line #L727 was not covered by tests
)
elif seg != ".":
parsed.append(seg)
path = path if encoded else self._PATH_QUOTER(path)
segments = [
segment for segment in reversed(path.split("/")) if segment != "."

Check warning on line 731 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L731

Added line #L731 was not covered by tests
]
if not segments:
continue
# remove trailing empty segment for all but the last path
parsed += segments[1:] if not last and segments[0] == "" else segments
commonism marked this conversation as resolved.
Show resolved Hide resolved
parsed.reverse()
old_path = self._val.path
if old_path:
parsed = [*old_path.rstrip("/").split("/"), *parsed]

if self._val.path and (old_path_segments := self._val.path.split("/")):
old_path_cutoff = -1 if old_path_segments[-1] == "" else None
parsed = [*old_path_segments[:old_path_cutoff], *parsed]

if self.is_absolute():
parsed = _normalize_path_segments(parsed)
if parsed and parsed[0] != "":
Expand Down
Loading