Skip to content

Commit 2f7fd61

Browse files
committed
fix: reject template patterns causing O(n²) regex backtracking
The [^?#]* match pattern for {+var} and {#var} overlaps with every other operator's character class. When a trailing literal fails to match, the regex engine backtracks through O(n) split points with O(n) rescanning each, yielding quadratic time. A 64KB payload (the default max_uri_length) against {+prefix}{/path*}/END consumed ~25s CPU per request. Two conditions trigger the quadratic case, now both rejected at parse time: 1. {+var} immediately adjacent to any expression ({+a}{b}, {+a}{/b*}) 2. Two {+var}/{#var} anywhere in the template, even with a literal between them ({+a}/x/{+b}) since [^?#]* matches the literal too What remains permitted: - {+path} at end of template (the flagship use case) - {+path}.txt or {+path}/edit (literal suffix, linear backtracking) - {+path}{?v,page} (query expressions stripped before pattern build) - {+a}/sep/{b} (literal + bounded expression, disambiguated) The _check_adjacent_explodes function is generalized to _check_ambiguous_adjacency covering both explode adjacency and the new reserved-expansion constraints.
1 parent 60d12e1 commit 2f7fd61

File tree

2 files changed

+121
-31
lines changed

2 files changed

+121
-31
lines changed

src/mcp/shared/uri_template.py

Lines changed: 70 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,15 @@
1515
Matching is not specified by RFC 6570. A few templates can expand to
1616
URIs that ``match()`` cannot unambiguously reverse:
1717
18-
* Multi-variable reserved expressions like ``{+x,y}`` use a comma as
19-
separator but also permit commas *inside* values (commas are in the
20-
reserved set). ``match("a,b,c")`` cannot know which comma is the
21-
separator. The matcher takes the last comma as the split point; if
22-
your values contain commas, prefer separate expressions (``{+x}/{+y}``)
23-
or a different operator.
18+
* Reserved/fragment expressions (``{+var}``, ``{#var}``) are restricted
19+
to positions that avoid quadratic-time backtracking: at most one per
20+
template, and not immediately adjacent to another expression. The
21+
``[^?#]*`` pattern overlaps with every other operator's character
22+
class; a failing match against ``{+a}{b}`` or ``{+a}/x/{+b}``
23+
backtracks O(n²). Use a literal separator before a bounded
24+
expression (``{+a}/sep/{b}``) or put the reserved expression last
25+
(``file://docs/{+path}``). Trailing ``{?...}``/``{&...}`` query
26+
expressions are always fine since they're matched via ``parse_qs``.
2427
2528
* Reserved expansion ``{+var}`` leaves ``?`` and ``#`` unencoded, but
2629
the match pattern stops at those characters so that templates like
@@ -615,7 +618,7 @@ def _parse(template: str, *, max_expressions: int) -> tuple[list[_Part], list[Va
615618
Raises:
616619
InvalidUriTemplate: On unclosed braces, too many expressions, or
617620
any error surfaced by :func:`_parse_expression` or
618-
:func:`_check_adjacent_explodes`.
621+
:func:`_check_ambiguous_adjacency`.
619622
"""
620623
parts: list[_Part] = []
621624
variables: list[Variable] = []
@@ -659,7 +662,7 @@ def _parse(template: str, *, max_expressions: int) -> tuple[list[_Part], list[Va
659662
# Advance past the closing brace.
660663
i = end + 1
661664

662-
_check_adjacent_explodes(template, parts)
665+
_check_ambiguous_adjacency(template, parts)
663666
_check_duplicate_variables(template, variables)
664667
return parts, variables
665668

@@ -752,36 +755,73 @@ def _check_duplicate_variables(template: str, variables: list[Variable]) -> None
752755
seen.add(var.name)
753756

754757

755-
def _check_adjacent_explodes(template: str, parts: list[_Part]) -> None:
756-
"""Reject templates with adjacent explode variables.
758+
def _check_ambiguous_adjacency(template: str, parts: list[_Part]) -> None:
759+
"""Reject templates where adjacent expressions would cause ambiguous or quadratic matching.
757760
758-
Patterns like ``{/a*}{/b*}`` are ambiguous for matching: given
759-
``/x/y/z``, the split between ``a`` and ``b`` is undetermined.
760-
Different operators (``{/a*}{.b*}``) do not help in general because
761-
the first operator's character class often includes the second's
762-
separator, so the first explode greedily consumes both. We reject
763-
all adjacent explodes at parse time rather than picking an arbitrary
764-
resolution. A literal between them (``{/a*}/x{/b*}``) still
765-
disambiguates.
761+
Two patterns are rejected:
762+
763+
1. Adjacent explode variables (``{/a*}{/b*}``): the split between
764+
``a`` and ``b`` in ``/x/y/z`` is undetermined. Different
765+
operators don't help since character classes overlap.
766+
767+
2. Reserved/fragment expansion in a position that causes quadratic
768+
backtracking. The ``[^?#]*`` pattern for ``+`` and ``#``
769+
overlaps with every other operator's character class, so when a
770+
trailing match fails the engine backtracks through O(n) split
771+
points. Two conditions trigger this:
772+
773+
- ``{+var}`` immediately adjacent to any expression
774+
(``{+a}{b}``, ``{+a}{/b*}``)
775+
- Two ``{+var}``/``{#var}`` anywhere in the path, even with a
776+
literal between them (``{+a}/x/{+b}``) — the literal does not
777+
disambiguate since ``[^?#]*`` matches it too
778+
779+
A 64KB payload against either can consume tens of seconds of CPU.
780+
781+
Trailing ``{?...}``/``{&...}`` expressions are handled via
782+
``parse_qs`` outside the path regex, so they do not count against
783+
any check.
766784
767785
Raises:
768-
InvalidUriTemplate: If two explode variables appear with no
769-
literal or non-explode variable between them.
786+
InvalidUriTemplate: If any pattern is detected.
770787
"""
771788
prev_explode = False
789+
prev_reserved = False
790+
seen_reserved = False
772791
for part in parts:
773792
if isinstance(part, str):
774-
# Literal text breaks any adjacency.
793+
# A literal breaks immediate adjacency but does not reset
794+
# the seen-reserved count: [^?#]* matches most literals.
775795
prev_explode = False
796+
prev_reserved = False
776797
continue
777798
for var in part.variables:
778-
if var.explode:
779-
if prev_explode:
780-
raise InvalidUriTemplate(
781-
"Adjacent explode expressions are ambiguous for matching and not supported",
782-
template=template,
783-
)
784-
prev_explode = True
785-
else:
786-
# A non-explode variable also breaks adjacency.
799+
# ?/& are stripped before pattern building and never reach
800+
# the path regex.
801+
if var.operator in ("?", "&"):
787802
prev_explode = False
803+
prev_reserved = False
804+
continue
805+
806+
if prev_reserved:
807+
raise InvalidUriTemplate(
808+
"{+var} or {#var} immediately followed by another expression "
809+
"causes quadratic-time matching; separate them with a literal",
810+
template=template,
811+
)
812+
if var.operator in ("+", "#") and seen_reserved:
813+
raise InvalidUriTemplate(
814+
"Multiple {+var} or {#var} expressions in one template cause "
815+
"quadratic-time matching even with literals between them",
816+
template=template,
817+
)
818+
if var.explode and prev_explode:
819+
raise InvalidUriTemplate(
820+
"Adjacent explode expressions are ambiguous for matching and not supported",
821+
template=template,
822+
)
823+
824+
prev_explode = var.explode
825+
prev_reserved = var.operator in ("+", "#")
826+
if prev_reserved:
827+
seen_reserved = True

tests/shared/test_uri_template.py

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,55 @@ def test_parse_rejects_adjacent_explodes(template: str):
163163
UriTemplate.parse(template)
164164

165165

166+
@pytest.mark.parametrize(
167+
"template",
168+
[
169+
# {+var} immediately adjacent to any expression
170+
"{+a}{b}",
171+
"{+a}{/b}",
172+
"{+a}{/b*}",
173+
"{+a}{.b}",
174+
"{+a}{;b}",
175+
"{#a}{b}",
176+
"{+a,b}", # multi-var in one expression: same adjacency
177+
"prefix/{+path}{.ext}", # literal before doesn't help
178+
# Two {+var}/{#var} anywhere, even with literals between
179+
"{+a}/x/{+b}",
180+
"{+a},{+b}",
181+
"{#a}/x/{+b}",
182+
"{+a}.foo.{#b}",
183+
],
184+
)
185+
def test_parse_rejects_reserved_quadratic_patterns(template: str):
186+
# These patterns cause O(n²) regex backtracking when a trailing
187+
# literal fails to match. Rejecting at parse time eliminates the
188+
# ReDoS vector at the source.
189+
with pytest.raises(InvalidUriTemplate, match="quadratic"):
190+
UriTemplate.parse(template)
191+
192+
193+
@pytest.mark.parametrize(
194+
"template",
195+
[
196+
"file://docs/{+path}", # + at end of template
197+
"file://{+path}.txt", # + followed by literal only
198+
"file://{+path}/edit", # + followed by literal only
199+
"api/{+path}{?v,page}", # + followed by query (stripped before regex)
200+
"api/{+path}{&next}", # + followed by query-continuation
201+
"page{#section}", # # at end
202+
"{a}{+b}", # + preceded by expression is fine; only following matters
203+
"{+a}/sep/{b}", # literal + bounded expression after: linear
204+
"{+a},{b}", # same: literal disambiguates when second is bounded
205+
],
206+
)
207+
def test_parse_allows_reserved_in_safe_positions(template: str):
208+
# These do not exhibit quadratic backtracking: end-of-template,
209+
# literal + bounded expression, or trailing query expression
210+
# (handled by parse_qs outside the path regex).
211+
t = UriTemplate.parse(template)
212+
assert t is not None
213+
214+
166215
@pytest.mark.parametrize(
167216
"template",
168217
["{x}/{x}", "{x,x}", "{a}{b}{a}", "{+x}/foo/{x}"],
@@ -306,7 +355,8 @@ def test_frozen():
306355
("?a=1{&b}", {"b": "2"}, "?a=1&b=2"),
307356
# Multi-var in one expression
308357
("{x,y}", {"x": "1", "y": "2"}, "1,2"),
309-
("{+x,y}", {"x": "a/b", "y": "c/d"}, "a/b,c/d"),
358+
# {+x,y} is rejected at parse time (quadratic backtracking +
359+
# inherent ambiguity). Use {+x}/{+y} with a literal separator.
310360
# Sequence values, non-explode (comma-join)
311361
("{/list}", {"list": ["a", "b", "c"]}, "/a,b,c"),
312362
("{?list}", {"list": ["a", "b"]}, "?list=a,b"),

0 commit comments

Comments
 (0)