Add a flag to specify url path pattern

## What problem does this solve?

Currently, when processing llms.txt to get a list of pages to check, it strips the `.md` extension and uses the remaining path as the url, on the assumption that the pages use "clean" url paths with no extension. However, this isn't always the case - some docs still use the (older but still quite valid) `filename.ext` pattern. Currently, this means the checker just ends up with however a particular server is set up to handle those cases (whether a 404 or a redirect or what), which can skew the results of the check in ways that wouldn't be valid for a real agent.

**Example:**
`https://www.example.com/filename.md` in `llms.txt` is converted to `https://www.example.com/filename` by afdocs, but the server uses real filenames and is expecting `https://www.example.com/filename.html`. The exact result will depend on how that particular server handles those cases, but it'll quite likely be a 404.

## What would you like to see?

Ideally it'd be great if the checker could magically sniff out what URL pattern a site uses, but probably a simpler solution would be to add a flag that lets the user specify how paths should be processed.

## Alternatives considered

Can't really think of one.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a flag to specify url path pattern #95

What problem does this solve?

What would you like to see?

Alternatives considered

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add a flag to specify url path pattern #95

Description

What problem does this solve?

What would you like to see?

Alternatives considered

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions