Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize extraction of URL parameters #2042

Open
TheTechromancer opened this issue Nov 27, 2024 · 3 comments
Open

Centralize extraction of URL parameters #2042

TheTechromancer opened this issue Nov 27, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@TheTechromancer
Copy link
Collaborator

When a URL event is created, we should always save the GET parameters (before they're stripped off) in the event so that we can later speculate/excavate them. This will allow us to delete a lot of code in excavate, since right now we are extracting URL parameters in multiple places.

@liquidsec
Copy link
Collaborator

liquidsec commented Jan 16, 2025

This comes back to the idea that URL extraction and parameter extraction could somehow be merged. I fervently maintain that they cannot and should not.

Although I could come up with a giant list of reasons, the primary one comes down to all the places you can get parameters that have nothing to do with URLs. Think an input tag on a form where there's no action attribute. No URL to be found. So if all of those situations force you to do parameter extraction separately anyway, trying to force them together in the cases where you could would actually add a significant amount of complication.

The other big reason is when you don't want to do parameter extraction, because you don't care on a particular scan, it's really nice to have a clean separation to just shut it off. If they were merged in any significant way, you'd basically have a siamese twin baby you'd never truly be able to separate.

So - hopefully we can mind meld on that issue and converge our vision there.

I have been trying to clean up excavate some (#2181) but I think there are really 2 separate discussion points.

  1. Readability of excavate:

To that point, I think we need to move the excavate submodules into their own folder, like how i do lightfuzz submodules

  1. Multiple places producing parameters

I think this is alleviated somewhat with the aforementioned refactor/cleanup PR, but given the diversity of the scenarios covered there:

-Extracting parameters from HTTP_RESPONSE body
-Extracting parameters from initial TARGET / scan configuration settings
-Extracting parameters from headers (like set-cookie, or location)

These are just fundamentally different things that don't lend themselves to being mashed together, again, without actually adding complexity.

I think the solution there also comes down to breaking apart excavate.py and moving some of this stuff away - perhaps we need a parameter helper file?

@TheTechromancer
Copy link
Collaborator Author

Is there ever a case where we emit a URL's getparams but not the URL itself?

@liquidsec
Copy link
Collaborator

Plenty of cases where we are harvesting a parameter with no new url information, or only partial new URL information (like maybe just the path) and have to combine that with the existing URL from the parent event.

All of the logic to handle that has to exist separate from URL parsing. And its extra overhead that you really don't want to employ if you aren't dealing with parameters.

There are situations where we need information from 3 places to make a parameter and properly associate it with the correct URL:

  • Parent Event URL
  • Form action
  • input tag value

Without the context from all three, you are going to get it wrong. Remember, forms can be submitted to different URLs than the parent, or to themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants