Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: corrupt data in routes() response due to healthchecker data #11844

Merged
merged 16 commits into from
Dec 27, 2024

Conversation

Revolyssup
Copy link
Contributor

@Revolyssup Revolyssup commented Dec 19, 2024

Description

Fixes # (issue)
Currently there is an intermittent bug. Below are the steps to reproduce:
Reproduction steps

  1. Create a route and enable health check
curl http://127.0.0.1:9180/apisix/admin/routes/1 -H 'X-API-KEY: wqGBivhkYbFXiMdcMEAOMZanCGnLSbWE' -X PUT -d '
{
    "uri": "/*",
    "upstream": {
         "nodes": {
            "httpbin.org:80": 1,
            "mockbin.org:80": 1
        },
        "type": "roundrobin",
        "checks": {
            "active": {
                "timeout": 5,
                "type": "tcp",
                "healthy": {
                    "interval": 2,
                    "successes": 1
                },                
                "unhealthy": {
                    "interval": 2,
                    "successes": 1
                }
            }
        }
    }
}'
  1. access routes to trigger health checks
curl http://127.0.0.1:9080/get
  1. When accessing the control api, 500 will be returned and the error in the problem description will be generated.

Checklist

This error is intermittent and multiple tries might be needed to reproduce.
On debugging it was found that the cause of issue was http_router.user_routes table being polluted by adding healthchecks leaving unexpected data which while trying to parse, the error was generated.

As you can see in the generated logs, user_route returned healthchecker data.
Screenshot from 2024-12-19 15-11-00
Screenshot from 2024-12-19 15-10-51

Fix

Remove healthcheck related objects before sending to user.

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels Dec 19, 2024
@Revolyssup Revolyssup marked this pull request as draft December 20, 2024 08:06
@Revolyssup Revolyssup marked this pull request as ready for review December 22, 2024 17:33
@Revolyssup Revolyssup changed the title fix: corrupt data in routes() due to user requeset fix: corrupt data in routes() response due to healthchecker data Dec 22, 2024
@membphis
Copy link
Member

we can not update the watched data from etcd, it is dangerous

@membphis membphis closed this Dec 23, 2024
@Revolyssup Revolyssup reopened this Dec 23, 2024
@Revolyssup
Copy link
Contributor Author

Revolyssup commented Dec 23, 2024

we can not update the watched data from etcd, it is dangerous

The original data is not being modified. It is already deepcopied. But to make it easier to review I have refactored to move the modification logic just below the deepcopy function call so that it's clear that the original data is not being modified.

Comment on lines 276 to 278
if new_route.clean_handlers then
new_route.clean_handlers = {}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just also set to nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some existing test cases grep logs and expect this to {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can fix the test case as well.
Checking the logs has no meaning for the control api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 25, 2024
Copy link
Member

@nic-chen nic-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test cases to ensure error is gone

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Dec 26, 2024
@Revolyssup
Copy link
Contributor Author

please add test cases to ensure error is gone

added

@Revolyssup Revolyssup merged commit fe8f9bf into apache:master Dec 27, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants