Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The placement-rule generated by placement-rule-in-SQL may unexpectedly add Region peer to tiflash compute node #58633

Open
JaySon-Huang opened this issue Dec 31, 2024 · 5 comments · May be fixed by #58637

Comments

@JaySon-Huang
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Reported from https://asktug.com/t/topic/1037727
Details root cause is described in pingcap/tiflash#9750

Reproduce steps:

  1. Deploy a cluster with tikv and disaggregated tiflash (1 tiflash compute node and 1 tiflash storage node)
  2. Load TPC-H dataset into the cluster and create tiflash replica
  3. Create placement policy
CREATE PLACEMENT POLICY evict_sata_dw CONSTRAINTS="[-disk=sata-new, -disk=dw-ssd]" SURVIVAL_PREFERENCES="[host]";
ALTER TABLE test.region PLACEMENT POLICY=evict_sata_dw;
-- or alter policy globally
-- ALTER RANGE global PLACEMENT POLICY evict_sata_dw;

2. What did you expect to see? (Required)

The placement policy applied to the table and the tidb cluster run normally

3. What did you see instead (Required)

The tiflash compute node crash as the tiflash issue descripe

4. What is your TiDB version? (Required)

v7.5.4

@JaySon-Huang JaySon-Huang added the type/bug The issue is confirmed as a bug. label Dec 31, 2024
@JaySon-Huang
Copy link
Contributor Author

Ref pingcap/tiflash#9750 (comment)

Root cause

When creating a placement policy through tidb, tidb will create a placement-rule with "engine" notIn "tiflash" #22065
However, under tiflash disaggregated compute and storage arch, the compute node will register store with label {"engine": "tiflash_compute"}.

The logic in PD of choosing store for rule: pkg/schedule/placement/label_constraint.go @ pd

  1. If the store's label.key is equal to "engine" (or "exclusive," or starts with '$'), and the rule's constraint does not contain a rule with the same label.key, then do not schedule the peer to this store.
  2. Otherwise, further check if the rule's constraint matches the store's label. If it matches, schedule the peer to this store.

So the PD would pick tiflash compute node as target store to place the Region peer.

Workaround

As a workaround, user can explicitly exclude tiflash and tiflash_compute in theirs rule before this behavior is fixed. For example

CREATE PLACEMENT POLICY evict_sata_dw CONSTRAINTS="[-engine=tiflash, -engine=tiflash_compute, -disk=sata-new, -disk=dw-ssd]" SURVIVAL_PREFERENCES="[host]";
ALTER TABLE test.region PLACEMENT POLICY=evict_sata_dw;

@JaySon-Huang
Copy link
Contributor Author

/assign

@JaySon-Huang
Copy link
Contributor Author

/label severity/major

Copy link

ti-chi-bot bot commented Dec 31, 2024

@JaySon-Huang: The label(s) severity/major cannot be applied. These labels are supported: fuzz/sqlancer, fuzz/comp, challenge-program, compatibility-breaker, first-time-contributor, contribution, good first issue, correctness, duplicate, proposal, security, needs-more-info, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, needs-cherry-pick-release-8.1, needs-cherry-pick-release-8.5, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, affects-8.1, affects-8.4, affects-8.5, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5, may-affects-8.1, may-affects-8.5.

In response to this:

/label severity/major

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@JaySon-Huang
Copy link
Contributor Author

/severity major

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants