Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

监控SQL慢 #3593

Open
LinWanCen opened this issue Dec 18, 2024 · 4 comments
Open

监控SQL慢 #3593

LinWanCen opened this issue Dec 18, 2024 · 4 comments

Comments

@LinWanCen
Copy link

LinWanCen commented Dec 18, 2024

https://github.com/xuxueli/xxl-job/blob/39b92f9db608753650f2285bf70b0279800a560c/xxl-job-admin/src/main/resources/mybatis-mapper/XxlJobLogMapper.xml#L230C1-L240C11

这个SQL在我们自动化部署环境平均执行时间16秒,最高执行时间32秒,看代码每10秒一次,
建议添加组合索引,优化 SQL 以便命中索引:

  • 去掉 ORDER BY id ASC 避免 MYSQL 走主键索引遍历全表 Using where
	<select id="findFailJobLogIds" resultType="long" >
		SELECT id FROM `xxl_job_log`
		WHERE !(
			(trigger_code in (0, 200) and handle_code = 0)
			OR
			(handle_code = 200)
		)
		AND `alarm_status` = 0
		ORDER BY id ASC
		LIMIT #{pagesize}
	</select>

索引:
  KEY `I_trigger_time` (`trigger_time`),
  KEY `I_handle_code` (`handle_code`)

数据量:
handle_code 有索引
500 337115
200 193482
0 30351

trigger_code 无索引
200 552070
500 5768
0 2430

alarm_status 无索引
2 342881
0 217975
-1 2
@xmabul
Copy link

xmabul commented Dec 30, 2024

@LinWanCen 不用改代码,直接增加联合索引idx_alarmstatus_handlecode (alarm_status,handle_code),再看看扫描行多少

@LinWanCen
Copy link
Author

我们试过不管用,于是根据数据只有200、500、0的情况优化了SQL

@xmabul
Copy link

xmabul commented Dec 31, 2024

对我们是有效,慢SQL瞬间没了。
之所以你们不管用,看了你们数据分布,handle_code<>200超一半(执行状态失败),是不是得考虑任务有异常。

这是我们的数据分布
+-------------+---------+
| handle_code | cnt     |
+-------------+---------+
|           0 |  294438 |
|         200 | 2416270 |
|         500 |     410 |
+-------------+---------+
+--------------+---------+
| alarm_status | cnt     |
+--------------+---------+
|            0 | 2416276 |
|            2 |  294848 |
+--------------+---------+
+--------------+---------+
| trigger_code | cnt     |
+--------------+---------+
|          500 |  294571 |
|          200 | 2416606 |
+--------------+---------+

@LinWanCen
Copy link
Author

我们正在分析为什么同样的代码只有一套环境出现了有个任务没用返回执行结果的情况,其他环境都有,看完客户端代码应该没问题,可能接受端更新表哪里出问题了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants