digoal
2021-09-18
PostgreSQL , pglog
解析csvlog, 获得数据库报告
https://mydbanotebook.org/post/pglog/
https://gitlab.com/l_avrot/pglog
建议的参数配置如下:
Parameter | Setting | Comment |
---|---|---|
log_destination | csvlog | Mandatory |
logging_collector | on | Mandatory for csv loggin |
log_min_duration_statement | 250 ms | For a transactional workload, for Business Intelligence worload, 1s or 5s can be fine |
log_autovacuum_min_duration | 0 | To see all autovacuum operation occuring |
log_checkpoints | on | To see when checkpoints occur |
log_connections | on | To see the rythm of connections |
log_disconnections | on | To see how long sessions are |
log_lock_waits | on | To see when locks occur |
log_temp_files | 0 | To see if PostgreSQL is swapping |
使用, 创建需要分析的csvlog文件或目录
create extension pglog cascade;
call pglog.create_logtable_from_file (file, directory, version[, schema][, materialized]);
call pglog.create_logtable_from_dir (directory, version[, schema][, materialized]);
- an error report
- a detailed error analysis for each level of error
- an autovacuum report
- a checkpoints report
- an hourly checkpoints report
- a tempfile usage report
- a tempfile queries report
- a top n slow queries report
Based on the pglog partitionned table, you can generate dashboards.
This view will query the partitionned table pglog for log levels analysis.
Example:
pglog=# select * from error_report;
error_severity | nb | percentage
----------------+---------+------------
FATAL | 8 | 0.00
ERROR | 106 | 0.00
WARNING | 3 | 0.00
LOG | 6055776 | 100.00
(4 rows)
The views error_analyze_(warning|error|fatal|panic) will display a more detail
analysis of the given log level. For lower severity level, I recommend using
adding limit n to your query.
Example:
select * from error_analyze_fatal;
message | count | min | max
-----------------------------------------------------+-------+----------------------------+----------------------------
terminating connection due to administrator command | 7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00
connection to client lost | 2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00
(2 rows)
This view will display the number of autovacuum and autoanalyze per table. You
might want to add a limit n part to your query so you don't display all your
tables in the resault.
Example:
pglog=# select * from autovacuum_report limit 5;
schema_name | table_name | vacuum_count | analyze_count
-------------+---------------+--------------+---------------
pg_catalog | pg_statistic | 24 | 0
pg_catalog | pg_attribute | 15 | 14
pglog | postgresqlwed | 8 | 7
pg_catalog | pg_class | 7 | 7
pg_catalog | pg_type | 6 | 6
(5 rows)
This view will give you statistics about your checkpoints as how often they're
launched and how much time it takes to complete.
Example:
pglog=# select * from checkpoints_stats;
avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
---------------------+------------------------+-------------------------+----------------------------
06:01:55.491981 | 00:15:00.809 | 00:00:32.447288 | 00:00:06.6675
(1 row)
It's rare that your database has a flat workload. It usually works by peaks.
This view will give you chackpoints statistics by hours. Once the peak hours
are identified, I strongly suggest to craft a query that will get the same stats
per minute during the peak times.
Example:
pglog=# select * from checkpoints_stats_by_hour;
date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
-----------+---------------------+------------------------+-------------------------+----------------------------
6 | 03:31:43.989333 | 00:04:59.941 | 00:00:06.375 | 00:00:04.431
7 | 22:36:05.281917 | 00:15:00.809 | 00:00:17.00225 | 00:00:03.916
8 | 00:21:22.5615 | 00:15:01.661 | 00:00:17.9445 | 00:00:09.3765
9 | 00:39:21.674167 | 00:30:17.2415 | 00:00:33.430833 | 00:00:11.2735
10 | 00:16:09.9175 | 00:16:09.9175 | 00:00:06.032 | 00:00:06.032
11 | 00:37:25.941667 | 00:12:22.328 | 00:01:40.044667 | 00:02:29.37
12 | 01:11:27.7846 | 00:40:00.957 | 00:01:05.7842 | 00:00:16.016
13 | 00:15:19.55725 | 00:07:56.6295 | 00:01:18.24225 | 00:01:21.253
15 | 04:13:58.037 | 04:13:58.037 | 00:02:30.109 | 00:02:30.109
16 | 02:58:23.038333 | 00:25:00.427 | 00:00:05.165667 | 00:00:05.012
17 | 00:15:00.34025 | 00:07:30.346 | 00:00:04.27425 | 00:00:03.195
19 | 01:02:31.026 | 01:02:31.026 | 00:00:06.4115 | 00:00:06.4115
20 | 00:09:59.441 | 00:09:59.441 | 00:00:06.241 | 00:00:06.241
(13 rows)
Tempfiles are crucial when it comes to performance because when tempfiles are
used, it means Postgres is swapping on disc. These statistics might help you
adjusting work_mem to a better value for your system.
Example:
pglog=# select * from tempfile_report ;
min | max | global_average | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100%
---------+--------+----------------+---------+---------+---------+---------+-------+-------+-------+--------+--------+--------
8192 kB | 374 GB | 57 GB | 5720 MB | 5720 MB | 5720 MB | 6000 MB | 32 GB | 33 GB | 36 GB | 142 GB | 144 GB | 374 GB
(1 row)
It happens that most tempfiles are used by one or two specific queries. In that
case, you can set work_mem to a highest value right before those queries on
the session level.
Example:
pglog=# select * from tempfile_queries limit 2 \gx
-[ RECORD 1 ]--------------+----------------------------------------------------------------------
frequency | 33
query_tempfilesize_median | 142 GB
query_tempfilesize_average | 142 GB
total_size | 4689 GB
query | create materialized view pglog.top_n_queries as ( +
| with queries as ( +
| select +
| split_part(message, ?) +
| ) +
| select query, +
| count(1), +
| avg(duration) as average, +
| percentile_disc(0.5) within group (order by duration) as median+
| from queries +
| group by query +
| order by average desc +
| )
-[ RECORD 2 ]--------------+----------------------------------------------------------------------
frequency | 154
query_tempfilesize_median | 6000 MB
query_tempfilesize_average | 30 GB
total_size | 4655 GB
query | call create_mylogs (?,true);
This view will give the top n slowest query in the partitionned table
pglog. Don't forget to add a limit n when you'll query the view unless you
want to see all your slow queries.
Example:
pglog=# select *
from top_n_slowest_queries
limit 5;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?
count | 1
average | 26384.688000000000
median | 26384.688
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog inner join (values ? ? )) as severity(level,name) on pglog.pglog.error_severity = severity.name, total group by error_severity, severity.level, total order by severity.level;
count | 1
average | 24090.557000000000
median | 24090.557
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog, total group by error_severity, total order by percentage desc;
count | 2
average | 23530.949500000000
median | 23210.202
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?
count | 173009
average | 2.5784063256824789
median | 1.666
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?
count | 173009
average | 1.9884613054812177
median | 0.593
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.