Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
242 lines (207 loc) · 13.3 KB

20210918_03.md

File metadata and controls

242 lines (207 loc) · 13.3 KB

PostgreSQL pglog 插件, 分析PG csvlog产生各种报告

作者

digoal

日期

2021-09-18

标签

PostgreSQL , pglog


背景

解析csvlog, 获得数据库报告

https://mydbanotebook.org/post/pglog/

https://gitlab.com/l_avrot/pglog

建议的参数配置如下:

Parameter Setting Comment
log_destination csvlog Mandatory
logging_collector on Mandatory for csv loggin
log_min_duration_statement 250 ms For a transactional workload, for Business Intelligence worload, 1s or 5s can be fine
log_autovacuum_min_duration 0 To see all autovacuum operation occuring
log_checkpoints on To see when checkpoints occur
log_connections on To see the rythm of connections
log_disconnections on To see how long sessions are
log_lock_waits on To see when locks occur
log_temp_files 0 To see if PostgreSQL is swapping

使用, 创建需要分析的csvlog文件或目录

create extension pglog cascade;  
  
call pglog.create_logtable_from_file (file, directory, version[, schema][, materialized]);  
call pglog.create_logtable_from_dir (directory, version[, schema][, materialized]);  

报告类别

  • an error report
  • a detailed error analysis for each level of error
  • an autovacuum report
  • a checkpoints report
  • an hourly checkpoints report
  • a tempfile usage report
  • a tempfile queries report
  • a top n slow queries report

例子

Based on the pglog partitionned table, you can generate dashboards.

Error report

This view will query the partitionned table pglog for log levels analysis.
Example:

pglog=# select * from error_report;  
 error_severity |   nb    | percentage   
----------------+---------+------------  
 FATAL          |       8 |       0.00  
 ERROR          |     106 |       0.00  
 WARNING        |       3 |       0.00  
 LOG            | 6055776 |     100.00  
(4 rows)  

Detail error analysis

The views error_analyze_(warning|error|fatal|panic) will display a more detail
analysis of the given log level. For lower severity level, I recommend using
adding limit n to your query.
Example:

select * from error_analyze_fatal;  
                       message                       | count |            min             |            max               
-----------------------------------------------------+-------+----------------------------+----------------------------  
 terminating connection due to administrator command |     7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00  
 connection to client lost                           |     2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00  
(2 rows)  

Autovacuum report

This view will display the number of autovacuum and autoanalyze per table. You
might want to add a limit n part to your query so you don't display all your
tables in the resault.
Example:

pglog=# select * from autovacuum_report limit 5;  
 schema_name |  table_name   | vacuum_count | analyze_count   
-------------+---------------+--------------+---------------  
 pg_catalog  | pg_statistic  |           24 |             0  
 pg_catalog  | pg_attribute  |           15 |            14  
 pglog       | postgresqlwed |            8 |             7  
 pg_catalog  | pg_class      |            7 |             7  
 pg_catalog  | pg_type       |            6 |             6  
(5 rows)  

Checkpoints

This view will give you statistics about your checkpoints as how often they're
launched and how much time it takes to complete.
Example:

pglog=# select * from checkpoints_stats;  
 avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration   
---------------------+------------------------+-------------------------+----------------------------  
 06:01:55.491981     | 00:15:00.809           | 00:00:32.447288         | 00:00:06.6675  
(1 row)  

Checkpoints by hour

It's rare that your database has a flat workload. It usually works by peaks.
This view will give you chackpoints statistics by hours. Once the peak hours
are identified, I strongly suggest to craft a query that will get the same stats
per minute during the peak times.
Example:

pglog=# select * from checkpoints_stats_by_hour;  
 date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration   
-----------+---------------------+------------------------+-------------------------+----------------------------  
         6 | 03:31:43.989333     | 00:04:59.941           | 00:00:06.375            | 00:00:04.431  
         7 | 22:36:05.281917     | 00:15:00.809           | 00:00:17.00225          | 00:00:03.916  
         8 | 00:21:22.5615       | 00:15:01.661           | 00:00:17.9445           | 00:00:09.3765  
         9 | 00:39:21.674167     | 00:30:17.2415          | 00:00:33.430833         | 00:00:11.2735  
        10 | 00:16:09.9175       | 00:16:09.9175          | 00:00:06.032            | 00:00:06.032  
        11 | 00:37:25.941667     | 00:12:22.328           | 00:01:40.044667         | 00:02:29.37  
        12 | 01:11:27.7846       | 00:40:00.957           | 00:01:05.7842           | 00:00:16.016  
        13 | 00:15:19.55725      | 00:07:56.6295          | 00:01:18.24225          | 00:01:21.253  
        15 | 04:13:58.037        | 04:13:58.037           | 00:02:30.109            | 00:02:30.109  
        16 | 02:58:23.038333     | 00:25:00.427           | 00:00:05.165667         | 00:00:05.012  
        17 | 00:15:00.34025      | 00:07:30.346           | 00:00:04.27425          | 00:00:03.195  
        19 | 01:02:31.026        | 01:02:31.026           | 00:00:06.4115           | 00:00:06.4115  
        20 | 00:09:59.441        | 00:09:59.441           | 00:00:06.241            | 00:00:06.241  
(13 rows)  

Tempfile report

Tempfiles are crucial when it comes to performance because when tempfiles are
used, it means Postgres is swapping on disc. These statistics might help you
adjusting work_mem to a better value for your system.
Example:

pglog=# select * from tempfile_report ;  
   min   |  max   | global_average |   10%   |   20%   |   30%   |   40% |  50%  |  60%  |  70%  |  80%   |  90%   |  100%    
---------+--------+----------------+---------+---------+---------+---------+-------+-------+-------+--------+--------+--------  
 8192 kB | 374 GB | 57 GB          | 5720 MB | 5720 MB | 5720 MB | 6000 MB | 32 GB | 33 GB | 36 GB | 142 GB | 144 GB | 374 GB  
(1 row)  

Tempfile queries

It happens that most tempfiles are used by one or two specific queries. In that
case, you can set work_mem to a highest value right before those queries on
the session level.
Example:

pglog=# select * from tempfile_queries limit 2 \gx  
-[ RECORD 1 ]--------------+----------------------------------------------------------------------  
frequency                  | 33  
query_tempfilesize_median  | 142 GB  
query_tempfilesize_average | 142 GB  
total_size                 | 4689 GB  
query                      | create materialized view pglog.top_n_queries as (                    +  
                           |       with queries as (                                              +  
                           |       select                                                         +  
                           |         split_part(message, ?)                                       +  
                           |     )                                                                +  
                           |     select query,                                                    +  
                           |       count(1),                                                      +  
                           |       avg(duration) as average,                                      +  
                           |       percentile_disc(0.5) within group (order by duration) as median+  
                           |     from queries                                                     +  
                           |     group by query                                                   +  
                           |     order by average desc                                            +  
                           |     )  
-[ RECORD 2 ]--------------+----------------------------------------------------------------------  
frequency                  | 154  
query_tempfilesize_median  | 6000 MB  
query_tempfilesize_average | 30 GB  
total_size                 | 4655 GB  
query                      | call create_mylogs (?,true);  

Top n slow queries

This view will give the top n slowest query in the partitionned table
pglog. Don't forget to add a limit n when you'll query the view unless you
want to see all your slow queries.
Example:

pglog=# select *  
from top_n_slowest_queries  
limit 5;  
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?   
count   | 1  
average | 26384.688000000000  
median  | 26384.688  
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog inner join (values ? ? )) as severity(level,name) on pglog.pglog.error_severity = severity.name, total group by error_severity, severity.level, total order by severity.level;  
count   | 1  
average | 24090.557000000000  
median  | 24090.557  
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog, total group by error_severity, total order by percentage desc;  
count   | 2  
average | 23530.949500000000  
median  | 23210.202  
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?   
count   | 173009  
average | 2.5784063256824789  
median  | 1.666  
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?   
count   | 173009  
average | 1.9884613054812177  
median  | 0.593  

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat