Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul Monitoring Dashboard #32

Open
wants to merge 17 commits into
base: rel1_3
Choose a base branch
from
Open

Overhaul Monitoring Dashboard #32

wants to merge 17 commits into from

Conversation

mbanck
Copy link
Contributor

@mbanck mbanck commented Aug 30, 2023

Grafana dashboard updates

  • Add settings panels at the top
  • Change Top Statements by Total Time table to use chosen time range
  • Add legend to Statements panels and adjust height accordingly
  • Update Checkpoints per Minute panel
  • Add Deadlocks panel to Transactions and Locks
  • Add Temp Files panel
  • Add Checkpoint traffic to WAL traffic panel
  • Update Connections panels
  • Update Table Scans panels to report SeqScan tuples and SeqScan to Index Scan ratio
  • Add Select table/index scans as Read Stats panel
  • Misc layout and display fixes/improvements

sql-exporter query updates

  • Add checksum_failures to pg_stat_database metrics
  • Add cluster query for server version and start time
  • Add additional settings: This adds some memory-based settings (shared_buffers, work_mem, maintenance_work_mem, effective_cache_size and max_wal_size), as well as some notable settings one might want to expose e.g. in Grafana dashboards like data_checksums, jit (v11+), max_worker_processes, random_page_cost, seq_page_cost and checkpoint_timeout).
  • Add buffers to checkpoints query

monitoring_upgrades

This adds some memory-based settings (shared_buffers, work_mem,
maintenance_work_mem, effective_cache_size and max_wal_size), as well as some
notable settings one might want to expose e.g. in Grafana dashboards like
data_checksums, jit (v11+), max_worker_processes, random_page_cost,
seq_page_cost and checkpoint_timeout).
This adds two rows of stat panels at the top that show key settings and values
of the instance:

 * server version, instance uptime, bnumber of cores, total memory
 * max connections
 * memory/wal related settings
 * data checksums/jit settings
 * number of checksum failures and oom kills
 * make (regular) timed checkpoints green and requested ones yellow
 * mention that requested checkpoints can also be wal- or backup-based
 * hardcode rate interval to 90s which seems to display well for most common
   checkpoint_timeout values
 * change y axis intervals to integers
Also, ignore (new) checkpoint buffers metric from Checkpoints per Minute panel.
This changes the Sequential/Index Scans panels (that reported the total number
of seq/index scans per table) to the more useful Sequential Scan Tuples (i.e.,
how many rows were scanned per table) and Seq/Index Tuple Ratio (the ratio
between sequential to index scan tuples per table).
This changes the area fill to 1 from 5 to be more in line with the other
Database-related panels. For the Connections by State panel, also add a
table-based legend, only show connections states that are present in the chosen
time range and override the colors to be more in line with their states.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants