Skip to content

add prometheus metrics #514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: develop
Choose a base branch
from
Draft

add prometheus metrics #514

wants to merge 12 commits into from

Conversation

ASuciuX
Copy link
Contributor

@ASuciuX ASuciuX commented Apr 1, 2025


Ordinals

Health Metrics

  • update_chain_tip_distance
  • record_processing_error

Performance Metrics

  • record_block_processing_time
  • record_inscription_parsing_time
  • record_ordinal_computation_time
  • record_db_write_time

Volumetric Metrics

  • record_inscriptions_in_block
  • record_brc20_operations_in_block

BRC-20 Specific Metrics

  • record_brc20_deploy
  • record_brc20_mint
  • record_brc20_transfer
  • record_brc20_transfer_send

Runes

Health Metrics

  • update_chain_tip_distance
  • record_processing_error

Performance Metrics

  • record_block_processing_time
  • record_rune_parsing_time
  • record_rune_computation_time
  • record_db_write_time

Volumetric Metrics

  • record_runes_in_block
  • record_rune_operations_in_block

Runes Specific Metrics

  • record_rune_etching
  • record_rune_mint
  • record_rune_transfer
  • record_rune_burn

Copy link

github-actions bot commented Apr 1, 2025

Vercel deployment URL: https://bitcoin-indexer-ajnyrbzi5-hirosystems.vercel.app 🚀

Copy link

codecov bot commented Apr 1, 2025

@ASuciuX ASuciuX changed the title add(ordinals+brc20): prometheus metrics add prometheus metrics Apr 2, 2025
Copy link
Member

@CharlieC3 CharlieC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Looking forward to exposing this data. Left some questions/comments.

Comment on lines +20 to +21
pub rune_parsing_time: Histogram,
pub rune_computation_time: Histogram,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, parsing and computation describe two phases involved in processing a rune, is that correct? If so, are there any other phases we should track?

Copy link
Contributor Author

@ASuciuX ASuciuX Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. I am looking into it and will note the possible phases there might be. The db_write_time is also part of this, I appended rune_ to that as well.

Comment on lines 78 to 92
// Update cache size metric
prometheus.metrics_update_cache_size(cache_l2.len() as u64);

// Memory usage metric - include all caches
let mut total_memory = cache_l2.len() as f64 * 0.1; // L2 cache estimate
total_memory += cache_l1.len() as f64 * 0.05; // L1 cache estimate
if brc20_cache.is_some() {
// Add BRC20 cache memory estimate based on config's lru_cache_size
let lru_size = config
.ordinals_brc20_config()
.map(|c| c.lru_cache_size)
.unwrap_or(0);
total_memory += lru_size as f64 * 0.02; // Estimate based on configured cache size
}
prometheus.metrics_update_memory_usage(total_memory);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to track memory/cache, or are there other resources we should be looking at for metrics?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think this would be a good way because you'll only be measuring memory on this thread IIUC... There should be a way to sample the VM's memory use but that should already be covered by the grafana graphs (is that right @CharlieC3 ?)

Comment on lines 77 to 78
// TODO: is there no way to have processing errors as we have on ordinals?
// if so, delete the processing errors metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD before the review is finalized

Comment on lines 75 to 76
prometheus
.metrics_record_block_processing_time(process_start_time.elapsed().as_millis() as f64);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This measurement should occur once per block so we can measure it both by block (with a label for block height or something) and as a histogram to check stats on overall block processing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the Prometheus documentation, and it seems that using labels for each block_height isn’t recommended, as it would create a new time series for every block, leading to high cardinality issues. Unless I’m misunderstanding something, this approach isn’t feasible for Prometheus:

Each labelset is an additional time series that has RAM, CPU, disk, and network costs. Usually the overhead is negligible, but in scenarios with lots of metrics and hundreds of labelsets across hundreds of servers, this can add up quickly.

As a general guideline, try to keep the cardinality of your metrics below 10, and for metrics that exceed that, aim to limit them to a handful across your whole system. The vast majority of your metrics should have no labels.

If you have a metric that has a cardinality over 100 or the potential to grow that large, investigate alternate solutions such as reducing the number of dimensions or moving the analysis away from monitoring and to a general-purpose processing system.

Comment on lines 78 to 92
// Update cache size metric
prometheus.metrics_update_cache_size(cache_l2.len() as u64);

// Memory usage metric - include all caches
let mut total_memory = cache_l2.len() as f64 * 0.1; // L2 cache estimate
total_memory += cache_l1.len() as f64 * 0.05; // L1 cache estimate
if brc20_cache.is_some() {
// Add BRC20 cache memory estimate based on config's lru_cache_size
let lru_size = config
.ordinals_brc20_config()
.map(|c| c.lru_cache_size)
.unwrap_or(0);
total_memory += lru_size as f64 * 0.02; // Estimate based on configured cache size
}
prometheus.metrics_update_memory_usage(total_memory);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think this would be a good way because you'll only be measuring memory on this thread IIUC... There should be a way to sample the VM's memory use but that should already be covered by the grafana graphs (is that right @CharlieC3 ?)

import { FastifyInstance, FastifyReply, FastifyRequest } from 'fastify';
import { ApiMetrics } from '../../metrics/metrics';

export function registerMetricsMiddleware(fastify: FastifyInstance, metrics: ApiMetrics) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alin, apologies, I should've explained this yesterday. The prom-client already takes care of per-request timing measurements and all kinds of default stats, so you shouldn't worry about adding these to your metrics.

The same goes for DB queries, because those get measured by another tool we use called PgHero.

You should only focus on making sure the prometheus metrics are being applied correctly to this API and the Runes API via the use of prom-client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expand prometheus metrics to cover all indexers
3 participants