Skip to content

Commit 68ccb2e

Browse files
committed
docs: Address PR open-policy-agent#7929 review feedback
- Fix misleading 'aggregated' terminology - use 'instance-level' instead - Remove per-query metrics section from monitoring.md, add cross-references - Focus metrics documentation on commonly used regex and http.send built-ins - Add missing counter_rego_builtin_regex_interquery_value_cache_hits metric - Move admonition to after example in REST API documentation - Simplify and reduce scope of metrics documentation per reviewer guidance Signed-off-by: Anivar A Aravind <[email protected]>
1 parent 675a456 commit 68ccb2e

File tree

3 files changed

+39
-112
lines changed

3 files changed

+39
-112
lines changed

docs/docs/monitoring.md

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -90,42 +90,28 @@ When Prometheus is enabled in the status plugin (see [Configuration](./configura
9090

9191
OPA provides two ways to access performance metrics:
9292

93-
1. **System-wide metrics** via the `/metrics` Prometheus endpoint - Aggregated metrics across all OPA operations
93+
1. **System-wide metrics** via the `/metrics` Prometheus endpoint - Instance-level metrics across all OPA operations
9494
2. **Per-query metrics** via API responses with `?metrics=true` - Metrics for individual query executions
9595

96-
These serve different purposes: system metrics for monitoring and alerting, per-query metrics for debugging and optimization.
96+
These serve different purposes: system metrics for OPA instance monitoring and alerting, per-query metrics for debugging and optimization.
9797

9898

9999
## Accessing Metrics
100100

101101
### System-Wide Metrics (Prometheus Endpoint)
102102

103-
Access aggregated metrics across all OPA operations:
103+
Access instance-level metrics across all OPA operations:
104104

105105
- **URL**: `http://localhost:8181/metrics` (default configuration)
106106
- **Method**: HTTP GET
107107
- **Format**: Prometheus text format
108-
- **Contents**: All counters, timers, histograms, Go runtime metrics
108+
- **Contents**: Instance-level counters, timers, histograms, Go runtime metrics
109109
- **Use case**: Monitoring dashboards, alerting, performance trends
110110

111-
### Per-Query Metrics
112-
113-
Get metrics for individual policy evaluations:
114-
115-
- **Method**: Add `?metrics=true` parameter to API requests
116-
- **Supported APIs**: `/v1/data`, `/v0/data`, `/v1/query`, `/v1/compile`
117-
- **Contents**: Query-specific parse, compile, and eval timers
118-
- **Use case**: Debugging slow queries, performance optimization
119-
120-
Example:
121-
```http
122-
POST /v1/data/example?metrics=true HTTP/1.1
123-
```
124-
125-
For details on interpreting per-query metrics, see [REST API Performance Metrics](./rest-api#performance-metrics).
126-
127-
### Other Metric Sources
111+
### Additional Resources
128112

113+
- **Per-query metrics**: See [REST API Performance Metrics](./rest-api#performance-metrics) for debugging individual queries
114+
- **Policy performance**: See [Policy Performance](./policy-performance#performance-metrics) for optimization guidance
129115
- **Status API**: Includes subset of metrics in status reports
130116
- **Decision logs**: Can include request-level metrics when configured
131117
- **CLI tools**: `opa eval --metrics` and `opa bench --metrics`

docs/docs/policy-performance.md

Lines changed: 30 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -971,117 +971,58 @@ Users are recommended to do performance testing to determine the optimal configu
971971

972972
## Performance Metrics
973973

974-
OPA exposes metrics for each phase of policy evaluation:
974+
OPA exposes metrics for policy evaluation performance. These are available through:
975975

976-
- **System-wide metrics** are available at the `/metrics` Prometheus endpoint
977-
- **Per-query metrics** are returned with individual API responses when `?metrics=true` is specified
976+
- **System-wide metrics** at the `/metrics` Prometheus endpoint
977+
- **Per-query metrics** with individual API responses when `?metrics=true` is specified
978978

979-
See [Monitoring](./monitoring#metrics-overview) for the distinction between these metric types.
979+
See [Monitoring](./monitoring#metrics-overview) for more details.
980980

981-
### Query Evaluation Metrics
981+
### Common Built-in Function Metrics
982982

983-
Query evaluation phases:
984-
985-
- `timer_rego_query_parse_ns` - Time spent parsing the query string into AST
986-
- `timer_rego_query_compile_ns` - Time spent compiling the query for evaluation
987-
- `timer_rego_query_eval_ns` - Time spent executing the compiled query
988-
989-
Compilation time often dominates in complex policies.
990-
991-
### Module and Policy Metrics
992-
993-
Policy compilation and parsing:
994-
995-
- `timer_rego_module_parse_ns` - Time to parse policy modules from source
996-
- `timer_rego_module_compile_ns` - Time to compile parsed modules into evaluation form
997-
- `timer_rego_data_parse_ns` - Time to parse data documents
998-
- `timer_rego_input_parse_ns` - Time to parse input documents
999-
1000-
Module compilation runs once at load time. Slow compilation impacts bundle updates.
1001-
1002-
### File and Bundle Loading
1003-
1004-
Policy and data loading:
1005-
1006-
- `timer_rego_load_files_ns` - Time to load policy files from disk
1007-
- `timer_rego_load_bundles_ns` - Time to load and activate bundles
1008-
- `timer_bundle_request_ns` - Time spent downloading bundles
1009-
1010-
### Compilation and Partial Evaluation Metrics
1011-
1012-
Compilation and partial evaluation:
1013-
1014-
- `timer_rego_partial_eval_ns` - Total partial evaluation time
1015-
- `timer_compile_prep_partial_ns` - Time preparing for partial evaluation
1016-
- `timer_compile_eval_constraints_ns` - Time evaluating constraints
1017-
- `timer_compile_translate_queries_ns` - Time translating queries
1018-
- `timer_compile_extract_annotations_unknowns_ns` - Time extracting unknowns
1019-
- `timer_compile_extract_annotations_mask_ns` - Time extracting masks
1020-
- `timer_compile_eval_mask_rule_ns` - Time evaluating mask rules
1021-
- `timer_compile_stage_check_imports_ns` - Time checking imports
1022-
- `counter_compile_stage_comprehension_index_build` - Comprehension indices built
1023-
1024-
High partial evaluation times indicate optimization opportunities.
1025-
1026-
### Evaluation Operation Metrics
1027-
1028-
Evaluation operations produce both timer and histogram metrics:
983+
#### HTTP Built-ins
1029984

1030-
**Timers** (measure total time):
1031-
- `timer_eval_op_plug_ns` - Time spent in plugging operations
1032-
- `timer_eval_op_resolve_ns` - Time resolving references
1033-
- `timer_eval_op_rule_index_ns` - Time spent in rule indexing
1034-
- `timer_eval_op_builtin_call_ns` - Time spent calling built-in functions
1035-
- `timer_partial_op_save_unify_ns` - Time saving unification in partial eval
1036-
- `timer_partial_op_save_set_contains_ns` - Time for set contains in partial eval
1037-
- `timer_partial_op_save_set_contains_rec_ns` - Time for recursive set contains
1038-
- `timer_partial_op_copy_propagation_ns` - Time for copy propagation optimization
985+
`http.send` metrics help identify I/O bottlenecks:
1039986

1040-
**Histograms** (track time distribution):
1041-
- `histogram_eval_op_plug` - Distribution of plugging operation times
1042-
- `histogram_eval_op_resolve` - Distribution of reference resolution times
1043-
- `histogram_eval_op_rule_index` - Distribution of rule indexing times
1044-
- `histogram_eval_op_builtin_call` - Distribution of built-in function call times
1045-
- `histogram_partial_op_save_unify` - Distribution of unification save times
1046-
- `histogram_partial_op_save_set_contains` - Distribution of set contains times
1047-
- `histogram_partial_op_save_set_contains_rec` - Distribution of recursive set contains times
1048-
- `histogram_partial_op_copy_propagation` - Distribution of copy propagation times
987+
- `timer_rego_builtin_http_send_ns` - Total time spent in http.send calls
988+
- `counter_rego_builtin_http_send_interquery_cache_hits` - Inter-query cache hits
989+
- `counter_rego_builtin_http_send_network_requests` - Actual network requests made
1049990

1050-
Histograms show percentiles: 50%, 75%, 90%, 95%, 99%, 99.9%, 99.99%.
991+
High cache hit ratios indicate effective caching and reduced network overhead.
1051992

1052-
### Built-in Function Metrics
993+
#### Regex Built-ins
1053994

1054-
#### HTTP Built-ins
995+
Regex operation metrics help optimize pattern matching:
1055996

1056-
`http.send` metrics:
997+
- `timer_rego_builtin_regex_interquery_ns` - Time spent in regex operations
998+
- `counter_rego_builtin_regex_interquery_cache_hits` - Regex pattern cache hits
999+
- `counter_rego_builtin_regex_interquery_value_cache_hits` - Regex value cache hits
10571000

1058-
- `timer_rego_builtin_http_send_ns` - Total time spent in http.send calls
1059-
- `counter_rego_builtin_http_send_interquery_cache_hits` - Inter-query cache hits
1060-
- `counter_rego_builtin_http_send_network_requests` - Actual network requests made
1001+
Effective regex caching improves performance when the same patterns are used repeatedly.
10611002

1062-
High cache hit ratios indicate effective caching.
1003+
### Core Query Metrics
10631004

1064-
#### External Data Resolution
1005+
Basic query evaluation phases:
10651006

1066-
External data resolution:
1007+
- `timer_rego_query_parse_ns` - Time parsing the query string
1008+
- `timer_rego_query_compile_ns` - Time compiling the query
1009+
- `timer_rego_query_eval_ns` - Time executing the compiled query
10671010

1068-
- `timer_rego_external_resolve_ns` - Time resolving external data references
1011+
Compilation time often dominates in complex policies.
10691012

1070-
### SDK and Server Metrics
1013+
### High-Level Metrics
10711014

1072-
High-level evaluation:
1015+
Server-level metrics for overall performance:
10731016

10741017
- `timer_server_handler_ns` - Total request handler execution time
1075-
- `timer_sdk_decision_eval_ns` - SDK decision evaluation time
10761018
- `counter_server_query_cache_hit` - Server-level query cache hits
10771019

1078-
### Using Metrics
1020+
### Using Metrics for Optimization
10791021

1080-
1. Compare parse, compile, and eval times to find slow phases
1081-
2. High operation counts indicate complex queries
1082-
3. Low cache hit rates suggest tuning opportunities
1083-
4. High `http.send` counts indicate I/O bottlenecks
1084-
5. Bundle activation times show deployment latency
1022+
1. **Query phases**: Compare parse, compile, and eval times to identify bottlenecks
1023+
2. **Cache effectiveness**: Low cache hit rates suggest tuning opportunities
1024+
3. **I/O bottlenecks**: High `http.send` network request counts indicate caching issues
1025+
4. **Pattern matching**: Monitor regex cache hits for frequently used patterns
10851026

10861027
Access metrics via:
10871028
- REST API: Add `?metrics=true` to policy evaluation requests

docs/docs/rest-api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2223,8 +2223,6 @@ restarts, a **Redo** Trace Event is emitted.
22232223

22242224
## Performance Metrics
22252225

2226-
**Note**: These are per-query metrics returned inline with API responses. For system-wide aggregated metrics, see the `/metrics` Prometheus endpoint described in [Monitoring](./monitoring#prometheus).
2227-
22282226
OPA can report detailed performance metrics at runtime. Performance metrics can
22292227
be requested on individual API calls and are returned inline with the API
22302228
response. To enable performance metric collection on an API call, specify the
@@ -2265,6 +2263,8 @@ Content-Type: application/json
22652263
}
22662264
```
22672265

2266+
> **Note**: These are per-query metrics returned inline with API responses. For system-wide instance metrics, see the `/metrics` Prometheus endpoint described in [Monitoring](./monitoring#prometheus).
2267+
22682268
OPA provides the following query performance metrics:
22692269

22702270
### Core Query Metrics

0 commit comments

Comments
 (0)