Skip to content

Commit 6c020fe

Browse files
committed
Refactor documentation for HTTPArchive metrics: clarify metric types, usage guidelines, and SQL patterns
1 parent 2d4aac9 commit 6c020fe

File tree

2 files changed

+143
-82
lines changed

2 files changed

+143
-82
lines changed

.agents/skills/add-httparchive-metric-report/SKILL.md

Lines changed: 26 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -5,100 +5,48 @@ description: Add new metrics to HTTPArchive reports config. USE FOR adding perfo
55

66
# Adding Metrics to HTTPArchive Reports
77

8-
## Documentation Reference
8+
## Documentation
99

10-
See [reports.md](../../../reports.md) for complete architecture, troubleshooting, and configuration details.
10+
**See [reports.md](../../../reports.md)** for complete guide including:
11+
- Architecture and processing details
12+
- Quick Decision Guide table
13+
- Required SQL patterns checklist
14+
- SQL pattern reference (adoption, percentiles, binning)
15+
- Complete examples
16+
- Troubleshooting
1117

12-
## Quick Implementation
18+
## Quick Start
1319

14-
Add metrics to `includes/reports.js` in the `config._metrics` object. The system automatically generates reports across all lenses (all, top1k, wordpress, etc.).
20+
1. Open `includes/reports.js`, find `config._metrics` (line ~42)
21+
2. Choose type: **Timeseries** (adoption/percentiles) or **Histogram** (distributions)
22+
3. Add metric with required patterns: `date`, `is_root_page`, `${params.lens.sql}`, `${params.devRankFilter}`, `${ctx.ref('crawl', 'pages')}`, `GROUP BY client`
23+
4. Run `get_errors` to verify
1524

16-
## Metric Type Selection
25+
## Key Rules
1726

18-
| Type | Use For | Don't Use For |
19-
|------|---------|---------------|
20-
| **Timeseries** | Percentiles, adoption rates, trends, **boolean/presence metrics** | N/A (most versatile) |
21-
| **Histogram** | Continuous value distributions (page weight, load times) | Boolean/binary (only 2 states) |
27+
- **Boolean/adoption metrics**: Timeseries ONLY (histogram meaningless for 2 states)
28+
- **Continuous metrics**: Both histogram + timeseries
29+
- **Use safe functions**: `SAFE_DIVIDE()`, `SAFE.BOOL()` for custom metrics
30+
- **Filter zeros**: Add `AND metric > 0` before percentile calculations
2231

23-
**Key Rule:** Always use timeseries for boolean/adoption metrics; histogram only for continuous distributions.
24-
25-
## Required SQL Patterns
26-
27-
Every metric MUST include:
28-
- `date = '${params.date}'`
29-
- `AND is_root_page`
30-
- `${params.lens.sql}`
31-
- `${params.devRankFilter}`
32-
- `${ctx.ref('crawl', 'pages')}`
33-
- `GROUP BY client ORDER BY client`
34-
35-
## Quick Patterns
36-
37-
### Timeseries - Adoption/Percentage
38-
```sql
39-
ROUND(SAFE_DIVIDE(COUNTIF(condition), COUNT(0)) * 100, 2) AS pct_pages
40-
```
41-
42-
### Timeseries - Percentiles
43-
```sql
44-
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(101)] / 1024, 2) AS p10,
45-
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(251)] / 1024, 2) AS p25,
46-
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(501)] / 1024, 2) AS p50,
47-
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(751)] / 1024, 2) AS p75,
48-
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(901)] / 1024, 2) AS p90
49-
-- Add: AND FLOAT64(metric) > 0 in WHERE for continuous metrics
50-
```
51-
52-
### Histogram - Distribution Bins
53-
```sql
54-
-- Core binning pattern in innermost subquery:
55-
CAST(FLOOR(FLOAT64(metric) / bin_size) * bin_size AS INT64) AS bin,
56-
COUNT(0) AS volume
57-
-- Wrap with pdf: volume / SUM(volume) OVER (PARTITION BY client)
58-
-- Wrap with cdf: SUM(pdf) OVER (PARTITION BY client ORDER BY bin)
59-
```
60-
61-
## Examples
32+
## Minimal Example
6233

6334
```javascript
64-
llmsTxtAdoption: {
35+
metricName: {
6536
SQL: [
6637
{
67-
type: 'timeseries',
38+
type: 'timeseries', // or 'histogram'
6839
query: DataformTemplateBuilder.create((ctx, params) => `
69-
SELECT
70-
client,
71-
ROUND(SAFE_DIVIDE(
72-
COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)),
73-
COUNT(0)
74-
) * 100, 2) AS pct_pages
40+
SELECT client, /* your calculations */
7541
FROM ${ctx.ref('crawl', 'pages')}
76-
WHERE
77-
date = '${params.date}'
78-
AND is_root_page
79-
${params.lens.sql}
80-
${params.devRankFilter}
81-
GROUP BY client
82-
ORDER BY client
42+
WHERE date = '${params.date}' AND is_root_page
43+
${params.lens.sql} ${params.devRankFilter}
44+
GROUP BY client ORDER BY client
8345
`)
8446
}
8547
]
8648
}
8749
```
8850

89-
See [reports.md](../../../reports.md) for complete histogram + timeseries examples.
90-
91-
## Implementation
92-
93-
1. Open `includes/reports.js`, locate `config._metrics` (line ~42)
94-
2. Add metric before closing `}` of `_metrics`
95-
3. Use patterns above for timeseries/histogram structure
96-
4. Include all required SQL patterns
97-
5. Run `get_errors` to verify
98-
99-
## Key Notes
100-
101-
- **Continuous metrics:** Add `AND metric > 0` before percentile calculations
102-
- **Custom metrics:** Use `SAFE.BOOL()` and `SAFE_DIVIDE()` for safety
103-
- **Auto-processing:** Metrics run across all lenses automatically
51+
See [reports.md](../../../reports.md) for complete patterns and examples.
10452

reports.md

Lines changed: 117 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,25 @@ The system supports two types of SQL queries:
3030
- **Output**: Contains `bin`, `volume`, `pdf`, `cdf` columns
3131
- **Use case**: Page weight distributions, performance metric distributions
3232
- **Export path**: `reports/{date_folder}/{metric_id}_test.json`
33+
- **⚠️ Do NOT use for**: Boolean/binary metrics (present/not present) - only two states don't create meaningful distributions
3334

3435
#### 2. Timeseries
3536

3637
- **Purpose**: Trend analysis over time
3738
- **Output**: Contains percentile data (p10, p25, p50, p75, p90) with timestamps
38-
- **Use case**: Performance trends, adoption over time
39+
- **Use case**: Performance trends, adoption over time, **boolean/adoption metrics**
3940
- **Export path**: `reports/{metric_id}_test.json`
4041

42+
### Quick Decision Guide
43+
44+
| Metric Type | Use Timeseries | Use Histogram | Use Both |
45+
|-------------|----------------|---------------|----------|
46+
| Boolean/Adoption (present/not present) | ✅ Always | ❌ Never ||
47+
| Percentage/Rate | ✅ Yes | ❌ Rarely useful ||
48+
| Continuous values (bytes, time, count) | ✅ For percentiles | ✅ For distribution | ✅ Often |
49+
50+
**Key Rule**: Always use timeseries for boolean/adoption metrics; histogram only for continuous distributions.
51+
4152
### Lenses (Data Filters)
4253

4354
Lenses allow filtering data by different criteria:
@@ -59,7 +70,15 @@ Lenses allow filtering data by different criteria:
5970

6071
## How to Add a New Report
6172

62-
### Step 1: Define Your Metric
73+
### Step 1: Choose Your Metric Type
74+
75+
Determine which SQL type(s) to use based on your metric:
76+
77+
- **Boolean/Adoption metrics** (e.g., feature presence, file exists): Use timeseries only
78+
- **Continuous metrics** (e.g., page weight, load time): Use both histogram and timeseries
79+
- **Percentages/Rates**: Use timeseries only
80+
81+
### Step 2: Define Your Metric
6382

6483
Add your metric to the `_metrics` object in `includes/reports.js`:
6584

@@ -169,13 +188,62 @@ Your SQL template receives these parameters:
169188
- `timestamp` - Unix timestamp in milliseconds
170189
- `p10`, `p25`, `p50`, `p75`, `p90` - Percentile values
171190

191+
### Required SQL Patterns
192+
193+
Every metric query **MUST** include these patterns:
194+
195+
```sql
196+
WHERE
197+
date = '${params.date}' -- Date filter
198+
AND is_root_page -- Root page filter
199+
${params.lens.sql} -- Lens filtering
200+
${params.devRankFilter} -- Dev environment sampling
201+
-- Use:
202+
${ctx.ref('crawl', 'pages')} -- Proper table reference
203+
GROUP BY client
204+
ORDER BY client
205+
```
206+
207+
### SQL Pattern Reference
208+
209+
#### Adoption/Percentage Metrics (Timeseries)
210+
211+
```sql
212+
ROUND(SAFE_DIVIDE(
213+
COUNTIF(condition),
214+
COUNT(0)
215+
) * 100, 2) AS pct_pages
216+
```
217+
218+
#### Percentile Distributions (Timeseries)
219+
220+
```sql
221+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(101)] / 1024, 2) AS p10,
222+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(251)] / 1024, 2) AS p25,
223+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(501)] / 1024, 2) AS p50,
224+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(751)] / 1024, 2) AS p75,
225+
ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001)[OFFSET(901)] / 1024, 2) AS p90
226+
-- Important: Add WHERE condition: AND FLOAT64(metric) > 0 for continuous metrics
227+
```
228+
229+
#### Distribution Binning (Histogram)
230+
231+
```sql
232+
-- Innermost subquery:
233+
CAST(FLOOR(FLOAT64(metric) / bin_size) * bin_size AS INT64) AS bin,
234+
COUNT(0) AS volume
235+
-- Wrap with: volume / SUM(volume) OVER (PARTITION BY client) AS pdf
236+
-- Wrap with: SUM(pdf) OVER (PARTITION BY client ORDER BY bin) AS cdf
237+
```
238+
172239
### Best Practices
173240

174241
1. **Filter root pages**: Always include `AND is_root_page` unless you specifically need all pages
175242
2. **Handle null values**: Use appropriate null checks and filtering
176243
3. **Use consistent binning**: For histograms, use logical bin sizes (e.g., 100KB increments for page weight)
177244
4. **Optimize performance**: Use appropriate WHERE clauses and avoid expensive operations
178245
5. **Test with dev filters**: Your queries should work with the development rank filter
246+
6. **Use safe functions**: `SAFE.BOOL()` for custom metrics, `SAFE_DIVIDE()` for percentages
179247

180248
## Lenses
181249

@@ -256,7 +324,46 @@ const EXPORT_CONFIG = {
256324

257325
## Examples
258326

259-
### Adding a JavaScript Bundle Size Metric
327+
### Example 1: Adding an Adoption Metric (Boolean/Presence)
328+
329+
For metrics that track whether a feature/file exists (present or not present), use **timeseries only**:
330+
331+
```javascript
332+
llmsTxtAdoption: {
333+
SQL: [
334+
{
335+
type: 'timeseries',
336+
query: DataformTemplateBuilder.create((ctx, params) => `
337+
SELECT
338+
client,
339+
ROUND(SAFE_DIVIDE(
340+
COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)),
341+
COUNT(0)
342+
) * 100, 2) AS pct_pages
343+
FROM ${ctx.ref('crawl', 'pages')}
344+
WHERE
345+
date = '${params.date}'
346+
AND is_root_page
347+
${params.lens.sql}
348+
${params.devRankFilter}
349+
GROUP BY client
350+
ORDER BY client
351+
`)
352+
}
353+
]
354+
}
355+
```
356+
357+
**Key points:**
358+
359+
- Uses `SAFE_DIVIDE()` to avoid division by zero
360+
- Uses `SAFE.BOOL()` for accessing custom_metrics that may not exist
361+
- Returns `pct_pages` as the adoption percentage
362+
- No histogram - boolean metrics don't have meaningful distributions
363+
364+
### Example 2: Adding a Continuous Metric (Histogram + Timeseries)
365+
366+
For metrics with continuous values (bytes, time, count), use both histogram and timeseries:
260367

261368
```javascript
262369
jsBytes: {
@@ -332,4 +439,10 @@ jsBytes: {
332439
}
333440
```
334441

335-
This would automatically generate reports for JavaScript bundle sizes across all lenses and the configured date range.
442+
**Key points:**
443+
444+
- Histogram shows distribution across bins (50KB increments)
445+
- Timeseries shows percentiles over time (p10, p25, p50, p75, p90)
446+
- Both queries filter out zero values: `AND INT64(summary.bytesJS) > 0`
447+
- Uses nested CTEs for clear structure
448+
- Automatically generates reports for JavaScript bundle sizes across all lenses and the configured date range

0 commit comments

Comments
 (0)