@@ -30,14 +30,25 @@ The system supports two types of SQL queries:
3030- ** Output** : Contains ` bin ` , ` volume ` , ` pdf ` , ` cdf ` columns
3131- ** Use case** : Page weight distributions, performance metric distributions
3232- ** Export path** : ` reports/{date_folder}/{metric_id}_test.json `
33+ - ** ⚠️ Do NOT use for** : Boolean/binary metrics (present/not present) - only two states don't create meaningful distributions
3334
3435#### 2. Timeseries
3536
3637- ** Purpose** : Trend analysis over time
3738- ** Output** : Contains percentile data (p10, p25, p50, p75, p90) with timestamps
38- - ** Use case** : Performance trends, adoption over time
39+ - ** Use case** : Performance trends, adoption over time, ** boolean/adoption metrics **
3940- ** Export path** : ` reports/{metric_id}_test.json `
4041
42+ ### Quick Decision Guide
43+
44+ | Metric Type | Use Timeseries | Use Histogram | Use Both |
45+ | -------------| ----------------| ---------------| ----------|
46+ | Boolean/Adoption (present/not present) | ✅ Always | ❌ Never | ❌ |
47+ | Percentage/Rate | ✅ Yes | ❌ Rarely useful | ❌ |
48+ | Continuous values (bytes, time, count) | ✅ For percentiles | ✅ For distribution | ✅ Often |
49+
50+ ** Key Rule** : Always use timeseries for boolean/adoption metrics; histogram only for continuous distributions.
51+
4152### Lenses (Data Filters)
4253
4354Lenses allow filtering data by different criteria:
@@ -59,7 +70,15 @@ Lenses allow filtering data by different criteria:
5970
6071## How to Add a New Report
6172
62- ### Step 1: Define Your Metric
73+ ### Step 1: Choose Your Metric Type
74+
75+ Determine which SQL type(s) to use based on your metric:
76+
77+ - ** Boolean/Adoption metrics** (e.g., feature presence, file exists): Use timeseries only
78+ - ** Continuous metrics** (e.g., page weight, load time): Use both histogram and timeseries
79+ - ** Percentages/Rates** : Use timeseries only
80+
81+ ### Step 2: Define Your Metric
6382
6483Add your metric to the ` _metrics ` object in ` includes/reports.js ` :
6584
@@ -169,13 +188,62 @@ Your SQL template receives these parameters:
169188- ` timestamp ` - Unix timestamp in milliseconds
170189- ` p10 ` , ` p25 ` , ` p50 ` , ` p75 ` , ` p90 ` - Percentile values
171190
191+ ### Required SQL Patterns
192+
193+ Every metric query ** MUST** include these patterns:
194+
195+ ``` sql
196+ WHERE
197+ date = ' ${params.date}' -- Date filter
198+ AND is_root_page -- Root page filter
199+ ${params .lens .sql} -- Lens filtering
200+ ${params .devRankFilter } -- Dev environment sampling
201+ -- Use:
202+ ${ctx .ref (' crawl' , ' pages' )} -- Proper table reference
203+ GROUP BY client
204+ ORDER BY client
205+ ```
206+
207+ ### SQL Pattern Reference
208+
209+ #### Adoption/Percentage Metrics (Timeseries)
210+
211+ ``` sql
212+ ROUND(SAFE_DIVIDE(
213+ COUNTIF(condition),
214+ COUNT (0 )
215+ ) * 100 , 2 ) AS pct_pages
216+ ```
217+
218+ #### Percentile Distributions (Timeseries)
219+
220+ ``` sql
221+ ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001 )[OFFSET(101 )] / 1024 , 2 ) AS p10,
222+ ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001 )[OFFSET(251 )] / 1024 , 2 ) AS p25,
223+ ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001 )[OFFSET(501 )] / 1024 , 2 ) AS p50,
224+ ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001 )[OFFSET(751 )] / 1024 , 2 ) AS p75,
225+ ROUND(APPROX_QUANTILES(FLOAT64(metric), 1001 )[OFFSET(901 )] / 1024 , 2 ) AS p90
226+ -- Important: Add WHERE condition: AND FLOAT64(metric) > 0 for continuous metrics
227+ ```
228+
229+ #### Distribution Binning (Histogram)
230+
231+ ``` sql
232+ -- Innermost subquery:
233+ CAST(FLOOR(FLOAT64(metric) / bin_size) * bin_size AS INT64) AS bin,
234+ COUNT (0 ) AS volume
235+ -- Wrap with: volume / SUM(volume) OVER (PARTITION BY client) AS pdf
236+ -- Wrap with: SUM(pdf) OVER (PARTITION BY client ORDER BY bin) AS cdf
237+ ```
238+
172239### Best Practices
173240
1742411 . ** Filter root pages** : Always include ` AND is_root_page ` unless you specifically need all pages
1752422 . ** Handle null values** : Use appropriate null checks and filtering
1762433 . ** Use consistent binning** : For histograms, use logical bin sizes (e.g., 100KB increments for page weight)
1772444 . ** Optimize performance** : Use appropriate WHERE clauses and avoid expensive operations
1782455 . ** Test with dev filters** : Your queries should work with the development rank filter
246+ 6 . ** Use safe functions** : ` SAFE.BOOL() ` for custom metrics, ` SAFE_DIVIDE() ` for percentages
179247
180248## Lenses
181249
@@ -256,7 +324,46 @@ const EXPORT_CONFIG = {
256324
257325## Examples
258326
259- ### Adding a JavaScript Bundle Size Metric
327+ ### Example 1: Adding an Adoption Metric (Boolean/Presence)
328+
329+ For metrics that track whether a feature/file exists (present or not present), use ** timeseries only** :
330+
331+ ``` javascript
332+ llmsTxtAdoption: {
333+ SQL : [
334+ {
335+ type: ' timeseries' ,
336+ query: DataformTemplateBuilder .create ((ctx , params ) => `
337+ SELECT
338+ client,
339+ ROUND(SAFE_DIVIDE(
340+ COUNTIF(SAFE.BOOL(custom_metrics.other.llms_txt_validation.valid)),
341+ COUNT(0)
342+ ) * 100, 2) AS pct_pages
343+ FROM ${ ctx .ref (' crawl' , ' pages' )}
344+ WHERE
345+ date = '${ params .date } '
346+ AND is_root_page
347+ ${ params .lens .sql }
348+ ${ params .devRankFilter }
349+ GROUP BY client
350+ ORDER BY client
351+ ` )
352+ }
353+ ]
354+ }
355+ ```
356+
357+ ** Key points:**
358+
359+ - Uses ` SAFE_DIVIDE() ` to avoid division by zero
360+ - Uses ` SAFE.BOOL() ` for accessing custom_metrics that may not exist
361+ - Returns ` pct_pages ` as the adoption percentage
362+ - No histogram - boolean metrics don't have meaningful distributions
363+
364+ ### Example 2: Adding a Continuous Metric (Histogram + Timeseries)
365+
366+ For metrics with continuous values (bytes, time, count), use both histogram and timeseries:
260367
261368``` javascript
262369jsBytes: {
@@ -332,4 +439,10 @@ jsBytes: {
332439}
333440```
334441
335- This would automatically generate reports for JavaScript bundle sizes across all lenses and the configured date range.
442+ ** Key points:**
443+
444+ - Histogram shows distribution across bins (50KB increments)
445+ - Timeseries shows percentiles over time (p10, p25, p50, p75, p90)
446+ - Both queries filter out zero values: ` AND INT64(summary.bytesJS) > 0 `
447+ - Uses nested CTEs for clear structure
448+ - Automatically generates reports for JavaScript bundle sizes across all lenses and the configured date range
0 commit comments