Skip to content

Add same-metrics dataset for time series UI test scenarios#7

Open
kpatticha wants to merge 12 commits intosimianhacker:mainfrom
kpatticha:add-same-metric-dataset
Open

Add same-metrics dataset for time series UI test scenarios#7
kpatticha wants to merge 12 commits intosimianhacker:mainfrom
kpatticha:add-same-metric-dataset

Conversation

@kpatticha
Copy link
Copy Markdown
Contributor

@kpatticha kpatticha commented Feb 26, 2026

Add same-metrics and edge-cases datasets for time series UI test scenarios

Summary

Introduces two new datasets for testing the Metrics experience in Discover:

  1. same-metrics -- 9 scenario pairs (15 data streams) where each pair shares the metric name request_duration but differs along one axis (dimensions, ES metric type, field type, histogram variant, or unit).
  2. edge-cases -- 6 data streams that each undergo a mid-backfill rollover to change their mapping, simulating real-world metric type, field type, or unit updates over time.

Build first

npm run build

Dataset: same-metrics

  • CLI: ./forge --dataset same-metrics [--interval 1m] [--backfill now-1h] [--purge]
  • --count is ignored; all 15 streams are always populated.
# Scenario Stream A Stream B What differs
1 Identical (baseline) same-metric-identical-a same-metric-identical-b Nothing (gauge, double, ms, host.name)
2 Different dimension keys same-metric-different-dims-host-region same-metric-different-dims-host-env dimension_keys (2 vs 3, sharing host.name + region)
3 Different ES metric type same-metric-different-estype-gauge same-metric-different-estype-counter metric type + field type
4 Different ES metric type (histogram) same-metric-different-estype-histogram same-metric-different-estype-gauge (shared) metric type
5 Different field type same-metric-different-fieldtype-long same-metric-different-fieldtype-double field type
6 Different histogram (tdigest vs exponential) same-metric-different-histogram-tdigest same-metric-different-histogram-exponential field type
7 Different histogram (tdigest vs legacy) same-metric-different-histogram-tdigest (shared) same-metric-different-histogram-legacy field type
8 Different unit (null vs ms) same-metric-different-unit-null same-metric-different-unit-ms unit
9 Different unit (ms vs s) same-metric-different-unit-ms (shared) same-metric-different-unit-seconds unit

Shared metric: metric.name = request_duration, field request_duration. Each document includes test.scenario and test.data_stream for filtering in Discover.

Dataset: edge-cases

  • CLI: ./forge --dataset edge-cases [--interval 1m] [--purge]
  • --count and --backfill are ignored; the dataset generates a fixed 10-minute window (5 min per phase).

Each stream starts with one mapping (phase 1), then the index template is updated, the data stream is rolled over, and data continues with the new mapping (phase 2). Each stream uses its stream name as both metric.name and the metric field name.

Stream Phase 1 Phase 2 (after rollover)
edge-case-gauge-to-counter gauge / long counter / double
edge-case-histogram-to-gauge histogram (legacy) gauge / double
edge-case-histogram-to-tdigest histogram (legacy) tdigest
edge-case-tdigest-to-exponential tdigest exponential_histogram
edge-case-unit-null-to-ms unit: null unit: ms
edge-case-unit-ms-to-s unit: ms unit: s

Other changes

  • Extracted shared histogram value builders (buildHistogramValue, buildTdigestValue, buildExponentialHistogramValue) into src/formatters/histogram-helpers.ts, used by both datasets.
  • Made runBackfillStream and runRealTimeStream in BaseSimulator protected with optional start/end params so subclasses can implement multi-phase backfill.

This PR was prepared with AI assistance.

Summary by CodeRabbit

  • New Features

    • Added two datasets: "same-metrics" (9 scenario pairs / 15 streams) and "edge-cases" (6 streams with a mid-backfill rollover and fixed 10‑minute window).
  • Enhancements

    • New simulators with per-stream backfill/real-time flows, template setup and rollover, and support for histogram/tdigest/exponential-histogram metric representations.
  • Documentation

    • README usage sections, example commands, dataset/stream tables, and field guidance added.
  • Chores

    • README formatting, CLI/help wording, Docker Compose quoting, and minor example corrections.

- Introduced a new dataset for generating time series test scenarios with the same metric in two streams, supporting different units and dimensions.
- Implemented the SameMetricsSimulator, SameMetricsMetricsGenerator, and SameMetricsFormatter to handle the new dataset.
- Added configuration and template generation for same metrics data streams.
- Updated README.md to include details about the new dataset and usage examples.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two new datasets, "same-metrics" and "edge-cases": types, config/metrics generators, formatters, template builders, simulators, histogram/value helpers, CLI integration, README updates, and visibility changes to base simulator backfill/real-time methods.

Changes

Cohort / File(s) Summary
Documentation
README.md
Adds usage/docs for same-metrics and edge-cases, CLI examples, scenario/rollover tables, minor Markdown/YAML formatting fixes, example value correction, and ensures EOF newline.
CLI / Entry
src/index.ts
Registers same-metrics and edge-cases, expands dataset validation and purge logic to handle dataset-specific datastream ID sets, instantiates new simulators, and updates logging/tracing span handling.
Histogram / formatter helpers
src/formatters/histogram-helpers.ts
New histogram/tdigest/exponential-histogram builders and formatValueForFieldType dispatcher used by formatters.
Same-metrics types & formatter
src/types/same-metrics-types.ts, src/formatters/same-metrics-formatter.ts
Adds SameMetrics type definitions and formatter that builds SameMetricsDocument with typed metric value, @timestamp, metric.name, test.*, and optional dimensions.
Same-metrics templates & generators
src/simulators/same-metrics-template-builder.ts, src/simulators/same-metrics-config-generator.ts, src/simulators/same-metrics-metrics-generator.ts
Adds STREAM_TEMPLATE_CONFIGS, DATA_STREAM_IDS, index template builder, config generator, and deterministic per-minute metrics generator for each data stream.
Same-metrics simulator
src/simulators/same-metrics-simulator.ts
New simulator wiring config/metrics generators and formatter, builds templates (putIndexTemplate), computes index names and create ops, and instruments template setup with spans and error handling.
Edge-cases types & formatter
src/types/edge-cases-types.ts, src/formatters/edge-cases-formatter.ts
Adds EdgeCases types and formatter that selects phase-specific field typing, formats metric value per field type, and merges dimensions.
Edge-cases templates & helpers
src/simulators/edge-cases-template-builder.ts
Adds two-phase stream configs (EDGE_CASE_STREAMS), EDGE_CASE_STREAM_IDS, getStreamConfig/getPhaseConfig, and phase-aware buildIndexTemplate producing time-bounded templates.
Edge-cases generators & config
src/simulators/edge-cases-config-generator.ts, src/simulators/edge-cases-metrics-generator.ts
Adds EdgeCasesConfigGenerator and EdgeCasesMetricsGenerator (midpoint-aware) to generate phase-aware metrics and optional dimensions.
Edge-cases simulator
src/simulators/edge-cases-simulator.ts
New simulator implementing two-phase template setup, rollover/rollout flow, midpoint coordination, per-phase backfills, template creation, and optional real-time run with spans and error propagation.
Base simulator visibility
src/simulators/base-simulator.ts
Made runBackfillStream/runRealTimeStream protected; runBackfillStream now accepts optional startTime/endTime and uses provided window when present.
Other simulators (request shape fix)
src/simulators/histograms-simulator.ts, src/simulators/weather-simulator.ts
Changed indices.putIndexTemplate calls to spread the index template into the request body (payload shape adjustment).
Small edits / formatting
(various)
Minor CLI string/quote adjustments, README code-block spacing, Docker Compose command formatting normalization, and other cosmetic tweaks.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/User
    participant Simulator as Simulator (SameMetrics / EdgeCases)
    participant ConfigGen as ConfigGenerator
    participant MetricsGen as MetricsGenerator
    participant Formatter as Formatter
    participant ES as Elasticsearch
    participant Tracer as Tracer

    CLI->>Simulator: start(dataset, options)
    Simulator->>Tracer: start span (setup)
    Simulator->>ConfigGen: generateConfig(entityId)
    ConfigGen-->>Simulator: Config
    loop per-timestamp / per-phase
      Simulator->>MetricsGen: generateMetrics(config, timestamp)
      MetricsGen-->>Simulator: Metrics
      Simulator->>Formatter: formatMetrics(metrics)
      Formatter-->>Simulator: FormatterResult[documents]
      Simulator->>ES: putIndexTemplate / rollover (per stream)
      ES-->>Simulator: ack
      Simulator->>ES: bulk index documents
      ES-->>Simulator: ack
    end
    Simulator->>Tracer: end span
    Simulator-->>CLI: stop/complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through configs, templates, and streams,
I counted each minute and shaped histogram dreams,
I split phases, rolled indices, and set midpoints with cheer,
Two datasets now hum — same-metrics and edge-cases appear,
Hop, hop — new scenarios ready to steer!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new 'same-metrics' dataset for time series UI test scenarios.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/types/same-metrics-types.ts (1)

11-15: Consider making dimension fields optional for flexibility.

As noted in the config generator, SameMetricsDimensions is used with partial data (e.g., only host.name for some streams). Making these fields optional would align with actual usage:

 export interface SameMetricsDimensions {
-  'host.name': string;
-  region: string;
-  environment: string;
+  'host.name'?: string;
+  region?: string;
+  environment?: string;
 }

This mirrors the optional fields already defined in SameMetricsDocument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/same-metrics-types.ts` around lines 11 - 15, Make the three
properties on the SameMetricsDimensions interface optional to match actual usage
and the existing SameMetricsDocument shape: update the SameMetricsDimensions
declaration so 'host.name', region, and environment are optional (e.g., mark
them as optional properties) to allow partial dimension objects when only some
fields (like host.name) are present.
src/formatters/same-metrics-formatter.ts (1)

18-22: Type-safe dimension assignment possible.

Since SameMetricsDimensions has fixed keys (host.name, region, environment) that match the optional fields in SameMetricsDocument, you could avoid the double cast by spreading or directly assigning:

-    if (metrics.dimensions) {
-      for (const [key, value] of Object.entries(metrics.dimensions)) {
-        (doc as unknown as Record<string, unknown>)[key] = value;
-      }
-    }
+    if (metrics.dimensions) {
+      Object.assign(doc, metrics.dimensions);
+    }

The current implementation works correctly, so this is purely a readability refinement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/formatters/same-metrics-formatter.ts` around lines 18 - 22, The loop that
assigns metrics.dimensions into the document uses a double-cast and can be
simplified for type-safety and readability: replace the manual for-loop that
iterates over metrics.dimensions (in same-metrics-formatter.ts) with a direct,
type-safe assignment or spread so SameMetricsDimensions (keys: "host.name",
"region", "environment") are merged into the SameMetricsDocument without
casting; specifically, merge metrics.dimensions into the existing doc object
(e.g., Object.assign(doc, metrics.dimensions) or doc = { ...doc,
...metrics.dimensions }) ensuring types align with
SameMetricsDocument/SameMetricsDimensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/simulators/same-metrics-config-generator.ts`:
- Around line 25-31: The function buildDimensionsFromNames constructs a partial
dimensions object but unsafely casts it to SameMetricsDimensions; change the
types or the casting so callers can't assume all fields exist: either update
SameMetricsDimensions to make fields optional (host.name, region, environment)
and adjust SameMetricsConfig.dimensions accordingly, or keep
SameMetricsDimensions strict and change buildDimensionsFromNames to return
Partial<SameMetricsDimensions> (and update any consumer types/usages). Locate
buildDimensionsFromNames, DEFAULT_DIMENSION_VALUES, and
SameMetricsConfig/SameMetricsDimensions in the types/diff and ensure the
returned type and downstream type annotations are updated consistently to avoid
the unsafe cast.

In `@src/simulators/same-metrics-simulator.ts`:
- Around line 104-107: Replace the invalid "body" parameter passed to the
Elasticsearch client's indices.putIndexTemplate calls with the flattened
"template" parameter expected by `@elastic/elasticsearch` v9.0.2; locate the
putIndexTemplate invocations in same-metrics-simulator (the call using
dataStream and template), and make the argument { name: dataStream, template:
template } instead of { name: dataStream, body: template }, and apply the same
change to the identical putIndexTemplate calls in weather-simulator.ts and
histograms-simulator.ts so all simulators use the "template" parameter.

---

Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 18-22: The loop that assigns metrics.dimensions into the document
uses a double-cast and can be simplified for type-safety and readability:
replace the manual for-loop that iterates over metrics.dimensions (in
same-metrics-formatter.ts) with a direct, type-safe assignment or spread so
SameMetricsDimensions (keys: "host.name", "region", "environment") are merged
into the SameMetricsDocument without casting; specifically, merge
metrics.dimensions into the existing doc object (e.g., Object.assign(doc,
metrics.dimensions) or doc = { ...doc, ...metrics.dimensions }) ensuring types
align with SameMetricsDocument/SameMetricsDimensions.

In `@src/types/same-metrics-types.ts`:
- Around line 11-15: Make the three properties on the SameMetricsDimensions
interface optional to match actual usage and the existing SameMetricsDocument
shape: update the SameMetricsDimensions declaration so 'host.name', region, and
environment are optional (e.g., mark them as optional properties) to allow
partial dimension objects when only some fields (like host.name) are present.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bd1080 and 33df8e4.

📒 Files selected for processing (8)
  • README.md
  • src/formatters/same-metrics-formatter.ts
  • src/index.ts
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-metrics-generator.ts
  • src/simulators/same-metrics-simulator.ts
  • src/simulators/same-metrics-template-builder.ts
  • src/types/same-metrics-types.ts

Comment on lines +104 to +107
await this.elasticsearchClient.indices.putIndexTemplate({
name: dataStream,
body: template
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat package.json | jq '.dependencies["@elastic/elasticsearch"], .devDependencies["@elastic/elasticsearch"]'

Repository: simianhacker/simian-forge

Length of output: 82


🏁 Script executed:

rg -n "putIndexTemplate" --type ts -C 3

Repository: simianhacker/simian-forge

Length of output: 1642


🌐 Web query:

@elastic/elasticsearch 9.0.2 putIndexTemplate API documentation

💡 Result:

For @elastic/elasticsearch v9.0.2, the Node.js client method is:

  • client.indices.putIndexTemplate({ name, index_patterns, composed_of, template, data_stream, priority, version, _meta, ... }) (creates/updates a composable index template). [1]

Official docs:

  • JavaScript client API referenceclient.indices.putIndexTemplate (arguments + behavior). [1]
  • Elasticsearch v9 REST API → “Create or update an index template” (PUT /_index_template/{name}) with request/response examples. [2]

Example (Node.js):

await client.indices.putIndexTemplate({
  name: 'template_1',
  index_patterns: ['template*'],
  priority: 1,
  template: { settings: { number_of_shards: 2 } }
})

[1][2]


Use template parameter instead of body for v9.0.2 client compatibility.

The @elastic/elasticsearch v9.0.2 client uses flattened parameters. The body parameter is not valid; use template instead:

await this.elasticsearchClient.indices.putIndexTemplate({
  name: dataStream,
  template: template
});

This same issue appears in weather-simulator.ts (lines 82-85) and histograms-simulator.ts (lines 104-107) and should be fixed consistently across all simulators.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/same-metrics-simulator.ts` around lines 104 - 107, Replace the
invalid "body" parameter passed to the Elasticsearch client's
indices.putIndexTemplate calls with the flattened "template" parameter expected
by `@elastic/elasticsearch` v9.0.2; locate the putIndexTemplate invocations in
same-metrics-simulator (the call using dataStream and template), and make the
argument { name: dataStream, template: template } instead of { name: dataStream,
body: template }, and apply the same change to the identical putIndexTemplate
calls in weather-simulator.ts and histograms-simulator.ts so all simulators use
the "template" parameter.

@kpatticha kpatticha marked this pull request as draft March 10, 2026 15:26
@kpatticha
Copy link
Copy Markdown
Contributor Author

Moving it back to draft so I can refine the data and create a more realistic environment (ex now I just generate total_cost for all indices :D )

- Introduced a new dataset for generating time series data with edge cases, simulating real-world scenarios where metric types, field types, or units change mid-backfill.
- Implemented EdgeCasesSimulator, EdgeCasesMetricsGenerator, and EdgeCasesFormatter to manage the new dataset.
- Added configuration and template generation for edge case data streams.
- Updated README.md to include details about the new edge cases dataset and usage examples.
@kpatticha kpatticha marked this pull request as ready for review March 26, 2026 16:05
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/formatters/same-metrics-formatter.ts (1)

15-16: Inconsistent error handling: formatter silently defaults while template builder throws.

When metrics.dataStream is not found in STREAM_TEMPLATE_CONFIGS, this code silently falls back to "double". However, buildIndexTemplate in same-metrics-template-builder.ts (lines 145-147) throws an error for unknown data streams. This asymmetry could mask bugs if a new stream is added to the config but not properly exported.

Consider making the formatter fail explicitly for unknown streams:

♻️ Proposed fix for consistency
   formatMetrics(
     metrics: SameMetricsMetrics,
   ): FormatterResult<SameMetricsDocument>[] {
     const streamConfig = STREAM_TEMPLATE_CONFIGS[metrics.dataStream];
-    const fieldType = streamConfig?.metricFieldType ?? "double";
+    if (!streamConfig) {
+      throw new Error(`Unknown same-metrics data stream: ${metrics.dataStream}`);
+    }
+    const fieldType = streamConfig.metricFieldType;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/formatters/same-metrics-formatter.ts` around lines 15 - 16, The formatter
currently defaults to "double" when STREAM_TEMPLATE_CONFIGS[metrics.dataStream]
is missing, creating an asymmetry with buildIndexTemplate which throws for
unknown streams; update the formatter to check streamConfig (from
STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and throw a clear Error if
undefined (including the metrics.dataStream value) instead of silently using the
fallback, then derive fieldType from streamConfig.metricFieldType only after the
existence check; reference STREAM_TEMPLATE_CONFIGS, metrics.dataStream,
streamConfig, fieldType and the buildIndexTemplate behavior to keep the handling
consistent.
src/simulators/edge-cases-config-generator.ts (1)

5-15: Silent fallback to empty string may cause issues if dimensions expand.

Currently SHARED_DIMENSIONS only contains "host.name", which is covered by DEFAULT_DIMENSION_VALUES. However, if SHARED_DIMENSIONS is later extended with additional dimension names not present in DEFAULT_DIMENSION_VALUES, they will silently receive empty string values, which could produce invalid or confusing test data.

Consider logging a warning or throwing an error for unmapped dimensions:

♻️ Proposed defensive check
 function buildDimensionsFromNames(dimNames: string[]): EdgeCasesDimensions {
   const result: Record<string, string> = {};
   for (const name of dimNames) {
+    if (!(name in DEFAULT_DIMENSION_VALUES)) {
+      console.warn(`No default value for dimension "${name}", using empty string`);
+    }
     result[name] = DEFAULT_DIMENSION_VALUES[name] ?? "";
   }
   return result;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-config-generator.ts` around lines 5 - 15, The
buildDimensionsFromNames function silently uses "" for any dimNames not in
DEFAULT_DIMENSION_VALUES; update it to detect unmapped names (using
DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear warning so
missing defaults don't produce invalid test data; modify
buildDimensionsFromNames to iterate dimNames, and for each name if
DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw
(choose consistent behavior for the module) with a message referencing the
missing dimension name and SHARED_DIMENSIONS context, otherwise assign the
default value from DEFAULT_DIMENSION_VALUES.
src/simulators/edge-cases-template-builder.ts (1)

131-186: Consider whether test.scenario field is needed for parity with same-metrics.

The same-metrics formatter includes both test.scenario and test.data_stream fields (as documented in README Line 339), but this template only defines test.data_stream. While the README (Line 368) indicates metric.name and test.data_stream are sufficient for filtering edge cases, you may want to confirm this asymmetry is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-template-builder.ts` around lines 131 - 186, The
index template built by buildIndexTemplate currently defines "test.data_stream"
in the mappings but omits "test.scenario", causing asymmetry with the
same-metrics formatter; add a "test.scenario" property to the mappings
(properties object) with the same mapping as "test.data_stream" (e.g., { type:
"keyword" }) so edge-case documents have parity with same-metrics, and ensure
any consumers filtering on test.scenario will work consistently with the
existing metric.name and test.data_stream fields.
src/simulators/edge-cases-simulator.ts (1)

32-37: Unsafe double cast through unknown bypasses type checking.

The cast this.metricsGenerator as unknown as EdgeCasesMetricsGenerator loses compile-time safety. If the base class or createMetricsGenerator() changes, this would silently break.

Consider storing the generator instance before calling super() or adding a type guard.

Proposed refactor
 export class EdgeCasesSimulator extends BaseSimulator<
   EdgeCasesConfig,
   EdgeCasesMetrics,
   EdgeCasesDocument
 > {
   private formatter: EdgeCasesFormatter;
-  private edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator;
+  private readonly edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator;

   constructor(options: BaseSimulatorOptions) {
+    const metricsGen = new EdgeCasesMetricsGenerator();
     super(options);
     this.formatter = new EdgeCasesFormatter();
-    this.edgeCasesMetricsGenerator =
-      this.metricsGenerator as unknown as EdgeCasesMetricsGenerator;
+    this.edgeCasesMetricsGenerator = metricsGen;
   }

+  protected createMetricsGenerator(): MetricsGenerator<
+    EdgeCasesConfig,
+    EdgeCasesMetrics
+  > {
+    // Return the instance we already created, or create a new one if called before constructor completes
+    return this.edgeCasesMetricsGenerator ?? new EdgeCasesMetricsGenerator();
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-simulator.ts` around lines 32 - 37, The constructor
is using an unsafe double-cast this.metricsGenerator as unknown as
EdgeCasesMetricsGenerator which bypasses type checking; fix by ensuring a
properly typed metrics generator is created or validated before assignment:
either accept a typed generator in options or call/create the generator (via
createMetricsGenerator or similar) and store it in a local variable, perform a
runtime type guard/assertion that it implements EdgeCasesMetricsGenerator, then
assign that validated instance to this.edgeCasesMetricsGenerator; update the
constructor (and any factory that produces metricsGenerator) to avoid the
unknown cast and use a safe typed conversion or guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 195: Update the --dataset help text to include the missing "edge-cases"
option so the list matches the rest of the README and the code; specifically
edit the help string that currently ends with "same-metrics (default:
\"hosts\")" to read something like "same-metrics, edge-cases (default:
\"hosts\")" so the flag documentation includes the edge-cases dataset.

In `@src/formatters/histogram-helpers.ts`:
- Around line 27-46: The exponential histogram builder
buildExponentialHistogramValue computes base as Math.abs(value % 1000) which can
be 0 for value === 0 or multiples of 1000; change the base calculation to
include the same fallback used in buildHistogramValue/buildTdigestValue (e.g.,
use Math.abs(value % 1000) || 1) so sum, min, and max are never zero and the
histogram isn't degenerate.

In `@src/simulators/edge-cases-metrics-generator.ts`:
- Around line 9-13: counterValueForStream currently always reads
getStreamConfig(dataStream).phase1.baseValue so phase2 metrics will be wrong if
base values differ; change counterValueForStream to accept a phase parameter
(e.g., phase: 'phase1' | 'phase2') and obtain the base value via
getPhaseConfig(dataStream, phase).baseValue (or similar helper) instead of
hardcoding phase1, and then update the call site(s) in generateMetrics to pass
the current phase into counterValueForStream so the correct phase-specific
baseValue is used.

In `@src/types/same-metrics-types.ts`:
- Around line 34-41: SameMetricsDocument's request_duration is typed too
narrowly as number; update its type to allow the histogram/tdigest shapes
returned by formatValueForFieldType used by SameMetricsFormatter (e.g. change
request_duration to a union like number | Histogram | TDigest or a more
permissive unknown/object union) so callers of SameMetricsFormatter and any
consumers of SameMetricsDocument accept histogram objects returned by
formatValueForFieldType.

---

Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 15-16: The formatter currently defaults to "double" when
STREAM_TEMPLATE_CONFIGS[metrics.dataStream] is missing, creating an asymmetry
with buildIndexTemplate which throws for unknown streams; update the formatter
to check streamConfig (from STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and
throw a clear Error if undefined (including the metrics.dataStream value)
instead of silently using the fallback, then derive fieldType from
streamConfig.metricFieldType only after the existence check; reference
STREAM_TEMPLATE_CONFIGS, metrics.dataStream, streamConfig, fieldType and the
buildIndexTemplate behavior to keep the handling consistent.

In `@src/simulators/edge-cases-config-generator.ts`:
- Around line 5-15: The buildDimensionsFromNames function silently uses "" for
any dimNames not in DEFAULT_DIMENSION_VALUES; update it to detect unmapped names
(using DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear
warning so missing defaults don't produce invalid test data; modify
buildDimensionsFromNames to iterate dimNames, and for each name if
DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw
(choose consistent behavior for the module) with a message referencing the
missing dimension name and SHARED_DIMENSIONS context, otherwise assign the
default value from DEFAULT_DIMENSION_VALUES.

In `@src/simulators/edge-cases-simulator.ts`:
- Around line 32-37: The constructor is using an unsafe double-cast
this.metricsGenerator as unknown as EdgeCasesMetricsGenerator which bypasses
type checking; fix by ensuring a properly typed metrics generator is created or
validated before assignment: either accept a typed generator in options or
call/create the generator (via createMetricsGenerator or similar) and store it
in a local variable, perform a runtime type guard/assertion that it implements
EdgeCasesMetricsGenerator, then assign that validated instance to
this.edgeCasesMetricsGenerator; update the constructor (and any factory that
produces metricsGenerator) to avoid the unknown cast and use a safe typed
conversion or guard.

In `@src/simulators/edge-cases-template-builder.ts`:
- Around line 131-186: The index template built by buildIndexTemplate currently
defines "test.data_stream" in the mappings but omits "test.scenario", causing
asymmetry with the same-metrics formatter; add a "test.scenario" property to the
mappings (properties object) with the same mapping as "test.data_stream" (e.g.,
{ type: "keyword" }) so edge-case documents have parity with same-metrics, and
ensure any consumers filtering on test.scenario will work consistently with the
existing metric.name and test.data_stream fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06cd9dd4-dbdf-45dc-8925-1e1dceff9c07

📥 Commits

Reviewing files that changed from the base of the PR and between 33df8e4 and cbfeedd.

📒 Files selected for processing (16)
  • README.md
  • src/formatters/edge-cases-formatter.ts
  • src/formatters/histogram-helpers.ts
  • src/formatters/same-metrics-formatter.ts
  • src/index.ts
  • src/simulators/base-simulator.ts
  • src/simulators/edge-cases-config-generator.ts
  • src/simulators/edge-cases-metrics-generator.ts
  • src/simulators/edge-cases-simulator.ts
  • src/simulators/edge-cases-template-builder.ts
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-metrics-generator.ts
  • src/simulators/same-metrics-simulator.ts
  • src/simulators/same-metrics-template-builder.ts
  • src/types/edge-cases-types.ts
  • src/types/same-metrics-types.ts
✅ Files skipped from review due to trivial changes (1)
  • src/types/edge-cases-types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-simulator.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 311-344: The README incorrectly states that `--backfill` is
ignored for the same-metrics dataset; update the documentation to say that
SameMetricsSimulator's setupElasticsearchTemplates() does use the `--backfill`
flag, while EdgeCasesSimulator ignores both `--count` and `--backfill` and
always generates the fixed 10-minute (5m per phase) window; mention that
`--count` is ignored for same-metrics (hardcoded 15 streams) and explicitly call
out the two simulators by name (SameMetricsSimulator and EdgeCasesSimulator) and
the method setupElasticsearchTemplates() so readers know which behavior applies
to which dataset.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ab3ec5a-e90d-497e-a4cb-fe839ef71e4b

📥 Commits

Reviewing files that changed from the base of the PR and between cbfeedd and 358c167.

📒 Files selected for processing (4)
  • README.md
  • src/formatters/histogram-helpers.ts
  • src/formatters/same-metrics-formatter.ts
  • src/types/same-metrics-types.ts
✅ Files skipped from review due to trivial changes (2)
  • src/formatters/histogram-helpers.ts
  • src/types/same-metrics-types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/formatters/same-metrics-formatter.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

35-39: ⚠️ Potential issue | 🟡 Minor

Update stale README sections to match the newly added dataset modules.

Overview/usage now documents same-metrics and edge-cases, but Key Features and Project Structure still describe the old module surface. This creates internal docs drift for contributors (e.g., missing new simulators/formatters/helpers).

📝 Suggested doc patch
- - **Multiple Datasets**: Host metrics, weather station data, and unique metrics generation
+ - **Multiple Datasets**: Host metrics, weather station data, unique metrics, histogram samples, same-metrics scenarios, and edge-case rollover scenarios
 src/
 ├── index.ts                      # Main CLI entry point
 ├── tracing.ts                   # OpenTelemetry tracing setup
 ├── types/
 │   ├── host-types.ts            # Host and metrics type definitions
 │   ├── machine-types.ts         # Cloud machine type specifications
 │   └── weather-types.ts         # Weather station type definitions
 ├── simulators/
 │   ├── host-simulator.ts        # Host metrics simulator orchestrator
 │   ├── host-generator.ts        # Host configuration generator
 │   ├── metrics-generator.ts     # Host metrics generation
 │   ├── weather-simulator.ts     # Weather station simulator orchestrator
 │   ├── weather-generator.ts     # Weather station configuration generator
-│   └── weather-metrics-generator.ts # Weather metrics generation
+│   ├── weather-metrics-generator.ts # Weather metrics generation
+│   ├── same-metrics-simulator.ts    # Same-metrics scenario simulator
+│   └── edge-cases-simulator.ts      # Edge-case rollover simulator
 └── formatters/
     ├── base-formatter.ts        # Common formatter functionality
     ├── otel-formatter.ts        # OpenTelemetry format converter
     ├── elastic-formatter.ts     # Elastic Metricbeat format converter
-    └── fieldsense-formatter.ts  # FieldSense weather format converter
+    ├── fieldsense-formatter.ts  # FieldSense weather format converter
+    ├── same-metrics-formatter.ts # Same-metrics formatter
+    ├── edge-cases-formatter.ts   # Edge-cases formatter
+    └── histogram-helpers.ts      # Shared histogram builders

Also applies to: 598-618

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 35 - 39, The README's "Key Features" and "Project
Structure" sections are stale and must be updated to include the newly added
dataset modules and helpers; edit those sections to list the new modules
"same-metrics" and "edge-cases" and ensure references to simulators, formatters,
and helpers (e.g., dataset modules, simulator functions, formatter adapters) are
present and consistent with the Overview/usage. Replace or augment the old
bullet points (Multiple Datasets, Histogram Field Types, Format Support,
Cardinality Testing, Realistic Data Generation) with updated entries that
mention the new dataset modules and any new formatters/simulators, and ensure
the Project Structure tree shows the new folders/files so contributors can find
the simulators/formatters/helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@README.md`:
- Around line 35-39: The README's "Key Features" and "Project Structure"
sections are stale and must be updated to include the newly added dataset
modules and helpers; edit those sections to list the new modules "same-metrics"
and "edge-cases" and ensure references to simulators, formatters, and helpers
(e.g., dataset modules, simulator functions, formatter adapters) are present and
consistent with the Overview/usage. Replace or augment the old bullet points
(Multiple Datasets, Histogram Field Types, Format Support, Cardinality Testing,
Realistic Data Generation) with updated entries that mention the new dataset
modules and any new formatters/simulators, and ensure the Project Structure tree
shows the new folders/files so contributors can find the
simulators/formatters/helpers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 217c889d-119f-4866-a28c-eea0fd870048

📥 Commits

Reviewing files that changed from the base of the PR and between 358c167 and 622d56d.

📒 Files selected for processing (2)
  • README.md
  • src/simulators/edge-cases-metrics-generator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/simulators/edge-cases-metrics-generator.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/simulators/edge-cases-simulator.ts`:
- Line 98: The isRunning flag is set to true before phase 1 but never cleared on
the noRealtime path, on startup errors, or after a one-shot run ends; update the
startup routine where this.isRunning is set (the method that sets this.isRunning
= true before phase 1) to ensure you clear this.isRunning = false in all
early-exit paths: the noRealtime branch, any catch/error handlers during
startup, and after completing a one-shot run. Locate and update the same logic
referenced around lines 129-145 (the subsequent phase/cleanup code) to perform
the same reset so the instance accurately reports running state after failures
or one-shot completion.
- Around line 32-37: EdgeCasesSimulator relies on auto-generated IDs so retries
cause duplicate documents; disable retries for this deterministic fixture by
turning off retry logic in the simulator. In the EdgeCasesSimulator constructor
(and any simulator-level request execution path used by this class, e.g.,
methods named createSamples, performBulkInsert, or whatever calls the shared
metricsGenerator/operation executor), set the retry configuration to 0/false
(for example set options.retries = 0 or pass retry: false into the bulk/create
calls) or override the executor used by this.metricsGenerator to never retry;
alternatively, if you prefer idempotent writes instead, derive stable _id values
from document content before calling create.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92e5eb26-2622-4d96-b490-ee5872bc1db2

📥 Commits

Reviewing files that changed from the base of the PR and between 622d56d and 7681887.

📒 Files selected for processing (4)
  • src/simulators/edge-cases-simulator.ts
  • src/simulators/histograms-simulator.ts
  • src/simulators/same-metrics-simulator.ts
  • src/simulators/weather-simulator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/simulators/same-metrics-simulator.ts

async start(): Promise<boolean> {
return tracer.startActiveSpan("start", async (span) => {
try {
this.isRunning = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clear isRunning when startup ends without a live stream.

The flag is set before phase 1, but the noRealtime and error paths never reset it. After a one-shot run or failed startup the instance still reports itself as running.

🛠️ Proposed fix
         if (!this.noRealtime) {
           await this.runRealTimeStream();
+          this.isRunning = false;
           span.setStatus({ code: 1 });
           return false;
         } else {
+          this.isRunning = false;
           console.log(
             `${this.getSimulatorName()} backfill completed. Real-time generation disabled.`,
           );
           span.setStatus({ code: 1 });
           return true;
         }
       } catch (error) {
+        this.isRunning = false;
         console.error(`Error in ${this.getSimulatorName()}:`, error);

Also applies to: 129-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-simulator.ts` at line 98, The isRunning flag is set
to true before phase 1 but never cleared on the noRealtime path, on startup
errors, or after a one-shot run ends; update the startup routine where
this.isRunning is set (the method that sets this.isRunning = true before phase
1) to ensure you clear this.isRunning = false in all early-exit paths: the
noRealtime branch, any catch/error handlers during startup, and after completing
a one-shot run. Locate and update the same logic referenced around lines 129-145
(the subsequent phase/cleanup code) to perform the same reset so the instance
accurately reports running state after failures or one-shot completion.

Copy link
Copy Markdown

@lucaslopezf lucaslopezf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a quick look to keep consistency with the existing patterns, but I didn’t do a deep review honestly. I tested it locally and it works well, so LGTM. Just one nit comment

Comment on lines 2 to +3

import { Command } from 'commander';
import { initializeTracing } from './tracing';
import { HostSimulator } from './simulators/host-simulator';
import { WeatherSimulator } from './simulators/weather-simulator';
import { UniqueMetricsSimulator } from './simulators/unique-metrics-simulator';
import { HistogramsSimulator } from './simulators/histograms-simulator';
import { simianLogger } from './logger';
import { trace } from '@opentelemetry/api';
import { Client } from '@elastic/elasticsearch';
import { createElasticsearchClient } from './utils/elasticsearch-client';

const tracer = trace.getTracer('simian-forge');

async function purgeDataStreams(elasticsearchUrl: string, elasticsearchAuth: string, elasticsearchApiKey: string, dataset: string, format?: string, count?: number): Promise<void> {
return tracer.startActiveSpan('purgeDataStreams', async (span) => {
import { Command } from "commander";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid formatting issues in the future, we could add a .prettierrc file and a pre-commit hook, or something similar. Just like we do in Kibana, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Let's do it in a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants