Add same-metrics dataset for time series UI test scenarios#7
Add same-metrics dataset for time series UI test scenarios#7kpatticha wants to merge 12 commits intosimianhacker:mainfrom
Conversation
- Introduced a new dataset for generating time series test scenarios with the same metric in two streams, supporting different units and dimensions. - Implemented the SameMetricsSimulator, SameMetricsMetricsGenerator, and SameMetricsFormatter to handle the new dataset. - Added configuration and template generation for same metrics data streams. - Updated README.md to include details about the new dataset and usage examples.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two new datasets, "same-metrics" and "edge-cases": types, config/metrics generators, formatters, template builders, simulators, histogram/value helpers, CLI integration, README updates, and visibility changes to base simulator backfill/real-time methods. Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI/User
participant Simulator as Simulator (SameMetrics / EdgeCases)
participant ConfigGen as ConfigGenerator
participant MetricsGen as MetricsGenerator
participant Formatter as Formatter
participant ES as Elasticsearch
participant Tracer as Tracer
CLI->>Simulator: start(dataset, options)
Simulator->>Tracer: start span (setup)
Simulator->>ConfigGen: generateConfig(entityId)
ConfigGen-->>Simulator: Config
loop per-timestamp / per-phase
Simulator->>MetricsGen: generateMetrics(config, timestamp)
MetricsGen-->>Simulator: Metrics
Simulator->>Formatter: formatMetrics(metrics)
Formatter-->>Simulator: FormatterResult[documents]
Simulator->>ES: putIndexTemplate / rollover (per stream)
ES-->>Simulator: ack
Simulator->>ES: bulk index documents
ES-->>Simulator: ack
end
Simulator->>Tracer: end span
Simulator-->>CLI: stop/complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/types/same-metrics-types.ts (1)
11-15: Consider making dimension fields optional for flexibility.As noted in the config generator,
SameMetricsDimensionsis used with partial data (e.g., onlyhost.namefor some streams). Making these fields optional would align with actual usage:export interface SameMetricsDimensions { - 'host.name': string; - region: string; - environment: string; + 'host.name'?: string; + region?: string; + environment?: string; }This mirrors the optional fields already defined in
SameMetricsDocument.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types/same-metrics-types.ts` around lines 11 - 15, Make the three properties on the SameMetricsDimensions interface optional to match actual usage and the existing SameMetricsDocument shape: update the SameMetricsDimensions declaration so 'host.name', region, and environment are optional (e.g., mark them as optional properties) to allow partial dimension objects when only some fields (like host.name) are present.src/formatters/same-metrics-formatter.ts (1)
18-22: Type-safe dimension assignment possible.Since
SameMetricsDimensionshas fixed keys (host.name,region,environment) that match the optional fields inSameMetricsDocument, you could avoid the double cast by spreading or directly assigning:- if (metrics.dimensions) { - for (const [key, value] of Object.entries(metrics.dimensions)) { - (doc as unknown as Record<string, unknown>)[key] = value; - } - } + if (metrics.dimensions) { + Object.assign(doc, metrics.dimensions); + }The current implementation works correctly, so this is purely a readability refinement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/formatters/same-metrics-formatter.ts` around lines 18 - 22, The loop that assigns metrics.dimensions into the document uses a double-cast and can be simplified for type-safety and readability: replace the manual for-loop that iterates over metrics.dimensions (in same-metrics-formatter.ts) with a direct, type-safe assignment or spread so SameMetricsDimensions (keys: "host.name", "region", "environment") are merged into the SameMetricsDocument without casting; specifically, merge metrics.dimensions into the existing doc object (e.g., Object.assign(doc, metrics.dimensions) or doc = { ...doc, ...metrics.dimensions }) ensuring types align with SameMetricsDocument/SameMetricsDimensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/simulators/same-metrics-config-generator.ts`:
- Around line 25-31: The function buildDimensionsFromNames constructs a partial
dimensions object but unsafely casts it to SameMetricsDimensions; change the
types or the casting so callers can't assume all fields exist: either update
SameMetricsDimensions to make fields optional (host.name, region, environment)
and adjust SameMetricsConfig.dimensions accordingly, or keep
SameMetricsDimensions strict and change buildDimensionsFromNames to return
Partial<SameMetricsDimensions> (and update any consumer types/usages). Locate
buildDimensionsFromNames, DEFAULT_DIMENSION_VALUES, and
SameMetricsConfig/SameMetricsDimensions in the types/diff and ensure the
returned type and downstream type annotations are updated consistently to avoid
the unsafe cast.
In `@src/simulators/same-metrics-simulator.ts`:
- Around line 104-107: Replace the invalid "body" parameter passed to the
Elasticsearch client's indices.putIndexTemplate calls with the flattened
"template" parameter expected by `@elastic/elasticsearch` v9.0.2; locate the
putIndexTemplate invocations in same-metrics-simulator (the call using
dataStream and template), and make the argument { name: dataStream, template:
template } instead of { name: dataStream, body: template }, and apply the same
change to the identical putIndexTemplate calls in weather-simulator.ts and
histograms-simulator.ts so all simulators use the "template" parameter.
---
Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 18-22: The loop that assigns metrics.dimensions into the document
uses a double-cast and can be simplified for type-safety and readability:
replace the manual for-loop that iterates over metrics.dimensions (in
same-metrics-formatter.ts) with a direct, type-safe assignment or spread so
SameMetricsDimensions (keys: "host.name", "region", "environment") are merged
into the SameMetricsDocument without casting; specifically, merge
metrics.dimensions into the existing doc object (e.g., Object.assign(doc,
metrics.dimensions) or doc = { ...doc, ...metrics.dimensions }) ensuring types
align with SameMetricsDocument/SameMetricsDimensions.
In `@src/types/same-metrics-types.ts`:
- Around line 11-15: Make the three properties on the SameMetricsDimensions
interface optional to match actual usage and the existing SameMetricsDocument
shape: update the SameMetricsDimensions declaration so 'host.name', region, and
environment are optional (e.g., mark them as optional properties) to allow
partial dimension objects when only some fields (like host.name) are present.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
README.mdsrc/formatters/same-metrics-formatter.tssrc/index.tssrc/simulators/same-metrics-config-generator.tssrc/simulators/same-metrics-metrics-generator.tssrc/simulators/same-metrics-simulator.tssrc/simulators/same-metrics-template-builder.tssrc/types/same-metrics-types.ts
| await this.elasticsearchClient.indices.putIndexTemplate({ | ||
| name: dataStream, | ||
| body: template | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat package.json | jq '.dependencies["@elastic/elasticsearch"], .devDependencies["@elastic/elasticsearch"]'Repository: simianhacker/simian-forge
Length of output: 82
🏁 Script executed:
rg -n "putIndexTemplate" --type ts -C 3Repository: simianhacker/simian-forge
Length of output: 1642
🌐 Web query:
@elastic/elasticsearch 9.0.2 putIndexTemplate API documentation
💡 Result:
For @elastic/elasticsearch v9.0.2, the Node.js client method is:
client.indices.putIndexTemplate({ name, index_patterns, composed_of, template, data_stream, priority, version, _meta, ... })(creates/updates a composable index template). [1]
Official docs:
- JavaScript client API reference →
client.indices.putIndexTemplate(arguments + behavior). [1] - Elasticsearch v9 REST API → “Create or update an index template” (
PUT /_index_template/{name}) with request/response examples. [2]
Example (Node.js):
await client.indices.putIndexTemplate({
name: 'template_1',
index_patterns: ['template*'],
priority: 1,
template: { settings: { number_of_shards: 2 } }
})[1][2]
Use template parameter instead of body for v9.0.2 client compatibility.
The @elastic/elasticsearch v9.0.2 client uses flattened parameters. The body parameter is not valid; use template instead:
await this.elasticsearchClient.indices.putIndexTemplate({
name: dataStream,
template: template
});This same issue appears in weather-simulator.ts (lines 82-85) and histograms-simulator.ts (lines 104-107) and should be fixed consistently across all simulators.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/simulators/same-metrics-simulator.ts` around lines 104 - 107, Replace the
invalid "body" parameter passed to the Elasticsearch client's
indices.putIndexTemplate calls with the flattened "template" parameter expected
by `@elastic/elasticsearch` v9.0.2; locate the putIndexTemplate invocations in
same-metrics-simulator (the call using dataStream and template), and make the
argument { name: dataStream, template: template } instead of { name: dataStream,
body: template }, and apply the same change to the identical putIndexTemplate
calls in weather-simulator.ts and histograms-simulator.ts so all simulators use
the "template" parameter.
|
Moving it back to draft so I can refine the data and create a more realistic environment (ex now I just generate total_cost for all indices :D ) |
- Introduced a new dataset for generating time series data with edge cases, simulating real-world scenarios where metric types, field types, or units change mid-backfill. - Implemented EdgeCasesSimulator, EdgeCasesMetricsGenerator, and EdgeCasesFormatter to manage the new dataset. - Added configuration and template generation for edge case data streams. - Updated README.md to include details about the new edge cases dataset and usage examples.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
src/formatters/same-metrics-formatter.ts (1)
15-16: Inconsistent error handling: formatter silently defaults while template builder throws.When
metrics.dataStreamis not found inSTREAM_TEMPLATE_CONFIGS, this code silently falls back to"double". However,buildIndexTemplateinsame-metrics-template-builder.ts(lines 145-147) throws an error for unknown data streams. This asymmetry could mask bugs if a new stream is added to the config but not properly exported.Consider making the formatter fail explicitly for unknown streams:
♻️ Proposed fix for consistency
formatMetrics( metrics: SameMetricsMetrics, ): FormatterResult<SameMetricsDocument>[] { const streamConfig = STREAM_TEMPLATE_CONFIGS[metrics.dataStream]; - const fieldType = streamConfig?.metricFieldType ?? "double"; + if (!streamConfig) { + throw new Error(`Unknown same-metrics data stream: ${metrics.dataStream}`); + } + const fieldType = streamConfig.metricFieldType;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/formatters/same-metrics-formatter.ts` around lines 15 - 16, The formatter currently defaults to "double" when STREAM_TEMPLATE_CONFIGS[metrics.dataStream] is missing, creating an asymmetry with buildIndexTemplate which throws for unknown streams; update the formatter to check streamConfig (from STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and throw a clear Error if undefined (including the metrics.dataStream value) instead of silently using the fallback, then derive fieldType from streamConfig.metricFieldType only after the existence check; reference STREAM_TEMPLATE_CONFIGS, metrics.dataStream, streamConfig, fieldType and the buildIndexTemplate behavior to keep the handling consistent.src/simulators/edge-cases-config-generator.ts (1)
5-15: Silent fallback to empty string may cause issues if dimensions expand.Currently
SHARED_DIMENSIONSonly contains"host.name", which is covered byDEFAULT_DIMENSION_VALUES. However, ifSHARED_DIMENSIONSis later extended with additional dimension names not present inDEFAULT_DIMENSION_VALUES, they will silently receive empty string values, which could produce invalid or confusing test data.Consider logging a warning or throwing an error for unmapped dimensions:
♻️ Proposed defensive check
function buildDimensionsFromNames(dimNames: string[]): EdgeCasesDimensions { const result: Record<string, string> = {}; for (const name of dimNames) { + if (!(name in DEFAULT_DIMENSION_VALUES)) { + console.warn(`No default value for dimension "${name}", using empty string`); + } result[name] = DEFAULT_DIMENSION_VALUES[name] ?? ""; } return result; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulators/edge-cases-config-generator.ts` around lines 5 - 15, The buildDimensionsFromNames function silently uses "" for any dimNames not in DEFAULT_DIMENSION_VALUES; update it to detect unmapped names (using DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear warning so missing defaults don't produce invalid test data; modify buildDimensionsFromNames to iterate dimNames, and for each name if DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw (choose consistent behavior for the module) with a message referencing the missing dimension name and SHARED_DIMENSIONS context, otherwise assign the default value from DEFAULT_DIMENSION_VALUES.src/simulators/edge-cases-template-builder.ts (1)
131-186: Consider whethertest.scenariofield is needed for parity with same-metrics.The
same-metricsformatter includes bothtest.scenarioandtest.data_streamfields (as documented in README Line 339), but this template only definestest.data_stream. While the README (Line 368) indicatesmetric.nameandtest.data_streamare sufficient for filtering edge cases, you may want to confirm this asymmetry is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulators/edge-cases-template-builder.ts` around lines 131 - 186, The index template built by buildIndexTemplate currently defines "test.data_stream" in the mappings but omits "test.scenario", causing asymmetry with the same-metrics formatter; add a "test.scenario" property to the mappings (properties object) with the same mapping as "test.data_stream" (e.g., { type: "keyword" }) so edge-case documents have parity with same-metrics, and ensure any consumers filtering on test.scenario will work consistently with the existing metric.name and test.data_stream fields.src/simulators/edge-cases-simulator.ts (1)
32-37: Unsafe double cast throughunknownbypasses type checking.The cast
this.metricsGenerator as unknown as EdgeCasesMetricsGeneratorloses compile-time safety. If the base class orcreateMetricsGenerator()changes, this would silently break.Consider storing the generator instance before calling
super()or adding a type guard.Proposed refactor
export class EdgeCasesSimulator extends BaseSimulator< EdgeCasesConfig, EdgeCasesMetrics, EdgeCasesDocument > { private formatter: EdgeCasesFormatter; - private edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator; + private readonly edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator; constructor(options: BaseSimulatorOptions) { + const metricsGen = new EdgeCasesMetricsGenerator(); super(options); this.formatter = new EdgeCasesFormatter(); - this.edgeCasesMetricsGenerator = - this.metricsGenerator as unknown as EdgeCasesMetricsGenerator; + this.edgeCasesMetricsGenerator = metricsGen; } + protected createMetricsGenerator(): MetricsGenerator< + EdgeCasesConfig, + EdgeCasesMetrics + > { + // Return the instance we already created, or create a new one if called before constructor completes + return this.edgeCasesMetricsGenerator ?? new EdgeCasesMetricsGenerator(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulators/edge-cases-simulator.ts` around lines 32 - 37, The constructor is using an unsafe double-cast this.metricsGenerator as unknown as EdgeCasesMetricsGenerator which bypasses type checking; fix by ensuring a properly typed metrics generator is created or validated before assignment: either accept a typed generator in options or call/create the generator (via createMetricsGenerator or similar) and store it in a local variable, perform a runtime type guard/assertion that it implements EdgeCasesMetricsGenerator, then assign that validated instance to this.edgeCasesMetricsGenerator; update the constructor (and any factory that produces metricsGenerator) to avoid the unknown cast and use a safe typed conversion or guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 195: Update the --dataset help text to include the missing "edge-cases"
option so the list matches the rest of the README and the code; specifically
edit the help string that currently ends with "same-metrics (default:
\"hosts\")" to read something like "same-metrics, edge-cases (default:
\"hosts\")" so the flag documentation includes the edge-cases dataset.
In `@src/formatters/histogram-helpers.ts`:
- Around line 27-46: The exponential histogram builder
buildExponentialHistogramValue computes base as Math.abs(value % 1000) which can
be 0 for value === 0 or multiples of 1000; change the base calculation to
include the same fallback used in buildHistogramValue/buildTdigestValue (e.g.,
use Math.abs(value % 1000) || 1) so sum, min, and max are never zero and the
histogram isn't degenerate.
In `@src/simulators/edge-cases-metrics-generator.ts`:
- Around line 9-13: counterValueForStream currently always reads
getStreamConfig(dataStream).phase1.baseValue so phase2 metrics will be wrong if
base values differ; change counterValueForStream to accept a phase parameter
(e.g., phase: 'phase1' | 'phase2') and obtain the base value via
getPhaseConfig(dataStream, phase).baseValue (or similar helper) instead of
hardcoding phase1, and then update the call site(s) in generateMetrics to pass
the current phase into counterValueForStream so the correct phase-specific
baseValue is used.
In `@src/types/same-metrics-types.ts`:
- Around line 34-41: SameMetricsDocument's request_duration is typed too
narrowly as number; update its type to allow the histogram/tdigest shapes
returned by formatValueForFieldType used by SameMetricsFormatter (e.g. change
request_duration to a union like number | Histogram | TDigest or a more
permissive unknown/object union) so callers of SameMetricsFormatter and any
consumers of SameMetricsDocument accept histogram objects returned by
formatValueForFieldType.
---
Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 15-16: The formatter currently defaults to "double" when
STREAM_TEMPLATE_CONFIGS[metrics.dataStream] is missing, creating an asymmetry
with buildIndexTemplate which throws for unknown streams; update the formatter
to check streamConfig (from STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and
throw a clear Error if undefined (including the metrics.dataStream value)
instead of silently using the fallback, then derive fieldType from
streamConfig.metricFieldType only after the existence check; reference
STREAM_TEMPLATE_CONFIGS, metrics.dataStream, streamConfig, fieldType and the
buildIndexTemplate behavior to keep the handling consistent.
In `@src/simulators/edge-cases-config-generator.ts`:
- Around line 5-15: The buildDimensionsFromNames function silently uses "" for
any dimNames not in DEFAULT_DIMENSION_VALUES; update it to detect unmapped names
(using DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear
warning so missing defaults don't produce invalid test data; modify
buildDimensionsFromNames to iterate dimNames, and for each name if
DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw
(choose consistent behavior for the module) with a message referencing the
missing dimension name and SHARED_DIMENSIONS context, otherwise assign the
default value from DEFAULT_DIMENSION_VALUES.
In `@src/simulators/edge-cases-simulator.ts`:
- Around line 32-37: The constructor is using an unsafe double-cast
this.metricsGenerator as unknown as EdgeCasesMetricsGenerator which bypasses
type checking; fix by ensuring a properly typed metrics generator is created or
validated before assignment: either accept a typed generator in options or
call/create the generator (via createMetricsGenerator or similar) and store it
in a local variable, perform a runtime type guard/assertion that it implements
EdgeCasesMetricsGenerator, then assign that validated instance to
this.edgeCasesMetricsGenerator; update the constructor (and any factory that
produces metricsGenerator) to avoid the unknown cast and use a safe typed
conversion or guard.
In `@src/simulators/edge-cases-template-builder.ts`:
- Around line 131-186: The index template built by buildIndexTemplate currently
defines "test.data_stream" in the mappings but omits "test.scenario", causing
asymmetry with the same-metrics formatter; add a "test.scenario" property to the
mappings (properties object) with the same mapping as "test.data_stream" (e.g.,
{ type: "keyword" }) so edge-case documents have parity with same-metrics, and
ensure any consumers filtering on test.scenario will work consistently with the
existing metric.name and test.data_stream fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 06cd9dd4-dbdf-45dc-8925-1e1dceff9c07
📒 Files selected for processing (16)
README.mdsrc/formatters/edge-cases-formatter.tssrc/formatters/histogram-helpers.tssrc/formatters/same-metrics-formatter.tssrc/index.tssrc/simulators/base-simulator.tssrc/simulators/edge-cases-config-generator.tssrc/simulators/edge-cases-metrics-generator.tssrc/simulators/edge-cases-simulator.tssrc/simulators/edge-cases-template-builder.tssrc/simulators/same-metrics-config-generator.tssrc/simulators/same-metrics-metrics-generator.tssrc/simulators/same-metrics-simulator.tssrc/simulators/same-metrics-template-builder.tssrc/types/edge-cases-types.tssrc/types/same-metrics-types.ts
✅ Files skipped from review due to trivial changes (1)
- src/types/edge-cases-types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/simulators/same-metrics-config-generator.ts
- src/simulators/same-metrics-simulator.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 311-344: The README incorrectly states that `--backfill` is
ignored for the same-metrics dataset; update the documentation to say that
SameMetricsSimulator's setupElasticsearchTemplates() does use the `--backfill`
flag, while EdgeCasesSimulator ignores both `--count` and `--backfill` and
always generates the fixed 10-minute (5m per phase) window; mention that
`--count` is ignored for same-metrics (hardcoded 15 streams) and explicitly call
out the two simulators by name (SameMetricsSimulator and EdgeCasesSimulator) and
the method setupElasticsearchTemplates() so readers know which behavior applies
to which dataset.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4ab3ec5a-e90d-497e-a4cb-fe839ef71e4b
📒 Files selected for processing (4)
README.mdsrc/formatters/histogram-helpers.tssrc/formatters/same-metrics-formatter.tssrc/types/same-metrics-types.ts
✅ Files skipped from review due to trivial changes (2)
- src/formatters/histogram-helpers.ts
- src/types/same-metrics-types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/formatters/same-metrics-formatter.ts
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
35-39:⚠️ Potential issue | 🟡 MinorUpdate stale README sections to match the newly added dataset modules.
Overview/usagenow documentssame-metricsandedge-cases, butKey FeaturesandProject Structurestill describe the old module surface. This creates internal docs drift for contributors (e.g., missing new simulators/formatters/helpers).📝 Suggested doc patch
- - **Multiple Datasets**: Host metrics, weather station data, and unique metrics generation + - **Multiple Datasets**: Host metrics, weather station data, unique metrics, histogram samples, same-metrics scenarios, and edge-case rollover scenariossrc/ ├── index.ts # Main CLI entry point ├── tracing.ts # OpenTelemetry tracing setup ├── types/ │ ├── host-types.ts # Host and metrics type definitions │ ├── machine-types.ts # Cloud machine type specifications │ └── weather-types.ts # Weather station type definitions ├── simulators/ │ ├── host-simulator.ts # Host metrics simulator orchestrator │ ├── host-generator.ts # Host configuration generator │ ├── metrics-generator.ts # Host metrics generation │ ├── weather-simulator.ts # Weather station simulator orchestrator │ ├── weather-generator.ts # Weather station configuration generator -│ └── weather-metrics-generator.ts # Weather metrics generation +│ ├── weather-metrics-generator.ts # Weather metrics generation +│ ├── same-metrics-simulator.ts # Same-metrics scenario simulator +│ └── edge-cases-simulator.ts # Edge-case rollover simulator └── formatters/ ├── base-formatter.ts # Common formatter functionality ├── otel-formatter.ts # OpenTelemetry format converter ├── elastic-formatter.ts # Elastic Metricbeat format converter - └── fieldsense-formatter.ts # FieldSense weather format converter + ├── fieldsense-formatter.ts # FieldSense weather format converter + ├── same-metrics-formatter.ts # Same-metrics formatter + ├── edge-cases-formatter.ts # Edge-cases formatter + └── histogram-helpers.ts # Shared histogram buildersAlso applies to: 598-618
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 35 - 39, The README's "Key Features" and "Project Structure" sections are stale and must be updated to include the newly added dataset modules and helpers; edit those sections to list the new modules "same-metrics" and "edge-cases" and ensure references to simulators, formatters, and helpers (e.g., dataset modules, simulator functions, formatter adapters) are present and consistent with the Overview/usage. Replace or augment the old bullet points (Multiple Datasets, Histogram Field Types, Format Support, Cardinality Testing, Realistic Data Generation) with updated entries that mention the new dataset modules and any new formatters/simulators, and ensure the Project Structure tree shows the new folders/files so contributors can find the simulators/formatters/helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@README.md`:
- Around line 35-39: The README's "Key Features" and "Project Structure"
sections are stale and must be updated to include the newly added dataset
modules and helpers; edit those sections to list the new modules "same-metrics"
and "edge-cases" and ensure references to simulators, formatters, and helpers
(e.g., dataset modules, simulator functions, formatter adapters) are present and
consistent with the Overview/usage. Replace or augment the old bullet points
(Multiple Datasets, Histogram Field Types, Format Support, Cardinality Testing,
Realistic Data Generation) with updated entries that mention the new dataset
modules and any new formatters/simulators, and ensure the Project Structure tree
shows the new folders/files so contributors can find the
simulators/formatters/helpers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 217c889d-119f-4866-a28c-eea0fd870048
📒 Files selected for processing (2)
README.mdsrc/simulators/edge-cases-metrics-generator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/simulators/edge-cases-metrics-generator.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/simulators/edge-cases-simulator.ts`:
- Line 98: The isRunning flag is set to true before phase 1 but never cleared on
the noRealtime path, on startup errors, or after a one-shot run ends; update the
startup routine where this.isRunning is set (the method that sets this.isRunning
= true before phase 1) to ensure you clear this.isRunning = false in all
early-exit paths: the noRealtime branch, any catch/error handlers during
startup, and after completing a one-shot run. Locate and update the same logic
referenced around lines 129-145 (the subsequent phase/cleanup code) to perform
the same reset so the instance accurately reports running state after failures
or one-shot completion.
- Around line 32-37: EdgeCasesSimulator relies on auto-generated IDs so retries
cause duplicate documents; disable retries for this deterministic fixture by
turning off retry logic in the simulator. In the EdgeCasesSimulator constructor
(and any simulator-level request execution path used by this class, e.g.,
methods named createSamples, performBulkInsert, or whatever calls the shared
metricsGenerator/operation executor), set the retry configuration to 0/false
(for example set options.retries = 0 or pass retry: false into the bulk/create
calls) or override the executor used by this.metricsGenerator to never retry;
alternatively, if you prefer idempotent writes instead, derive stable _id values
from document content before calling create.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 92e5eb26-2622-4d96-b490-ee5872bc1db2
📒 Files selected for processing (4)
src/simulators/edge-cases-simulator.tssrc/simulators/histograms-simulator.tssrc/simulators/same-metrics-simulator.tssrc/simulators/weather-simulator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/simulators/same-metrics-simulator.ts
| async start(): Promise<boolean> { | ||
| return tracer.startActiveSpan("start", async (span) => { | ||
| try { | ||
| this.isRunning = true; |
There was a problem hiding this comment.
Clear isRunning when startup ends without a live stream.
The flag is set before phase 1, but the noRealtime and error paths never reset it. After a one-shot run or failed startup the instance still reports itself as running.
🛠️ Proposed fix
if (!this.noRealtime) {
await this.runRealTimeStream();
+ this.isRunning = false;
span.setStatus({ code: 1 });
return false;
} else {
+ this.isRunning = false;
console.log(
`${this.getSimulatorName()} backfill completed. Real-time generation disabled.`,
);
span.setStatus({ code: 1 });
return true;
}
} catch (error) {
+ this.isRunning = false;
console.error(`Error in ${this.getSimulatorName()}:`, error);Also applies to: 129-145
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/simulators/edge-cases-simulator.ts` at line 98, The isRunning flag is set
to true before phase 1 but never cleared on the noRealtime path, on startup
errors, or after a one-shot run ends; update the startup routine where
this.isRunning is set (the method that sets this.isRunning = true before phase
1) to ensure you clear this.isRunning = false in all early-exit paths: the
noRealtime branch, any catch/error handlers during startup, and after completing
a one-shot run. Locate and update the same logic referenced around lines 129-145
(the subsequent phase/cleanup code) to perform the same reset so the instance
accurately reports running state after failures or one-shot completion.
lucaslopezf
left a comment
There was a problem hiding this comment.
I gave it a quick look to keep consistency with the existing patterns, but I didn’t do a deep review honestly. I tested it locally and it works well, so LGTM. Just one nit comment
|
|
||
| import { Command } from 'commander'; | ||
| import { initializeTracing } from './tracing'; | ||
| import { HostSimulator } from './simulators/host-simulator'; | ||
| import { WeatherSimulator } from './simulators/weather-simulator'; | ||
| import { UniqueMetricsSimulator } from './simulators/unique-metrics-simulator'; | ||
| import { HistogramsSimulator } from './simulators/histograms-simulator'; | ||
| import { simianLogger } from './logger'; | ||
| import { trace } from '@opentelemetry/api'; | ||
| import { Client } from '@elastic/elasticsearch'; | ||
| import { createElasticsearchClient } from './utils/elasticsearch-client'; | ||
|
|
||
| const tracer = trace.getTracer('simian-forge'); | ||
|
|
||
| async function purgeDataStreams(elasticsearchUrl: string, elasticsearchAuth: string, elasticsearchApiKey: string, dataset: string, format?: string, count?: number): Promise<void> { | ||
| return tracer.startActiveSpan('purgeDataStreams', async (span) => { | ||
| import { Command } from "commander"; |
There was a problem hiding this comment.
To avoid formatting issues in the future, we could add a .prettierrc file and a pre-commit hook, or something similar. Just like we do in Kibana, what do you think?
There was a problem hiding this comment.
makes sense. Let's do it in a separate PR
Add same-metrics and edge-cases datasets for time series UI test scenarios
Summary
Introduces two new datasets for testing the Metrics experience in Discover:
same-metrics-- 9 scenario pairs (15 data streams) where each pair shares the metric namerequest_durationbut differs along one axis (dimensions, ES metric type, field type, histogram variant, or unit).edge-cases-- 6 data streams that each undergo a mid-backfill rollover to change their mapping, simulating real-world metric type, field type, or unit updates over time.Build first
npm run buildDataset:
same-metrics./forge --dataset same-metrics [--interval 1m] [--backfill now-1h] [--purge]--countis ignored; all 15 streams are always populated.same-metric-identical-asame-metric-identical-bsame-metric-different-dims-host-regionsame-metric-different-dims-host-envsame-metric-different-estype-gaugesame-metric-different-estype-countersame-metric-different-estype-histogramsame-metric-different-estype-gauge(shared)same-metric-different-fieldtype-longsame-metric-different-fieldtype-doublesame-metric-different-histogram-tdigestsame-metric-different-histogram-exponentialsame-metric-different-histogram-tdigest(shared)same-metric-different-histogram-legacysame-metric-different-unit-nullsame-metric-different-unit-mssame-metric-different-unit-ms(shared)same-metric-different-unit-secondsShared metric:
metric.name=request_duration, fieldrequest_duration. Each document includestest.scenarioandtest.data_streamfor filtering in Discover.Dataset:
edge-cases./forge --dataset edge-cases [--interval 1m] [--purge]--countand--backfillare ignored; the dataset generates a fixed 10-minute window (5 min per phase).Each stream starts with one mapping (phase 1), then the index template is updated, the data stream is rolled over, and data continues with the new mapping (phase 2). Each stream uses its stream name as both
metric.nameand the metric field name.edge-case-gauge-to-counteredge-case-histogram-to-gaugeedge-case-histogram-to-tdigestedge-case-tdigest-to-exponentialedge-case-unit-null-to-msedge-case-unit-ms-to-sOther changes
buildHistogramValue,buildTdigestValue,buildExponentialHistogramValue) intosrc/formatters/histogram-helpers.ts, used by both datasets.runBackfillStreamandrunRealTimeStreaminBaseSimulatorprotected with optional start/end params so subclasses can implement multi-phase backfill.This PR was prepared with AI assistance.
Summary by CodeRabbit
New Features
Enhancements
Documentation
Chores