Skip to content

test+ci: phase 2.1 coverage push (cmd exclusion + ~30 new tests)#772

Merged
leoparente merged 8 commits into
developfrom
chore/coverage-exclude-cmd
May 11, 2026
Merged

test+ci: phase 2.1 coverage push (cmd exclusion + ~30 new tests)#772
leoparente merged 8 commits into
developfrom
chore/coverage-exclude-cmd

Conversation

@leoparente
Copy link
Copy Markdown
Contributor

@leoparente leoparente commented May 9, 2026

Phase 2.1 of the coverage initiative on top of #771. The branch grew from a one-line cmake change ("stop dragging cmd/ down the project number") into a focused push to lift project coverage toward 85%.

TL;DR per commit

Commit What it does
fdf97c9 Re-exclude cmd/* from the lcov EXCLUDE list so the broken integration tests don't anchor the report at 0%. Baseline shifts ~74.9% → ~77.3%.
1fa5fa2 Six new [netprobe][unit] tests on NetProbeMetricsManager / NetProbeMetricsBucket (process_failure across all ErrorTypes, TCP send/recv/timeout, process_filtered, to_prometheus, specialized_merge). SKIP() the two pre-existing tests that send real ICMP/TCP and bus-error in CI.
b8abf12 New src/tests/test_module_plugins.cpp walks every builtin plugin and exercises setup_routes(HttpServer*) + instantiate() + generate_input_name(). ~150 previously-uncovered lines across 16 tiny plugin .cpps.
41272d2 SKIP() two more pre-existing input-side NetProbe tests that send real probes.
4c98ef8 A uniform <handler> to_prometheus and to_opentelemetry backends TEST_CASE in every handler test (bgp, dhcp, pcap, input_resources, flow, dns v1+v2, net v1+v2). None of the existing tests had ever called either backend — both were 0%-covered.
bf8eb72 Cover the bucket-direct overloads that PCAP-fed tests don't reach: DNS v1 shallow process_dns_layer(l3, l4, QR), DNS v2 specialized_merge, Net v1 process_net_layer(PacketDirection, l3, l4, size), Net v2 process_net_layer(NetworkPacketDirection, l3, l4, size) + specialized_merge.
847676f Flow merge + to_prometheus + to_opentelemetry with every metric group enabled (enable=["all"]). Pulls the Conversations / TopTos / TopGeo / TopInterfaces formatting branches into coverage — they were dead from the defaults.

Scope choices worth noting

  • cmd/* is excluded for now, not because the code doesn't matter, but because the integration tests that exercise it are broken. Revisit once those are fixed.
  • Flaky tests are SKIP()-ed, not deleted. Four pre-existing [netprobe][...] cases that bus-error in CI (no raw-socket privileges, no external network) are now visibly skipped with a one-line reason. They're cheap to revive once a privileged or network-isolated CI lane exists, but they were aborting the test process before any other test in the same binary could run, which prevented coverage capture.
  • All new tests are deterministic and offline. No network, no fixtures beyond what's already in the tree.

Test plan

  • ctest --output-on-failure locally: all relevant suites green (the pre-existing unit-tests-input-dnstap / unit-tests-input-flow failures were verified to reproduce on develop with the changes stashed — they're orthogonal).
  • CI's code-coverage job: posts the actual project % via the zgosalvez/github-actions-report-lcov comment. From the local rough math: each commit contributes single digits, but combined we should land within striking distance of 85%.
  • Codecov upload still working (it's the same upload step, untouched).

Integration tests are currently broken so the cmd/ entry points
(pktvisord, pktvisor-reader) aren't being exercised by ctest. Showing
them at 0% drags the project number down without informing decisions.
Restore the cmd/* exclusion to the lcov EXCLUDE list (it was dropped
in chore/coverage-on-prs); revisit once integration tests are fixed.
@leoparente
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Phase 2.1 of the coverage initiative. NetProbeStreamHandler.cpp was the
worst-covered handler (41.5% line coverage). The pre-existing tests in
this file relied on real network and raw-socket privileges (`localhost`
ICMP ping, `www.google.com:80` TCP) and only asserted `attempts >= 0`,
which always passes — they Bus-errored in CI without verifying anything
meaningful.

Add six new TEST_CASEs tagged [netprobe][unit] that exercise the
manager and bucket APIs directly through a small `UnitFixture` (which
constructs the handler and calls start() so the metrics manager's
_groups bitset is configured — directly constructing a bare manager
segfaults in group_enabled()):

- process_failure walks every ErrorType enum value and verifies the
  correct counter (dns_lookup_failures / packets_timeout /
  connect_failures) increments per target via to_json()
- process_netprobe_tcp send/recv exercises the success path of the
  transaction manager (start → end within TTL → new_transaction)
- process_netprobe_tcp timeout exercises the timed-out path with a
  6s gap exceeding the default 5s TTL
- process_filtered exercises the filtered-event path
- to_prometheus is exercised by emitting a failure and grepping the
  output for the metric name + target label
- specialized_merge across two managers verifies bucket aggregation

Skip the two pre-existing network-dependent tests with SKIP() so they
no longer abort the test process before the new ones run (Catch2
randomizes order). They're left in place for someone with priviledges
to revisit, but the unit tests above cover the same code paths
deterministically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

LCOV of commit 303c93a during Debug Builds #113

  lines......: 81.0% (13529 of 16698 lines)
  functions..: 72.4% (1396 of 1929 functions)
  branches...: no data found

Files changed coverage rate: n/a

Full coverage report

leoparente and others added 5 commits May 9, 2026 21:31
Phase 2.1 easy-win: every *HandlerModulePlugin.cpp and
*InputModulePlugin.cpp was sitting at 0% coverage because:

- setup_routes() is called only from init_plugin() when start() is
  invoked with a non-null HttpServer; existing tests pass nullptr,
  short-circuiting the call.
- instantiate() is only reached through full Tap/Policy creation,
  which doesn't enumerate every plugin.

Add src/tests/test_module_plugins.cpp with three TEST_CASEs:

1. CoreRegistry::start with a real HttpServer — exercises every
   plugin's setup_routes() body in one shot. Most are empty stubs but
   they all need to compile and not crash; previously all 16 were 0%.

2. Iterate registry.input_plugins(), invoke instantiate() and
   generate_input_name() on each. Verifies basic plugin contract.

3. Iterate registry.handler_plugins(); for each, look up a compatible
   input proxy from the kHandlerProxy table (derived from each
   handler's dynamic_cast<*InputEventProxy*>(proxy) call sites) and
   call instantiate() with that proxy.

Net effect: ~150 previously-uncovered lines across 16 tiny .cpp files
are now exercised in a single fast unit test (~30ms locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the handler-side fix in 1fa5fa2: NetProbe Configs
and NetProbe TCP config send real ICMP/TCP probes to localhost and
example.com, which segfaults in CI without raw-socket privileges or
outbound network. Only assert the config round-trips, which the
NetProbe Boundaries / Test Configs fail / invalid config TEST_CASEs
already validate deterministically.

Skip with SKIP() so they show up as explicitly skipped rather than
silently dropped.
…handler

Phase 2.1 — none of the handler test suites called to_prometheus() or
to_opentelemetry() at all. Both backends are typically 30-60 lines of
output formatting per handler that was sitting at 0% line coverage.

Add a uniform "<handler> to_prometheus and to_opentelemetry backends"
TEST_CASE in each existing test file. Each one feeds the same fixture
the existing tests already use (or, for the flow handler, the sFlow
ecmp.pcap), constructs the handler, then:

  std::stringstream prom;
  handler.metrics()->bucket(0)->to_prometheus(prom, {});
  CHECK(prom.str().find("<schema>_") != std::string::npos);

  opentelemetry::proto::metrics::v1::ScopeMetrics scope;
  timespec start_ts{}, end_ts{};
  handler.metrics()->bucket(0)->to_opentelemetry(scope, start_ts, end_ts, {});
  CHECK(scope.metrics_size() > 0);

Covers nine handlers: bgp, dhcp, pcap, input_resources, flow, dns v1,
dns v2, net v1, net v2. Net v1's prometheus prefix is "packets_" and
v2's is "net_" (different schemas); input_resources emits cross-schema
metrics so just asserts the output is non-empty.

Net effect (rough): each handler picks up ~50-100 newly covered lines
in its StreamHandler.cpp. ~600 covered lines total across 9 files,
roughly +3-4 percentage points on the project number.
Phase 2.1 follow-up: hit the bucket methods that the existing PCAP-fed
tests don't reach because they're only invoked via specific filter or
merge code paths.

- DNS v1: DnsMetricsBucket::process_dns_layer(l3, l4, QR) — the
  no-payload overload used in pre-filter passes. Calls it with three
  combinations (UDP query, UDP response, TCP query) and checks the
  wire_packets counters in to_json.

- DNS v2: DnsMetricsBucket::specialized_merge — runs the same fixture
  through two independent handler instances, then merges bucket(0)
  from one into bucket(0) of the other and asserts to_json still emits.

- Net v1: NetworkMetricsBucket::process_net_layer(PacketDirection,
  l3, l4, size) — direct calls with toHost/fromHost/unknown across
  IPv4+UDP, IPv4+TCP, IPv6+UDP.

- Net v2: NetworkMetricsBucket::process_net_layer(NetworkPacketDirection,
  l3, l4, size) using in/out/unknown — and immediately after,
  specialized_merge across two populated v2 buckets.

All four use the same const_cast trick on bucket(0) (the metrics
manager exposes it const) since the bucket-direct methods on
NetworkMetricsBucket / DnsMetricsBucket are non-const.
…etry

with every metric group enabled

FlowStreamHandler.cpp is the worst-covered handler (52.9%). The default
group set leaves Conversations, TopTos, TopGeo, and TopInterfaces off,
which means every group_enabled() branch behind those four is dead
code from the existing tests' perspective.

Add a TEST_CASE that builds two FlowStreamHandlers over ecmp.pcap with
`enable=["all"]`, merges bucket(0) of one into the other, then renders
to_prometheus + to_opentelemetry on the merged bucket. Walking the
fully-populated, all-groups-on tree pulls in:

- specialized_merge code paths in FlowMetricsBucket
- the Conversations / TopTos / TopGeo / TopInterfaces formatting
  branches in to_json/to_prometheus/to_opentelemetry that were never
  reached
- the per-direction (InBytes/OutBytes/InPackets/OutPackets) emitters
  that gate behind ByBytes/ByPackets which are now both on
@leoparente leoparente changed the title ci(coverage): exclude cmd/* from coverage report test+ci: phase 2.1 coverage push (cmd exclusion + ~30 new tests) May 10, 2026
Per review feedback the backend tests just verified prom output started
with the right schema prefix and that the otel scope had >0 metrics —
neither caught regressions in actual metric content. Tighten every check
to round-trip a known counter value through both backends.

New shared helper libs/visor_test/catch2/otel_helpers.hpp:

  otel_gauge_value(scope, name)  -> first int gauge data point
  otel_gauge_sum(scope, name)    -> sum across data points (for v2
                                    handlers that slice by `direction`)

Per-handler tightening:

- BGP backends: prom must contain `bgp_wire_packets_{total,open,update,
  keepalive}` lines with values 9/2/4/3 (matches the parse test
  fixture); otel mirrors them.
- DHCP backends: same idea — DISCOVER=1, OFFER=1, REQUEST=3, ACK=3.
- DNS v1 backends: switched fixture to dns_ipv4_udp.pcap so we can
  assert UDP=140, IPv4=140, queries=70, replies=70 against both
  backends.
- DNS v2 backends: same fixture; v2 collapses query+reply into xacts,
  metrics use the suffix-form `dns_xacts`/`dns_udp_xacts`/`dns_ipv4_xacts`.
  v2 also emits per-direction series so we use otel_gauge_sum.
- Net v1 backends: assert `packets_udp/ipv4/ipv6` == 140/140/0 — v1
  has flat (un-labelled) counters.
- Net v2 backends: same fixture, but v2 metric names use the
  `_packets` suffix and slice by direction → use otel_gauge_sum.

Merge tests:

- Flow specialized_merge: snapshot `flow_records_flows` per bucket
  before merge, assert post-merge value equals their sum.
- DNS v2 specialized_merge: same pattern with `dns_xacts`.
- Net v2 process_net_layer + merge: assert `net_total_packets` == 4
  after the four direct process_net_layer calls and the merge.

Overload tests:

- DNS v1 process_dns_layer(l3, l4, QR): snapshot pre-call counters,
  invoke 2 UDP queries / 1 UDP response / 1 TCP query, assert exact
  delta (+3 queries, +1 reply, +3 udp, +1 tcp) via otel.
- Net v1 process_net_layer(dir, l3, l4, size): snapshot counters,
  invoke 3 calls (toHost UDP IPv4 / fromHost TCP IPv4 / unknown UDP
  IPv6), assert exact deltas via otel.
@leoparente leoparente marked this pull request as ready for review May 10, 2026 22:42
@leoparente leoparente self-assigned this May 10, 2026
@leoparente leoparente merged commit 9087769 into develop May 11, 2026
18 checks passed
@leoparente leoparente deleted the chore/coverage-exclude-cmd branch May 11, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants