Merc 6849 llo benchmarking on staging mainnet don #481

namikmesic · 2025-05-08T12:05:27Z

No description provided.

• Moved Ajv schema compilation out of the hot path • Introduced `schema-cache.ts` with a WeakMap<object, ValidateFunction> • `InputParameters.validateInput` now: – strips top-level nulls (matches legacy “missing field” semantics) – runs a pre-compiled Ajv validator before per-param logic * **schema-cache.ts** – Creates a single process-wide Ajv instance – Caches compiled validators in a WeakMap keyed by the schema object – Falls back to on-the-fly compile for boolean schemas (true/false) * **input-params.ts** – Builds JSON-Schema once in the constructor (`definitionToJsonSchema`) – Fetches/compiles validator via `getValidator` – Adds fast upfront Ajv validation (`validateFn`) – Sanitises `null` values before validation to preserve previous behaviour – Replaces ad-hoc runtime checks with brace-wrapped `if` blocks – Types tightened (`ValidateFunction`, `ErrorObject`) and all `any` removed – ESLint/TS warnings fixed (curly, index-signature, unused directives)

…e logic • Introduced persistent HTTP and HTTPS agents with keep-alive enabled, limiting socket churn and improving latency. • Simplified and clarified queue processing logs and error handling logic. • Moved keep-alive agent initialization into the Requester constructor to ensure global Axios defaults are set. * **Requester.ts** – Adds persistent HTTP(S) agents with configurable max sockets (`MAX_PARALLEL_HTTP_SOCKETS` defaulting to 128). – Simplifies logging to clearly indicate queue actions (enqueue, overflow, dequeue, retries). – Streamlines queue processing logic, preserving request coalescing behavior. * **Performance** – Reduces overhead from socket creation/teardown, improving throughput for concurrent outbound HTTP requests.

… overhead • Replaced SHA-1 + Base64 with FarmHash’s 64-bit `fingerprint64` converted to a 16-char hex string, eliminating intermediate `Buffer` allocations. • Cuts hash computation time and reduces event-loop blocking and lowering GC churn under high QPS. • Maintains deterministic, fixed-length key segments and preserves existing JSON-stringify lowercasing logic.

…ustify shutdown • Refactored background execution to spawn a dedicated loop for every (endpoint × transport) pair, improving isolation and resilience. • Timers are now managed in a Map keyed by "endpoint:transport", ensuring precise control and cleanup. • Enhanced shutdown logic: all timers are reliably cleared when the HTTP server closes, preventing resource leaks. • Improved error and timeout handling: background loops always survive exceptions, maintaining continuous operation. • Metrics are now labeled per (endpoint, transport) for more granular observability. * **background-executor.ts** – Replaces single-loop logic with per-transport loop spawning. – Adds robust timer management and graceful shutdown via a centralized stopAll routine. – Refactors metric collection to use consistent, cached label handles. – Ensures backgroundExecute errors and timeouts do not interrupt future executions. * **Test Coverage** – All existing unit tests updated and passing. – Test expectations adjusted for new scheduling granularity. * **Performance & Reliability** – Reduces risk of cross-transport interference and timer leaks. – More maintainable and observable background execution model.

• Introduced `fast-serialize.ts` module for optimized JSON serialization of adapter responses, aiming to reduce CPU/GC overhead. • Added `FAST_SERIALIZATION_ENABLED` configuration option (default: true) to toggle the feature. • Integrated custom serialization into Fastify's reply pipeline, conditional on the new setting. • Added `ea_response_serialization_duration_seconds` Prometheus histogram to track 'fast' vs. 'standard' serialization performance. • Implemented comprehensive unit tests for the new fast serialization logic, covering correctness and various data structures. • Added API-level tests to verify custom serializer integration with Fastify. • Achieved 100% test coverage for the schema validation caching utility (`src/validation/schema-cache.ts`) with new test cases. * **src/util/fast-serialize.ts** – New module providing `serializeResponse`, `serializeSuccessResponse`, `serializeErrorResponse`, and `escapeString`. * **src/index.ts** – Modified `buildRestApi` to use custom serializer via `reply.serializer()` when `FAST_SERIALIZATION_ENABLED` is true. * **src/config/index.ts** – Added `FAST_SERIALIZATION_ENABLED` setting. * **src/metrics/index.ts** – Added `ea_response_serialization_duration_seconds` metric. * **test/util/fast-serialize.test.ts** – New file with unit tests for `serializeResponse` and its helpers. * **test/api-serialization.test.ts** – New file testing Fastify integration for serialization. * **test/validation/schema-cache.test.ts** – New file with tests for `getValidator` caching and validation rules.

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: David de Kloet <[email protected]>

…t-don

alejoberardino

Potential avenues to take for improvements:

The redis cache implementation that EAs use have a local in-memory cache, as a sort of layered cache implementation to not have to read from redis all the time. This has evolved since the first implementations, but you can see that it returns directly if the values are found in memory. Some things here:
- Looking at the metrics however I see that the adapter makes a lot of get requests to redis anyways, and I would expect much fewer if the in-memory cache was working as intended. Fixing this would be a huge win for timeliness of requests.
- Redis is also used as a "subscription set", which serves as the layer in between read and write to let the adapter know what values to fetch. This should also have an in-memory layer, because the zadd ops to add to this set are also hundreds per second and they are not necessary: the entries will fall from the set after a while, probably minutes. This should be another huge win.

alejoberardino · 2025-05-09T20:36:45Z

src/background-executor.ts


-  // Checks if an individual transport has a backgroundExecute function, and executes it if it does
-  const callBackgroundExecute = (
+  // Spawn one loop per (endpoint × transport)


This was already setting up a separate one per bg, I don't think this should have a meaningful impect. Maybe the recursive vs separate timeout could, but in theory it should not have an effect

alejoberardino · 2025-05-09T20:37:43Z

src/background-executor.ts

-      logger.debug(`Clearing timeout for endpoint "${endpointName}"`)
-      timeoutsMap[endpointName].unref()
-      clearTimeout(timeoutsMap[endpointName])
+export function callBackgroundExecutes(adapter: Adapter, apiShutdownPromise?: Promise<void>): void {


All this would be on the writer side and executed not very frequently, I would advise to separate this change from the first attempts to derisk

alejoberardino · 2025-05-09T20:49:55Z

src/cache/index.ts

-    shasum.update(cacheKey)
-    return shasum.digest('base64')
-  }
+    const digest = farmhash.fingerprint64(cacheKey)


This indeed could save time, as it's one of the things that is executed quite frequently and on reads. These cache keys are at the core of the framework, serving as the identifier for requests. A couple of things to note here:

This is an improvement to make for sure, but it may not affect adapters, since it's only applied if the key is too long

The original cache key is just stringifying the object. This is where most effort will be done most likely, profiling should be able to tell what's going on.

If we're doing streams specific features, for the cost of higher payload sizes this calculation could be done once at the job level and the cache key sent out directly as a parameter in the payload, bypassing all this calculation entirely and speeding up the reads but putting the onus of making sure these are correct on the job production.

alejoberardino · 2025-05-09T21:06:18Z

src/util/fast-serialize.ts

+ * @param response - The response object to serialize
+ * @returns JSON string representation of the response
+ */
+export function serializeResponse<T extends ResponseGenerics>(


I'd be really curious to A/B test this feature, as the JSON.stringify call even tho slow it's using a c lib iirc, but probably for small payloads the string concatenation is faster? It seems somewhat brittle to me though, it's good to try but the approach should be stricter if it turns out it gives significant benefits

alejoberardino · 2025-05-09T21:07:54Z

src/util/requester.ts

- * It additionally serves to coalesce requests by utilizing a more complex queue structure:
- *   - ignores duplicate items via a provided key
- *   - doesn't use the request itself because it's common for those to have things like timestamps/nonces
+ * Manages outbound HTTP requests with queueing, rate limiting, and connection reuse.


This one I would honestly remove entirely. It adds complexity but it's only useful for outbound http requests, and we're not really doing that many. Most LL adapters use a streaming protocol anyways, so this should not affect performance at all.

alejoberardino · 2025-05-09T21:08:51Z

src/validation/input-params.ts

+/* -------------------------------------------------------------------------- */
+/*                               TYPE HELPERS                                 */
+/* -------------------------------------------------------------------------- */
+


I would DEFINITELY not remove all these comments haha, I added them with a lot of love and I'm sure our AI overlords appreciate them even if they hallucinate to remove them

alejoberardino · 2025-05-09T21:11:55Z

src/validation/input-params.ts

+
+    /* New: compile schema once */
+    this.schema = definitionToJsonSchema(this.definition)
+    this.validateFn = getValidator(this.schema)


We tried ajv fast too, but at the time the parameters were very complex and proved impossible to do the migrations properly so it was quickly rejected. Still, it's a big time sink on the read request execution, so any speedup helps a lot and worth a try. I would first consider the approach where the validation is skipped entirely for mercury jobs though, as we avoid this altogether. Paired with the suggestion to precalculate the cacheKeys, requests could be resolved in 2 lines only and be blazing fast

alejoberardino · 2025-05-09T21:14:06Z

Another thing to consider is parallelizing multiple read instances and a write instance in a single dockerfile, should be doable with some weird constructs and would not require any changes to the interfaces

namikmesic and others added 10 commits May 8, 2025 13:59

fix(background-executor.ts): Fix imports for background executor

8ea2f86

fix(package.json): Add missing dependencies

9af8605

Update dependency axios to v1.9.0 (#475)

086cc71

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

2.6.0 (#480)

e6e0c12

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: David de Kloet <[email protected]>

fix: Prettier

1c361d5

namikmesic force-pushed the MERC-6849-llo-benchmarking-on-staging-mainnet-don branch from 9810365 to 1c361d5 Compare May 8, 2025 12:39

namikmesic added 3 commits May 8, 2025 14:41

fix: Pin dependencies

0d483fe

Merge branch 'main' into MERC-6849-llo-benchmarking-on-staging-mainne…

6f1ac84

…t-don

fix: Try disabling cache

9dbfbe3

namikmesic force-pushed the MERC-6849-llo-benchmarking-on-staging-mainnet-don branch from 2a97965 to 9dbfbe3 Compare May 8, 2025 13:34

namikmesic added 5 commits May 8, 2025 15:40

fix: add corepack

93758cf

fix: Try disabling yarn cache

9d71afe

fix: regenerate yarn.lock

6a9b5e0

fix: add missing .yarnrc.yml

a3f401e

fix: linter

cbd589c

alejoberardino reviewed May 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merc 6849 llo benchmarking on staging mainnet don #481

Merc 6849 llo benchmarking on staging mainnet don #481

Uh oh!

namikmesic commented May 8, 2025

Uh oh!

alejoberardino left a comment

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino May 9, 2025

Uh oh!

alejoberardino commented May 9, 2025

Uh oh!

Uh oh!

Merc 6849 llo benchmarking on staging mainnet don #481

Are you sure you want to change the base?

Merc 6849 llo benchmarking on staging mainnet don #481

Uh oh!

Conversation

namikmesic commented May 8, 2025

Uh oh!

alejoberardino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alejoberardino commented May 9, 2025

Uh oh!

Uh oh!