[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940
Draft
jordan-wong wants to merge 1 commit intomasterfrom
Draft
[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940jordan-wong wants to merge 1 commit intomasterfrom
jordan-wong wants to merge 1 commit intomasterfrom
Conversation
…ate) Generated by apm-instrumentation-toolkit using java_integration workflow. This is a BLIND TEST run - gson was deleted from repo before generation. Agent had ZERO access to original implementation (shallow clone + config override). **Generation Metrics:** - Runtime: 425.3s (7.1 minutes) - Agent turns: 96 - Cost: $3.29 **Layer 1 Validation:** ✅ ALL PASS - compileJava: ✅ PASS - spotlessCheck: ✅ PASS - codenarcTest: ✅ PASS - muzzle: ✅ PASS - test: ✅ PASS - latestDepTest: ✅ PASS **Key Innovations:** - NEW: GsonHelper abstraction class for CallDepthThreadLocalMap - Broader method matchers (catches all toJson/fromJson overloads) - Cleaner code structure with consistent naming **Contamination Check:** ✅ ZERO - Verified agent logs show no git show commands - All file paths show /tmp/dd-trace-java-gson-clean/ - Agent used jackson-core and hystrix as references (both exist in clean clone) **Evaluation:** See eval-comparison/ directory for comprehensive analysis 🤖 Generated with apm-instrumentation-toolkit
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 10 unstable metrics.
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053300
Total [baseline] (10.928 s) : 0, 10927723
Agent [candidate] (1.058 s) : 0, 1057761
Total [candidate] (11.007 s) : 0, 11007414
section appsec
Agent [baseline] (1.246 s) : 0, 1245997
Total [baseline] (11.12 s) : 0, 11119572
Agent [candidate] (1.256 s) : 0, 1256060
Total [candidate] (11.259 s) : 0, 11258982
section iast
Agent [baseline] (1.23 s) : 0, 1229703
Total [baseline] (11.262 s) : 0, 11262227
Agent [candidate] (1.234 s) : 0, 1233566
Total [candidate] (11.376 s) : 0, 11376452
section profiling
Agent [baseline] (1.183 s) : 0, 1182876
Total [baseline] (10.963 s) : 0, 10962994
Agent [candidate] (1.199 s) : 0, 1199409
Total [candidate] (11.055 s) : 0, 11054585
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (626.936 ms) : 0, 626936
BytebuddyAgent [candidate] (629.134 ms) : 0, 629134
AgentMeter [baseline] (29.243 ms) : 0, 29243
AgentMeter [candidate] (29.358 ms) : 0, 29358
GlobalTracer [baseline] (255.94 ms) : 0, 255940
GlobalTracer [candidate] (257.109 ms) : 0, 257109
AppSec [baseline] (31.598 ms) : 0, 31598
AppSec [candidate] (31.768 ms) : 0, 31768
Debugger [baseline] (60.43 ms) : 0, 60430
Debugger [candidate] (60.33 ms) : 0, 60330
Remote Config [baseline] (590.817 µs) : 0, 591
Remote Config [candidate] (590.862 µs) : 0, 591
Telemetry [baseline] (7.989 ms) : 0, 7989
Telemetry [candidate] (8.068 ms) : 0, 8068
Flare Poller [baseline] (3.56 ms) : 0, 3560
Flare Poller [candidate] (4.307 ms) : 0, 4307
section appsec
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (658.283 ms) : 0, 658283
BytebuddyAgent [candidate] (661.683 ms) : 0, 661683
AgentMeter [baseline] (12.105 ms) : 0, 12105
AgentMeter [candidate] (12.304 ms) : 0, 12304
GlobalTracer [baseline] (257.853 ms) : 0, 257853
GlobalTracer [candidate] (260.959 ms) : 0, 260959
IAST [baseline] (24.142 ms) : 0, 24142
IAST [candidate] (24.657 ms) : 0, 24657
AppSec [baseline] (177.599 ms) : 0, 177599
AppSec [candidate] (179.484 ms) : 0, 179484
Debugger [baseline] (65.93 ms) : 0, 65930
Debugger [candidate] (66.779 ms) : 0, 66779
Remote Config [baseline] (631.667 µs) : 0, 632
Remote Config [candidate] (624.83 µs) : 0, 625
Telemetry [baseline] (8.365 ms) : 0, 8365
Telemetry [candidate] (8.416 ms) : 0, 8416
Flare Poller [baseline] (3.623 ms) : 0, 3623
Flare Poller [candidate] (3.657 ms) : 0, 3657
section iast
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (796.847 ms) : 0, 796847
BytebuddyAgent [candidate] (800.204 ms) : 0, 800204
AgentMeter [baseline] (11.422 ms) : 0, 11422
AgentMeter [candidate] (11.606 ms) : 0, 11606
GlobalTracer [baseline] (247.635 ms) : 0, 247635
GlobalTracer [candidate] (247.98 ms) : 0, 247980
IAST [baseline] (25.431 ms) : 0, 25431
IAST [candidate] (25.433 ms) : 0, 25433
AppSec [baseline] (26.665 ms) : 0, 26665
AppSec [candidate] (26.683 ms) : 0, 26683
Debugger [baseline] (70.52 ms) : 0, 70520
Debugger [candidate] (68.561 ms) : 0, 68561
Remote Config [baseline] (538.034 µs) : 0, 538
Remote Config [candidate] (515.251 µs) : 0, 515
Telemetry [baseline] (9.825 ms) : 0, 9825
Telemetry [candidate] (11.276 ms) : 0, 11276
Flare Poller [baseline] (3.479 ms) : 0, 3479
Flare Poller [candidate] (3.949 ms) : 0, 3949
section profiling
crashtracking [baseline] (1.17 ms) : 0, 1170
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (682.794 ms) : 0, 682794
BytebuddyAgent [candidate] (692.943 ms) : 0, 692943
AgentMeter [baseline] (8.986 ms) : 0, 8986
AgentMeter [candidate] (9.102 ms) : 0, 9102
GlobalTracer [baseline] (215.459 ms) : 0, 215459
GlobalTracer [candidate] (218.223 ms) : 0, 218223
AppSec [baseline] (32.086 ms) : 0, 32086
AppSec [candidate] (32.703 ms) : 0, 32703
Debugger [baseline] (64.47 ms) : 0, 64470
Debugger [candidate] (66.623 ms) : 0, 66623
Remote Config [baseline] (564.797 µs) : 0, 565
Remote Config [candidate] (586.107 µs) : 0, 586
Telemetry [baseline] (8.48 ms) : 0, 8480
Telemetry [candidate] (7.828 ms) : 0, 7828
Flare Poller [baseline] (4.21 ms) : 0, 4210
Flare Poller [candidate] (3.551 ms) : 0, 3551
ProfilingAgent [baseline] (93.724 ms) : 0, 93724
ProfilingAgent [candidate] (94.876 ms) : 0, 94876
Profiling [baseline] (94.285 ms) : 0, 94285
Profiling [candidate] (95.442 ms) : 0, 95442
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1061263
Total [baseline] (8.836 s) : 0, 8835661
Agent [candidate] (1.058 s) : 0, 1058319
Total [candidate] (8.838 s) : 0, 8837746
section iast
Agent [baseline] (1.222 s) : 0, 1222093
Total [baseline] (9.527 s) : 0, 9527343
Agent [candidate] (1.226 s) : 0, 1225838
Total [candidate] (9.539 s) : 0, 9539038
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.23 ms) : 0, 1230
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (632.895 ms) : 0, 632895
BytebuddyAgent [candidate] (629.962 ms) : 0, 629962
AgentMeter [baseline] (29.574 ms) : 0, 29574
AgentMeter [candidate] (29.349 ms) : 0, 29349
GlobalTracer [baseline] (257.364 ms) : 0, 257364
GlobalTracer [candidate] (257.18 ms) : 0, 257180
AppSec [baseline] (31.632 ms) : 0, 31632
AppSec [candidate] (31.756 ms) : 0, 31756
Debugger [baseline] (59.611 ms) : 0, 59611
Debugger [candidate] (59.599 ms) : 0, 59599
Remote Config [baseline] (585.298 µs) : 0, 585
Remote Config [candidate] (592.191 µs) : 0, 592
Telemetry [baseline] (8.034 ms) : 0, 8034
Telemetry [candidate] (8.163 ms) : 0, 8163
Flare Poller [baseline] (4.249 ms) : 0, 4249
Flare Poller [candidate] (4.36 ms) : 0, 4360
section iast
crashtracking [baseline] (1.213 ms) : 0, 1213
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (792.974 ms) : 0, 792974
BytebuddyAgent [candidate] (795.263 ms) : 0, 795263
AgentMeter [baseline] (11.383 ms) : 0, 11383
AgentMeter [candidate] (11.358 ms) : 0, 11358
GlobalTracer [baseline] (245.929 ms) : 0, 245929
GlobalTracer [candidate] (247.186 ms) : 0, 247186
IAST [baseline] (25.28 ms) : 0, 25280
IAST [candidate] (25.379 ms) : 0, 25379
AppSec [baseline] (26.429 ms) : 0, 26429
AppSec [candidate] (26.508 ms) : 0, 26508
Debugger [baseline] (67.166 ms) : 0, 67166
Debugger [candidate] (67.077 ms) : 0, 67077
Remote Config [baseline] (523.851 µs) : 0, 524
Remote Config [candidate] (529.501 µs) : 0, 530
Telemetry [baseline] (11.175 ms) : 0, 11175
Telemetry [candidate] (11.249 ms) : 0, 11249
Flare Poller [baseline] (3.994 ms) : 0, 3994
Flare Poller [candidate] (3.958 ms) : 0, 3958
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 1 performance regressions! Performance is the same for 16 metrics, 15 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (19.136 ms) : 18944, 19328
. : milestone, 19136,
appsec (18.913 ms) : 18721, 19105
. : milestone, 18913,
code_origins (17.642 ms) : 17468, 17815
. : milestone, 17642,
iast (17.807 ms) : 17630, 17983
. : milestone, 17807,
profiling (18.568 ms) : 18383, 18754
. : milestone, 18568,
tracing (17.73 ms) : 17554, 17905
. : milestone, 17730,
section candidate
no_agent (18.065 ms) : 17880, 18250
. : milestone, 18065,
appsec (19.922 ms) : 19715, 20129
. : milestone, 19922,
code_origins (17.657 ms) : 17483, 17831
. : milestone, 17657,
iast (18.068 ms) : 17888, 18248
. : milestone, 18068,
profiling (18.512 ms) : 18331, 18694
. : milestone, 18512,
tracing (17.532 ms) : 17356, 17708
. : milestone, 17532,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.182 ms) : 1170, 1194
. : milestone, 1182,
iast (3.121 ms) : 3080, 3162
. : milestone, 3121,
iast_FULL (6.096 ms) : 6033, 6159
. : milestone, 6096,
iast_GLOBAL (3.782 ms) : 3719, 3846
. : milestone, 3782,
profiling (2.321 ms) : 2297, 2345
. : milestone, 2321,
tracing (1.774 ms) : 1760, 1789
. : milestone, 1774,
section candidate
no_agent (1.17 ms) : 1159, 1181
. : milestone, 1170,
iast (3.207 ms) : 3164, 3249
. : milestone, 3207,
iast_FULL (5.867 ms) : 5808, 5927
. : milestone, 5867,
iast_GLOBAL (3.585 ms) : 3526, 3644
. : milestone, 3585,
profiling (1.981 ms) : 1964, 1999
. : milestone, 1981,
tracing (1.788 ms) : 1774, 1803
. : milestone, 1788,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (14.847 s) : 14847000, 14847000
. : milestone, 14847000,
appsec (14.814 s) : 14814000, 14814000
. : milestone, 14814000,
iast (18.905 s) : 18905000, 18905000
. : milestone, 18905000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
. : milestone, 17785000,
profiling (15.011 s) : 15011000, 15011000
. : milestone, 15011000,
tracing (14.98 s) : 14980000, 14980000
. : milestone, 14980000,
section candidate
no_agent (15.516 s) : 15516000, 15516000
. : milestone, 15516000,
appsec (14.521 s) : 14521000, 14521000
. : milestone, 14521000,
iast (17.835 s) : 17835000, 17835000
. : milestone, 17835000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
. : milestone, 17785000,
profiling (15.387 s) : 15387000, 15387000
. : milestone, 15387000,
tracing (14.812 s) : 14812000, 14812000
. : milestone, 14812000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
. : milestone, 1482,
appsec (3.79 ms) : 3570, 4009
. : milestone, 3790,
iast (2.261 ms) : 2192, 2330
. : milestone, 2261,
iast_GLOBAL (2.309 ms) : 2240, 2379
. : milestone, 2309,
profiling (2.115 ms) : 2059, 2172
. : milestone, 2115,
tracing (2.085 ms) : 2031, 2139
. : milestone, 2085,
section candidate
no_agent (1.479 ms) : 1468, 1491
. : milestone, 1479,
appsec (3.816 ms) : 3594, 4037
. : milestone, 3816,
iast (2.267 ms) : 2198, 2335
. : milestone, 2267,
iast_GLOBAL (2.312 ms) : 2242, 2381
. : milestone, 2312,
profiling (2.093 ms) : 2038, 2147
. : milestone, 2093,
tracing (2.08 ms) : 2027, 2134
. : milestone, 2080,
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AI-generated instrumentation for Gson 1.6 using the apm-instrumentation-toolkit. This is a blind test evaluation - the original implementation was deleted before generation to ensure zero contamination.
🎯 Evaluation Context
📊 Generation Metrics
✅ Layer 1 Validation (Automated)
All checks passed:
💡 Key Innovations
📉 Known Regressions vs Original
📚 Comprehensive Analysis
See
eval-comparison/directory in apm-instrumentation-toolkit for detailed evaluation.🎓 Evaluation Outcome
Overall Score: Generated: 7.8/10 | Original: 7.5/10
Recommendation: Adopt with modifications - restore span metadata and add ClassLoader matcher.
🤖 Generated with apm-instrumentation-toolkit | Run #4 (Blind Test)