Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Error converting traces to OTLP (\uFFFD) #2589

Open
joffrey-bion opened this issue Jan 16, 2025 · 14 comments
Open

[Bug]: Error converting traces to OTLP (\uFFFD) #2589

joffrey-bion opened this issue Jan 16, 2025 · 14 comments

Comments

@joffrey-bion
Copy link

What happened?

I'm running jaeger-all-in-one.

When importing some OTLP traces, I get Error converting traces to OTLP:

Image

When checking server logs, I see the following:

{
  "level": "error",
  "ts": 1737043807.5091002,
  "caller": "app/http_handler.go:494",
  "msg": "HTTP handler, Internal Server Error",
  "error": "cannot unmarshal OTLP : readUint32: unexpected character: \ufffd, error found in #10 byte of ...|esCount\":-1}],\"links|..., bigger context ...|AR to the classpath\"}}],\"droppedAttributesCount\":-1}],\"links\":[],\"status\":{\"code\":2},\"flags\":257},{\"|...",
  "stacktrace": "github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:494\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).transformOTLP\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:195\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:538\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:215\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:177\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:65\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.responseHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:19\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/[email protected]/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/[email protected]/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3210\nnet/http.(*conn).serve\n\tnet/http/server.go:2092"
}

The file itself doesn't contain this \uFFFD replacement char, so I'm guess there might be some encoding issue between the UI and the backend.

Steps to reproduce

  1. Launch jaeger all-in-one and go to the Upload tab
  2. Upload opentelemetry_traces.json (renamed to .json because GitHub doesn't like .jsonl) or opentelemetry_traces.json (2)
  3. Notice the error message in the UI and server logs

Expected behavior

The traces are imported successfully

Relevant log output

{
  "level": "error",
  "ts": 1737043807.5091002,
  "caller": "app/http_handler.go:494",
  "msg": "HTTP handler, Internal Server Error",
  "error": "cannot unmarshal OTLP : readUint32: unexpected character: \ufffd, error found in #10 byte of ...|esCount\":-1}],\"links|..., bigger context ...|AR to the classpath\"}}],\"droppedAttributesCount\":-1}],\"links\":[],\"status\":{\"code\":2},\"flags\":257},{\"|...",
  "stacktrace": "github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:494\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).transformOTLP\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:195\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:538\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:215\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:177\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:65\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.responseHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:19\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/[email protected]/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2220\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/[email protected]/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3210\nnet/http.(*conn).serve\n\tnet/http/server.go:2092"
}

Screenshot

No response

Additional context

No response

Jaeger backend version

1.62.0

SDK

opentelemetry-java with kotlin bindings

Pipeline

Jaeger all-in-one

Stogage backend

Jaeger all-in-one

Operating system

Windows

Deployment model

Local Jaeger all-in-one

Deployment configs

@yurishkuro
Copy link
Member

The same error happens when submitting directly to the server

$ curl -X POST -H "Content-Type: application/json" -d @ot.json http://localhost:16686/api/transform

IF you want to investigate yourself I would recommend bisecting the payload to reduce it to a single span that's failing.

@MAX-786
Copy link

MAX-786 commented Jan 22, 2025

Hi @joffrey-bion , hmm after going through your provided json (both) and and once i went through the error logs, i found:

  • that there's a key 'droppedAttributesCount' with negative value ie '-1' which seems to break the jaeger parser
  • and when i googled this unicode char it turned out to be replacement character (\uFFFD) and I am no jaeger expert but I think parser is throwing it because it doesn't expect negative value for that key in json.

So to fix that, what i did is one of the 2 things:

  • either remove the traces with 'droppedAttributesCount' key (there are 2 traces )
  • or change the value of it to >=0 (eg. 0).

And once clear cache/reload and after re-uploading the json , jaeger parses it .

Disclaimer: Again I am still getting used to how jaeger works so I dunno why you have negative value for "droppedAttributesCount" in first place and also i dunno changing value of 'droppedAttributesCount' , how it gonna affects. And from what i read here so i think based on that
Jaeger's OTLP parser expects unsigned integers for fields like droppedAttributesCount

I don't want to direct you into wrong direction so I hope someone from maintainers team check if i am getting it correctly?
@yurishkuro is this correct? or are it's something that needs to be fixed on jaeger?

@MAX-786
Copy link

MAX-786 commented Jan 22, 2025

@yurishkuro
Copy link
Member

yurishkuro commented Jan 22, 2025

@MAX-786 thanks for your investigation. It is spot-on. I checked the attached JSON file and see this:

"droppedAttributesCount":-1

Meanwhile, the OTLP format defines the value of this field as uint32
https://github.com/open-telemetry/opentelemetry-proto/blob/ffade295895a2be5a6e7931eb0cda1c72bc4c0f6/opentelemetry/proto/trace/v1/trace.proto#L237

So the provided JSON files are malformed, and are correctly rejected (albeit the error message could've been much better - it does mention readUint32 but could've mentioned the field name).

I am closing this issue as wontfix.

@yurishkuro yurishkuro closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2025
@joffrey-bion
Copy link
Author

Thank you for the help, @MAX-786. Yeah I had also figured out the replacement character, and the location of the -1 (the 10th byte of the snippet in the log). But I don't get why the parser would replace a valid character like - or 1 by the replacement character. This is normally meant for encoding errors.

You're right that it's possible that some deserialization process just decides -1 is so outrageous that it needs to be converted to the replacement character 😆 but that would be a very weird way to fail as opposed to a simple error mentioning the bounds.

I'm using the opentelemetry-java library to generate these traces, so it could be a bug there if it produces negative values when it shouldn't.

@MAX-786
Copy link

MAX-786 commented Jan 22, 2025

about 'error message could be better'
I am all up for that because IT WILL REALLY HELP IN MANY SITUATION because right now we have to watch docker logs for that (which is not a bad think imho) but still something more detailed in UI will help.
IG we can open up small issue for this

@joffrey-bion
Copy link
Author

joffrey-bion commented Jan 22, 2025

@yurishkuro I still believe this issue warrants better error handling, but I guess it is on the Jaeger server side, not Jaeger UI. Do you know where I should open such issue?

@yurishkuro
Copy link
Member

What better error handing do you expect here? Nothing crashes, you get an error in the UI when trying to import a malformed file.

@MAX-786
Copy link

MAX-786 commented Jan 22, 2025

What better error handing do you expect here? Nothing crashes, you get an error in the UI when trying to import a malformed file.

Yep i think he phrased it wrong (hopefully) and he meant that it can show better error message as you already said above.

@yurishkuro
Copy link
Member

Well there are two things here:

  1. The UI just says "error converting OTLP traces". Since there is a more detailed (even though unhelpful) error message in the logs I think it would make sense for it to be reflected in the UI. That is a reasonable fix we can make.
  2. Improving the underlying error message itself (instead of going about \ufffd) - here there's not much we can do because the error is coming from a JSON parser, and quite likely from gogoproto library, which is no longer maintained.

@joffrey-bion
Copy link
Author

What better error handing do you expect here? Nothing crashes

Internal server error is a crash as far as I can tell (on server side, but still). It would be best if the server handled the error properly, and then offered an error message that the UI could display instead of the generic failure.
IIRC I was getting an HTTP 500 (to be confirmed), so I guess it's fair for the UI to just show a generic error, but the core of the issue is that this shouldn't warrant a 500. Probably a 400 instead.

Improving the underlying error message itself (instead of going about \ufffd) - here there's not much we can do because the error is coming from a JSON parser, and quite likely from gogoproto library, which is no longer maintained

I think there is a lot that can be done here. Catching the error, parsing the error message, changing to a maintained library, etc. I'm not saying this is simple, but I don't think it's fair to say that not much can be done.

@yurishkuro
Copy link
Member

I checked - the server already returns the best error it can muster given the underlying library's response: {"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":500,"msg":"cannot unmarshal OTLP : readUint32: unexpected character: \ufffd, error found in #10 byte of ...|esCount\":-1}],\"links|..., bigger context ...|AR to the classpath\"}}],\"droppedAttributesCount\":-1}],\"links\":[],\"status\":{\"code\":2},\"flags\":257},{\"|..."}]}

I agree, it should be 400 error, not 500. But the UI is completely ignoring the error message, that can be fixed.

@yurishkuro
Copy link
Member

booked related jaegertracing/jaeger#6598

@joffrey-bion
Copy link
Author

Thanks for opening the issue on server side!

Regarding the error in the frontend, I agree it would be great to have the error from the server shown somehow. That would go a long way already.

If some day the error message from the server gets better, it's even greater ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants