Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient and undertyped to_json public data structures #3364

Closed
sernst opened this issue Jul 2, 2023 · 1 comment
Closed

Inefficient and undertyped to_json public data structures #3364

sernst opened this issue Jul 2, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@sernst
Copy link

sernst commented Jul 2, 2023

Issue opened in response to this discussion:

#3346 (comment)

where nested json serde behaviors are less than ideal for converting to a public data structure as used in the ConsoleExporter.

Issues with the current implementation are:

  1. The serialization process is inefficient due to a potentially significant number of serialization and deserialization processes rather than converting the entire nested object to a serializable format and then serializing to JSON once.
  2. repr() are used in places where they could/would produce invalid JSON objects, which breaks the contract of a to_json public method.
  3. The to_json output is used directly by ConsoleExporter among other locations, which makes it a public data structure. However, no type definitions or other contracts exist to provide consistency with the public, structured outputs.
@sernst sernst added the bug Something isn't working label Jul 2, 2023
sernst added a commit to sernst/opentelemetry-python that referenced this issue Jul 2, 2023
Introduce `to_dict` to the objects included in the existing
JSON serialization process for `ReadableSpan`, `MetricsData`,
`LogRecord`, and `Resource` objects. This includes adding
`to_dict` to objects that are included within the serialized
data structures of these objects. In places where `repr()`
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:
1. Dictionary typing was included for all the `to_dict`
   methods for clarity in subsequent usage.
2. `DataT` and `DataPointT` were did not include the
   exponential histogram types in point.py, and so those
   were added with new `to_json` and `to_dict` methods as
   well for consistency. It appears that the exponential
   types were added later and including them in the types
   might have been overlooked. Please let me know if that
   is a misunderstanding on my part.
3. OrderedDict was removed in a number of places
   associated with the existing `to_json` functionality
   given its redundancy for Python 3.7+ compatibility.
   I was assuming this was legacy code for previous
   compatibility, but please let me know if that's not
   the case as well.
4. `to_dict` was added to objects like `SpanContext`,
   `Link`, and `Event` that were previously being
   serialized by static methods within the `ReadableSpan`
   class and accessing private/protected members.
   This simplified the serialization in the `ReadableSpan`
   class and those methods were removed. However, once
   again, let me know if there was a larger purpose to
   those I could not find.

Finally, I used `to_dict` as the method names here to be
consistent with other related usages. For example,
`dataclasses.asdict()`. But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define `to_dict` functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define `to_dictionary`
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
`to_dictionary` instead let me know and I will adjust.

Fixes open-telemetry#3364
sernst added a commit to sernst/opentelemetry-python that referenced this issue Jul 2, 2023
Introduce `to_dict` to the objects included in the existing
JSON serialization process for `ReadableSpan`, `MetricsData`,
`LogRecord`, and `Resource` objects. This includes adding
`to_dict` to objects that are included within the serialized
data structures of these objects. In places where `repr()`
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:
1. Dictionary typing was included for all the `to_dict`
   methods for clarity in subsequent usage.
2. `DataT` and `DataPointT` were did not include the
   exponential histogram types in point.py, and so those
   were added with new `to_json` and `to_dict` methods as
   well for consistency. It appears that the exponential
   types were added later and including them in the types
   might have been overlooked. Please let me know if that
   is a misunderstanding on my part.
3. OrderedDict was removed in a number of places
   associated with the existing `to_json` functionality
   given its redundancy for Python 3.7+ compatibility.
   I was assuming this was legacy code for previous
   compatibility, but please let me know if that's not
   the case as well.
4. `to_dict` was added to objects like `SpanContext`,
   `Link`, and `Event` that were previously being
   serialized by static methods within the `ReadableSpan`
   class and accessing private/protected members.
   This simplified the serialization in the `ReadableSpan`
   class and those methods were removed. However, once
   again, let me know if there was a larger purpose to
   those I could not find.

Finally, I used `to_dict` as the method names here to be
consistent with other related usages. For example,
`dataclasses.asdict()`. But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define `to_dict` functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define `to_dictionary`
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
`to_dictionary` instead let me know and I will adjust.

Fixes open-telemetry#3364
sernst added a commit to sernst/opentelemetry-python that referenced this issue Jul 2, 2023
Introduce `to_dict` to the objects included in the existing
JSON serialization process for `ReadableSpan`, `MetricsData`,
`LogRecord`, and `Resource` objects. This includes adding
`to_dict` to objects that are included within the serialized
data structures of these objects. In places where `repr()`
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:
1. Dictionary typing was included for all the `to_dict`
   methods for clarity in subsequent usage.
2. `DataT` and `DataPointT` were did not include the
   exponential histogram types in point.py, and so those
   were added with new `to_json` and `to_dict` methods as
   well for consistency. It appears that the exponential
   types were added later and including them in the types
   might have been overlooked. Please let me know if that
   is a misunderstanding on my part.
3. OrderedDict was removed in a number of places
   associated with the existing `to_json` functionality
   given its redundancy for Python 3.7+ compatibility.
   I was assuming this was legacy code for previous
   compatibility, but please let me know if that's not
   the case as well.
4. `to_dict` was added to objects like `SpanContext`,
   `Link`, and `Event` that were previously being
   serialized by static methods within the `ReadableSpan`
   class and accessing private/protected members.
   This simplified the serialization in the `ReadableSpan`
   class and those methods were removed. However, once
   again, let me know if there was a larger purpose to
   those I could not find.

Finally, I used `to_dict` as the method names here to be
consistent with other related usages. For example,
`dataclasses.asdict()`. But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define `to_dict` functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define `to_dictionary`
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
`to_dictionary` instead let me know and I will adjust.

Fixes open-telemetry#3364
@aabmass
Copy link
Member

aabmass commented Jul 10, 2023

Thanks for opening this issue. Points 1 and 2 make sense to me, I agree we should improve those.

3. The to_json output is used directly by ConsoleExporter among other locations, which makes it a public data structure. However, no type definitions or other contracts exist to provide consistency with the public, structured outputs.

The console exporter not intended to be a public data structure, or to be parseable even. We just use JSON because it's convenient, but other languages use different representations. There are a lot more details in open-telemetry/opentelemetry-specification#3565, I'd recommend chiming in there.

As a workaround for anyone reading this, if you'd like to log out json lines of OTLP protobufs, you can use write your own small exporter using https://github.com/open-telemetry/opentelemetry-python/tree/main/exporter/opentelemetry-exporter-otlp-proto-common as a util. This should be more performant and I think the only reason we don't do this for ConsoleExporters is to avoid pulling the protobuf dependency in.

ocelotl pushed a commit to sernst/opentelemetry-python that referenced this issue Feb 2, 2024
Introduce `to_dict` to the objects included in the existing
JSON serialization process for `ReadableSpan`, `MetricsData`,
`LogRecord`, and `Resource` objects. This includes adding
`to_dict` to objects that are included within the serialized
data structures of these objects. In places where `repr()`
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:
1. Dictionary typing was included for all the `to_dict`
   methods for clarity in subsequent usage.
2. `DataT` and `DataPointT` were did not include the
   exponential histogram types in point.py, and so those
   were added with new `to_json` and `to_dict` methods as
   well for consistency. It appears that the exponential
   types were added later and including them in the types
   might have been overlooked. Please let me know if that
   is a misunderstanding on my part.
3. OrderedDict was removed in a number of places
   associated with the existing `to_json` functionality
   given its redundancy for Python 3.7+ compatibility.
   I was assuming this was legacy code for previous
   compatibility, but please let me know if that's not
   the case as well.
4. `to_dict` was added to objects like `SpanContext`,
   `Link`, and `Event` that were previously being
   serialized by static methods within the `ReadableSpan`
   class and accessing private/protected members.
   This simplified the serialization in the `ReadableSpan`
   class and those methods were removed. However, once
   again, let me know if there was a larger purpose to
   those I could not find.

Finally, I used `to_dict` as the method names here to be
consistent with other related usages. For example,
`dataclasses.asdict()`. But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define `to_dict` functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define `to_dictionary`
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
`to_dictionary` instead let me know and I will adjust.

Fixes open-telemetry#3364
@sernst sernst closed this as completed Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants