Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python segfault on empty dict as struct in protobuf 5 #19624

Open
mjvankampen opened this issue Dec 12, 2024 · 3 comments
Open

Python segfault on empty dict as struct in protobuf 5 #19624

mjvankampen opened this issue Dec 12, 2024 · 3 comments
Assignees

Comments

@mjvankampen
Copy link

mjvankampen commented Dec 12, 2024

What version of protobuf and what language are you using?
Version: 5.29.1 with protoc 29.1
Language: Python

What operating system (Linux, Windows, ...) and version?
Macos 14.6.1

What runtime / compiler are you using (e.g., python version or gcc version)
Python 3.12

What did you do?
Steps to reproduce the behavior:

syntax = "proto3";

import "google/protobuf/struct.proto";

package test;

// TestMessage demonstrates Struct handling
message TestMessage {
    // Regular fields
    string id = 1;
    
    // Struct field that can hold arbitrary JSON-like data
    google.protobuf.Struct data = 2;
}
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# NO CHECKED-IN PROTOBUF GENCODE
# source: test.pb
# Protobuf Python Version: 5.29.1
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import runtime_version as _runtime_version
from google.protobuf import symbol_database as _symbol_database
from google.protobuf.internal import builder as _builder
_runtime_version.ValidateProtobufRuntimeVersion(
    _runtime_version.Domain.PUBLIC,
    5,
    29,
    1,
    '',
    'test.pb'
)
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()


from google.protobuf import struct_pb2 as google_dot_protobuf_dot_struct__pb2


DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x07test.pb\x12\x04test\x1a\x1cgoogle/protobuf/struct.proto\"@\n\x0bTestMessage\x12\n\n\x02id\x18\x01 \x01(\t\x12%\n\x04\x64\x61ta\x18\x02 \x01(\x0b\x32\x17.google.protobuf.Structb\x06proto3')

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'test.pb_pb2', _globals)
if not _descriptor._USE_C_DESCRIPTORS:
  DESCRIPTOR._loaded_options = None
  _globals['_TESTMESSAGE']._serialized_start=47
  _globals['_TESTMESSAGE']._serialized_end=111
# @@protoc_insertion_point(module_scope)
from test_message_pb2 import TestMessage
import unittest

class TestStructBehavior(unittest.TestCase):
    def test_empty_struct(self):
        # This will trigger the segfault in protobuf v5
        msg = TestMessage(data={})
        
        # Access the struct to trigger the issue
        try:
            "key" in msg.data
        except Exception as e:
            print(f"Error accessing empty struct: {e}")

if __name__ == '__main__':
    unittest.main()

What did you expect to see
No segfault, either a warning or just the same behaviour as if you do not add an empty dict (same as protobuf 4).

What did you see instead?
A segfault in upb

lldb -- python test.py
(lldb) target create "python"
Current executable set to '/Users/m.vankampen/protobuftest/.venv/bin/python' (arm64).
(lldb) settings set -- target.run-args  "test.py"
(lldb) run
Process 70666 launched: '/Users/m.vankampen/protobuftest/.venv/bin/python' (arm64)
Process 70666 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001002de948 _message.abi3.so`upb_Map_Get + 28
_message.abi3.so`upb_Map_Get:
->  0x1002de948 <+28>: ldrsb  x8, [x0]
    0x1002de94c <+32>: ldrsb  x20, [x0, #0x1]
    0x1002de950 <+36>: cmp    w8, #0x0
    0x1002de954 <+40>: mov    x9, sp
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001002de948 _message.abi3.so`upb_Map_Get + 28
    frame #1: 0x00000001002c3388 _message.abi3.so`PyUpb_Message_Contains + 184
    frame #2: 0x0000000101935bfc libpython3.12.dylib`method_vectorcall_O.llvm.15475255770341856349 + 120
    frame #3: 0x000000010195573c libpython3.12.dylib`slot_sq_contains + 188
    frame #4: 0x0000000101730c24 libpython3.12.dylib`_PyEval_EvalFrameDefault + 117476
    frame #5: 0x00000001019317f0 libpython3.12.dylib`method_vectorcall.llvm.6069578723340407369 + 176
    frame #6: 0x000000010173df04 libpython3.12.dylib`_PyEval_EvalFrameDefault + 171460
    frame #7: 0x000000010174f684 libpython3.12.dylib`_PyObject_Call_Prepend + 152
    frame #8: 0x000000010174f118 libpython3.12.dylib`slot_tp_call + 116
    frame #9: 0x000000010173a654 libpython3.12.dylib`_PyEval_EvalFrameDefault + 156948
    frame #10: 0x00000001019317f0 libpython3.12.dylib`method_vectorcall.llvm.6069578723340407369 + 176
    frame #11: 0x000000010173df04 libpython3.12.dylib`_PyEval_EvalFrameDefault + 171460
    frame #12: 0x000000010174f684 libpython3.12.dylib`_PyObject_Call_Prepend + 152
    frame #13: 0x000000010174f118 libpython3.12.dylib`slot_tp_call + 116
    frame #14: 0x000000010173a654 libpython3.12.dylib`_PyEval_EvalFrameDefault + 156948
    frame #15: 0x00000001019317f0 libpython3.12.dylib`method_vectorcall.llvm.6069578723340407369 + 176
    frame #16: 0x000000010173df04 libpython3.12.dylib`_PyEval_EvalFrameDefault + 171460
    frame #17: 0x000000010174f684 libpython3.12.dylib`_PyObject_Call_Prepend + 152
    frame #18: 0x000000010174f118 libpython3.12.dylib`slot_tp_call + 116
    frame #19: 0x000000010173a654 libpython3.12.dylib`_PyEval_EvalFrameDefault + 156948
    frame #20: 0x000000010195231c libpython3.12.dylib`slot_tp_init + 300
    frame #21: 0x00000001019503dc libpython3.12.dylib`type_call + 148
    frame #22: 0x000000010173a654 libpython3.12.dylib`_PyEval_EvalFrameDefault + 156948
    frame #23: 0x00000001017df780 libpython3.12.dylib`PyEval_EvalCode + 220
    frame #24: 0x00000001017df5d4 libpython3.12.dylib`run_mod.llvm.105002074040386257 + 284
    frame #25: 0x000000010185aaf8 libpython3.12.dylib`pyrun_file + 156
    frame #26: 0x000000010185a238 libpython3.12.dylib`_PyRun_SimpleFileObject + 268
    frame #27: 0x0000000101854cbc libpython3.12.dylib`_PyRun_AnyFileObject + 80
    frame #28: 0x0000000101853994 libpython3.12.dylib`pymain_run_file_obj + 164
    frame #29: 0x0000000101852ff8 libpython3.12.dylib`pymain_run_file + 72
    frame #30: 0x0000000101851294 libpython3.12.dylib`Py_RunMain + 1124
    frame #31: 0x00000001018311a8 libpython3.12.dylib`pymain_main + 456
    frame #32: 0x0000000101830fd4 libpython3.12.dylib`Py_BytesMain + 40
    frame #33: 0x00000001951ef154 dyld`start + 2476

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

Anything else we should know about your project / environment

@mjvankampen mjvankampen added the untriaged auto added to all issues by default when created. label Dec 12, 2024
@mjvankampen
Copy link
Author

While I think this would be a suitable fix, I can't repro in a unit test yet...

image

@mjvankampen
Copy link
Author

F.e. a test like the one below succeeds.

  def testEmptyDict(self):
    # in operator for empty initialized struct
    msg = well_known_types_test_pb2.WKTMessage(optional_struct={})
    self.assertFalse('key' in msg.optional_struct)

@shaod2 shaod2 added python upb and removed untriaged auto added to all issues by default when created. labels Dec 12, 2024
@mjvankampen
Copy link
Author

Adding optimization flags doesn't matter either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants