Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String comparison with equal-to operator does not work for c strings, in type_support_common.cpp #703

Closed
henriksod opened this issue Jul 29, 2023 · 19 comments

Comments

@henriksod
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • source
  • Version or commit hash:
    • 6.2.3
  • DDS implementation:
    • rmw_fastrtps_dynamic_cpp

Steps to reproduce issue

This problem was discovered in and affected this PR mvukov/rules_ros2#148

Got this error:

'Unknown typesupport identifier, at external/ros2_rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_registry.cpp:93'

Added some prints here:

type_support_ptr TypeSupportRegistry::get_message_type_support(
  const rosidl_message_type_support_t * ros_type_support)
{
  auto creator_fun = [&ros_type_support]() -> type_support_ptr
    {
      std::cerr << ros_type_support->typesupport_identifier << "\n";
      std::cerr << "c "
        << using_introspection_c_typesupport(ros_type_support->typesupport_identifier)
        << " " << rosidl_typesupport_introspection_c__identifier <<  "\n";
      std::cerr << "cpp "
        << using_introspection_cpp_typesupport(ros_type_support->typesupport_identifier)
        << " " << rosidl_typesupport_introspection_cpp::typesupport_identifier << "\n";
      if (using_introspection_c_typesupport(ros_type_support->typesupport_identifier)) {
        auto members = static_cast<const rosidl_typesupport_introspection_c__MessageMembers *>(
          ros_type_support->data);
        return new MessageTypeSupport_c(members, ros_type_support);
      } else if (using_introspection_cpp_typesupport(ros_type_support->typesupport_identifier)) {
        auto members = static_cast<const rosidl_typesupport_introspection_cpp::MessageMembers *>(
          ros_type_support->data);
        return new MessageTypeSupport_cpp(members, ros_type_support);
      }
      RMW_SET_ERROR_MSG("Unknown typesupport identifier");
      return nullptr;
    };

  return get_type_support(ros_type_support, message_types_, creator_fun);
}

Outcome:

rosidl_typesupport_introspection_cpp
c 0 rosidl_typesupport_introspection_c
cpp 1 rosidl_typesupport_introspection_cpp
rosidl_typesupport_introspection_c
c 0 rosidl_typesupport_introspection_c
cpp 0 rosidl_typesupport_introspection_cpp

Expected behavior

rosidl_typesupport_introspection_c == rosidl_typesupport_introspection_c

Actual behavior

rosidl_typesupport_introspection_c != rosidl_typesupport_introspection_c

Implementation considerations

Use strcmp:

bool using_introspection_c_typesupport(const char * typesupport_identifier)
{
  return !strcmp(typesupport_identifier, rosidl_typesupport_introspection_c__identifier);
}

or std::string().compare

bool using_introspection_c_typesupport(const char * typesupport_identifier) {
  return !std::string(typesupport_identifier)
    .compare(rosidl_typesupport_introspection_c__identifier);
 }

Update both these lines:

return typesupport_identifier == rosidl_typesupport_introspection_c__identifier;

@iuhilnehc-ynos
Copy link
Contributor

cpp 1 rosidl_typesupport_introspection_cpp   

It returns 1(true), so I think it's OK.

Please notice that it's a trick to compare the char* address directly for performance. The typesupport_identifier(type is const char*) is initialized with https://github.com/ros2/rosidl/blob/7cbb1161e8d3fa9aff29a481bce0e6c2b56715b5/rosidl_typesupport_introspection_cpp/resource/msg__type_support.cpp.em#L251, it's better to compare them directly, right?

@iuhilnehc-ynos
Copy link
Contributor

The typesupport_identifier in rosidl_typesupport_introspection_c is similar to above, which is initialized at https://github.com/ros2/rosidl/blob/7cbb1161e8d3fa9aff29a481bce0e6c2b56715b5/rosidl_typesupport_introspection_c/resource/msg__type_support.c.em#L287 and will be assigned with https://github.com/ros2/rosidl/blob/7cbb1161e8d3fa9aff29a481bce0e6c2b56715b5/rosidl_typesupport_introspection_c/resource/msg__type_support.c.em#L311. Could you provide more detailed steps to reproduce this issue? A test case might be better.

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

cpp 1 rosidl_typesupport_introspection_cpp   

It returns 1(true), so I think it's OK.

Please notice that it's a trick to compare the char* address directly for performance. The typesupport_identifier(type is const char*) is initialized with https://github.com/ros2/rosidl/blob/7cbb1161e8d3fa9aff29a481bce0e6c2b56715b5/rosidl_typesupport_introspection_cpp/resource/msg__type_support.cpp.em#L251, it's better to compare them directly, right?

The second step fails:

rosidl_typesupport_introspection_c
c 0 rosidl_typesupport_introspection_c
cpp 0 rosidl_typesupport_introspection_cpp

Seems that they are not pointing to the same address in memory.

It is rosidl_typesupport_introspection_c that is not working, seems like rosidl_typesupport_introspection_cpp works fine.

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

Fails in this test, using rmw_fastrtps_dynamic_cpp as middleware. We had the same issue with rmw_cyclonedds:

TestGenericPublisher
// Copyright 2023 Milan Vukov
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <thread>

#include "gtest/gtest.h"
#include "rclcpp/rclcpp.hpp"

// Inspired by an example in https://github.com/ros2/rclcpp/issues/2146
// For this to work, a rclcpp patch from
// https://github.com/mvukov/rules_ros2/pull/117 is required.
TEST(TestGenericPublisher,
     WhenPublisherDeletedInThread_EnsureCleanNodeCleanup) {
  auto node = rclcpp::Node::make_shared("pub_node");
  auto publisher =
      node->create_generic_publisher("pub_topic", "std_msgs/String", 10);

  auto reset_publisher = std::async(std::launch::async, [&node, &publisher]() {
    std::this_thread::sleep_for(std::chrono::milliseconds(200));
    publisher.reset();

    // New: Create a timer that gets immediately out of scope (should never
    // fire) to trigger cleanup of the publisher.
    node->create_wall_timer(std::chrono::seconds(1), []() {});
  });

  rclcpp::executors::SingleThreadedExecutor executor;
  executor.add_node(node);
  executor.spin_until_future_complete(reset_publisher);
}

int main(int argc, char** argv) {
  rclcpp::init(argc, argv);
  testing::InitGoogleTest(&argc, argv);
  const int result = RUN_ALL_TESTS();
  if (!rclcpp::shutdown() || result) {
    return EXIT_FAILURE;
  }
  return EXIT_SUCCESS;
}

@iuhilnehc-ynos
Copy link
Contributor

I am afraid that I can't reproduce this issue by rolling in my machine.

diff --git a/rclcpp/test/rclcpp/test_generic_pubsub.cpp b/rclcpp/test/rclcpp/test_generic_pubsub.cpp
index f4cef0b7..56b74089 100644
--- a/rclcpp/test/rclcpp/test_generic_pubsub.cpp
+++ b/rclcpp/test/rclcpp/test_generic_pubsub.cpp
@@ -263,3 +263,24 @@ TEST_F(RclcppGenericNodeFixture, generic_publisher_uses_qos)
   // It normally takes < 20ms, 5s chosen as "a very long time"
   ASSERT_TRUE(wait_for(connected, 5s));
 }
+
+
+TEST_F(RclcppGenericNodeFixture,
+     WhenPublisherDeletedInThread_EnsureCleanNodeCleanup) {
+  auto node = rclcpp::Node::make_shared("pub_node");
+  auto publisher =
+      node->create_generic_publisher("pub_topic", "std_msgs/String", 10);
+
+  auto reset_publisher = std::async(std::launch::async, [&node, &publisher]() {
+    std::this_thread::sleep_for(std::chrono::milliseconds(200));
+    publisher.reset();
+
+    // New: Create a timer that gets immediately out of scope (should never
+    // fire) to trigger cleanup of the publisher.
+    node->create_wall_timer(std::chrono::seconds(1), []() {});
+  });
+
+  rclcpp::executors::SingleThreadedExecutor executor;
+  executor.add_node(node);
+  executor.spin_until_future_complete(reset_publisher);
+}
\ No newline at end of file

test log:

chenlh rclcpp $ RMW_IMPLEMENTATION=rmw_fastrtps_dynamic_cpp ./test_generic_pubsub__rmw_fastrtps_dynamic_cpp --gtest_filter="*.WhenPublisherDeletedInThread_EnsureCleanNodeCleanup"
Running main() from gmock_main.cc
Note: Google Test filter = *.WhenPublisherDeletedInThread_EnsureCleanNodeCleanup
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from RclcppGenericNodeFixture
[ RUN      ] RclcppGenericNodeFixture.WhenPublisherDeletedInThread_EnsureCleanNodeCleanup
[       OK ] RclcppGenericNodeFixture.WhenPublisherDeletedInThread_EnsureCleanNodeCleanup (2291 ms)
[----------] 1 test from RclcppGenericNodeFixture (2291 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2295 ms total)
[  PASSED  ] 1 test.

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

This line looks interesting?

https://github.com/ros2/rosidl/blob/7cbb1161e8d3fa9aff29a481bce0e6c2b56715b5/rosidl_typesupport_introspection_c/resource/msg__type_support.c.em#L287

Could be a compiler difference?
gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
-std=c++17

@iuhilnehc-ynos
Copy link
Contributor

This line looks interesting?

Yeah, I noticed that, but it's not a big deal. Because it will be assigned with rosidl_typesupport_introspection_c__identifier

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

It is being called two times:

type_support_ptr TypeSupportRegistry::get_message_type_support(
  const rosidl_message_type_support_t * ros_type_support)
{
  auto creator_fun = [&ros_type_support]() -> type_support_ptr
    {
      std::cerr << "Got: " << &ros_type_support->typesupport_identifier
                << " " << ros_type_support->typesupport_identifier << "\n";
      std::cerr << "Expected C: " << &rosidl_typesupport_introspection_c__identifier
                << " " << rosidl_typesupport_introspection_c__identifier << "\n";
      std::cerr << "Expected CPP: " << &rosidl_typesupport_introspection_cpp::typesupport_identifier
                << " " << rosidl_typesupport_introspection_cpp::typesupport_identifier << "\n";
      if (using_introspection_c_typesupport(ros_type_support->typesupport_identifier)) {
        ...
Got: 0x7f23f96d0d20 rosidl_typesupport_introspection_cpp
Expected C: 0x7f23f96d0f58 rosidl_typesupport_introspection_c
Expected CPP: 0x7f23f96d0f60 rosidl_typesupport_introspection_cpp
Got: 0x55e8fd9cf0d0 rosidl_typesupport_introspection_c
Expected C: 0x7f23f96d0f58 rosidl_typesupport_introspection_c
Expected CPP: 0x7f23f96d0f60 rosidl_typesupport_introspection_cpp

Somehow there is a memory address mismatch

@iuhilnehc-ynos
Copy link
Contributor

I think you need to output (void*)ros_type_support->typesupport_identifier instead of &ros_type_support->typesupport_identifier, and update the two others as well.

@henriksod
Copy link
Author

Yields same result

Got: 0x7f42074c78c8 rosidl_typesupport_introspection_cpp
Expected C: 0x7f42074c78a0 rosidl_typesupport_introspection_c
Expected CPP: 0x7f42074c78c8 rosidl_typesupport_introspection_cpp
Got: 0x563a0a646b28 rosidl_typesupport_introspection_c
Expected C: 0x7f42074c78a0 rosidl_typesupport_introspection_c
Expected CPP: 0x7f42074c78c8 rosidl_typesupport_introspection_cpp

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

Doing some more digging,

I noticed that here strcmp is used:
https://github.com/ros2/rosidl/blob/cf3b637605c8c1dc0b1266ca0090963e9186c7dd/rosidl_runtime_c/src/message_type_support.c#L37

Shouldn't it use == as well then? What is the performance gain in practice, is this function in rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_common.cpp called a lot? I suppose a ros network does not have > 10000 nodes, thus I think the performance gain is neglectable.

@clalancette
Copy link
Contributor

We've had this conversation in the past, though I cannot find the issue right at the moment. The short answer is that we have two unanswered questions here:

  1. Is the expectation that the pointers are the same correct?
  2. What is the performance impact of switching to a strcmp here?

If we can determine the answer to both of those, then we can make a decision about which way to go.

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

Proof:

bool
using_introspection_c_typesupport(const char * typesupport_identifier)
{
  bool equality = typesupport_identifier == rosidl_typesupport_introspection_c__identifier;
  return typesupport_identifier == rosidl_typesupport_introspection_c__identifier;
}
Thread 1 "generic_publish" hit Breakpoint 1, using_introspection_c_typesupport (typesupport_identifier=0x5555558dab08 "rosidl_typesupport_introspection_c") at external/ros2_rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_common.cpp:29
29      in external/ros2_rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_common.cpp
(gdb) p typesupport_identifier
$8 = 0x5555558dab08 "rosidl_typesupport_introspection_c"
(gdb) p rosidl_typesupport_introspection_c__identifier
$9 = 0x5555558dab08 "rosidl_typesupport_introspection_c"
(gdb) p equality
$10 = false  <-----------------

Pointers are correct, but checking equality with equal-to operator results in FALSE.

@clalancette
Copy link
Contributor

So the question is: what is different about this situation? Because we don't normally see this problem.

@henriksod
Copy link
Author

henriksod commented Jul 31, 2023

#include <cstdint>

bool
using_introspection_c_typesupport(const char * typesupport_identifier)
{
  auto a = reinterpret_cast<std::uintptr_t>(typesupport_identifier);
  auto b = reinterpret_cast<std::uintptr_t>(rosidl_typesupport_introspection_c__identifier);
  bool equality = a == b;
  return a == b;
}
Thread 1 "generic_publish" hit Breakpoint 1, using_introspection_c_typesupport (typesupport_identifier=0x5555558dab08 "rosidl_typesupport_introspection_c") at external/ros2_rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_common.cpp:35
35      in external/ros2_rmw_fastrtps/rmw_fastrtps_dynamic_cpp/src/type_support_common.cpp
(gdb) p typesupport_identifier
$1 = 0x5555558dab08 "rosidl_typesupport_introspection_c"
(gdb) p rosidl_typesupport_introspection_c__identifier
$2 = 0x5555558dab08 "rosidl_typesupport_introspection_c"
(gdb) p equality
$3 = false
(gdb) p a
$4 = 93824995928840
(gdb) p b
$5 = 140737344691008

values resulting from cast to std::uintptr_t are different. typesupport_identifier somehow sits at another address.

(gdb) x/s 0x5555558dab08
0x5555558dab08: "rosidl_typesupport_introspection_c"
(gdb) x/s 0x7ffff76fdb40
0x7ffff76fdb40: "rosidl_typesupport_introspection_c"

@henriksod
Copy link
Author

generic_publisher_tests.tar.gz

If anyone is interested to investigate further, I think I have reached my capability...

Invoke gdb with:

AMENT_PREFIX_PATH="generic_publisher_tests_launch_ament_setup" gdb generic_publisher_tests_impl

@iuhilnehc-ynos
Copy link
Contributor

iuhilnehc-ynos commented Aug 1, 2023

It seems it's the build issue(link phase) of rules_ros2.

Please see the Bazel build parameter file for rules_ros2.

  • bazel-out/k8-dbg/bin/external/ros2_common_interfaces/std_msgs/libplugin.so-2.params
-Wl,--start-lib
bazel-out/k8-dbg/bin/external/ros2_rosidl/_objs/rosidl_typesupport_introspection_c/identifier.pic.o
-Wl,--end-lib
-Wl,--start-lib
bazel-out/k8-dbg/bin/external/ros2_rosidl/_objs/rosidl_typesupport_introspection_cpp/identifier.pic.o
-Wl,--end-lib
  • bazel-out/k8-dbg/bin/ros2/test/generic_publisher_tests_impl-2.params
-Wl,--start-lib
bazel-out/k8-dbg/bin/external/ros2_rosidl/_objs/rosidl_typesupport_introspection_c/identifier.pic.o
-Wl,--end-lib
-Wl,--start-lib
bazel-out/k8-dbg/bin/external/ros2_rosidl/_objs/rosidl_typesupport_introspection_cpp/identifier.pic.o
-Wl,--end-lib

A similar demo as below
test-global-variable.zip

$ ./test.sh 
0x55ad13ad4078 aaaaaaaaaaa
lib: 0x55ad13ad4078 aaaaaaaaaaa
lib: 0x7f122cd01000 aaaaaaaaaaa

@iuhilnehc-ynos
Copy link
Contributor

It's better to build the rosidl_typesupport_introspection_c and rosidl_typesupport_introspection_cpp with shared libraries in rules_ros2. I am not familiar with Bazel configuration, so I have no plans or capabilities to fix them.

@henriksod
Copy link
Author

henriksod commented Aug 1, 2023

Yes, forcing dynamic linking of the test binary did indeed work!

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants