Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shared-data][3.3.7] cn crash when insert new data just after process restarted with hdfs storage volume #54493

Open
lukoou3 opened this issue Dec 30, 2024 · 1 comment
Assignees
Labels
type/bug Something isn't working

Comments

@lukoou3
Copy link

lukoou3 commented Dec 30, 2024

3.3.7版本存储分离使用hdfs作为存储后端时,重启cn后,向之前建的表插入数据cn后崩溃。

切换3.2.13或者3.1.16版本,没有这个问题。

Steps to reproduce the behavior (Required)

单机环境,一个fe节点,一个cn节点,存储分离模式,存储后端使用hdfs。

创建表:

create table if not exists test.test_object_statistics
(
    vsys_id int,
    object_uuid string,
    __time datetime not null,
    object_type string,
    insert_time datetime default current_timestamp,
    in_bytes bigint,
    out_bytes bigint,
    bytes bigint
)
duplicate key(vsys_id, object_uuid, __time)
partition by date_trunc('day', __time)
distributed by random
properties (
    'replication_num' = '1',
    'partition_live_number' = '90',
    "datacache.enable" = "true",
    "datacache.partition_duration" = "30 DAY"
);

插入数据:

INSERT  INTO test.test_object_statistics(vsys_id, __time) values (2, '2024-12-26 07:21:31'), (3, '2024-12-26 07:21:32');

重启cn节点后。

可以查询表的数据,但是插入数据cn会崩溃。

插入数据:

INSERT  INTO test.test_object_statistics(vsys_id, __time) values (2, '2024-12-26 07:21:31'), (3, '2024-12-26 07:21:32');

Expected behavior (Required)

进程正常,之前建的表可以正常使用。

Real behavior (Required)

cn崩溃。

cn崩溃时,cn.out的输出:

3.3.7 RELEASE (build 00177de)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 331777788
tracker:jemalloc_metadata consumption: 57657840
tracker:jemalloc_fragmentation consumption: 2838172
tracker:query_pool consumption: 1357424
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 70517
tracker:tablet_metadata consumption: 2695
tracker:rowset_metadata consumption: 0
tracker:segment_metadata consumption: 9840
tracker:column_metadata consumption: 57982
tracker:tablet_schema consumption: 2695
tracker:segment_zonemap consumption: 4278
tracker:short_key_index consumption: 1635
tracker:column_zonemap_index consumption: 7038
tracker:ordinal_index consumption: 10688
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:jit_cache consumption: 0
tracker:update consumption: 0
tracker:chunk_allocator consumption: 0
tracker:passthrough consumption: 0
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1735205247 (unix time) try "date -d @1735205247" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 41817 (TID 0x7fb57f2ea700) from PID 0; stack trace: ***
    @     0x7fb7d19f01cb __pthread_once_slow
    @          0x7bbf8a0 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7fb7d287cddd os::Linux::chained_handler(int, siginfo*, void*)
    @     0x7fb7d2882b1f JVM_handle_linux_signal
    @     0x7fb7d2874418 signalHandler(int, siginfo*, void*)
    @     0x7fb7d19f95f0 (/usr/lib64/libpthread-2.17.so+0xf5ef)

3.3.7 RELEASE (build 00177de)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 357269858
tracker:jemalloc_metadata consumption: 57895968
tracker:jemalloc_fragmentation consumption: 2319770
tracker:query_pool consumption: 1356936
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 0
tracker:metadata consumption: 133304
tracker:tablet_metadata consumption: 4735
tracker:rowset_metadata consumption: 0
tracker:segment_metadata consumption: 15793
tracker:column_metadata consumption: 112776
tracker:tablet_schema consumption: 4735
tracker:segment_zonemap consumption: 7302
tracker:short_key_index consumption: 1351
tracker:column_zonemap_index consumption: 13296
tracker:ordinal_index consumption: 27400
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:jit_cache consumption: 0
tracker:update consumption: 0
tracker:chunk_allocator consumption: 0
tracker:passthrough consumption: 0
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 260528
tracker:replication consumption: 0
*** Aborted at 1735205656 (unix time) try "date -d @1735205656" if you are using GNU date ***
PC: @          0x744e7f0 methodIdFromClass
*** SIGSEGV (@0x0) received by PID 46502 (TID 0x7f882ac2a700) from PID 0; stack trace: ***
    @     0x7f8a81f781cb __pthread_once_slow
    @          0x7bbf8a0 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f8a82e04ddd os::Linux::chained_handler(int, siginfo*, void*)
    @     0x7f8a82e0ab1f JVM_handle_linux_signal
    @     0x7f8a82dfc418 signalHandler(int, siginfo*, void*)
    @     0x7f8a81f815f0 (/usr/lib64/libpthread-2.17.so+0xf5ef)
    @          0x744e7f0 methodIdFromClass
    @          0x744eddc constructNewObjectOfJclass
    @          0x744f031 constructNewObjectOfCachedClass
    @          0x7451b3f hdfsBuilderConnect
    @          0x729ad68 staros::starlet::fslib::HdfsFileSystem::initialize(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::0L▒
    @          0x7254d5b staros::starlet::fslib::FileSystemFactoryImpl::new_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::c0L▒
    @          0x7254ec0 staros::starlet::fslib::FileSystemFactory::new_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_0L▒
    @          0x727a2ce staros::starlet::fslib::CacheFileSystem::initialize(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11:0L▒
    @          0x7254d5b staros::starlet::fslib::FileSystemFactoryImpl::new_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::c0L▒
    @          0x7254ec0 staros::starlet::fslib::FileSystemFactory::new_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_0L▒
    @          0x6ad4808 starrocks::StarOSWorker::new_shared_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char0L▒
    @          0x6ad79fb starrocks::StarOSWorker::build_filesystem_from_shard_info(staros::starlet::ShardInfo const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::a0L▒
    @          0x6ad9974 starrocks::StarOSWorker::get_shard_filesystem(unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std:0L▒
    @          0x5ed84d5 starrocks::StarletFileSystem::iterate_dir(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<bool (std::basic_string_view<char, std::char_traits<char> >)> const&)
    @          0x62b3bbc starrocks::lake::TabletManager::list_tablet_metadata(long)
    @          0x62b4098 starrocks::lake::TabletManager::get_tablet_data_size(long, long*)
    @          0x62b64d6 starrocks::lake::TabletManager::add_in_writing_data_size(long, long)
    @          0x626a97a starrocks::lake::DeltaWriterImpl::check_immutable()
    @          0x626abf5 starrocks::lake::DeltaWriter::check_immutable()
    @          0x6258308 starrocks::lake::AsyncDeltaWriter::check_immutable()
    @          0x373488e starrocks::LakeTabletsChannel::open(starrocks::PTabletWriterOpenRequest const&, starrocks::PTabletWriterOpenResult*, std::shared_ptr<starrocks::OlapTableSchemaParam>, bool)
    @          0x36d9637 starrocks::LoadChannel::open(brpc::Controller*, starrocks::PTabletWriterOpenRequest const&, starrocks::PTabletWriterOpenResult*, google::protobuf::Closure*)
    @          0x36d3288 starrocks::LoadChannelMgr::open(brpc::Controller*, starrocks::PTabletWriterOpenRequest const&, starrocks::PTabletWriterOpenResult*, google::protobuf::Closure*)
    @          0x7e4c264 brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
    @          0x7d78607 brpc::ProcessInputMessage(void*)
    @          0x7d79985 brpc::InputMessenger::OnNewMessages(brpc::Socket*)

StarRocks version (Required)

3.3.7

切换3.2.13或者3.1.16版本,没有这个问题。但是每个版本都有hdfs文件不存在的报错打印,但是进程能正常运行。

hdfsOpenFile(/starrocks/storage/2d052fdc-b9cc-42e4-aca0-5424e6afd567/db10005/12003/12007/meta/0000000000002EE8_00000000000003D0.meta): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
RemoteException: File does not exist: /starrocks/storage/2d052fdc-b9cc-42e4-aca0-5424e6afd567/db10005/12003/12007/meta/0000000000002EE8_00000000000003D0.meta
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:460)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
java.io.FileNotFoundException: File does not exist: /starrocks/storage/2d052fdc-b9cc-42e4-aca0-5424e6afd567/db10005/12003/12007/meta/0000000000002EE8_00000000000003D0.meta
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:460)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)

        at java.base/jdk.internal.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:933)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:920)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:909)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1076)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:342)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:355)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /starrocks/storage/2d052fdc-b9cc-42e4-aca0-5424e6afd567/db10005/12003/12007/meta/0000000000002EE8_00000000000003D0.meta
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:460)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584)
        at org.apache.hadoop.ipc.Client.call(Client.java:1529)
        at org.apache.hadoop.ipc.Client.call(Client.java:1426)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
        at com.sun.proxy.$Proxy24.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$getBlockLocations$0(ClientNamenodeProtocolTranslatorPB.java:340)
        at org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:340)
        at jdk.internal.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
        at com.sun.proxy.$Proxy25.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:931)
        ... 7 mor
@lukoou3 lukoou3 added the type/bug Something isn't working label Dec 30, 2024
@kevincai kevincai self-assigned this Dec 30, 2024
@kevincai kevincai changed the title 3.3.7版本存储分离使用hdfs作为存储后端时,重启cn后,向之前建的表插入数据cn后崩溃 [shared-data][3.3.7] cn crash when insert new data just after process restarted with hdfs storage volume Dec 30, 2024
@kevincai
Copy link
Contributor

reproduced in v3.3.7

Thread 629 (Thread 0x7f4555117640 (LWP 16504)):
#0  methodIdFromClass (cls=cls@entry=0x7f45dc41b240, className=className@entry=0x30ddb50 "org/apache/hadoop/conf/Configuration", methName=methName@entry=0x2cb6967 "<init>", methSignature=methSignature@entry=0x2cf1470 "()V", methType=methType@entry=INSTANCE, env=env@entry=0x7f4494579290, out=0x7f44945785b0) at /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:307
#1  0x0000000009923ccc in constructNewObjectOfJclass (env=env@entry=0x7f4494579290, out=out@entry=0x7f4494578700, cls=0x7f45dc41b240, className=className@entry=0x30ddb50 "org/apache/hadoop/conf/Configuration", ctorSignature=ctorSignature@entry=0x2cf1470 "()V", args=args@entry=0x7f44945785f0) at /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:245
#2  0x0000000009923f21 in constructNewObjectOfCachedClass (env=env@entry=0x7f4494579290, out=out@entry=0x7f4494578700, cachedJavaClass=cachedJavaClass@entry=JC_CONFIGURATION, ctorSignature=ctorSignature@entry=0x2cf1470 "()V") at /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:285
#3  0x00000000099271df in hdfsBuilderConnect (bld=0x7f490fa15cb0) at /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:711
#4  0x0000000009761408 in staros::starlet::fslib::HdfsFileSystem::initialize (this=0x7f4499f6d0c0, conf=...) at ./src/fslib/hdfs/hdfs_fs.cc:481
#5  0x00000000096cd2be in staros::starlet::fslib::FileSystemFactoryImpl::new_filesystem (this=0xf7778e0 <staros::starlet::fslib::FileSystemFactoryImpl::instance()::s_instance>, url=..., conf=std::unordered_map with 5 elements = {...}) at ./src/fslib/file_system.cc:210
#6  0x00000000096cd484 in staros::starlet::fslib::FileSystemFactory::new_filesystem (url="hdfs://172.26.92.212:8020/testvol/7d2a14a5-62fd-435f-9ed9-d51e7ca824ae/db10005/10007/10006", conf=std::unordered_map with 5 elements = {...}) at ./src/fslib/file_system.cc:262
#7  0x0000000009726d68 in staros::starlet::fslib::CacheFileSystem::initialize (this=0x7f45c9be6670, conf=std::unordered_map with 6 elements = {...}) at ./src/fslib/cachefs/cache_fs.cc:126
#8  0x00000000096cd2be in staros::starlet::fslib::FileSystemFactoryImpl::new_filesystem (this=0xf7778e0 <staros::starlet::fslib::FileSystemFactoryImpl::instance()::s_instance>, url=..., conf=std::unordered_map with 6 elements = {...}) at ./src/fslib/file_system.cc:210
#9  0x00000000096cd484 in staros::starlet::fslib::FileSystemFactory::new_filesystem (url="cachefs://", conf=std::unordered_map with 6 elements = {...}) at ./src/fslib/file_system.cc:262
#10 0x00000000088d0af6 in starrocks::StarOSWorker::new_shared_filesystem (this=this@entry=0x7f49158d49d0, scheme=..., conf=std::unordered_map with 6 elements = {...}) at be/src/service/staros_worker.cpp:296
#11 0x00000000088d3f5f in starrocks::StarOSWorker::build_filesystem_from_shard_info (this=this@entry=0x7f49158d49d0, info=..., conf=std::unordered_map with 0 elements) at /usr/include/c++/11/string_view:137
#12 0x00000000088d6bb0 in starrocks::StarOSWorker::get_shard_filesystem (this=0x7f49158d49d0, id=<optimized out>, conf=std::unordered_map with 0 elements) at be/src/service/staros_worker.cpp:194
#13 0x000000000532a536 in starrocks::StarletFileSystem::get_shard_filesystem (shard_id=<optimized out>, this=0x7f4544562f00) at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#14 starrocks::StarletFileSystem::iterate_dir(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<bool (std::basic_string_view<char, std::char_traits<char> >)> const&) (this=this@entry=0x7f4544562f00, dir="staros://10009/meta", cb=...) at be/src/fs/fs_starlet.cpp:360
#15 0x000000000670bee7 in starrocks::lake::TabletManager::list_tablet_metadata (this=0x7f4915b26060, tablet_id=tablet_id@entry=10009) at be/src/storage/lake/tablet_manager.cpp:318
#16 0x000000000670c3cb in starrocks::lake::TabletManager::get_tablet_data_size (this=this@entry=0x7f4915b26060, tablet_id=tablet_id@entry=10009, version_hint=version_hint@entry=0x0) at be/src/storage/lake/tablet_manager.cpp:457
#17 0x000000000670e9c8 in starrocks::lake::TabletManager::add_in_writing_data_size (this=0x7f4915b26060, tablet_id=<optimized out>, size=size@entry=0) at be/src/storage/lake/tablet_manager.cpp:722
#18 0x000000000670eb9d in starrocks::lake::TabletManager::in_writing_data_size (this=<optimized out>, tablet_id=<optimized out>) at be/src/storage/lake/tablet_manager.cpp:709
#19 0x000000000882313a in starrocks::lake::DeltaWriterImpl::check_immutable (this=0x7f490faca300) at be/src/storage/lake/delta_writer.cpp:211
#20 0x0000000008823438 in starrocks::lake::DeltaWriter::check_immutable (this=<optimized out>) at be/src/storage/lake/delta_writer.cpp:732
#21 0x00000000088a5339 in starrocks::lake::AsyncDeltaWriterImpl::check_immutable (this=<optimized out>) at be/src/storage/lake/async_delta_writer.cpp:69
#22 starrocks::lake::AsyncDeltaWriter::check_immutable (this=<optimized out>) at be/src/storage/lake/async_delta_writer.cpp:331
#23 0x00000000088a1bd9 in starrocks::LakeTabletsChannel::open (this=this@entry=0x7f4499f52010, params=..., result=result@entry=0x7f490fab5120, schema=std::shared_ptr<starrocks::OlapTableSchemaParam> (use count 3, weak count 0) = {...}, is_incremental=is_incremental@entry=false) at /usr/include/c++/11/bits/unique_ptr.h:173
#24 0x0000000008849058 in starrocks::LoadChannel::open (this=this@entry=0x7f4499f43e00, cntl=cntl@entry=0x7f4499efb000, request=..., response=response@entry=0x7f490fab5120, done=done@entry=0x7f490fab5170) at be/src/runtime/load_channel.cpp:137
#25 0x00000000088428a0 in starrocks::LoadChannelMgr::open (this=0x7f48f0767ee0, cntl=0x7f4499efb000, request=..., response=0x7f490fab5120, done=0x7f490fab5170) at be/src/common/closure_guard.h:60
#26 0x000000000a458e93 in brpc::policy::ProcessRpcRequest (msg_base=0x7f454460cbc0) at /usr/include/c++/11/bits/unique_ptr.h:185
#27 0x000000000a37ea6b in brpc::ProcessInputMessage (void_arg=<optimized out>) at src/brpc/input_messenger.cpp:173
#28 0x000000000a37feb4 in brpc::InputMessenger::OnNewMessages (m=0x7f452eb6f980) at src/brpc/input_messenger.cpp:397
#29 0x000000000a3bc7d2 in brpc::Socket::ProcessEvent (arg=0x7f452eb6f980) at src/brpc/socket.cpp:1201
#30 0x000000000a333967 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at src/bthread/task_group.cpp:305
#31 0x000000000a31cf81 in bthread_make_fcontext ()
#32 0x0000000000000000 in ?? ()

Quoted from https://github.com/apache/brpc/blob/master/docs/en/server.md#pthread-mode

pthread mode
User code(client-side done, server-side CallMethod) runs in bthreads with 1MB stacksize by default. But some of them cannot run in bthreads:
JNI code checks stack layout and cannot be run in bthreads.
The user code extensively use pthread-local to pass session-level data across functions. If there's a synchronous RPC call or function calls that may block bthread, the resumed bthread may land on a different pthread which does not have the pthread-local data that users expect to have. As a contrast, although tcmalloc uses pthread(or LWP)-local as well, the code inside has nothing to do with bthread, which is safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants