Skip to content

How to resolve source lines from a user crash using gdb

Magnus Edenhill edited this page Feb 3, 2021 · 1 revision

How to resolve source lines from a user gdb backtrace

User's gdb backtrace

We have this gdb output from a user:

(user-gdb)
Thread 1 (Thread 0x7fb303fff700 (LWP 56461)):
#0  0x00007fb3966e178a in rd_kafka_txn_handle_TxnOffsetCommit () from librdkafka.so.1
#1  0x00007fb39666b4fb in rd_kafka_buf_callback () from librdkafka.so.1
#2  0x00007fb396675a2b in rd_kafka_op_handle_std () from librdkafka.so.1
#3  0x00007fb396675aa8 in rd_kafka_op_handle () from librdkafka.so.1
#4  0x00007fb39666fa63 in rd_kafka_q_serve () from librdkafka.so.1
#5  0x00007fb3966395dc in rd_kafka_thread_main () from librdkafka.so.1
#6  0x00007fb3966affd7 in _thrd_wrapper_function () from librdkafka.so.1
#7  0x00007fb38af0bdd5 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007fb389cdd02d in clone () from /usr/lib64/libc.so.6

There are however no source line references since their librdkafka.so.1 is from a confluent-platform librdkafka1...rpm package which is stripped of debug symbols.

We do know that they are using the CP librdkafka v1.5.3 RPM packages though, and each such release also has a librdkafka-debuginfo..rpm package with the debug symbols.

Extracting RPMs

$ ls
librdkafka1-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm
librdkafka-1.5.3_confluent6.1.0-0.1.0.el7.src.rpm
librdkafka-debuginfo-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm
librdkafka-devel-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm

Extract librdkafka1..*rpm:

$ rpm2cpio librdkafka1-*rpm | cpio -idmv

Extract librdkafka-debuginfo..*rpm:

$ rpm2cpio librdkafka-debuginfo*rpm | cpio -idmv

We now have the stripped librdkafka.so.1 and its unstripped counterpart as well as the librdkafka source code:

$ find . -name librdkafka.so.1* -or -name rdkafka.h
./usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src/rdkafka.h
./usr/lib/debug/usr/lib64/librdkafka.so.1.debug
./usr/lib64/librdkafka.so.1

Find the debug symbol

Start up gdb using the debuginfo lib:

$ gdb ./usr/lib/debug/usr/lib64/librdkafka.so.1.debug
..
(gdb)

Verify that the library is loaded by looking up the function that crashed:

(gdb) x/i rd_kafka_txn_handle_TxnOffsetCommit
  0xca740 <rd_kafka_txn_handle_TxnOffsetCommit>:	add    %al,(%rax)

Since we're running gdb on the shared library itself rather than an application the library is loaded at address 0x0. We can see this by:

(gdb) info target
Symbols from "./usr/lib/debug/usr/lib64/librdkafka.so.1.debug".
Local exec file:
	`./usr/lib/debug/usr/lib64/librdkafka.so.1.debug', file type elf64-x86-64.
warning: Cannot find section for the entry point of ./usr/lib/debug/usr/lib64/librdkafka.so.1.debug.
	Entry point: 0x13510
	0x0000000000000200 - 0x0000000000000224 is .note.gnu.build-id
	0x0000000000000228 - 0x0000000000000bd8 is .gnu.hash
	0x0000000000000bd8 - 0x0000000000003f20 is .dynsym
	0x0000000000003f20 - 0x0000000000006d42 is .dynstr
	0x0000000000006d42 - 0x0000000000007188 is .gnu.version
	0x0000000000007188 - 0x0000000000007318 is .gnu.version_r
	0x0000000000007318 - 0x0000000000010210 is .rela.dyn
	0x0000000000010210 - 0x0000000000012088 is .rela.plt
	0x0000000000012088 - 0x00000000000120a2 is .init
	0x00000000000120b0 - 0x0000000000013510 is .plt
	0x0000000000013510 - 0x00000000001884ab is .text
	0x00000000001884ac - 0x00000000001884b5 is .fini
	0x00000000001884c0 - 0x00000000001be800 is .rodata
	0x00000000001be800 - 0x00000000001c3274 is .eh_frame_hdr
	0x00000000001c3278 - 0x00000000001e07dc is .eh_frame
	0x00000000003e0d70 - 0x00000000003e0dc0 is .tdata
	0x00000000003e0dc0 - 0x00000000003e4538 is .tbss
	0x00000000003e0dc0 - 0x00000000003e0dc8 is .init_array
	0x00000000003e0dc8 - 0x00000000003e0dd0 is .fini_array
	0x00000000003e0dd0 - 0x00000000003e0dd8 is .jcr
	0x00000000003e0de0 - 0x00000000003fccb0 is .data.rel.ro
	0x00000000003fccb0 - 0x00000000003fcf10 is .dynamic
	0x00000000003fcf10 - 0x00000000003fcff8 is .got
	0x00000000003fd000 - 0x00000000003fda40 is .got.plt
	0x00000000003fda40 - 0x00000000003fde30 is .data
	0x00000000003fde40 - 0x0000000000401f30 is .bss

We see that the .text section (which is where the function code resides) is at 0x13510, which is also the Entry point.

Resolve the user's crash address

So, now we have the shared library's relative offset to the first instruction in rd_kafka_txn_handle_TxnOffsetCommit(): 0xca740.

And we have the absolute offset to the crash location from the user's gdb output: 0x00007fb3966e178a.

But we don't know at what address librdkafka.so.1 was loaded/mapped in the user's application.

There are two ways to find out:

1) User provides the load address

Have the user issue info shared in gdb, it will print the load address of all shared libraries. This is by far the the simplest.

(user-gdb) info shared
From                To                  Syms Read   Shared Object Library
...lots of other stuff...
0x00007fb39662a510  0x00007fb39679f4ab  Yes (*)     librdkafka.so.1
...

Perfect, librdkafka's .text segment was loaded at 0x00007fb39662a510 and if we subtract the crash address 0x00007fb3966e178a, we get:

(gdb) x/a 0x00007fb3966e178a-0x00007fb39662a510
0xb727a <rd_kafka_CreateTopicsResponse_parse+4810>:	0x0

But that's not where the crash is, it's supposed to be in rd_kafka_txn_handle_TxnOffsetCommit(), not rd_kafka_CreateTopicsResponse_parse(). We need to add the .text segment's offset since that's what the load address from info shared refers to:

gdb) x/a 0x00007fb3966e178a-0x00007fb39662a510+0x13510
0xca78a <rd_kafka_txn_handle_TxnOffsetCommit+74>:	0x0

That looks much better, now let's get the source line for that address: skip the next chapter and jump to Inspect the source.

2) Manually resolve load address

By looking at the backtraces we can compare relative offsets between known functions in the user's gdb output and our debuginfo gdb and derive a base offset where the library was probably loaded.

TBD.

Inspect the source

Now when we have the address of the crash in our local librdkafka gdb session we can look at the source line for the crash:

gdb) list *(0x00007fb3966e178a-0x00007fb39662a510+0x13510)
0xca78a is in rd_kafka_txn_handle_TxnOffsetCommit (rdkafka_txnmgr.c:1382).
1377	        rd_kafka_topic_partition_list_t *partitions = NULL;
1378	        char errstr[512];
1379
1380	        *errstr = '\0';
1381
1382	        if (err != RD_KAFKA_RESP_ERR__DESTROY &&
1383	            !rd_kafka_q_ready(rko->rko_replyq.q))
1384	                err = RD_KAFKA_RESP_ERR__OUTDATED;
1385
1386	        if (err)

It crashed at line 1372, since the build was with optimization (not -O0) line numbers don't exactly match, but we know that it is that if-statement, we also know that the err != .. check can't crash, so it is either the rko->.. dereferencing, or something in rd_kafka_q_ready() if that function is inlined (which it is).

Since we don't have access to the core file itself, this is as far as we can go since we can't inspect the memory of rko.

Verify source location

Also verify that the source files are indeed loaded from the debuginfo rpm package we extracted, so we know they match the address.

gdb) info source
Current source file is rdkafka_txnmgr.c
Compilation directory is /usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src
Located in /home/me/Downloads/dd/usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src/rdkafka_txnmgr.c

Perfect! That's where we extracted the debuginfo rpm.