Skip to content

Valgrind reports "possibly lost" when using static Regex #1205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wyfo opened this issue Jun 19, 2024 · 7 comments
Closed

Valgrind reports "possibly lost" when using static Regex #1205

wyfo opened this issue Jun 19, 2024 · 7 comments
Labels

Comments

@wyfo
Copy link

wyfo commented Jun 19, 2024

What version of regex are you using?

regex = "1.10.5"

Describe the bug at a high level.

Valgrind reports "possibly lost" when using static Regex.

What are the steps to reproduce the behavior?

use regex::Regex;

static mut REGEX: Option<Regex> = None;

fn main() {
    unsafe {
        REGEX = Regex::new(r"").ok();
        REGEX.as_ref().unwrap().captures("");
    }
}

What is the actual behavior?

Here is valgrind command and report:

valgrind --leak-check=full --num-callers=50 target/debug/regex-leak
==17154== Memcheck, a memory error detector
==17154== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==17154== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==17154== Command: target/debug/regex-leak
==17154== 
==17154== 
==17154== HEAP SUMMARY:
==17154==     in use at exit: 7,266 bytes in 51 blocks
==17154==   total heap usage: 122 allocs, 71 frees, 13,048 bytes allocated
==17154== 
==17154== 108 bytes in 1 blocks are possibly lost in loss record 41 of 51
==17154==    at 0x4885250: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==17154==    by 0x1B4B67: alloc (alloc.rs:98)
==17154==    by 0x1B4B67: alloc::alloc::Global::alloc_impl (alloc.rs:181)
==17154==    by 0x1B557B: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241)
==17154==    by 0x1AE187: hashbrown::raw::alloc::inner::do_alloc (alloc.rs:15)
==17154==    by 0x1B6D77: hashbrown::raw::RawTableInner::new_uninitialized (mod.rs:1754)
==17154==    by 0x1B70FB: hashbrown::raw::RawTableInner::fallible_with_capacity (mod.rs:1792)
==17154==    by 0x1B5F13: hashbrown::raw::RawTableInner::prepare_resize (mod.rs:2871)
==17154==    by 0x1B9167: resize_inner<alloc::alloc::Global> (mod.rs:3067)
==17154==    by 0x1B9167: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2957)
==17154==    by 0x1B9167: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1235)
==17154==    by 0x1BA85F: hashbrown::raw::RawTable<T,A>::reserve (mod.rs:1183)
==17154==    by 0x1B9A6B: hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot (mod.rs:1417)
==17154==    by 0x189B6F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==17154==    by 0x17BEAB: std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:1105)
==17154==    by 0x1CA35B: regex_automata::hybrid::dfa::Lazy::add_state (dfa.rs:2309)
==17154==    by 0x1CC05B: regex_automata::hybrid::dfa::Lazy::init_cache (dfa.rs:2534)
==17154==    by 0x1C85EF: regex_automata::hybrid::dfa::Cache::new (dfa.rs:1891)
==17154==    by 0x22914F: regex_automata::hybrid::regex::Cache::new (regex.rs:613)
==17154==    by 0x228B2B: regex_automata::hybrid::regex::Regex::create_cache (regex.rs:192)
==17154==    by 0x1A5DCF: regex_automata::meta::wrappers::HybridCache::new::{{closure}} (wrappers.rs:788)
==17154==    by 0x1BDB03: core::option::Option<T>::map (option.rs:1072)
==17154==    by 0x1A5D9B: regex_automata::meta::wrappers::HybridCache::new (wrappers.rs:788)
==17154==    by 0x1A5183: regex_automata::meta::wrappers::Hybrid::create_cache (wrappers.rs:541)
==17154==    by 0x192E43: <regex_automata::meta::strategy::Core as regex_automata::meta::strategy::Strategy>::create_cache (strategy.rs:679)
==17154==    by 0x18B70F: regex_automata::meta::regex::Builder::build_many_from_hir::{{closure}} (regex.rs:3556)
==17154==    by 0x14D73B: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call (boxed.rs:2021)
==17154==    by 0x149223: regex_automata::util::pool::inner::Pool<T,F>::get_slow (pool.rs:568)
==17154==    by 0x1490DF: regex_automata::util::pool::inner::Pool<T,F>::get (pool.rs:533)
==17154==    by 0x14C01F: regex_automata::util::pool::Pool<T,F>::get (pool.rs:182)
==17154==    by 0x149D6B: regex_automata::meta::regex::Regex::search_slots (regex.rs:1134)
==17154==    by 0x149EC3: regex_automata::meta::regex::Regex::search_captures (regex.rs:1065)
==17154==    by 0x14B453: regex::regex::string::Regex::captures_at (string.rs:1151)
==17154==    by 0x14B5B3: regex::regex::string::Regex::captures (string.rs:356)
==17154==    by 0x14ACC3: regex_leak::main (main.rs:8)
==17154==    by 0x14B73B: core::ops::function::FnOnce::call_once (function.rs:250)
==17154==    by 0x14D7F7: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:154)
==17154==    by 0x14AFEB: std::rt::lang_start::{{closure}} (rt.rs:167)
==17154==    by 0x344577: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==17154==    by 0x344577: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:552)
==17154==    by 0x344577: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:516)
==17154==    by 0x344577: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:142)
==17154==    by 0x344577: {closure#2} (rt.rs:148)
==17154==    by 0x344577: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:552)
==17154==    by 0x344577: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:516)
==17154==    by 0x344577: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:142)
==17154==    by 0x344577: std::rt::lang_start_internal (rt.rs:148)
==17154==    by 0x14AFBB: std::rt::lang_start (rt.rs:166)
==17154==    by 0x14AD03: main (in /app/target/debug/regex-leak)
==17154== 
==17154== 108 bytes in 1 blocks are possibly lost in loss record 42 of 51
==17154==    at 0x4885250: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==17154==    by 0x1B4B67: alloc (alloc.rs:98)
==17154==    by 0x1B4B67: alloc::alloc::Global::alloc_impl (alloc.rs:181)
==17154==    by 0x1B557B: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241)
==17154==    by 0x1AE187: hashbrown::raw::alloc::inner::do_alloc (alloc.rs:15)
==17154==    by 0x1B6D77: hashbrown::raw::RawTableInner::new_uninitialized (mod.rs:1754)
==17154==    by 0x1B70FB: hashbrown::raw::RawTableInner::fallible_with_capacity (mod.rs:1792)
==17154==    by 0x1B5F13: hashbrown::raw::RawTableInner::prepare_resize (mod.rs:2871)
==17154==    by 0x1B9167: resize_inner<alloc::alloc::Global> (mod.rs:3067)
==17154==    by 0x1B9167: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2957)
==17154==    by 0x1B9167: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1235)
==17154==    by 0x1BA85F: hashbrown::raw::RawTable<T,A>::reserve (mod.rs:1183)
==17154==    by 0x1B9A6B: hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot (mod.rs:1417)
==17154==    by 0x189B6F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==17154==    by 0x17BEAB: std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:1105)
==17154==    by 0x1CA35B: regex_automata::hybrid::dfa::Lazy::add_state (dfa.rs:2309)
==17154==    by 0x1CC05B: regex_automata::hybrid::dfa::Lazy::init_cache (dfa.rs:2534)
==17154==    by 0x1C85EF: regex_automata::hybrid::dfa::Cache::new (dfa.rs:1891)
==17154==    by 0x229187: regex_automata::hybrid::regex::Cache::new (regex.rs:614)
==17154==    by 0x228B2B: regex_automata::hybrid::regex::Regex::create_cache (regex.rs:192)
==17154==    by 0x1A5DCF: regex_automata::meta::wrappers::HybridCache::new::{{closure}} (wrappers.rs:788)
==17154==    by 0x1BDB03: core::option::Option<T>::map (option.rs:1072)
==17154==    by 0x1A5D9B: regex_automata::meta::wrappers::HybridCache::new (wrappers.rs:788)
==17154==    by 0x1A5183: regex_automata::meta::wrappers::Hybrid::create_cache (wrappers.rs:541)
==17154==    by 0x192E43: <regex_automata::meta::strategy::Core as regex_automata::meta::strategy::Strategy>::create_cache (strategy.rs:679)
==17154==    by 0x18B70F: regex_automata::meta::regex::Builder::build_many_from_hir::{{closure}} (regex.rs:3556)
==17154==    by 0x14D73B: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call (boxed.rs:2021)
==17154==    by 0x149223: regex_automata::util::pool::inner::Pool<T,F>::get_slow (pool.rs:568)
==17154==    by 0x1490DF: regex_automata::util::pool::inner::Pool<T,F>::get (pool.rs:533)
==17154==    by 0x14C01F: regex_automata::util::pool::Pool<T,F>::get (pool.rs:182)
==17154==    by 0x149D6B: regex_automata::meta::regex::Regex::search_slots (regex.rs:1134)
==17154==    by 0x149EC3: regex_automata::meta::regex::Regex::search_captures (regex.rs:1065)
==17154==    by 0x14B453: regex::regex::string::Regex::captures_at (string.rs:1151)
==17154==    by 0x14B5B3: regex::regex::string::Regex::captures (string.rs:356)
==17154==    by 0x14ACC3: regex_leak::main (main.rs:8)
==17154==    by 0x14B73B: core::ops::function::FnOnce::call_once (function.rs:250)
==17154==    by 0x14D7F7: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:154)
==17154==    by 0x14AFEB: std::rt::lang_start::{{closure}} (rt.rs:167)
==17154==    by 0x344577: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==17154==    by 0x344577: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:552)
==17154==    by 0x344577: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:516)
==17154==    by 0x344577: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:142)
==17154==    by 0x344577: {closure#2} (rt.rs:148)
==17154==    by 0x344577: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:552)
==17154==    by 0x344577: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:516)
==17154==    by 0x344577: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:142)
==17154==    by 0x344577: std::rt::lang_start_internal (rt.rs:148)
==17154==    by 0x14AFBB: std::rt::lang_start (rt.rs:166)
==17154==    by 0x14AD03: main (in /app/target/debug/regex-leak)
==17154== 
==17154== LEAK SUMMARY:
==17154==    definitely lost: 0 bytes in 0 blocks
==17154==    indirectly lost: 0 bytes in 0 blocks
==17154==      possibly lost: 216 bytes in 2 blocks
==17154==    still reachable: 7,050 bytes in 49 blocks
==17154==         suppressed: 0 bytes in 0 blocks
==17154== Reachable blocks (those to which a pointer was found) are not shown.
==17154== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17154== 
==17154== For lists of detected and suppressed errors, rerun with: -s
==17154== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

What is the expected behavior?

I expect to have still reachable blocks, but I don't know the reason of possibly lost.

@BurntSushi
Copy link
Member

Yeah, so? You're sticking the Regex in a global mutable variable. Its destructor will never run.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2024
@wyfo
Copy link
Author

wyfo commented Jun 19, 2024

Sorry, I missclicked and submitted without the full text. I've edited the description.
I know the destructor is never run, but usually (at least in my short experience), it results in still reachable and not in possibly lost.
Quoting the Valgrind documentation:

"possibly lost" means your program is leaking memory, unless you're doing unusual things with pointers that could cause them to point into the middle of an allocated block; see the user manual for some possible causes. Use --show-possibly-lost=no if you don't want to see these reports.

"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.

possibly lost may thus indicate a possible bug, that's why valgrind default behavior is to treat them as an error.

My question is then: does these possibly lost blocks fall in the category of "unusual things with pointers that could cause them to point into the middle of an allocated block", or may it be a non trivial memory leak?

@wyfo
Copy link
Author

wyfo commented Jun 19, 2024

To give some context, I'm facing this possible leak using tracing_subscriber::EnvFilter, which uses Lazy<Regex> internally https://github.com/tokio-rs/tracing/blob/master/tracing-subscriber/src/filter/env/directive.rs#L123.

About the still reachable/possibly lost distinction, tracing is using a static subscriber, and that results in a still reachable, see tokio-rs/tracing#2069, and that's totally fine. However, when using EnvFilter the leak changes of category, "because" of this Lazy<Regex>.

@wyfo wyfo changed the title Valgrind reports "possible lost" when using static Regex Valgrind reports "possibly lost" when using static Regex Jun 19, 2024
@wyfo
Copy link
Author

wyfo commented Jun 19, 2024

Another maybe useful information, the possibly lost blocks number and size changes with the regex.
For example:

  • Regex::new(r"") -> possibly lost: 216 bytes in 2 blocks
  • Regex::new(r"\w") -> possibly lost: 416 bytes in 2 blocks
  • Regex::new(r"(?P<name>\w)") -> possibly lost: 524 bytes in 3 blocks

@wyfo
Copy link
Author

wyfo commented Jun 19, 2024

Ok, I've understood that it felt in the case "doing unusual things with pointers", sorry for the bother. Seem's I've to add a valgrind suppression.

@BurntSushi
Copy link
Member

My prior is that valgrind reports false positives, and that its behavior at least partially depends on the allocator being used. So I need more evidence.

Otherwise, I don't really see anything wrong here. Like yes... you have a leak because you aren't running a regex's destructor. And yes, it changes with different patterns because different patterns require different amounts of heap... I'm not sure why you would expect anything different.

@BurntSushi
Copy link
Member

Also, regex internally doesn't really do anything that would cause leaks in the first place. Rust is itself doing a lot of leak checking already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants