Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnamed threads don't appear to be freed, at least not in a timely manner. #396

Open
jolivier23 opened this issue Feb 13, 2024 · 7 comments

Comments

@jolivier23
Copy link
Contributor

jolivier23 commented Feb 13, 2024

I'm using v1.1. To illustrate, I made this patch to the hello_world example. If I run with a large number of ULTs, the memory grows very fast. Since it is executing hello_world, I would expect it to be a much smaller memory footprint. Note, I also changed the stacksize, not sure if that has anything to do with the problem but it matches an issue we see with memory growth in DAOS

index 822b10f..8b988be 100644
--- a/examples/hello_world/hello_world.c
+++ b/examples/hello_world/hello_world.c
@@ -81,19 +81,23 @@ int main(int argc, char **argv)
     for (i = 0; i < num_xstreams; i++) {
         ABT_xstream_get_main_pools(xstreams[i], 1, &pools[i]);
     }
+ABT_thread_attr attr;
+ABT_thread_attr_create(&attr);
+
+ABT_thread_attr_set_stacksize(attr, 128*1024);
 
     /* Create ULTs. */
     for (i = 0; i < num_threads; i++) {
+	int rc1;
         int pool_id = i % num_xstreams;
+
         thread_args[i].tid = i;
         ABT_thread_create(pools[pool_id], hello_world, &thread_args[i],
-                          ABT_THREAD_ATTR_NULL, &threads[i]);
+                          attr, NULL);
     }
 
     /* Join and free ULTs. */
-    for (i = 0; i < num_threads; i++) {
-        ABT_thread_free(&threads[i]);
-    }
+    ABT_thread_attr_free(&attr);
 
     /* Join and free secondary execution streams. */
     for (i = 1; i < num_xstreams; i++) {```
@jolivier23
Copy link
Contributor Author

This may just be a documentation issue of ABT_thread_exit is required to free the ULT.

@jolivier23
Copy link
Contributor Author

Hmm, if I add an ABT_thread_yield, in the thread creation loop, it is much better. So I suppose it's not a leak but may indicate a peak memory usage issue.

@carns
Copy link
Contributor

carns commented Feb 14, 2024

@jolivier23 just curious, does the memory consumption change if you set the ABT_MEM_MAX_NUM_STACKS environment variable?

FWIW the default value of this parameter is 65536, but in Margo we set it to a much lower default value (8) to prevent Argobots from accumulating too many unused stacks for potential reuse. Although now that I look back at it I'm wondering if we need to consider if that value is having any other side effects.

At any rate, I'm curious if that's a factor in the behavior here.

@jolivier23
Copy link
Contributor Author

@carns For the example above, it makes no difference what I set that to. We can try in DAOS and see if it makes any difference.

I'm trying to understand the behavior of this particular example. If I add ABT_thread_yield, the problem essentially goes away in that test case. My assumption is that the primary ULT is creating the threads to execute on the other xstreams and is the party responsible for freeing them but never gets a chance to do so until the ABT_thread_yield or some other yield point.

If this is the case, perhaps we should be at least checking a condition on create and potentially cleaning up old threads before creating new ones?

It seems like this may be a red herring for the DAOS problem but seems like an issue nonetheless.

@cdavis28
Copy link

@jolivier23 Found the code to free memory for unnamed threads automatically. No call to ABT_thread_exit() required.

ABTD_atomic_release_store_int(&p_thread->state,

@jolivier23
Copy link
Contributor Author

jolivier23 commented Feb 14, 2024

@carns in looking at the code, it looks like the max is 1024 now

#define ABTD_MEM_MAX_NUM_STACKS 1024
...
        const uint32_t default_mem_max_stacks =                                                          
        ABTU_min_uint32(ABTD_MEM_MAX_TOTAL_STACK_SIZE /                                              
                            p_global->thread_stacksize,                                              
                        ABTD_MEM_MAX_NUM_STACKS);  

65536 could probably explain how much memory is used during rebuild in daos but 1024 does not.

@carns
Copy link
Contributor

carns commented Feb 19, 2024

@carns in looking at the code, it looks like the max is 1024 now

#define ABTD_MEM_MAX_NUM_STACKS 1024
...
        const uint32_t default_mem_max_stacks =                                                          
        ABTU_min_uint32(ABTD_MEM_MAX_TOTAL_STACK_SIZE /                                              
                            p_global->thread_stacksize,                                              
                        ABTD_MEM_MAX_NUM_STACKS);  

65536 could probably explain how much memory is used during rebuild in daos but 1024 does not.

Ah, right. I forgot it was tuned down. Re: your other comment I'm not sure who's responsible for the free (though I do know it eventually happens; we've had similar patterns with no runaway memory usage), so I'm not sure what's needed to encourage it to happen faster...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants