Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a stress test for memory allocation (snmalloc) #566

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

NirjharRoyiitk
Copy link
Contributor

This is a very basic stress test for memory allocator

Description of the test:

This thread creates 64 threads of 3 types:

  1. Large threads allocates buffers of size 1GB to 4GB and scans them page wise
  2. Medium threads allocates buffers of size 8MB to 128MB and scans them with an interval of 1MB
  3. Small threads allocates buffers of size 1KB to 512KB and scans them with an interval of 1KB

After scanning the buffers every thread frees the buffer.
In the end it calculates the time taken to do all the operations (list from step 1 to step 3).
The test periodically prints the average operation time of each thread

arai-fortanix
arai-fortanix previously approved these changes Mar 15, 2024
@@ -0,0 +1,9 @@
[package]
name = "mem-alloc-test"
version = "0.1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0.1.0? Can this be 1.0?

@@ -0,0 +1,152 @@
use core::arch::global_asm;
use std::{thread};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run cargo fmt on this crate? I think the preferred style when there's only on use on a line is to not put in the { }.

enum SizeType {
LARGE_MEM,
MEDIUM_MEM,
LOW_MEM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be SMALL_MEM instead of LOW? The term "low memory" usually means memory below a certain physical address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, having MEM in all of these identifiers is kind of redundant, because enumeration identifiers are scoped within their enumeration in Rust. Maybe call the enum MemSize and the enumeration members can be called simply Large, Medium, and Small? These will always be used in the code as something like MemSize::Large. That seems pretty clear.

I think the Rust convention is that enumeration variants should be named in CamelCase, not all caps.

LOW_MEM,
}

fn calculateAndPrintStat(shared_mutex_clone: Arc<Mutex<i32>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rust convention (or at least the one we use) is for function names to be written in snake_case().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Arc<Mutex<i32>> and Arc<Mutex<u128>> types already document that these are shared, and I think the clone part is unnecessary, so maybe these variables can just be called count and duration? (But see below where I think maybe it makes more sense to put both variables in a struct where they can be protected by a single lock.)


fn calculateAndPrintStat(shared_mutex_clone: Arc<Mutex<i32>>,
shared_mutex_dur_clone: Arc<Mutex<u128>>,
duration: Duration, tid: i32, memtypestr: &str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is memtypestr a string when we have an enumeration for the size?

*/

let duration_ms = duration.as_nanos();
let mut data = shared_mutex_clone.lock().unwrap();

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An area for possible future improvement: when looking at latency, it's often useful to gather more detailed statistics than just average, such as standard deviation, minimum and maximum time, and p95/p99/p99.9 latency. Keeping the maximum latency low can be as important (or sometimes more important) than the average. There are probably crates that make collecting these types of stats easy.


fn Get_Random_Num(start: usize, end: usize) -> usize {
let mut rng = rand::thread_rng();
let random_number: usize = rng.gen_range(start..=end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think you can just write rng.gen_range(start..=end) here. (Note: no semicolon, and you don't need the let random_number: usize = or the return. The last statement in the function is automatically used as the return value. The preferred way in Rust is not to use an explicit return unless you're returning from the middle of a function.

let mut ran_num = 0;
let mut size = 0;
let mut scan_interval = page_size;
let memtypestr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rust style for this would be to have each arm of the match produce the values you need (as a tuple, since this is producing multiple values). So something like:

let (scan_interval, size, memtypestr) = match memtype {
    SizeType::LARGE_MEM => {
         (Get_Random_Num(1, 4) * to_gb, String::from("LARGE"))
    },
    SizeType::MEDIUM_MEM => {
    }
   ...
};

Basically in Rust, things like match and if can evaluate to a value, so you can do things like let x = if ... or let y = match .... This form is generally preferred over assigning a variable in each arm of the match or if. Rust also has native support for tuples.

Also, I think there's a better way of getting a string from the enumeration names than having explicit code here that does it. In this particular case, it looks like we only care about the strings when we're printing the output. In that case, we can just use the enumeration value everyewhere. You can add #[derive(Debug)] on the line before your enum SizeType definition. This tells the Rust compiler to automatically generate code for a debug representation of the type. In the case of enums, this is just the name of the Enum variant. When you print the results, if you print using "{:?}" instead of just `"{}", it will print using the debug representation. So something like fun myfun(var: SizeType) { println!("{:?}", var); } will just do what you want, without having to write explicit conversions to string for your enumeration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do need the string representation at run-time for other reasons, then the Rust-y way to do it would be to add a to_str() method to the enumeration, so the conversion is tied to the enumeration type. Rust doesn't have classes and objects, but you can add functions to enums and structs that operate somewhat like methods on classes in object-oriented language.


fn worker_thread(tid: i32, shared_mutex_clone: Arc<Mutex<i32>>,
shared_mutex_dur_clone: Arc<Mutex<u128>>, memtype: SizeType) {
loop {

This comment was marked as outdated.

while !*started {
started = cvar.wait(started).unwrap();
}
drop(started);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if an explicit drop() is needed here. Usually locks are scoped, so they'll be automatically dropped when they leave scope, but maybe there's a reason why the drop is needed with the condition variable.

let (lock, cvar) = &*pair_clone;
let mut started = lock.lock().unwrap();
*started = true;
drop(started);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's wrong to drop the lock before notifying the condition variable, but I think it's safe to do the notify with the lock still held and then just let the lock be dropped when the function returns.

fn spawn_threads(thread_count: i32) {
let mut handles = vec![];
let shared_variable = Arc::new(Mutex::new(Counters{
avg_duration_large_thread:0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think the preferred formatting for these is like:

avg_duration_large_thread: 0.0,

(with no space before the : but one space after the :. I think this is one of things that cargo fmt will format for you.

data.avg_duration_medium_thread,
data.avg_duration_large_thread,
data.global_average);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: A couple of unnecessary blank lines here.

{
println!("NUM_THREADS,LATENCY_SMALL_THREADS,LATENCY_MEDIUM_THREADS,LATENCY_LARGE_THREADS,GLOBAL_AVERAGE");
let mut num_processors = get_num_processors();
num_processors = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_processors is being hard-coded to 4, even though we called get_num_processors() on the previous line. Was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't. I will remove this.

arai-fortanix
arai-fortanix previously approved these changes Mar 21, 2024
Copy link
Contributor

@arai-fortanix arai-fortanix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes look good to me.

}

fn get_num_processors() -> usize {
num_cpus::get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work in SGX as it will attempt to open files from within the enclave or issue syscalls. You'll need to create a service in the enclave runner if you want to do something like that. See this example

@@ -0,0 +1,9 @@
[package]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate isn't tested in CI. Can you add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I add to CI?

@NirjharRoyiitk NirjharRoyiitk force-pushed the memory-allocation-stress-test branch 2 times, most recently from be76ac4 to 3885d29 Compare March 22, 2024 11:53
@arai-fortanix arai-fortanix self-requested a review March 26, 2024 17:11
arai-fortanix
arai-fortanix previously approved these changes Mar 26, 2024
@NirjharRoyiitk NirjharRoyiitk force-pushed the memory-allocation-stress-test branch from 3e3c56f to 1c317e2 Compare March 27, 2024 12:41
@@ -99,3 +99,5 @@ jobs:
- name: Generate API docs
run: ./doc/generate-api-docs.sh

-name: Run memory allocator stress test
- cd ./examples/mem-alloc-test && cargo run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

    - name: Run memory allocator stress test
      run: cd ./examples/mem-alloc-test && cargo run


const MAX_INDEX_CHECKS_PER_BUFFER: usize = 32;

fn calculate_and_print_stat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is out of date. It doesn't print anything anymore

rng.gen_range(start..=end)
}

fn wait_per_thread(pair_clone: Arc<(Mutex<bool>, Condvar)>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply this by using a Barrier

)
}
MemSize::Small => {
/* buffer size will be from 1KB to 512KB */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a heap allocator 1KB is still big. Can this start from 1B instead? Corner cases may only pop up with such tiny chunks

fn traverse_buffer(buf: &mut Vec<i32>, scan_interval: usize) {
let mut ptr = 0;
let num_indices_checks = get_random_num(1, MAX_INDEX_CHECKS_PER_BUFFER);
for i in 1..=num_indices_checks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time this takes will be very heavily be impacted by the value you get from the random number generator. If you want to be able to compare performance results, this can't be based on a random value.

);
}

fn get_num_processors() -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of this function? It's confusing as it does not return the number of processors on the machine


[dependencies]
rand = "0.8.4"
num_cpus = "1.14.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer required

[package.metadata.fortanix-sgx]
# heap size (in bytes), the default heap size is 0x2000000.
heap-size=0x4000000
debug=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to specify debug=false here

let start_time = Instant::now();

/* Create an array of x GB where x is a random number between 1 to 4 */
let mut large_vector = Vec::with_capacity(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you intent to allocate a Vec that can hold size bytes. But Vec::with_capacity assumes it receives the number of elements the Vec may store. In this case large_vector will be of type Vec<i32>, so each element already takes 4 bytes, not 1. So you're at least allocating 4 times as much memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Vec is ill-suited here. There may be some internal optimizations that you are not aware of, and that may even differ between versions of the standard library. In principle Vec could even simply record that you intent a Vec with size elements initialized on 0, without actually allocating that much memory. For this very specific use case, I think you could directly interact with the memory allocator.

* then n threads, then n/2 and so on.
*/
fn start_tests() {
println!("NUM_THREADS,LATENCY_SMALL_THREADS,LATENCY_MEDIUM_THREADS,LATENCY_LARGE_THREADS,GLOBAL_AVERAGE");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match with the output created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to generate a csv format file and this is just the heading

@NirjharRoyiitk NirjharRoyiitk added this pull request to the merge queue Mar 27, 2024
Merged via the queue into master with commit d9e438f Mar 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants