-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a stress test for memory allocation (snmalloc) #566
Conversation
examples/mem-alloc-test/Cargo.toml
Outdated
@@ -0,0 +1,9 @@ | |||
[package] | |||
name = "mem-alloc-test" | |||
version = "0.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0.1.0? Can this be 1.0?
examples/mem-alloc-test/src/main.rs
Outdated
@@ -0,0 +1,152 @@ | |||
use core::arch::global_asm; | |||
use std::{thread}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run cargo fmt
on this crate? I think the preferred style when there's only on use
on a line is to not put in the { }
.
examples/mem-alloc-test/src/main.rs
Outdated
enum SizeType { | ||
LARGE_MEM, | ||
MEDIUM_MEM, | ||
LOW_MEM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be SMALL_MEM
instead of LOW
? The term "low memory" usually means memory below a certain physical address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having MEM
in all of these identifiers is kind of redundant, because enumeration identifiers are scoped within their enumeration in Rust. Maybe call the enum MemSize
and the enumeration members can be called simply Large
, Medium
, and Small
? These will always be used in the code as something like MemSize::Large
. That seems pretty clear.
I think the Rust convention is that enumeration variants should be named in CamelCase, not all caps.
examples/mem-alloc-test/src/main.rs
Outdated
LOW_MEM, | ||
} | ||
|
||
fn calculateAndPrintStat(shared_mutex_clone: Arc<Mutex<i32>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rust convention (or at least the one we use) is for function names to be written in snake_case()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Arc<Mutex<i32>>
and Arc<Mutex<u128>>
types already document that these are shared, and I think the clone
part is unnecessary, so maybe these variables can just be called count
and duration
? (But see below where I think maybe it makes more sense to put both variables in a struct where they can be protected by a single lock.)
examples/mem-alloc-test/src/main.rs
Outdated
|
||
fn calculateAndPrintStat(shared_mutex_clone: Arc<Mutex<i32>>, | ||
shared_mutex_dur_clone: Arc<Mutex<u128>>, | ||
duration: Duration, tid: i32, memtypestr: &str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is memtypestr
a string when we have an enumeration for the size?
*/ | ||
|
||
let duration_ms = duration.as_nanos(); | ||
let mut data = shared_mutex_clone.lock().unwrap(); |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An area for possible future improvement: when looking at latency, it's often useful to gather more detailed statistics than just average, such as standard deviation, minimum and maximum time, and p95/p99/p99.9 latency. Keeping the maximum latency low can be as important (or sometimes more important) than the average. There are probably crates that make collecting these types of stats easy.
examples/mem-alloc-test/src/main.rs
Outdated
|
||
fn Get_Random_Num(start: usize, end: usize) -> usize { | ||
let mut rng = rand::thread_rng(); | ||
let random_number: usize = rng.gen_range(start..=end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I think you can just write rng.gen_range(start..=end)
here. (Note: no semicolon, and you don't need the let random_number: usize =
or the return
. The last statement in the function is automatically used as the return value. The preferred way in Rust is not to use an explicit return
unless you're returning from the middle of a function.
examples/mem-alloc-test/src/main.rs
Outdated
let mut ran_num = 0; | ||
let mut size = 0; | ||
let mut scan_interval = page_size; | ||
let memtypestr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust style for this would be to have each arm of the match
produce the values you need (as a tuple, since this is producing multiple values). So something like:
let (scan_interval, size, memtypestr) = match memtype {
SizeType::LARGE_MEM => {
(Get_Random_Num(1, 4) * to_gb, String::from("LARGE"))
},
SizeType::MEDIUM_MEM => {
}
...
};
Basically in Rust, things like match
and if
can evaluate to a value, so you can do things like let x = if ...
or let y = match ...
. This form is generally preferred over assigning a variable in each arm of the match
or if
. Rust also has native support for tuples.
Also, I think there's a better way of getting a string from the enumeration names than having explicit code here that does it. In this particular case, it looks like we only care about the strings when we're printing the output. In that case, we can just use the enumeration value everyewhere. You can add #[derive(Debug)]
on the line before your enum SizeType
definition. This tells the Rust compiler to automatically generate code for a debug representation of the type. In the case of enums, this is just the name of the Enum variant. When you print the results, if you print using "{:?}"
instead of just `"{}", it will print using the debug representation. So something like fun myfun(var: SizeType) { println!("{:?}", var); }
will just do what you want, without having to write explicit conversions to string for your enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do need the string representation at run-time for other reasons, then the Rust-y way to do it would be to add a to_str()
method to the enumeration, so the conversion is tied to the enumeration type. Rust doesn't have classes and objects, but you can add functions to enums and structs that operate somewhat like methods on classes in object-oriented language.
examples/mem-alloc-test/src/main.rs
Outdated
|
||
fn worker_thread(tid: i32, shared_mutex_clone: Arc<Mutex<i32>>, | ||
shared_mutex_dur_clone: Arc<Mutex<u128>>, memtype: SizeType) { | ||
loop { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
c776874
to
b0025a9
Compare
while !*started { | ||
started = cvar.wait(started).unwrap(); | ||
} | ||
drop(started); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if an explicit drop()
is needed here. Usually locks are scoped, so they'll be automatically dropped when they leave scope, but maybe there's a reason why the drop is needed with the condition variable.
examples/mem-alloc-test/src/main.rs
Outdated
let (lock, cvar) = &*pair_clone; | ||
let mut started = lock.lock().unwrap(); | ||
*started = true; | ||
drop(started); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's wrong to drop the lock before notifying the condition variable, but I think it's safe to do the notify with the lock still held and then just let the lock be dropped when the function returns.
examples/mem-alloc-test/src/main.rs
Outdated
fn spawn_threads(thread_count: i32) { | ||
let mut handles = vec![]; | ||
let shared_variable = Arc::new(Mutex::new(Counters{ | ||
avg_duration_large_thread:0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I think the preferred formatting for these is like:
avg_duration_large_thread: 0.0,
(with no space before the :
but one space after the :
. I think this is one of things that cargo fmt
will format for you.
examples/mem-alloc-test/src/main.rs
Outdated
data.avg_duration_medium_thread, | ||
data.avg_duration_large_thread, | ||
data.global_average); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial: A couple of unnecessary blank lines here.
examples/mem-alloc-test/src/main.rs
Outdated
{ | ||
println!("NUM_THREADS,LATENCY_SMALL_THREADS,LATENCY_MEDIUM_THREADS,LATENCY_LARGE_THREADS,GLOBAL_AVERAGE"); | ||
let mut num_processors = get_num_processors(); | ||
num_processors = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_processors
is being hard-coded to 4, even though we called get_num_processors()
on the previous line. Was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it isn't. I will remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New changes look good to me.
examples/mem-alloc-test/src/main.rs
Outdated
} | ||
|
||
fn get_num_processors() -> usize { | ||
num_cpus::get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work in SGX as it will attempt to open files from within the enclave or issue syscalls. You'll need to create a service in the enclave runner if you want to do something like that. See this example
@@ -0,0 +1,9 @@ | |||
[package] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crate isn't tested in CI. Can you add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I add to CI?
be76ac4
to
3885d29
Compare
3885d29
to
3e3c56f
Compare
3e3c56f
to
1c317e2
Compare
.github/workflows/build.yml
Outdated
@@ -99,3 +99,5 @@ jobs: | |||
- name: Generate API docs | |||
run: ./doc/generate-api-docs.sh | |||
|
|||
-name: Run memory allocator stress test | |||
- cd ./examples/mem-alloc-test && cargo run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
- name: Run memory allocator stress test
run: cd ./examples/mem-alloc-test && cargo run
|
||
const MAX_INDEX_CHECKS_PER_BUFFER: usize = 32; | ||
|
||
fn calculate_and_print_stat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function name is out of date. It doesn't print anything anymore
rng.gen_range(start..=end) | ||
} | ||
|
||
fn wait_per_thread(pair_clone: Arc<(Mutex<bool>, Condvar)>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply this by using a Barrier
) | ||
} | ||
MemSize::Small => { | ||
/* buffer size will be from 1KB to 512KB */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a heap allocator 1KB is still big. Can this start from 1B instead? Corner cases may only pop up with such tiny chunks
fn traverse_buffer(buf: &mut Vec<i32>, scan_interval: usize) { | ||
let mut ptr = 0; | ||
let num_indices_checks = get_random_num(1, MAX_INDEX_CHECKS_PER_BUFFER); | ||
for i in 1..=num_indices_checks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time this takes will be very heavily be impacted by the value you get from the random number generator. If you want to be able to compare performance results, this can't be based on a random value.
); | ||
} | ||
|
||
fn get_num_processors() -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get rid of this function? It's confusing as it does not return the number of processors on the machine
|
||
[dependencies] | ||
rand = "0.8.4" | ||
num_cpus = "1.14.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer required
[package.metadata.fortanix-sgx] | ||
# heap size (in bytes), the default heap size is 0x2000000. | ||
heap-size=0x4000000 | ||
debug=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to specify debug=false
here
let start_time = Instant::now(); | ||
|
||
/* Create an array of x GB where x is a random number between 1 to 4 */ | ||
let mut large_vector = Vec::with_capacity(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you intent to allocate a Vec
that can hold size
bytes. But Vec::with_capacity
assumes it receives the number of elements the Vec
may store. In this case large_vector
will be of type Vec<i32>
, so each element already takes 4 bytes, not 1. So you're at least allocating 4 times as much memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Vec
is ill-suited here. There may be some internal optimizations that you are not aware of, and that may even differ between versions of the standard library. In principle Vec
could even simply record that you intent a Vec
with size
elements initialized on 0
, without actually allocating that much memory. For this very specific use case, I think you could directly interact with the memory allocator.
* then n threads, then n/2 and so on. | ||
*/ | ||
fn start_tests() { | ||
println!("NUM_THREADS,LATENCY_SMALL_THREADS,LATENCY_MEDIUM_THREADS,LATENCY_LARGE_THREADS,GLOBAL_AVERAGE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match with the output created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to generate a csv format file and this is just the heading
This is a very basic stress test for memory allocator
Description of the test:
This thread creates 64 threads of 3 types:
After scanning the buffers every thread frees the buffer.
In the end it calculates the time taken to do all the operations (list from step 1 to step 3).
The test periodically prints the average operation time of each thread