-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark for the queue #5765
Benchmark for the queue #5765
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
If you're on Linux, you might find https://github.com/aclements/perflock interesting for running benchmarks as well.
// iterate to an arbitrary element | ||
ind := 0 | ||
for ; ind < 5; ind++ { | ||
_ = slice[ind] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious; wondering if storing this in a package-level variable (without having to use it) has any effect. I'd guess not, but again I'm not putting my hand in the fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're referring to the whole slice
here? not sure I follow. Note that before the benchmark loop I call b.ResetTimer()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, literally just doing somePackageLvlVariable = slice[ind]
instead.
I'm guessing the compiler can completely eliminate the statement here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Yeah, it's possible that it removes this. However, I wouldn't expect this to change the results of benchmark in the slightest, as reaching into an indexed array is cheap. I tried this and getting results as following:
BenchmarkQueue/slice_10_elements-8 261364618 4.596 ns/op 0 B/op 0 allocs/op
BenchmarkQueue/list_10_elements-8 38173401 32.00 ns/op 56 B/op 1 allocs/op
BenchmarkQueue/slice_100_elements-8 81230875 14.80 ns/op 0 B/op 0 allocs/op
BenchmarkQueue/list_100_elements-8 37829430 31.11 ns/op 56 B/op 1 allocs/op
BenchmarkQueue/slice_300_elements-8 39452000 30.43 ns/op 0 B/op 0 allocs/op
BenchmarkQueue/list_300_elements-8 38020203 31.12 ns/op 56 B/op 1 allocs/op
BenchmarkQueue/slice_1000_elements-8 12476460 96.10 ns/op 0 B/op 0 allocs/op
BenchmarkQueue/list_1000_elements-8 38070512 31.05 ns/op 56 B/op 1 allocs/op
BenchmarkQueue/slice_10000_elements-8 1000000 1026 ns/op 0 B/op 0 allocs/op
BenchmarkQueue/list_10000_elements-8 34904437 34.29 ns/op 56 B/op 1 allocs/op
Which don't change the conclusion.
I noticed that I'm using int
instead of string
though, which may change the conclusion. I'll quickly change this so the benchmark matches closer the implementation.
I don't think it's worth spending more time here though, as it is definitely not a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect the benchmark to change, was just curious to see if there was any impact between runs of the same slice_*
case.
In any case, let's go ahead and merge this.
PR Description
As discussed in this thread, the worker_pool queue could use linked list instead of slice. I was curious about performance and wrote a quick benchmark, which shows that as long as there are less than 100 elements in the queue, the slice implementation is faster. Given our use case, it's unlikely we'll have 100+ elements and the queue size is capped at 1024, where the performance of a slice is still going to be good-enough (~1 microsecond).