I made this repo to demonstrate how the parallel library works, and it takes longer to use parallelism than it does to use a single core. It doesn't seem to run all threads at the same time very often according to threadscope, and I'm pretty sure it's a problem with the parallel library.
Edit: It seems if I split the password list into chunks then it runs faster with parallelism, but it still doesn't run using all cores very efficiently. This might just be a problem with my code.