-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multi-thread Support #2594
Add Multi-thread Support #2594
Conversation
7e07e20
to
e5012f4
Compare
e5012f4
to
0f2f8c0
Compare
Hi Gerardo, I've revised this PR according to what we discussed in mail, Wait, we found another bottlneck can be improved, please give me another day. |
0a13956
to
35f6f49
Compare
Hi Gerardo, We find that processing status code is time consuming, and achieved x1.2 speedup.
And as you advised, we extracted This PR is ready for review now. 😊 |
This is fantastic news! I tested it on my end, and there's a significant improvement when using 2 threads with the default CHUNK_SIZE. We're making great progress. I also ran it with 4+ threads, and it didn't crash. Before any merging, there are a few things I'd like to address:
Thanks again |
And sort getopt short_options list
And move count_process() out of parse_line() to avoid data race.
- set conf.* flags in parse_specifier() - glog->lp.ts = logitem->dt in parse_line()
In get_ii32, get_si32, inc_ii32, inc_si32, count_process
Prevent data race for init_log_item() And reduced the workload of unnecessary localtime_r() function calls This improvement have x1.15 overall speedup
Avoid data race for GLog
In process_log(), use array lookup instead of string compare This will speedup process_log() by x1.38, and have x1.28 overall speedup
35f6f49
to
f4ca02d
Compare
This is fantastic! Thank you for the effort put into this; it's a significant enhancement to the parsing process. Merged. |
whoops, I rushed the merge. Looks like something's broken when piping in data. e.g.,
|
Wow! Great, great and great news! I've been waiting for this for a long time. Well... I didn't come here to criticize anything, friends. So @Sea-n, are these EC2 instances single or multi-processor? So 64K lines from
Well.. But all this may be in vain... So... I always thought about implementing via Thread Pool, being 1 to 1 for each Anyway... Let's go ... to that really I can be help your, @allinurl and @Sea-n.
This would certainly give us the real impact of Threads on the process, and which Panels Sirs... Are you interested in this? |
hey @0bi-w6n-K3nobi good to hear from you. Thanks so much for laying all that out! The analysis you provided on potential bottlenecks, caching, and threading is super insightful. I'm definitely interested in seeing the results if you're able to run that comprehensive test matrix you proposed. Having real data on how different configs of processors, threads, panels, and chunk sizes actually perform would be really valuable for further optimizations. If you can put together that testing and share the findings, that would be amazing. No rush at all! |
This PR achieve x2.88 speedup for 4-thread, and it's x3.65 speedup in compare to current version. benchmark result as follow:
(1 GB log, 5.6 M lines, M2 Air, average of 5 runs)
Method
The simple control flow is as follows:
Since
parse_specifier()
takes much time here and doesn't have data races with others, we parallelize here first.Major speedup causes
Parallelize
parse_log()
Parsing logs takes about 70% of execution time, we parallelize this part.
And let
process_log()
run along withparse_log()
to achieve x3.0 speedup for 5-thread machine.Use
GLog->start_time
instead oflocaltime_r()
When solving the data race in
init_log_item()
, we introducedstart_time
inGLog
.And find out that
parse_specifier()
(which will be executed about 10 * times, depends on log format) also called this function every time.Copying
struct tm
instead of convert Unix Timestamp to {year, month, day, hour, ...} saved a lot of time.This contributed x1.15 speedup for serial version.
Change
GLogItem->status
fromchar*
toint
When we measure the execute time of each modules, we find out that "Status Code" module takes 30% among them.
The reason is finding the
{"404" => "404 - Not Found"}
mapping with foreach() and string compare consume too many time. So we changedstatus_code
to integer, and usestatus_codes[404]
to lookup.This will speedup process_log() by x1.38, and have x1.28 overall speedup if
parse_log()
in the bottleneck.Details
Instead of
--multi-thread-parsing
we talked about before, using-j, --jobs
(same asmake
) might be more reasonable for users. The default optionconf.jobs=1
will not create new threads for parse/process log.We split it into 20 commits:
read_lines()
for with/withoutGETLINE
pre_process_log()
toparse_line()
process_log()
fromparse_line()
process_log()
fromread_line()
read_lines()
read_lines()
__sync_bool_compare_and_swap()
__sync_add_and_fetch()
localtime_r()
per GLogIteminit_log_item()
char *
toint
char *
toconst char *
for parserWith every commits:
Future Work
process_log()
by separating modules into different threadsparse_logs()
andprocess_logs()
CHUNK_SIZE
more scientificallyFixes #377