Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid buffering the output twice when writing to lepton file #113

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mcroomp
Copy link
Collaborator

@mcroomp mcroomp commented Nov 15, 2024

Track the carry overflow to we can immediately write out bytes to output stream and not have to worry about adjusting for carry.

@Melirius
Copy link
Collaborator

Idea is very nice, but performance is bad:

main 82d6547

2024-11-15T21:03:34.699Z INFO  [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2024-11-15T21:03:35.173Z INFO  [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2024-11-15T21:03:36.610Z INFO  [lepton_jpeg::structs::lepton_file_writer] worker threads 9632ms of CPU time in 1436ms of wall time
2024-11-15T21:03:36.610Z INFO  [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2024-11-15T21:03:38.216Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324076 bytes (compression = 28.0%)
2024-11-15T21:03:38.216Z INFO  [lepton_jpeg_util] Main thread CPU: 3517ms, Worker thread CPU: 20710 ms, walltime: 3517 ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':

       871 003 064      cache-references                                                        (41,51%)
        86 439 870      cache-misses                     #    9,92% of all cache refs           (41,52%)
    15 795 148 123      cycles                                                                  (41,72%)
       834 608 725      ic_fetch_stall.ic_stall_back_pressure                                        (41,87%)
     1 014 898 692      stalled-cycles-frontend          #    6,43% frontend cycles idle        (42,14%)
    39 107 914 014      instructions                     #    2,48  insn per cycle            
                                                  #    0,03  stalled cycles per insn     (42,43%)
     4 396 207 542      branch-instructions                                                     (42,50%)
       162 547 993      branch-misses                    #    3,70% of all branches             (42,32%)
     5 434 637 214      ic_fetch_stall.ic_stall_any                                             (42,14%)
        45 374 631      ic_fetch_stall.ic_stall_dq_empty                                        (41,99%)
        72 770 962      l2_cache_misses_from_ic_miss                                            (41,75%)
     2 149 017 778      l2_latency.l2_cycles_waiting_on_fills                                        (41,47%)
           184 005      faults                                                                
                 1      migrations                                                            

       3,551824055 seconds time elapsed

       3,224073000 seconds user
       0,324906000 seconds sys

this PR

2024-11-16T11:07:51.300Z INFO  [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2024-11-16T11:07:51.773Z INFO  [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2024-11-16T11:07:53.262Z INFO  [lepton_jpeg::structs::lepton_file_writer] worker threads 9925ms of CPU time in 1488ms of wall time
2024-11-16T11:07:53.262Z INFO  [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2024-11-16T11:07:54.920Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324076 bytes (compression = 28.0%)
2024-11-16T11:07:54.920Z INFO  [lepton_jpeg_util] Main thread CPU: 3620ms, Worker thread CPU: 21327 ms, walltime: 3620 ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':

       865 085 762      cache-references                                                        (41,79%)
        81 144 742      cache-misses                     #    9,38% of all cache refs           (42,09%)
    16 326 000 065      cycles                                                                  (42,24%)
       968 571 728      ic_fetch_stall.ic_stall_back_pressure                                        (42,49%)
     1 047 639 608      stalled-cycles-frontend          #    6,42% frontend cycles idle        (42,35%)
    39 674 257 535      instructions                     #    2,43  insn per cycle            
                                                  #    0,03  stalled cycles per insn     (42,22%)
     4 364 055 943      branch-instructions                                                     (42,12%)
       165 140 691      branch-misses                    #    3,78% of all branches             (41,80%)
     5 544 312 995      ic_fetch_stall.ic_stall_any                                             (41,69%)
        39 725 098      ic_fetch_stall.ic_stall_dq_empty                                        (41,43%)
        66 843 172      l2_cache_misses_from_ic_miss                                            (41,31%)
     2 165 667 711      l2_latency.l2_cycles_waiting_on_fills                                        (41,54%)
           183 994      faults                                                                
                 1      migrations                                                            

       3,654640632 seconds time elapsed

       3,323893000 seconds user
       0,328890000 seconds sys

@Melirius
Copy link
Collaborator

c5b6938 helps, but still much slower than main

2024-11-20T20:56:19.974Z INFO  [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2024-11-20T20:56:20.453Z INFO  [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2024-11-20T20:56:21.944Z INFO  [lepton_jpeg::structs::lepton_file_writer] worker threads 9985ms of CPU time in 1490ms of wall time
2024-11-20T20:56:21.944Z INFO  [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2024-11-20T20:56:23.567Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324076 bytes (compression = 28.0%)
2024-11-20T20:56:23.567Z INFO  [lepton_jpeg_util] Main thread CPU: 3593ms, Worker thread CPU: 21155 ms, walltime: 3593 ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':

       836 854 716      cache-references                                                        (41,97%)
        75 722 326      cache-misses                     #    9,05% of all cache refs           (41,79%)
    16 120 935 933      cycles                                                                  (41,65%)
       956 121 210      ic_fetch_stall.ic_stall_back_pressure                                        (41,42%)
     1 106 567 900      stalled-cycles-frontend          #    6,86% frontend cycles idle        (41,53%)
    39 302 320 657      instructions                     #    2,44  insn per cycle            
                                                  #    0,03  stalled cycles per insn     (41,52%)
     4 328 449 995      branch-instructions                                                     (41,74%)
       166 308 719      branch-misses                    #    3,84% of all branches             (42,04%)
     5 614 369 841      ic_fetch_stall.ic_stall_any                                             (42,26%)
        42 227 531      ic_fetch_stall.ic_stall_dq_empty                                        (42,33%)
        62 449 857      l2_cache_misses_from_ic_miss                                            (42,46%)
     2 284 954 295      l2_latency.l2_cycles_waiting_on_fills                                        (42,25%)
           183 975      faults                                                                
                 1      migrations                                                            

       3,628224562 seconds time elapsed

       3,283399000 seconds user
       0,342937000 seconds sys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants