Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audio: move more code to DRAM #9880

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

audio: move more code to DRAM #9880

wants to merge 6 commits into from

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Mar 10, 2025

DAI, host and copier code is always hard linked into the base firmware image, never as loadable modules. Move a large part of non-performance critical code to DRAM

@lyakh
Copy link
Collaborator Author

lyakh commented Mar 10, 2025

CI:

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh what are the SRAM savings with this PR ?
@jsarha can you run your before/after script with load here. Thanks !

@lyakh
Copy link
Collaborator Author

lyakh commented Mar 12, 2025

@lyakh what are the SRAM savings with this PR ?

@lgirdwood it depends of course on the configuration - which modules you build. E.g. https://github.com/thesofproject/sof/actions/runs/13813419824/job/38640360345?pr=9880#step:8:833 tells that the cold module has 0x74fc bytes of text and 0x4d964 bytes of data, without that it would stay in the base firmware and occupy SRAM. The largest part of that is SRC coefficients. And there's one caveat - with a purely SRAM build all coefficients would be in SRAM, but with a DRAM build coefficients are copied on demand. So it isn't a complete saving. Best case SRAM is used in pass-through, then nothing is copied IIRC (or extremely little). However, if you have several SRCs running in parallel with different conversions, then those respective tables are copied, so the saving is smaller.

@lgirdwood
Copy link
Member

@lyakh what are the SRAM savings with this PR ?

@lgirdwood it depends of course on the configuration - which modules you build. E.g. https://github.com/thesofproject/sof/actions/runs/13813419824/job/38640360345?pr=9880#step:8:833 tells that the cold module has 0x74fc bytes of text and 0x4d964 bytes of data, without that it would stay in the base firmware and occupy SRAM. The largest part of that is SRC coefficients. And there's one caveat - with a purely SRAM build all coefficients would be in SRAM, but with a DRAM build coefficients are copied on demand. So it isn't a complete saving. Best case SRAM is used in pass-through, then nothing is copied IIRC (or extremely little). However, if you have several SRCs running in parallel with different conversions, then those respective tables are copied, so the saving is smaller.

Just the build time info is enough - i.e. what is the increase in DRAM usage for text/data

@lyakh
Copy link
Collaborator Author

lyakh commented Mar 13, 2025

@lyakh what are the SRAM savings with this PR ?

@lgirdwood it depends of course on the configuration - which modules you build. E.g. https://github.com/thesofproject/sof/actions/runs/13813419824/job/38640360345?pr=9880#step:8:833 tells that the cold module has 0x74fc bytes of text and 0x4d964 bytes of data, without that it would stay in the base firmware and occupy SRAM. The largest part of that is SRC coefficients. And there's one caveat - with a purely SRAM build all coefficients would be in SRAM, but with a DRAM build coefficients are copied on demand. So it isn't a complete saving. Best case SRAM is used in pass-through, then nothing is copied IIRC (or extremely little). However, if you have several SRCs running in parallel with different conversions, then those respective tables are copied, so the saving is smaller.

Just the build time info is enough - i.e. what is the increase in DRAM usage for text/data

@lgirdwood right, then 0x74fc bytes text + 0x4d964 bytes read-only data

@lyakh
Copy link
Collaborator Author

lyakh commented Mar 13, 2025

CI:

@lgirdwood
Copy link
Member

@lyakh are you able to run @jsarha script here ?

lyakh added 6 commits March 18, 2025 15:28
Fix several return codes, a preprocessor conditional, and simplify a
function.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Clarify which methods in struct comp_ops can and which cannot be
marked as __cold.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
The base firmware code is never executed on hot paths, mark it all as
"cold."

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Add __cold qualifiers to IPC context functions.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Add __cold qualifiers and debugging tests to IPC context functions.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Add __cold qualifiers and debugging tests to non-performance critical
code.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

PTL results were missing, retest.

@jsarha
Copy link
Contributor

jsarha commented Mar 24, 2025

IPC response time comparison report on 24th Mar 2025 for PR 9880

The test concists of two SW configurations and two tests sequences and
two different load conditions, so 4 different tests tests. All the
tests were run on LNL SDW RVP sitting in Espoo lab with nocodec
topology. The test itself was 10 playback runs of 2 seconds each to
hw:0,2 (Port 2) at 44.1kHz with 3 seconds sleep in between.

The SW configureations are SOF main (1ef4e60) built with
CONFIG_COLD_STORE_EXECUTE_DRAM=n and latest PR9880
CONFIG_COLD_STORE_EXECUTE_DRAM=y (00b572b). The two load
conditions were no load situation and Prime95 running with 8 threads
and large FFTs.

1. Short summary of total FW handling times for all messages per playback:
1.1 Playback tests, no load
1.1.1 Playback tests, no load, main branch 

IPC totals fw   min 3358 us     max 5037 us     average 4125 us of 10

1.1.2 Playback tests, no load, pr9880

IPC totals fw   min 5381 us     max 6184 us     average 5775 us of 10

1.2 Playback tests, Prime95 load
1.2.1 Playback tests, Prime95 load, main branch

IPC totals fw   min 4789 us     max 5479 us     average 5083 us of 10

1.2.2 Playback tests, Prime95 load, pr9880

IPC totals fw   min 5500 us     max 7063 us     average 6174 us of 10


2. Full results

2.1 Playback tests, no load

2.1.1 Playback tests, no load, main branch 

host-copier.2.playback fw init  min 76 us       max 77 us       average 76 us of 10
gain.5.1 fw init        min 49 us       max 51 us       average 49 us of 10
gain.5.1 fw conf        min 23 us       max 23 us       average 23 us of 10
src.5.1 fw init 	min 33 us       max 43 us       average 37 us of 10
mixin.5.1 fw init       min 38 us       max 44 us       average 39 us of 10
mixout.6.1 fw init      min 31 us       max 38 us       average 32 us of 10
gain.6.1 fw init        min 37 us       max 45 us       average 37 us of 10
gain.6.1 fw conf        min 19 us       max 20 us       average 19 us of 10
dai-copier.SSP.NoCodec-2.playback fw init       min 157 us      max 166 us      average 159 us of 10

pipeline.5: host-copier.2.playback, gain.5.1, src.5.1, mixin.5.1, 
pipeline.5 create fw    min 18 us       max 23 us       average 18 us of 10
pipeline.5 1.PAUSED fw  min 26 us       max 31 us       average 28 us of 10
pipeline.5 RUNNING fw   min 823 us      max 1177 us     average 939 us of 10
pipeline.5 PAUSED fw    min 319 us      max 774 us      average 433 us of 10
pipeline.5 delete fw    min 185 us      max 190 us      average 186 us of 10

pipeline.6: mixout.6.1, gain.6.1, dai-copier.SSP.NoCodec-2.playback, 
pipeline.6 create fw    min 27 us       max 27 us       average 27 us of 10
pipeline.6 1.PAUSED fw  min 22 us       max 22 us       average 22 us of 10
pipeline.6 RUNNING fw   min 327 us      max 1302 us     average 800 us of 10
pipeline.6 PAUSED fw    min 361 us      max 997 us      average 501 us of 10
pipeline.6 delete fw    min 173 us      max 183 us      average 175 us of 10

host-copier.2.playback>gain.5.1 bind fw         min 43 us       max 45 us       average 44 us of 10
gain.5.1>src.5.1        bind fw         min 38 us       max 46 us       average 40 us of 10
src.5.1>mixin.5.1       bind fw         min 40 us       max 47 us       average 44 us of 10
mixin.5.1>mixout.6.1    bind fw         min 40 us       max 48 us       average 43 us of 10
mixin.5.1>mixout.6.1    unbind fw       min 24 us       max 24 us       average 24 us of 10
mixout.6.1>gain.6.1     bind fw         min 45 us       max 52 us       average 49 us of 10
gain.6.1>dai-copier.SSP.NoCodec-2.playback      bind fw         min 40 us       max 50 us       average 43 us of 10

pipes 6 5: RESET fw     min 230 us      max 231 us      average 230 us of 10

IPC totals fw   min 3358 us     max 5037 us     average 4125 us of 10


2.1.2 Playback tests, no load, pr9880

host-copier.2.playback fw init  min 269 us      max 314 us      average 298 us of 10
gain.5.1 fw init        min 103 us      max 124 us      average 113 us of 10
gain.5.1 fw conf        min 23 us       max 23 us       average 23 us of 10
src.5.1 fw init 	min 117 us      max 151 us      average 137 us of 10
mixin.5.1 fw init       min 93 us       max 111 us      average 105 us of 10
mixout.6.1 fw init      min 88 us       max 112 us      average 97 us of 10
gain.6.1 fw init        min 93 us       max 117 us      average 107 us of 10
gain.6.1 fw conf        min 20 us       max 20 us       average 20 us of 10
dai-copier.SSP.NoCodec-2.playback fw init       min 451 us      max 508 us      average 485 us of 10

pipeline.5: host-copier.2.playback, gain.5.1, src.5.1, mixin.5.1, 
pipeline.5 create fw    min 18 us       max 23 us       average 18 us of 10
pipeline.5 1.PAUSED fw  min 27 us       max 84 us       average 34 us of 10
pipeline.5 RUNNING fw   min 1637 us     max 1989 us     average 1829 us of 10
pipeline.5 PAUSED fw    min 297 us      max 377 us      average 329 us of 10
pipeline.5 delete fw    min 235 us      max 265 us      average 255 us of 10

pipeline.6: mixout.6.1, gain.6.1, dai-copier.SSP.NoCodec-2.playback, 
pipeline.6 create fw    min 27 us       max 33 us       average 27 us of 10
pipeline.6 1.PAUSED fw  min 22 us       max 22 us       average 22 us of 10
pipeline.6 RUNNING fw   min 307 us      max 1186 us     average 722 us of 10
pipeline.6 PAUSED fw    min 319 us      max 433 us      average 366 us of 10
pipeline.6 delete fw    min 202 us      max 212 us      average 207 us of 10

host-copier.2.playback>gain.5.1 bind fw         min 53 us       max 58 us       average 54 us of 10
gain.5.1>src.5.1        bind fw         min 40 us       max 46 us       average 43 us of 10
src.5.1>mixin.5.1       bind fw         min 38 us       max 46 us       average 42 us of 10
mixin.5.1>mixout.6.1    bind fw         min 41 us       max 47 us       average 44 us of 10
mixin.5.1>mixout.6.1    unbind fw       min 23 us       max 23 us       average 23 us of 10
mixout.6.1>gain.6.1     bind fw         min 44 us       max 59 us       average 49 us of 10
gain.6.1>dai-copier.SSP.NoCodec-2.playback      bind fw         min 52 us       max 66 us       average 57 us of 10

pipes 6 5: RESET fw     min 254 us      max 275 us      average 260 us of 10

IPC totals fw   min 5381 us     max 6184 us     average 5775 us of 10


2.2 Playback tests, Prime95 load

2.2.1 Playback tests, Prime95 load, main branch

host-copier.2.playback fw init  min 62 us       max 77 us       average 68 us of 10
gain.5.1 fw init        min 40 us       max 52 us       average 45 us of 10
gain.5.1 fw conf        min 23 us       max 23 us       average 23 us of 10
src.5.1 fw init 	min 32 us       max 38 us       average 33 us of 10
mixin.5.1 fw init       min 34 us       max 39 us       average 35 us of 10
mixout.6.1 fw init      min 31 us       max 35 us       average 32 us of 10
gain.6.1 fw init        min 37 us       max 43 us       average 37 us of 10
gain.6.1 fw conf        min 19 us       max 19 us       average 19 us of 10
dai-copier.SSP.NoCodec-2.playback fw init       min 152 us      max 160 us      average 155 us of 10

pipeline.5: host-copier.2.playback, gain.5.1, src.5.1, mixin.5.1, 
pipeline.5 create fw    min 18 us       max 19 us       average 18 us of 10
pipeline.5 1.PAUSED fw  min 23 us       max 31 us       average 26 us of 10
pipeline.5 RUNNING fw   min 1192 us     max 1321 us     average 1223 us of 10
pipeline.5 PAUSED fw    min 982 us      max 996 us      average 988 us of 10
pipeline.5 delete fw    min 185 us      max 192 us      average 188 us of 10

pipeline.6: mixout.6.1, gain.6.1, dai-copier.SSP.NoCodec-2.playback, 
pipeline.6 create fw    min 19 us       max 27 us       average 23 us of 10
pipeline.6 1.PAUSED fw  min 21 us       max 22 us       average 21 us of 10
pipeline.6 RUNNING fw   min 483 us      max 1124 us     average 782 us of 10
pipeline.6 PAUSED fw    min 367 us      max 1025 us     average 670 us of 10
pipeline.6 delete fw    min 173 us      max 183 us      average 174 us of 10

host-copier.2.playback>gain.5.1 bind fw         min 43 us       max 44 us       average 43 us of 10
gain.5.1>src.5.1        bind fw         min 38 us       max 41 us       average 39 us of 10
src.5.1>mixin.5.1       bind fw         min 38 us       max 40 us       average 39 us of 10
mixin.5.1>mixout.6.1    bind fw         min 40 us       max 42 us       average 41 us of 10
mixin.5.1>mixout.6.1    unbind fw       min 24 us       max 24 us       average 24 us of 10
mixout.6.1>gain.6.1     bind fw         min 45 us       max 51 us       average 50 us of 10
gain.6.1>dai-copier.SSP.NoCodec-2.playback      bind fw         min 40 us       max 48 us       average 41 us of 10

pipes 6 5: RESET fw     min 231 us      max 242 us      average 235 us of 10

IPC totals fw   min 4789 us     max 5479 us     average 5083 us of 10


2.2.2 Playback tests, Prime95 load, pr9880

host-copier.2.playback fw init  min 208 us      max 277 us      average 243 us of 10
gain.5.1 fw init        min 58 us       max 110 us      average 74 us of 10
gain.5.1 fw conf        min 23 us       max 23 us       average 23 us of 10
src.5.1 fw init 	min 65 us       max 140 us      average 83 us of 10
mixin.5.1 fw init       min 53 us       max 105 us      average 68 us of 10
mixout.6.1 fw init      min 46 us       max 90 us       average 64 us of 10
gain.6.1 fw init        min 55 us       max 115 us      average 70 us of 10
gain.6.1 fw conf        min 20 us       max 20 us       average 20 us of 10
dai-copier.SSP.NoCodec-2.playback fw init       min 350 us      max 467 us      average 384 us of 10

pipeline.5: host-copier.2.playback, gain.5.1, src.5.1, mixin.5.1, 
pipeline.5 create fw    min 18 us       max 18 us       average 18 us of 10
pipeline.5 1.PAUSED fw  min 25 us       max 32 us       average 28 us of 10
pipeline.5 RUNNING fw   min 1090 us     max 2201 us     average 1608 us of 10
pipeline.5 PAUSED fw    min 980 us      max 993 us      average 988 us of 10
pipeline.5 delete fw    min 210 us      max 253 us      average 235 us of 10

pipeline.6: mixout.6.1, gain.6.1, dai-copier.SSP.NoCodec-2.playback, 
pipeline.6 create fw    min 18 us       max 27 us       average 26 us of 10
pipeline.6 1.PAUSED fw  min 22 us       max 22 us       average 22 us of 10
pipeline.6 RUNNING fw   min 501 us      max 1193 us     average 742 us of 10
pipeline.6 PAUSED fw    min 680 us      max 715 us      average 708 us of 10
pipeline.6 delete fw    min 200 us      max 208 us      average 202 us of 10

host-copier.2.playback>gain.5.1 bind fw         min 48 us       max 55 us       average 49 us of 10
gain.5.1>src.5.1        bind fw         min 38 us       max 40 us       average 39 us of 10
src.5.1>mixin.5.1       bind fw         min 38 us       max 42 us       average 39 us of 10
mixin.5.1>mixout.6.1    bind fw         min 41 us       max 43 us       average 41 us of 10
mixin.5.1>mixout.6.1    unbind fw       min 23 us       max 23 us       average 23 us of 10
mixout.6.1>gain.6.1     bind fw         min 45 us       max 51 us       average 49 us of 10
gain.6.1>dai-copier.SSP.NoCodec-2.playback      bind fw         min 46 us       max 52 us       average 47 us of 10

pipes 6 5: RESET fw     min 254 us      max 285 us      average 272 us of 10

IPC totals fw   min 5500 us     max 7063 us     average 6174 us of 10

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh should we hold this until the IPC cold Kconfig is enabled ?

Copy link
Member

@abonislawski abonislawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just retest on PTL CI with cold enabled now

@lyakh
Copy link
Collaborator Author

lyakh commented Mar 26, 2025

@lyakh should we hold this until the IPC cold Kconfig is enabled ?

@lgirdwood would make sense, yes and best also until after DRAM debugging is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants