Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a hook to limit scanning of the entire PCIe space for pre-silicon effectiveness #418

Open
cdwilde opened this issue Jan 13, 2025 · 7 comments

Comments

@cdwilde
Copy link

cdwilde commented Jan 13, 2025

In pre-silicon, the S/BSA setup and tests loop over all Bus devices and functions in multiple parts of the code:
BSA:
test_pool/exerciser/operating_system/test_os_e015.c
test_pool/pcie/operating_system/test_os_p002.c
test_pool/pcie/operating_system/test_os_p004.c
test_pool/pcie/operating_system/test_os_p005.c
val/common/src/acs_pcie.c
val/common/src/acs_peripherals.c
SBSA:
test_pool/pcie/operating_system/test_p003.c
test_pool/pmu/operating_system/test_pmu007.c

This is extremely costly in a pre-silicon environment. Changing the upper MAX value for each bus, device, and function and not always helpful either, especially if the active devices are non-contiguous. I would like to ask that we add a PAL hook to allow the user to cut down this space.

As an example (for acs_pcie.c):
/* Derive ecam specific information */
seg_num = (uint32_t)val_pcie_get_info(PCIE_INFO_SEGMENT, ecam_index);
start_bus = (uint32_t)val_pcie_get_info(PCIE_INFO_START_BUS, ecam_index);
end_bus = (uint32_t)val_pcie_get_info(PCIE_INFO_END_BUS, ecam_index);

  /* Iterate over all buses, devices and functions in this ecam */
  for (bus_index = start_bus; bus_index <= end_bus; bus_index++)
  {
      for (dev_index = 0; dev_index < PCIE_MAX_DEV; dev_index++)
      {
          for (func_index = 0; func_index < PCIE_MAX_FUNC; func_index++)
          {
              /* Form bdf using seg, bus, device, function numbers */
              bdf = PCIE_CREATE_BDF(seg_num, bus_index, dev_index, func_index);

Add these lines here to allow the user to skip the rest of this loop iteration:
/* Skip non-sequential BDFs to save time */
if (pal_pcie_check_device_valid(bdf))
continue;

Hopefully, pal_pcie_check_device_valid is a good function to use. As far as I can tell, it is only called in acs_pcie.c and that later call would no longer be needed. I would put buses, devices, and functions that I want skipped in this function and it does the job of not wasting inordinate amounts of time in pre-silicon.

@Sujana-M
Copy link
Contributor

Hi @cdwilde,

For running ACS at Pre-silicon, we have optimisations to ensure the user can change the values of the maximum bus, device and function according to their requirement.

  1. To modify the values of the max bus, device and function for pre-silicon, request you to modify the defines at lines 254-256 in the platform config file at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L254. On modifying thee parameters, the respective max, device and function will be taken in the test
  2. In the above tests and files mentioned, the max bus number is taken from the pcie info table created. The details of which are taken from the ECAM space config details configured at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L237.

Please let us know if you need more information.

Thanks,
ACS Team.

@cdwilde
Copy link
Author

cdwilde commented Jan 17, 2025

This is insufficient. We may have a space where the Buses are not consecutive (Think 31, 63, 91, ... up to 255). Setting the MAX BUS is insufficient as we would need the entire range of 255, but only every 32 or so. The way you have this implemented requires that you go through the entire bus range for only a handful of valid buses. What I am proposing will allow the user to have control instead of relying on max values. In this case, we might say in a pal function:
if bus % 32:
return VALID
else:
return INVALID

@Sujana-M
Copy link
Contributor

Hi @cdwilde,

In scenarios where the buses are not consecutive, you can modify the number of ECAM to more than 1, with the same ECAM base address and make the corresponding changes of start and end bus. Please find the reference changes at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L269. The corresponding change to also be done at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/src/platform_cfg_fvp.c#L256 and https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/src/platform_cfg_fvp.c#L273.

This will ensure reduced time for non-consecutive bus ranges.

In addition, the tests mentioned

  1. test_pool/pcie/operating_system/test_os_p002.c and test_pool/pcie/operating_system/test_p003.c
    : For complete coverage of the rules, the entire ECAM space of all devices within the start to end bus of the ECAM need to be verified.
  2. test_pool/pcie/operating_system/test_os_p004.c and test_pool/pcie/operating_system/test_os_p005.c: These tests traverse through the devices and function of the bus range only between the secondary and subordinate bus. As part of the enumeration, the secondary and subordinate buses programmed on the RP will only be until the busses connected to the RP.
  3. val/common/src/acs_pcie.c and val/common/src/acs_peripherals.c: By making the changes mentioned above in the platform config file, the time taken for this will be reduced.

Please let us know if this helps.

Thanks,
ACS Team

@cdwilde
Copy link
Author

cdwilde commented Jan 31, 2025

Thanks for this idea. I am out of the office until mid next week but will give it a try when I get back.

@cdwilde
Copy link
Author

cdwilde commented Feb 5, 2025

I'm back. The other issue I run into is that modifying MAX_FUNC and MAX_DEV causes address calculation problems since these values are used all over the place for address calculation. For example, in acs_pcie.c:
/* There are 8 functions / device, 32 devices / Bus and each has a 4KB config space */
cfg_addr = (bus * PCIE_MAX_DEV * PCIE_MAX_FUNC * 4096) +
(dev * PCIE_MAX_FUNC * 4096) + (func * 4096);

I would need a way to tell the system that MAX_BUS, MAX_DEV, and MAX_FUNC are the full values for address calculations. I would then need a way to divide up the space to only iterate over certain addresses. I don't see a way to do this with the current implementation. Am I missing something here?

@Sujana-M
Copy link
Contributor

Sujana-M commented Feb 7, 2025

Hi @cdwilde,

Hoping that the scenario of enumeration for non-consecutive bus is taken care by modifying the number of ECAM to more than 1 as mentioned in the previous comment. This modification would have reduced the enumeration/BDF scan time considerably.

For PCIE_MAX_DEV and PCIE_MAX_FUNC, These macros used in the tests, apart from configuration address calculation, are part of the test verification, where the entire Device and function space need to be scanned. Request you to please let us know how much time is it taking for scanning upto PCIE_MAX_DEV and PCIE_MAX_FUNC, given that the non-consecutive bus ranges are taken care.

If the BSA test "test_pool/pcie/operating_system/test_os_p002.c" and SBSA test "test_pool/pcie/operating_system/test_p003.c" are taking long time, [as the entire BDF range to be tested for complete coverage], these tests can be commented out and run, if the tests have passed in the first run.

Thanks,
ACS Team

@cdwilde
Copy link
Author

cdwilde commented Feb 7, 2025

When I went to modify PCIE_MAX_DEV and PCIE_MAX_FUNC, my address calculations started failing all over the place. I can try to reduce only the buses and see if that runs in a reasonable time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants