Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime_cxxmodules] Enable on AArch64 #16401

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

hahnjo
Copy link
Member

@hahnjo hahnjo commented Sep 10, 2024

It was disabled in commit a67863d ("Disable modules on aarch64 due to ODR violation") in 2019. I cannot reproduce these problems on lxplus-arm, so try to turn it back on.

@hahnjo hahnjo self-assigned this Sep 10, 2024
@hahnjo hahnjo requested a review from bellenot as a code owner September 10, 2024 12:49
@hahnjo hahnjo marked this pull request as draft September 10, 2024 12:49
Copy link
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link

github-actions bot commented Sep 10, 2024

Test Results

    18 files      18 suites   4d 2h 21m 3s ⏱️
 2 662 tests  2 662 ✅ 0 💤 0 ❌
46 198 runs  46 198 ✅ 0 💤 0 ❌

Results for commit b012169.

♻️ This comment has been updated with latest results.

@dpiparo
Copy link
Member

dpiparo commented Sep 27, 2024

I propose to merge it once we have the ARM nodes online to be able to test immediately, would this be ok?

@hahnjo
Copy link
Member Author

hahnjo commented Sep 27, 2024

Yes, I'm waiting for the AArch64 node in our CI so we can test there, and then (after) I'd still like to ask CMS to run their tests.

@hahnjo hahnjo added the clean build Ask CI to do non-incremental build on PR label Oct 10, 2024
@hahnjo hahnjo force-pushed the runtime_cxxmodules-aarch64 branch from 632b085 to 3320cc2 Compare October 10, 2024 10:19
@hahnjo hahnjo marked this pull request as ready for review October 11, 2024 08:46
@hahnjo
Copy link
Member Author

hahnjo commented Oct 11, 2024

@aandvalenzuela @smuzaffar if you have some cycles, can you test this change with CMSSW on AArch64? This should align the configurations with x86_64 to also enable runtime_cxxmodules by default

@smuzaffar
Copy link
Contributor

cms-sw#212 is running cmssw aarch64 tests

@smuzaffar
Copy link
Contributor

Hi, most of cmssw tests passed but for few relvals we get runtime errors like [a]

[a] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7b6638/42123/runTheMatrix-results/140.063_RunZeroBias2022D/step3_RunZeroBias2022D.log

cling JIT session error: In graph cling-module-926-jitted-objectbuffer, section .text._ZNK4reco10HitPattern23numberOfLostTrackerHitsENS0_11HitCategoryE: relocation target "_ZN4reco10HitPattern16missingHitFilterEt" at address 0x4000968500f0 is out of range of Page21 fixup at 0x4001a7270114 (_ZNK4reco10HitPattern23numberOfLostTrackerHitsENS0_11HitCategoryE, 0x4001a727010c + 0x8)
----- Begin Fatal Exception 11-Oct-2024 15:08:51 CEST-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing  Event run: 357735 lumi: 53 event: 87840020 stream: 0
   [1] Running path 'dqmoffline_1_step'
   [2] Prefetching for module NanoAODDQM/'nanoDQM'
   [3] Prefetching for module SimplePATTauFlatTableProducer/'boostedTauTable'
   [4] Prefetching for module PATObjectCrossLinker/'linkedObjects'
   [5] Prefetching for module PATMuonRefSelector/'finalMuons'
   [6] Prefetching for module PATMuonUserDataEmbedder/'slimmedMuonsWithUserData'
   [7] Calling method for module EvaluateMuonMVAID/'muonMVAID'
   Additional Info:
      [a] Fatal Root Error: @SUB=TClingCallFunc::make_wrapper
Failed to compile
  ==== SOURCE BEGIN ====
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wformat-security"
__attribute__((used)) __attribute__((annotate("__cling__ptrcheck(off)")))
extern "C" void __cf_365(void* obj, int nargs, void** args, void* ret)
{
   if (ret) {
      new (ret) (double) (((const reco::TrackBase*)obj)->validFraction());
      return;
   }
   else {
      (void)(((const reco::TrackBase*)obj)->validFraction());
      return;
   }
}
#pragma clang diagnostic pop
  ==== SOURCE END ====

----- End Fatal Exception -------------------------------------------------
Another exception was caught while trying to clean up files after the primary fatal exception.


@hahnjo
Copy link
Member Author

hahnjo commented Oct 14, 2024

Hi, most of cmssw tests passed but for few relvals we get runtime errors [...]

Thanks for testing! This needs debugging (likely after CHEP)...

@hahnjo hahnjo marked this pull request as draft November 29, 2024 15:02
@hahnjo hahnjo force-pushed the runtime_cxxmodules-aarch64 branch from 3320cc2 to 473c7b2 Compare December 10, 2024 10:03
@hahnjo
Copy link
Member Author

hahnjo commented Dec 10, 2024

Revisiting this PR before the end of the year: Thanks to the clear error message and some guess work, I managed to reproduce the issue in a standalone ROOT session:

root [0] struct A { static void f() {} void take_f(void (*fp)()) { fp(); } void pass_f() { take_f(f); } void call_f() { f(); } };
root [1] A::f()
root [2] #include <sys/mman.h>
root [3] for (int i = 0; i < 1024 * 1024; i++) { mmap(nullptr, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); }
root [4] A a;
root [5] a.call_f()
root [6] a.pass_f()
cling JIT session error: In graph cling-module-16-jitted-objectbuffer, section .text._ZN11__cling_N501A6pass_fEv: relocation target "_ZN11__cling_N501A1fEv" at address 0xffffa18a4034 is out of range of Page21 fixup at 0xfffd88b60044 (_ZN11__cling_N501A6pass_fEv, 0xfffd88b6003c + 0x8)

Currently still investigating why this happens with runtime_cxxmodules but not without...

@hahnjo hahnjo force-pushed the runtime_cxxmodules-aarch64 branch from 473c7b2 to 3c3c8c1 Compare December 11, 2024 08:40
@hahnjo hahnjo marked this pull request as ready for review December 11, 2024 08:40
@hahnjo
Copy link
Member Author

hahnjo commented Dec 11, 2024

Alright, the issue with the reproducer in #16401 (comment) is understood and fixed. Let's hope that this is also fixes the CMS relvals - @aandvalenzuela @smuzaffar could you maybe run the tests again? Thanks in advance for all your help!

@smuzaffar
Copy link
Contributor

Alright, the issue with the reproducer in #16401 (comment) is understood and fixed. Let's hope that this is also fixes the CMS relvals - @aandvalenzuela @smuzaffar could you maybe run the tests again? Thanks in advance for all your help!

cmssw tests started via cms-sw#215

In case of weak definitions, JITLink might merge symbols because we
currently add all llvm::Module's into a single JITDylib. For merges
that late in the pipeline, we also have to prevent optimizations that
take advantage of "localness" to avoid out-of-range relocations for
example on AArch64.
It was disabled in commit a67863d ("Disable modules on aarch64 due
to ODR violation") in 2019. I cannot reproduce these problems on
lxplus-arm, so try to turn it back on.
@smuzaffar
Copy link
Contributor

cmssw tests for aarch64 look good

@hahnjo
Copy link
Member Author

hahnjo commented Dec 11, 2024

cmssw tests for aarch64 look good

Fantastic, thank you! Unfortunately our macOS nodes are not happy, so I'll need to push a slightly fixed version. I don't think we strictly need to test with full CMSSW once more...

@hahnjo hahnjo force-pushed the runtime_cxxmodules-aarch64 branch from 3c3c8c1 to b012169 Compare December 11, 2024 17:59
@hahnjo
Copy link
Member Author

hahnjo commented Dec 12, 2024

@vgvassilev FYI you already approved this three months ago, and now that the issue in CMS is fixed I plan to land this soon

@vgvassilev
Copy link
Member

Go ahead.

@hahnjo hahnjo merged commit 495c30d into root-project:master Dec 12, 2024
21 checks passed
@hahnjo hahnjo deleted the runtime_cxxmodules-aarch64 branch December 13, 2024 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean build Ask CI to do non-incremental build on PR in:C++ modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants