Skip to content

MC: Restructure MCFragment as a fixed part and a variable tail #148544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jul 13, 2025

Refactor the fragment representation of push rax; jmp foo; nop; jmp foo,
previously encoded as
MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo),

to

MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)

Changes:

  • Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
  • The new MCFragment contains a fixed-size content (similar to previous
    MCDataFragment) and an optional variable-size tail.
  • The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
    FT_DwarfFrame, with plans to extend to other fragment types.
    dyn_cast/isa should be avoided for the converted fragment subclasses.
  • In setVarFixups, source fixup offsets are relative to the variable part's start.
    Stored fixup (in FixupStorage) offsets are relative to the fixed part's start.
    A lot of code does getFragmentOffset(Frag) + Fixup.getOffset(),
    expecting the fixup offset to be relative to the fixed part's start.
  • HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
    associated instruction for a fixup. We have to add a const MCFragment & parameter.
  • In MCObjectStreamer, extend absoluteSymbolDiff to apply to
    FT_Relaxable as otherwise there would be many more FT_DwarfFrame
    fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=267b136359d8448c73432b4f3ceeefbf4c35e00b&to=ef026be147babfd2b8f4ed4d875deadcbbbb2bbb&stat=instructions:u

stage2-O0-g instructins:u geomeon (-0.12%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Jul 13, 2025

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-hexagon

@llvm/pr-subscribers-backend-aarch64

Author: Fangrui Song (MaskRay)

Changes

Refactor the fragment representation of push rax; jmp foo; nop; jmp foo,
previously encoded as
MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo),

to

MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)

Changes:

  • Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
  • The new MCFragment contains a fixed-size content (similar to previous
    MCDataFragment) and an optional variable-size tail.
  • The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
    FT_DwarfFrame, with plans to extend to other fragment types.
    dyn_cast/isa should be avoided for the converted fragment subclasses.
  • In setVarFixups, source fixup offsets are relative to the variable part's start.
    Stored fixup (in FixupStorage) offsets are relative to the fixed part's start.
    A lot of code does getFragmentOffset(Frag) + Fixup.getOffset(),
    expecting the fixup offset to be relative to the fixed part's start.
  • HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
    associated instruction for a fixup. We have to add a const MCFragment &amp; parameter.
  • In MCObjectStreamer, extend absoluteSymbolDiff to apply to
    FT_Relaxable as otherwise there would be many more FT_DwarfFrame
    fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=267b136359d8448c73432b4f3ceeefbf4c35e00b&amp;to=ef026be147babfd2b8f4ed4d875deadcbbbb2bbb&amp;stat=instructions:u

stage2-O0-g instructins:u geomeon (-0.12%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &amp;| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i&lt;=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &amp;| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i&lt;=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396

Patch is 79.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148544.diff

28 Files Affected:

  • (modified) llvm/include/llvm/MC/MCAsmBackend.h (+6-10)
  • (modified) llvm/include/llvm/MC/MCAssembler.h (+7-11)
  • (modified) llvm/include/llvm/MC/MCCodeView.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCContext.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCELFStreamer.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCObjectStreamer.h (+3-3)
  • (modified) llvm/include/llvm/MC/MCSection.h (+155-154)
  • (modified) llvm/lib/MC/MCAsmBackend.cpp (+4-5)
  • (modified) llvm/lib/MC/MCAssembler.cpp (+67-62)
  • (modified) llvm/lib/MC/MCELFStreamer.cpp (+2-2)
  • (modified) llvm/lib/MC/MCExpr.cpp (+4-4)
  • (modified) llvm/lib/MC/MCFragment.cpp (+53-40)
  • (modified) llvm/lib/MC/MCObjectStreamer.cpp (+40-29)
  • (modified) llvm/lib/MC/MCSection.cpp (+30-4)
  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (+2-4)
  • (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp (+3-5)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+2-1)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.h (+2-1)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp (+5-6)
  • (modified) llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp (+2-1)
  • (modified) llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.h (+2-1)
  • (modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonAsmBackend.cpp (+7-8)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp (+16-17)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h (+3-5)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp (+22-23)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h (+5-6)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+27-31)
  • (modified) llvm/test/MC/ELF/mc-dump.s (+14-8)
diff --git a/llvm/include/llvm/MC/MCAsmBackend.h b/llvm/include/llvm/MC/MCAsmBackend.h
index 6b81bdba25e67..9a4862baaeb3a 100644
--- a/llvm/include/llvm/MC/MCAsmBackend.h
+++ b/llvm/include/llvm/MC/MCAsmBackend.h
@@ -19,11 +19,8 @@
 namespace llvm {
 
 class MCAlignFragment;
-class MCDwarfCallFrameFragment;
-class MCDwarfLineAddrFragment;
 class MCFragment;
 class MCLEBFragment;
-class MCRelaxableFragment;
 class MCSymbol;
 class MCAssembler;
 class MCContext;
@@ -157,8 +154,9 @@ class LLVM_ABI MCAsmBackend {
 
   /// Target specific predicate for whether a given fixup requires the
   /// associated instruction to be relaxed.
-  virtual bool fixupNeedsRelaxationAdvanced(const MCFixup &, const MCValue &,
-                                            uint64_t, bool Resolved) const;
+  virtual bool fixupNeedsRelaxationAdvanced(const MCFragment &, const MCFixup &,
+                                            const MCValue &, uint64_t,
+                                            bool Resolved) const;
 
   /// Simple predicate for targets where !Resolved implies requiring relaxation
   virtual bool fixupNeedsRelaxation(const MCFixup &Fixup,
@@ -179,18 +177,16 @@ class LLVM_ABI MCAsmBackend {
   }
 
   // Defined by linker relaxation targets.
-  virtual bool relaxDwarfLineAddr(MCDwarfLineAddrFragment &DF,
-                                  bool &WasRelaxed) const {
+  virtual bool relaxDwarfLineAddr(MCFragment &, bool &WasRelaxed) const {
     return false;
   }
-  virtual bool relaxDwarfCFA(MCDwarfCallFrameFragment &DF,
-                             bool &WasRelaxed) const {
+  virtual bool relaxDwarfCFA(MCFragment &, bool &WasRelaxed) const {
     return false;
   }
 
   // Defined by linker relaxation targets to possibly emit LEB128 relocations
   // and set Value at the relocated location.
-  virtual std::pair<bool, bool> relaxLEB128(MCLEBFragment &LF,
+  virtual std::pair<bool, bool> relaxLEB128(MCFragment &,
                                             int64_t &Value) const {
     return std::make_pair(false, false);
   }
diff --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index 1015992cedf29..858596ff8757c 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -34,13 +34,10 @@ namespace llvm {
 class MCBoundaryAlignFragment;
 class MCCVDefRangeFragment;
 class MCCVInlineLineTableFragment;
-class MCDwarfCallFrameFragment;
-class MCDwarfLineAddrFragment;
-class MCEncodedFragment;
+class MCFragment;
 class MCFixup;
 class MCLEBFragment;
 class MCPseudoProbeAddrFragment;
-class MCRelaxableFragment;
 class MCSymbolRefExpr;
 class raw_ostream;
 class MCAsmBackend;
@@ -107,7 +104,7 @@ class MCAssembler {
 
   /// Check whether a fixup can be satisfied, or whether it needs to be relaxed
   /// (increased in size, in order to hold its value correctly).
-  bool fixupNeedsRelaxation(const MCRelaxableFragment &, const MCFixup &) const;
+  bool fixupNeedsRelaxation(const MCFragment &, const MCFixup &) const;
 
   void layoutSection(MCSection &Sec);
   /// Perform one layout iteration and return the index of the first stable
@@ -116,11 +113,11 @@ class MCAssembler {
 
   /// Perform relaxation on a single fragment.
   bool relaxFragment(MCFragment &F);
-  bool relaxInstruction(MCRelaxableFragment &IF);
-  bool relaxLEB(MCLEBFragment &IF);
+  bool relaxInstruction(MCFragment &F);
+  bool relaxLEB(MCFragment &F);
   bool relaxBoundaryAlign(MCBoundaryAlignFragment &BF);
-  bool relaxDwarfLineAddr(MCDwarfLineAddrFragment &DF);
-  bool relaxDwarfCallFrameFragment(MCDwarfCallFrameFragment &DF);
+  bool relaxDwarfLineAddr(MCFragment &F);
+  bool relaxDwarfCallFrameFragment(MCFragment &F);
   bool relaxCVInlineLineTable(MCCVInlineLineTableFragment &DF);
   bool relaxCVDefRange(MCCVDefRangeFragment &DF);
   bool relaxFill(MCFillFragment &F);
@@ -228,8 +225,7 @@ class MCAssembler {
 
   /// Write the necessary bundle padding to \p OS.
   /// Expects a fragment \p F containing instructions and its size \p FSize.
-  LLVM_ABI void writeFragmentPadding(raw_ostream &OS,
-                                     const MCEncodedFragment &F,
+  LLVM_ABI void writeFragmentPadding(raw_ostream &OS, const MCFragment &F,
                                      uint64_t FSize) const;
 
   LLVM_ABI void reportError(SMLoc L, const Twine &Msg) const;
diff --git a/llvm/include/llvm/MC/MCCodeView.h b/llvm/include/llvm/MC/MCCodeView.h
index 88f84a2462841..9cde44c71baff 100644
--- a/llvm/include/llvm/MC/MCCodeView.h
+++ b/llvm/include/llvm/MC/MCCodeView.h
@@ -26,7 +26,6 @@ namespace llvm {
 class MCAssembler;
 class MCCVDefRangeFragment;
 class MCCVInlineLineTableFragment;
-class MCDataFragment;
 class MCFragment;
 class MCSection;
 class MCSymbol;
@@ -231,7 +230,7 @@ class CodeViewContext {
   StringMap<unsigned> StringTable;
 
   /// The fragment that ultimately holds our strings.
-  MCDataFragment *StrTabFragment = nullptr;
+  MCFragment *StrTabFragment = nullptr;
   SmallVector<char, 0> StrTab = {'\0'};
 
   /// Get a string table offset.
diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 5a8ec17dae1cc..c137f6184a9a7 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -47,7 +47,6 @@ namespace llvm {
 
 class CodeViewContext;
 class MCAsmInfo;
-class MCDataFragment;
 class MCInst;
 class MCLabel;
 class MCObjectFileInfo;
@@ -334,7 +333,7 @@ class MCContext {
   void reportCommon(SMLoc Loc,
                     std::function<void(SMDiagnostic &, const SourceMgr *)>);
 
-  MCDataFragment *allocInitialFragment(MCSection &Sec);
+  MCFragment *allocInitialFragment(MCSection &Sec);
 
   MCSymbolTableEntry &getSymbolTableEntry(StringRef Name);
 
diff --git a/llvm/include/llvm/MC/MCELFStreamer.h b/llvm/include/llvm/MC/MCELFStreamer.h
index aba87cb19b21c..5affe206a3535 100644
--- a/llvm/include/llvm/MC/MCELFStreamer.h
+++ b/llvm/include/llvm/MC/MCELFStreamer.h
@@ -17,7 +17,6 @@ namespace llvm {
 
 class ELFObjectWriter;
 class MCContext;
-class MCDataFragment;
 class MCFragment;
 class MCObjectWriter;
 class MCSection;
@@ -51,7 +50,7 @@ class MCELFStreamer : public MCObjectStreamer {
   void initSections(bool NoExecStack, const MCSubtargetInfo &STI) override;
   void changeSection(MCSection *Section, uint32_t Subsection = 0) override;
   void emitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;
-  void emitLabelAtPos(MCSymbol *Symbol, SMLoc Loc, MCDataFragment &F,
+  void emitLabelAtPos(MCSymbol *Symbol, SMLoc Loc, MCFragment &F,
                       uint64_t Offset) override;
   void emitWeakReference(MCSymbol *Alias, const MCSymbol *Target) override;
   bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;
diff --git a/llvm/include/llvm/MC/MCObjectStreamer.h b/llvm/include/llvm/MC/MCObjectStreamer.h
index 58c18af406158..f8635fb989417 100644
--- a/llvm/include/llvm/MC/MCObjectStreamer.h
+++ b/llvm/include/llvm/MC/MCObjectStreamer.h
@@ -43,8 +43,8 @@ class MCObjectStreamer : public MCStreamer {
   struct PendingMCFixup {
     const MCSymbol *Sym;
     MCFixup Fixup;
-    MCDataFragment *DF;
-    PendingMCFixup(const MCSymbol *McSym, MCDataFragment *F, MCFixup McFixup)
+    MCFragment *DF;
+    PendingMCFixup(const MCSymbol *McSym, MCFragment *F, MCFixup McFixup)
         : Sym(McSym), Fixup(McFixup), DF(F) {}
   };
   SmallVector<PendingMCFixup, 2> PendingFixups;
@@ -95,7 +95,7 @@ class MCObjectStreamer : public MCStreamer {
   /// fragment is not a data fragment.
   /// Optionally a \p STI can be passed in so that a new fragment is created
   /// if the Subtarget differs from the current fragment.
-  MCDataFragment *getOrCreateDataFragment(const MCSubtargetInfo* STI = nullptr);
+  MCFragment *getOrCreateDataFragment(const MCSubtargetInfo *STI = nullptr);
 
 protected:
   bool changeSectionImpl(MCSection *Section, uint32_t Subsection);
diff --git a/llvm/include/llvm/MC/MCSection.h b/llvm/include/llvm/MC/MCSection.h
index 16d57a7772a40..e018e98d66151 100644
--- a/llvm/include/llvm/MC/MCSection.h
+++ b/llvm/include/llvm/MC/MCSection.h
@@ -46,8 +46,6 @@ class LLVM_ABI MCSection {
   friend MCAssembler;
   friend MCObjectStreamer;
   friend class MCFragment;
-  friend class MCEncodedFragment;
-  friend class MCRelaxableFragment;
   static constexpr unsigned NonUniqueID = ~0U;
 
   enum SectionVariant {
@@ -265,55 +263,55 @@ class MCFragment {
   /// MCRelaxableFragment: x86-specific
   bool AllowAutoPadding : 1;
 
-  LLVM_ABI MCFragment(FragmentType Kind, bool HasInstructions);
-
-public:
-  MCFragment() = delete;
-  MCFragment(const MCFragment &) = delete;
-  MCFragment &operator=(const MCFragment &) = delete;
-
-  MCFragment *getNext() const { return Next; }
-
-  FragmentType getKind() const { return Kind; }
-
-  MCSection *getParent() const { return Parent; }
-  void setParent(MCSection *Value) { Parent = Value; }
-
-  LLVM_ABI const MCSymbol *getAtom() const;
-
-  unsigned getLayoutOrder() const { return LayoutOrder; }
-  void setLayoutOrder(unsigned Value) { LayoutOrder = Value; }
-
-  /// Does this fragment have instructions emitted into it? By default
-  /// this is false, but specific fragment types may set it to true.
-  bool hasInstructions() const { return HasInstructions; }
-
-  bool isLinkerRelaxable() const { return LinkerRelaxable; }
-  void setLinkerRelaxable() { LinkerRelaxable = true; }
-
-  LLVM_ABI void dump() const;
-};
-
-/// Interface implemented by fragments that contain encoded instructions and/or
-/// data.
-class MCEncodedFragment : public MCFragment {
   uint8_t BundlePadding = 0;
+
   uint32_t ContentStart = 0;
   uint32_t ContentEnd = 0;
   uint32_t FixupStart = 0;
   uint32_t FixupEnd = 0;
 
-protected:
-  MCEncodedFragment(MCFragment::FragmentType FType, bool HasInstructions)
-      : MCFragment(FType, HasInstructions) {}
+  uint32_t VarContentStart = 0;
+  uint32_t VarContentEnd = 0;
+  uint32_t VarFixupStart = 0;
+  uint32_t VarFixupEnd = 0;
 
-  /// The MCSubtargetInfo in effect when the instruction was encoded.
-  /// It must be non-null for instructions.
   const MCSubtargetInfo *STI = nullptr;
 
+  // Optional variable-size tail used by various fragment types.
+  union Tail {
+    struct {
+      uint32_t Opcode;
+      uint32_t Flags;
+      uint32_t OperandStart;
+      uint32_t OperandSize;
+    } relax;
+    struct {
+      // True if this is a sleb128, false if uleb128.
+      bool IsSigned;
+      // The value this fragment should contain.
+      const MCExpr *Value;
+    } leb;
+    struct {
+      const MCExpr *AddrDelta;
+    } dwarf_frame;
+    struct {
+      // The expression for the difference of the two symbols that make up the
+      // address delta between two .loc dwarf directives.
+      const MCExpr *AddrDelta;
+      // The value of the difference between the two line numbers between two
+      // .loc dwarf directives.
+      int64_t LineDelta;
+    } dwarf;
+  } u{};
+
 public:
-  static bool classof(const MCFragment *F) {
-    MCFragment::FragmentType Kind = F->getKind();
+  LLVM_ABI MCFragment(FragmentType Kind = MCFragment::FT_Data,
+                      bool HasInstructions = false);
+  MCFragment(const MCFragment &) = delete;
+  MCFragment &operator=(const MCFragment &) = delete;
+
+  bool isEncoded() const {
+    MCFragment::FragmentType Kind = getKind();
     switch (Kind) {
     default:
       return false;
@@ -329,20 +327,23 @@ class MCEncodedFragment : public MCFragment {
     }
   }
 
-  /// Should this fragment be placed at the end of an aligned bundle?
-  bool alignToBundleEnd() const { return AlignToBundleEnd; }
-  void setAlignToBundleEnd(bool V) { AlignToBundleEnd = V; }
+  MCFragment *getNext() const { return Next; }
 
-  /// Get the padding size that must be inserted before this fragment.
-  /// Used for bundling. By default, no padding is inserted.
-  /// Note that padding size is restricted to 8 bits. This is an optimization
-  /// to reduce the amount of space used for each fragment. In practice, larger
-  /// padding should never be required.
-  uint8_t getBundlePadding() const { return BundlePadding; }
+  FragmentType getKind() const { return Kind; }
 
-  /// Set the padding size for this fragment. By default it's a no-op,
-  /// and only some fragments have a meaningful implementation.
-  void setBundlePadding(uint8_t N) { BundlePadding = N; }
+  MCSection *getParent() const { return Parent; }
+  void setParent(MCSection *Value) { Parent = Value; }
+
+  LLVM_ABI const MCSymbol *getAtom() const;
+
+  unsigned getLayoutOrder() const { return LayoutOrder; }
+  void setLayoutOrder(unsigned Value) { LayoutOrder = Value; }
+
+  /// Does this fragment have instructions emitted into it? By default
+  /// this is false, but specific fragment types may set it to true.
+  bool hasInstructions() const { return HasInstructions; }
+
+  LLVM_ABI void dump() const;
 
   /// Retrieve the MCSubTargetInfo in effect when the instruction was encoded.
   /// Guaranteed to be non-null if hasInstructions() == true
@@ -355,9 +356,27 @@ class MCEncodedFragment : public MCFragment {
     this->STI = &STI;
   }
 
+  bool isLinkerRelaxable() const { return LinkerRelaxable; }
+  void setLinkerRelaxable() { LinkerRelaxable = true; }
+
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
   void setAllowAutoPadding(bool V) { AllowAutoPadding = V; }
 
+  /// Should this fragment be placed at the end of an aligned bundle?
+  bool alignToBundleEnd() const { return AlignToBundleEnd; }
+  void setAlignToBundleEnd(bool V) { AlignToBundleEnd = V; }
+
+  /// Get the padding size that must be inserted before this fragment.
+  /// Used for bundling. By default, no padding is inserted.
+  /// Note that padding size is restricted to 8 bits. This is an optimization
+  /// to reduce the amount of space used for each fragment. In practice, larger
+  /// padding should never be required.
+  uint8_t getBundlePadding() const { return BundlePadding; }
+
+  /// Set the padding size for this fragment. By default it's a no-op,
+  /// and only some fragments have a meaningful implementation.
+  void setBundlePadding(uint8_t N) { BundlePadding = N; }
+
   // Content-related functions manage parent's storage using ContentStart and
   // ContentSize.
   void clearContents() { ContentEnd = ContentStart; }
@@ -394,7 +413,24 @@ class MCEncodedFragment : public MCFragment {
         .slice(ContentStart, ContentEnd - ContentStart);
   }
 
-  // Fixup-related functions manage parent's storage using FixupStart and
+  void setVarContents(ArrayRef<char> Contents);
+  void clearVarContents() { setVarContents({}); }
+  MutableArrayRef<char> getVarContents() {
+    return MutableArrayRef(getParent()->ContentStorage)
+        .slice(VarContentStart, VarContentEnd - VarContentStart);
+  }
+  ArrayRef<char> getVarContents() const {
+    return ArrayRef(getParent()->ContentStorage)
+        .slice(VarContentStart, VarContentEnd - VarContentStart);
+  }
+
+  size_t getFixedSize() const { return ContentEnd - ContentStart; }
+  size_t getVarSize() const { return VarContentEnd - VarContentStart; }
+  size_t getSize() const {
+    return ContentEnd - ContentStart + (VarContentEnd - VarContentStart);
+  }
+
+  //== Fixup-related functions manage parent's storage using FixupStart and
   // FixupSize.
   void clearFixups() { FixupEnd = FixupStart; }
   LLVM_ABI void addFixup(MCFixup Fixup);
@@ -409,65 +445,91 @@ class MCEncodedFragment : public MCFragment {
         .slice(FixupStart, FixupEnd - FixupStart);
   }
 
-  size_t getSize() const { return ContentEnd - ContentStart; }
-};
-
-/// Fragment for data and encoded instructions.
-///
-class MCDataFragment : public MCEncodedFragment {
-public:
-  MCDataFragment() : MCEncodedFragment(FT_Data, false) {}
-
-  static bool classof(const MCFragment *F) {
-    return F->getKind() == MCFragment::FT_Data;
+  // Source fixup offsets are relative to the variable part's start.
+  // Stored fixup offsets are relative to the fixed part's start.
+  void setVarFixups(ArrayRef<MCFixup> Fixups);
+  void clearVarFixups() { setVarFixups({}); }
+  MutableArrayRef<MCFixup> getVarFixups() {
+    return MutableArrayRef(getParent()->FixupStorage)
+        .slice(VarFixupStart, VarFixupEnd - VarFixupStart);
   }
-};
-
-/// A relaxable fragment holds on to its MCInst, since it may need to be
-/// relaxed during the assembler layout and relaxation stage.
-///
-class MCRelaxableFragment : public MCEncodedFragment {
-  uint32_t Opcode = 0;
-  uint32_t Flags = 0;
-  uint32_t OperandStart = 0;
-  uint32_t OperandSize = 0;
-
-public:
-  MCRelaxableFragment(const MCSubtargetInfo &STI)
-      : MCEncodedFragment(FT_Relaxable, true) {
-    this->STI = &STI;
+  ArrayRef<MCFixup> getVarFixups() const {
+    return ArrayRef(getParent()->FixupStorage)
+        .slice(VarFixupStart, VarFixupEnd - VarFixupStart);
   }
 
-  unsigned getOpcode() const { return Opcode; }
+  //== FT_Relaxable functions
+  unsigned getOpcode() const {
+    assert(Kind == FT_Relaxable);
+    return u.relax.Opcode;
+  }
   ArrayRef<MCOperand> getOperands() const {
+    assert(Kind == FT_Relaxable);
     return MutableArrayRef(getParent()->MCOperandStorage)
-        .slice(OperandStart, OperandSize);
+        .slice(u.relax.OperandStart, u.relax.OperandSize);
   }
   MCInst getInst() const {
+    assert(Kind == FT_Relaxable);
     MCInst Inst;
-    Inst.setOpcode(Opcode);
-    Inst.setFlags(Flags);
+    Inst.setOpcode(u.relax.Opcode);
+    Inst.setFlags(u.relax.Flags);
     Inst.setOperands(ArrayRef(getParent()->MCOperandStorage)
-                         .slice(OperandStart, OperandSize));
+                         .slice(u.relax.OperandStart, u.relax.OperandSize));
     return Inst;
   }
   void setInst(const MCInst &Inst) {
-    Opcode = Inst.getOpcode();
-    Flags = Inst.getFlags();
+    assert(Kind == FT_Relaxable);
+    u.relax.Opcode = Inst.getOpcode();
+    u.relax.Flags = Inst.getFlags();
     auto &S = getParent()->MCOperandStorage;
-    if (Inst.getNumOperands() > OperandSize) {
-      OperandStart = S.size();
+    if (Inst.getNumOperands() > u.relax.OperandSize) {
+      u.relax.OperandStart = S.size();
       S.resize_for_overwrite(S.size() + Inst.getNumOperands());
     }
-    OperandSize = Inst.getNumOperands();
-    llvm::copy(Inst, S.begin() + OperandStart);
+    u.relax.OperandSize = Inst.getNumOperands();
+    llvm::copy(Inst, S.begin() + u.relax.OperandStart);
   }
 
-  static bool classof(const MCFragment *F) {
-    return F->getKind() == MCFragment::FT_Relaxable;
+  //== FT_LEB functions
+  const MCExpr &getLEBValue() const {
+    assert(Kind == FT_LEB);
+    return *u.leb.Value;
   }
+  void setLEBValue(const MCExpr *Expr) { u.leb.Value = Expr; }
+  bool isLEBSigned() const { return u.leb.IsSigned; }
+  void setLEBSigned(bool S) { u.leb.IsSigned = S; }
+
+  //== FT_DwarfFrame functions
+  const MCExpr &getAddrDelta() const {
+    assert(Kind == FT_Dwarf || Kind == FT_DwarfFrame);
+    return *u.dwarf.AddrDelta;
+  }
+  void setAddrDelta(const MCExpr *E) {
+    assert(Kind == FT_Dwarf || Kind == FT_DwarfFrame);
+    u.dwarf.AddrDelta = E;
+  }
+  int64_t getLineDelta() const {
+    assert(Kind == FT_Dwarf);
+    return u.dwarf.LineDelta;
+  }
+  void setLineDelta(int64_t LineDelta) {
+    assert(Kind == FT_Dwarf);
+    u.dwarf.LineDelta = LineDelta;
+  }
+};
+
+/// Interface implemented by fragments that contain encoded instructions and/or
+/// data.
+class MCEncodedFragment : public MCFragment {
+protected:
+  MCEncodedFragment(MCFragment::FragmentType FType, bool HasInstructions)
+      : MCFragment(FType, HasInstructions) {}
 };
 
+// TODO Delete
+using MCDataFragment = MCFragment;
+using MCRelaxableFragment = MCFragment;
+
 class MCAlignFragment : public MCFragment {
   /// The alignment to ensure, in bytes.
   Align Alignment;
@@ -602,67 +664,6 @@ class MCOrgFragment : public MCFragment {
   }
 };
 
-class MCLEBFragment final : public MCEncodedFragment {
-  /// True if this is a sleb128, false if uleb128.
-  bool IsSigned;
-
-  /// The value this fragment should contain.
-  const MCExpr *Value;
-
-public:
-  MCLEBFragment(const MCExpr &Value, bool IsSigned)
-      : MCEncodedFragment(FT_LEB, false), IsSigned(IsSigned), Value(&Value) {}
-
-  const MCExpr &getValue() const { return *Value; }
-  void setValue(const MCExpr *Expr) { Value = Expr; }
-
-  bool isSigned() const { return IsSigned; }
-
-  static bool classof(const MCFragment *F) {
-    ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 13, 2025

@llvm/pr-subscribers-backend-webassembly

Author: Fangrui Song (MaskRay)

Changes

Refactor the fragment representation of push rax; jmp foo; nop; jmp foo,
previously encoded as
MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo),

to

MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)

Changes:

  • Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
  • The new MCFragment contains a fixed-size content (similar to previous
    MCDataFragment) and an optional variable-size tail.
  • The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
    FT_DwarfFrame, with plans to extend to other fragment types.
    dyn_cast/isa should be avoided for the converted fragment subclasses.
  • In setVarFixups, source fixup offsets are relative to the variable part's start.
    Stored fixup (in FixupStorage) offsets are relative to the fixed part's start.
    A lot of code does getFragmentOffset(Frag) + Fixup.getOffset(),
    expecting the fixup offset to be relative to the fixed part's start.
  • HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
    associated instruction for a fixup. We have to add a const MCFragment &amp; parameter.
  • In MCObjectStreamer, extend absoluteSymbolDiff to apply to
    FT_Relaxable as otherwise there would be many more FT_DwarfFrame
    fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=267b136359d8448c73432b4f3ceeefbf4c35e00b&amp;to=ef026be147babfd2b8f4ed4d875deadcbbbb2bbb&amp;stat=instructions:u

stage2-O0-g instructins:u geomeon (-0.12%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &amp;| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i&lt;=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &amp;| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i&lt;=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396

Patch is 79.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148544.diff

28 Files Affected:

  • (modified) llvm/include/llvm/MC/MCAsmBackend.h (+6-10)
  • (modified) llvm/include/llvm/MC/MCAssembler.h (+7-11)
  • (modified) llvm/include/llvm/MC/MCCodeView.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCContext.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCELFStreamer.h (+1-2)
  • (modified) llvm/include/llvm/MC/MCObjectStreamer.h (+3-3)
  • (modified) llvm/include/llvm/MC/MCSection.h (+155-154)
  • (modified) llvm/lib/MC/MCAsmBackend.cpp (+4-5)
  • (modified) llvm/lib/MC/MCAssembler.cpp (+67-62)
  • (modified) llvm/lib/MC/MCELFStreamer.cpp (+2-2)
  • (modified) llvm/lib/MC/MCExpr.cpp (+4-4)
  • (modified) llvm/lib/MC/MCFragment.cpp (+53-40)
  • (modified) llvm/lib/MC/MCObjectStreamer.cpp (+40-29)
  • (modified) llvm/lib/MC/MCSection.cpp (+30-4)
  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (+2-4)
  • (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp (+3-5)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+2-1)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.h (+2-1)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp (+5-6)
  • (modified) llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp (+2-1)
  • (modified) llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.h (+2-1)
  • (modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonAsmBackend.cpp (+7-8)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.cpp (+16-17)
  • (modified) llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchAsmBackend.h (+3-5)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp (+22-23)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h (+5-6)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+27-31)
  • (modified) llvm/test/MC/ELF/mc-dump.s (+14-8)
diff --git a/llvm/include/llvm/MC/MCAsmBackend.h b/llvm/include/llvm/MC/MCAsmBackend.h
index 6b81bdba25e67..9a4862baaeb3a 100644
--- a/llvm/include/llvm/MC/MCAsmBackend.h
+++ b/llvm/include/llvm/MC/MCAsmBackend.h
@@ -19,11 +19,8 @@
 namespace llvm {
 
 class MCAlignFragment;
-class MCDwarfCallFrameFragment;
-class MCDwarfLineAddrFragment;
 class MCFragment;
 class MCLEBFragment;
-class MCRelaxableFragment;
 class MCSymbol;
 class MCAssembler;
 class MCContext;
@@ -157,8 +154,9 @@ class LLVM_ABI MCAsmBackend {
 
   /// Target specific predicate for whether a given fixup requires the
   /// associated instruction to be relaxed.
-  virtual bool fixupNeedsRelaxationAdvanced(const MCFixup &, const MCValue &,
-                                            uint64_t, bool Resolved) const;
+  virtual bool fixupNeedsRelaxationAdvanced(const MCFragment &, const MCFixup &,
+                                            const MCValue &, uint64_t,
+                                            bool Resolved) const;
 
   /// Simple predicate for targets where !Resolved implies requiring relaxation
   virtual bool fixupNeedsRelaxation(const MCFixup &Fixup,
@@ -179,18 +177,16 @@ class LLVM_ABI MCAsmBackend {
   }
 
   // Defined by linker relaxation targets.
-  virtual bool relaxDwarfLineAddr(MCDwarfLineAddrFragment &DF,
-                                  bool &WasRelaxed) const {
+  virtual bool relaxDwarfLineAddr(MCFragment &, bool &WasRelaxed) const {
     return false;
   }
-  virtual bool relaxDwarfCFA(MCDwarfCallFrameFragment &DF,
-                             bool &WasRelaxed) const {
+  virtual bool relaxDwarfCFA(MCFragment &, bool &WasRelaxed) const {
     return false;
   }
 
   // Defined by linker relaxation targets to possibly emit LEB128 relocations
   // and set Value at the relocated location.
-  virtual std::pair<bool, bool> relaxLEB128(MCLEBFragment &LF,
+  virtual std::pair<bool, bool> relaxLEB128(MCFragment &,
                                             int64_t &Value) const {
     return std::make_pair(false, false);
   }
diff --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index 1015992cedf29..858596ff8757c 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -34,13 +34,10 @@ namespace llvm {
 class MCBoundaryAlignFragment;
 class MCCVDefRangeFragment;
 class MCCVInlineLineTableFragment;
-class MCDwarfCallFrameFragment;
-class MCDwarfLineAddrFragment;
-class MCEncodedFragment;
+class MCFragment;
 class MCFixup;
 class MCLEBFragment;
 class MCPseudoProbeAddrFragment;
-class MCRelaxableFragment;
 class MCSymbolRefExpr;
 class raw_ostream;
 class MCAsmBackend;
@@ -107,7 +104,7 @@ class MCAssembler {
 
   /// Check whether a fixup can be satisfied, or whether it needs to be relaxed
   /// (increased in size, in order to hold its value correctly).
-  bool fixupNeedsRelaxation(const MCRelaxableFragment &, const MCFixup &) const;
+  bool fixupNeedsRelaxation(const MCFragment &, const MCFixup &) const;
 
   void layoutSection(MCSection &Sec);
   /// Perform one layout iteration and return the index of the first stable
@@ -116,11 +113,11 @@ class MCAssembler {
 
   /// Perform relaxation on a single fragment.
   bool relaxFragment(MCFragment &F);
-  bool relaxInstruction(MCRelaxableFragment &IF);
-  bool relaxLEB(MCLEBFragment &IF);
+  bool relaxInstruction(MCFragment &F);
+  bool relaxLEB(MCFragment &F);
   bool relaxBoundaryAlign(MCBoundaryAlignFragment &BF);
-  bool relaxDwarfLineAddr(MCDwarfLineAddrFragment &DF);
-  bool relaxDwarfCallFrameFragment(MCDwarfCallFrameFragment &DF);
+  bool relaxDwarfLineAddr(MCFragment &F);
+  bool relaxDwarfCallFrameFragment(MCFragment &F);
   bool relaxCVInlineLineTable(MCCVInlineLineTableFragment &DF);
   bool relaxCVDefRange(MCCVDefRangeFragment &DF);
   bool relaxFill(MCFillFragment &F);
@@ -228,8 +225,7 @@ class MCAssembler {
 
   /// Write the necessary bundle padding to \p OS.
   /// Expects a fragment \p F containing instructions and its size \p FSize.
-  LLVM_ABI void writeFragmentPadding(raw_ostream &OS,
-                                     const MCEncodedFragment &F,
+  LLVM_ABI void writeFragmentPadding(raw_ostream &OS, const MCFragment &F,
                                      uint64_t FSize) const;
 
   LLVM_ABI void reportError(SMLoc L, const Twine &Msg) const;
diff --git a/llvm/include/llvm/MC/MCCodeView.h b/llvm/include/llvm/MC/MCCodeView.h
index 88f84a2462841..9cde44c71baff 100644
--- a/llvm/include/llvm/MC/MCCodeView.h
+++ b/llvm/include/llvm/MC/MCCodeView.h
@@ -26,7 +26,6 @@ namespace llvm {
 class MCAssembler;
 class MCCVDefRangeFragment;
 class MCCVInlineLineTableFragment;
-class MCDataFragment;
 class MCFragment;
 class MCSection;
 class MCSymbol;
@@ -231,7 +230,7 @@ class CodeViewContext {
   StringMap<unsigned> StringTable;
 
   /// The fragment that ultimately holds our strings.
-  MCDataFragment *StrTabFragment = nullptr;
+  MCFragment *StrTabFragment = nullptr;
   SmallVector<char, 0> StrTab = {'\0'};
 
   /// Get a string table offset.
diff --git a/llvm/include/llvm/MC/MCContext.h b/llvm/include/llvm/MC/MCContext.h
index 5a8ec17dae1cc..c137f6184a9a7 100644
--- a/llvm/include/llvm/MC/MCContext.h
+++ b/llvm/include/llvm/MC/MCContext.h
@@ -47,7 +47,6 @@ namespace llvm {
 
 class CodeViewContext;
 class MCAsmInfo;
-class MCDataFragment;
 class MCInst;
 class MCLabel;
 class MCObjectFileInfo;
@@ -334,7 +333,7 @@ class MCContext {
   void reportCommon(SMLoc Loc,
                     std::function<void(SMDiagnostic &, const SourceMgr *)>);
 
-  MCDataFragment *allocInitialFragment(MCSection &Sec);
+  MCFragment *allocInitialFragment(MCSection &Sec);
 
   MCSymbolTableEntry &getSymbolTableEntry(StringRef Name);
 
diff --git a/llvm/include/llvm/MC/MCELFStreamer.h b/llvm/include/llvm/MC/MCELFStreamer.h
index aba87cb19b21c..5affe206a3535 100644
--- a/llvm/include/llvm/MC/MCELFStreamer.h
+++ b/llvm/include/llvm/MC/MCELFStreamer.h
@@ -17,7 +17,6 @@ namespace llvm {
 
 class ELFObjectWriter;
 class MCContext;
-class MCDataFragment;
 class MCFragment;
 class MCObjectWriter;
 class MCSection;
@@ -51,7 +50,7 @@ class MCELFStreamer : public MCObjectStreamer {
   void initSections(bool NoExecStack, const MCSubtargetInfo &STI) override;
   void changeSection(MCSection *Section, uint32_t Subsection = 0) override;
   void emitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;
-  void emitLabelAtPos(MCSymbol *Symbol, SMLoc Loc, MCDataFragment &F,
+  void emitLabelAtPos(MCSymbol *Symbol, SMLoc Loc, MCFragment &F,
                       uint64_t Offset) override;
   void emitWeakReference(MCSymbol *Alias, const MCSymbol *Target) override;
   bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;
diff --git a/llvm/include/llvm/MC/MCObjectStreamer.h b/llvm/include/llvm/MC/MCObjectStreamer.h
index 58c18af406158..f8635fb989417 100644
--- a/llvm/include/llvm/MC/MCObjectStreamer.h
+++ b/llvm/include/llvm/MC/MCObjectStreamer.h
@@ -43,8 +43,8 @@ class MCObjectStreamer : public MCStreamer {
   struct PendingMCFixup {
     const MCSymbol *Sym;
     MCFixup Fixup;
-    MCDataFragment *DF;
-    PendingMCFixup(const MCSymbol *McSym, MCDataFragment *F, MCFixup McFixup)
+    MCFragment *DF;
+    PendingMCFixup(const MCSymbol *McSym, MCFragment *F, MCFixup McFixup)
         : Sym(McSym), Fixup(McFixup), DF(F) {}
   };
   SmallVector<PendingMCFixup, 2> PendingFixups;
@@ -95,7 +95,7 @@ class MCObjectStreamer : public MCStreamer {
   /// fragment is not a data fragment.
   /// Optionally a \p STI can be passed in so that a new fragment is created
   /// if the Subtarget differs from the current fragment.
-  MCDataFragment *getOrCreateDataFragment(const MCSubtargetInfo* STI = nullptr);
+  MCFragment *getOrCreateDataFragment(const MCSubtargetInfo *STI = nullptr);
 
 protected:
   bool changeSectionImpl(MCSection *Section, uint32_t Subsection);
diff --git a/llvm/include/llvm/MC/MCSection.h b/llvm/include/llvm/MC/MCSection.h
index 16d57a7772a40..e018e98d66151 100644
--- a/llvm/include/llvm/MC/MCSection.h
+++ b/llvm/include/llvm/MC/MCSection.h
@@ -46,8 +46,6 @@ class LLVM_ABI MCSection {
   friend MCAssembler;
   friend MCObjectStreamer;
   friend class MCFragment;
-  friend class MCEncodedFragment;
-  friend class MCRelaxableFragment;
   static constexpr unsigned NonUniqueID = ~0U;
 
   enum SectionVariant {
@@ -265,55 +263,55 @@ class MCFragment {
   /// MCRelaxableFragment: x86-specific
   bool AllowAutoPadding : 1;
 
-  LLVM_ABI MCFragment(FragmentType Kind, bool HasInstructions);
-
-public:
-  MCFragment() = delete;
-  MCFragment(const MCFragment &) = delete;
-  MCFragment &operator=(const MCFragment &) = delete;
-
-  MCFragment *getNext() const { return Next; }
-
-  FragmentType getKind() const { return Kind; }
-
-  MCSection *getParent() const { return Parent; }
-  void setParent(MCSection *Value) { Parent = Value; }
-
-  LLVM_ABI const MCSymbol *getAtom() const;
-
-  unsigned getLayoutOrder() const { return LayoutOrder; }
-  void setLayoutOrder(unsigned Value) { LayoutOrder = Value; }
-
-  /// Does this fragment have instructions emitted into it? By default
-  /// this is false, but specific fragment types may set it to true.
-  bool hasInstructions() const { return HasInstructions; }
-
-  bool isLinkerRelaxable() const { return LinkerRelaxable; }
-  void setLinkerRelaxable() { LinkerRelaxable = true; }
-
-  LLVM_ABI void dump() const;
-};
-
-/// Interface implemented by fragments that contain encoded instructions and/or
-/// data.
-class MCEncodedFragment : public MCFragment {
   uint8_t BundlePadding = 0;
+
   uint32_t ContentStart = 0;
   uint32_t ContentEnd = 0;
   uint32_t FixupStart = 0;
   uint32_t FixupEnd = 0;
 
-protected:
-  MCEncodedFragment(MCFragment::FragmentType FType, bool HasInstructions)
-      : MCFragment(FType, HasInstructions) {}
+  uint32_t VarContentStart = 0;
+  uint32_t VarContentEnd = 0;
+  uint32_t VarFixupStart = 0;
+  uint32_t VarFixupEnd = 0;
 
-  /// The MCSubtargetInfo in effect when the instruction was encoded.
-  /// It must be non-null for instructions.
   const MCSubtargetInfo *STI = nullptr;
 
+  // Optional variable-size tail used by various fragment types.
+  union Tail {
+    struct {
+      uint32_t Opcode;
+      uint32_t Flags;
+      uint32_t OperandStart;
+      uint32_t OperandSize;
+    } relax;
+    struct {
+      // True if this is a sleb128, false if uleb128.
+      bool IsSigned;
+      // The value this fragment should contain.
+      const MCExpr *Value;
+    } leb;
+    struct {
+      const MCExpr *AddrDelta;
+    } dwarf_frame;
+    struct {
+      // The expression for the difference of the two symbols that make up the
+      // address delta between two .loc dwarf directives.
+      const MCExpr *AddrDelta;
+      // The value of the difference between the two line numbers between two
+      // .loc dwarf directives.
+      int64_t LineDelta;
+    } dwarf;
+  } u{};
+
 public:
-  static bool classof(const MCFragment *F) {
-    MCFragment::FragmentType Kind = F->getKind();
+  LLVM_ABI MCFragment(FragmentType Kind = MCFragment::FT_Data,
+                      bool HasInstructions = false);
+  MCFragment(const MCFragment &) = delete;
+  MCFragment &operator=(const MCFragment &) = delete;
+
+  bool isEncoded() const {
+    MCFragment::FragmentType Kind = getKind();
     switch (Kind) {
     default:
       return false;
@@ -329,20 +327,23 @@ class MCEncodedFragment : public MCFragment {
     }
   }
 
-  /// Should this fragment be placed at the end of an aligned bundle?
-  bool alignToBundleEnd() const { return AlignToBundleEnd; }
-  void setAlignToBundleEnd(bool V) { AlignToBundleEnd = V; }
+  MCFragment *getNext() const { return Next; }
 
-  /// Get the padding size that must be inserted before this fragment.
-  /// Used for bundling. By default, no padding is inserted.
-  /// Note that padding size is restricted to 8 bits. This is an optimization
-  /// to reduce the amount of space used for each fragment. In practice, larger
-  /// padding should never be required.
-  uint8_t getBundlePadding() const { return BundlePadding; }
+  FragmentType getKind() const { return Kind; }
 
-  /// Set the padding size for this fragment. By default it's a no-op,
-  /// and only some fragments have a meaningful implementation.
-  void setBundlePadding(uint8_t N) { BundlePadding = N; }
+  MCSection *getParent() const { return Parent; }
+  void setParent(MCSection *Value) { Parent = Value; }
+
+  LLVM_ABI const MCSymbol *getAtom() const;
+
+  unsigned getLayoutOrder() const { return LayoutOrder; }
+  void setLayoutOrder(unsigned Value) { LayoutOrder = Value; }
+
+  /// Does this fragment have instructions emitted into it? By default
+  /// this is false, but specific fragment types may set it to true.
+  bool hasInstructions() const { return HasInstructions; }
+
+  LLVM_ABI void dump() const;
 
   /// Retrieve the MCSubTargetInfo in effect when the instruction was encoded.
   /// Guaranteed to be non-null if hasInstructions() == true
@@ -355,9 +356,27 @@ class MCEncodedFragment : public MCFragment {
     this->STI = &STI;
   }
 
+  bool isLinkerRelaxable() const { return LinkerRelaxable; }
+  void setLinkerRelaxable() { LinkerRelaxable = true; }
+
   bool getAllowAutoPadding() const { return AllowAutoPadding; }
   void setAllowAutoPadding(bool V) { AllowAutoPadding = V; }
 
+  /// Should this fragment be placed at the end of an aligned bundle?
+  bool alignToBundleEnd() const { return AlignToBundleEnd; }
+  void setAlignToBundleEnd(bool V) { AlignToBundleEnd = V; }
+
+  /// Get the padding size that must be inserted before this fragment.
+  /// Used for bundling. By default, no padding is inserted.
+  /// Note that padding size is restricted to 8 bits. This is an optimization
+  /// to reduce the amount of space used for each fragment. In practice, larger
+  /// padding should never be required.
+  uint8_t getBundlePadding() const { return BundlePadding; }
+
+  /// Set the padding size for this fragment. By default it's a no-op,
+  /// and only some fragments have a meaningful implementation.
+  void setBundlePadding(uint8_t N) { BundlePadding = N; }
+
   // Content-related functions manage parent's storage using ContentStart and
   // ContentSize.
   void clearContents() { ContentEnd = ContentStart; }
@@ -394,7 +413,24 @@ class MCEncodedFragment : public MCFragment {
         .slice(ContentStart, ContentEnd - ContentStart);
   }
 
-  // Fixup-related functions manage parent's storage using FixupStart and
+  void setVarContents(ArrayRef<char> Contents);
+  void clearVarContents() { setVarContents({}); }
+  MutableArrayRef<char> getVarContents() {
+    return MutableArrayRef(getParent()->ContentStorage)
+        .slice(VarContentStart, VarContentEnd - VarContentStart);
+  }
+  ArrayRef<char> getVarContents() const {
+    return ArrayRef(getParent()->ContentStorage)
+        .slice(VarContentStart, VarContentEnd - VarContentStart);
+  }
+
+  size_t getFixedSize() const { return ContentEnd - ContentStart; }
+  size_t getVarSize() const { return VarContentEnd - VarContentStart; }
+  size_t getSize() const {
+    return ContentEnd - ContentStart + (VarContentEnd - VarContentStart);
+  }
+
+  //== Fixup-related functions manage parent's storage using FixupStart and
   // FixupSize.
   void clearFixups() { FixupEnd = FixupStart; }
   LLVM_ABI void addFixup(MCFixup Fixup);
@@ -409,65 +445,91 @@ class MCEncodedFragment : public MCFragment {
         .slice(FixupStart, FixupEnd - FixupStart);
   }
 
-  size_t getSize() const { return ContentEnd - ContentStart; }
-};
-
-/// Fragment for data and encoded instructions.
-///
-class MCDataFragment : public MCEncodedFragment {
-public:
-  MCDataFragment() : MCEncodedFragment(FT_Data, false) {}
-
-  static bool classof(const MCFragment *F) {
-    return F->getKind() == MCFragment::FT_Data;
+  // Source fixup offsets are relative to the variable part's start.
+  // Stored fixup offsets are relative to the fixed part's start.
+  void setVarFixups(ArrayRef<MCFixup> Fixups);
+  void clearVarFixups() { setVarFixups({}); }
+  MutableArrayRef<MCFixup> getVarFixups() {
+    return MutableArrayRef(getParent()->FixupStorage)
+        .slice(VarFixupStart, VarFixupEnd - VarFixupStart);
   }
-};
-
-/// A relaxable fragment holds on to its MCInst, since it may need to be
-/// relaxed during the assembler layout and relaxation stage.
-///
-class MCRelaxableFragment : public MCEncodedFragment {
-  uint32_t Opcode = 0;
-  uint32_t Flags = 0;
-  uint32_t OperandStart = 0;
-  uint32_t OperandSize = 0;
-
-public:
-  MCRelaxableFragment(const MCSubtargetInfo &STI)
-      : MCEncodedFragment(FT_Relaxable, true) {
-    this->STI = &STI;
+  ArrayRef<MCFixup> getVarFixups() const {
+    return ArrayRef(getParent()->FixupStorage)
+        .slice(VarFixupStart, VarFixupEnd - VarFixupStart);
   }
 
-  unsigned getOpcode() const { return Opcode; }
+  //== FT_Relaxable functions
+  unsigned getOpcode() const {
+    assert(Kind == FT_Relaxable);
+    return u.relax.Opcode;
+  }
   ArrayRef<MCOperand> getOperands() const {
+    assert(Kind == FT_Relaxable);
     return MutableArrayRef(getParent()->MCOperandStorage)
-        .slice(OperandStart, OperandSize);
+        .slice(u.relax.OperandStart, u.relax.OperandSize);
   }
   MCInst getInst() const {
+    assert(Kind == FT_Relaxable);
     MCInst Inst;
-    Inst.setOpcode(Opcode);
-    Inst.setFlags(Flags);
+    Inst.setOpcode(u.relax.Opcode);
+    Inst.setFlags(u.relax.Flags);
     Inst.setOperands(ArrayRef(getParent()->MCOperandStorage)
-                         .slice(OperandStart, OperandSize));
+                         .slice(u.relax.OperandStart, u.relax.OperandSize));
     return Inst;
   }
   void setInst(const MCInst &Inst) {
-    Opcode = Inst.getOpcode();
-    Flags = Inst.getFlags();
+    assert(Kind == FT_Relaxable);
+    u.relax.Opcode = Inst.getOpcode();
+    u.relax.Flags = Inst.getFlags();
     auto &S = getParent()->MCOperandStorage;
-    if (Inst.getNumOperands() > OperandSize) {
-      OperandStart = S.size();
+    if (Inst.getNumOperands() > u.relax.OperandSize) {
+      u.relax.OperandStart = S.size();
       S.resize_for_overwrite(S.size() + Inst.getNumOperands());
     }
-    OperandSize = Inst.getNumOperands();
-    llvm::copy(Inst, S.begin() + OperandStart);
+    u.relax.OperandSize = Inst.getNumOperands();
+    llvm::copy(Inst, S.begin() + u.relax.OperandStart);
   }
 
-  static bool classof(const MCFragment *F) {
-    return F->getKind() == MCFragment::FT_Relaxable;
+  //== FT_LEB functions
+  const MCExpr &getLEBValue() const {
+    assert(Kind == FT_LEB);
+    return *u.leb.Value;
   }
+  void setLEBValue(const MCExpr *Expr) { u.leb.Value = Expr; }
+  bool isLEBSigned() const { return u.leb.IsSigned; }
+  void setLEBSigned(bool S) { u.leb.IsSigned = S; }
+
+  //== FT_DwarfFrame functions
+  const MCExpr &getAddrDelta() const {
+    assert(Kind == FT_Dwarf || Kind == FT_DwarfFrame);
+    return *u.dwarf.AddrDelta;
+  }
+  void setAddrDelta(const MCExpr *E) {
+    assert(Kind == FT_Dwarf || Kind == FT_DwarfFrame);
+    u.dwarf.AddrDelta = E;
+  }
+  int64_t getLineDelta() const {
+    assert(Kind == FT_Dwarf);
+    return u.dwarf.LineDelta;
+  }
+  void setLineDelta(int64_t LineDelta) {
+    assert(Kind == FT_Dwarf);
+    u.dwarf.LineDelta = LineDelta;
+  }
+};
+
+/// Interface implemented by fragments that contain encoded instructions and/or
+/// data.
+class MCEncodedFragment : public MCFragment {
+protected:
+  MCEncodedFragment(MCFragment::FragmentType FType, bool HasInstructions)
+      : MCFragment(FType, HasInstructions) {}
 };
 
+// TODO Delete
+using MCDataFragment = MCFragment;
+using MCRelaxableFragment = MCFragment;
+
 class MCAlignFragment : public MCFragment {
   /// The alignment to ensure, in bytes.
   Align Alignment;
@@ -602,67 +664,6 @@ class MCOrgFragment : public MCFragment {
   }
 };
 
-class MCLEBFragment final : public MCEncodedFragment {
-  /// True if this is a sleb128, false if uleb128.
-  bool IsSigned;
-
-  /// The value this fragment should contain.
-  const MCExpr *Value;
-
-public:
-  MCLEBFragment(const MCExpr &Value, bool IsSigned)
-      : MCEncodedFragment(FT_LEB, false), IsSigned(IsSigned), Value(&Value) {}
-
-  const MCExpr &getValue() const { return *Value; }
-  void setValue(const MCExpr *Expr) { Value = Expr; }
-
-  bool isSigned() const { return IsSigned; }
-
-  static bool classof(const MCFragment *F) {
-    ...
[truncated]

Copy link
Contributor

@aengelke aengelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction looks good, will review more closely tomorrow. However, could you elaborate on why the fixed+variable part are not stored together? This seems to complicate things when accessing them. If growth during relaxation is a concern, we could allocate some padding afterwards to reduce the likelihood of moving the entire contents.

uint32_t VarContentStart = 0;
uint32_t VarContentEnd = 0;
uint32_t VarFixupStart = 0;
uint32_t VarFixupEnd = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this design, this seems to complicate things. Why not a single, contiguous region for all fixups and store just VarContentStart and VarFixupStart? This would avoid hacks like in MCAssembler::layout.

return F->getKind() == MCFragment::FT_Data;
// Source fixup offsets are relative to the variable part's start.
// Stored fixup offsets are relative to the fixed part's start.
void setVarFixups(ArrayRef<MCFixup> Fixups);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in semantics also feels like something someone's going to trip on.. but it's better than duplicating the transform code at call sites, so ok.

MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Jul 13, 2025
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,

to

```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```

Changes:

* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
  MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
  FT_DwarfFrame, with plans to extend to other fragment types.
  dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
  Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
  A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
  expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
  associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
  FT_Relaxable as otherwise there would be many more FT_DwarfFrame
  fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=267b136359d8448c73432b4f3ceeefbf4c35e00b&to=ef026be147babfd2b8f4ed4d875deadcbbbb2bbb&stat=instructions:u

```
stage2-O0-g instructins:u geomeon (-0.12%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```

```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```

Pull Request: llvm#148544
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Jul 14, 2025
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,

to

```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```

Changes:

* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
  MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
  FT_DwarfFrame, with plans to extend to other fragment types.
  dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
  Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
  A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
  expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
  associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
  FT_Relaxable as otherwise there would be many more FT_DwarfFrame
  fragments in -g compilations.

https://llvm-compile-time-tracker.com/compare.php?from=267b136359d8448c73432b4f3ceeefbf4c35e00b&to=ef026be147babfd2b8f4ed4d875deadcbbbb2bbb&stat=instructions:u

```
stage2-O0-g instructins:u geomeon (-0.12%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```

```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```

Pull Request: llvm#148544
@MaskRay
Copy link
Member Author

MaskRay commented Jul 14, 2025

Direction looks good, will review more closely tomorrow. However, could you elaborate on why the fixed+variable part are not stored together? This seems to complicate things when accessing them. If growth during relaxation is a concern, we could allocate some padding afterwards to reduce the likelihood of moving the entire contents.

Thanks for taking a look.

Growth of the variable part during relaxation is a valid concern.
To store both the fixed and variable parts together, we need a vector-like data structure (begin, size, capacity) for the variable part, while maintaining the four variables related to Contents:

ContentStart
ContentEnd (= VarStart)
VarEnd
VarEndOfStorage

The current approach also uses four variables.

To prevent issues, such as a 2-byte x86 instruction relaxing to 5 bytes requires copying 100-byte fixed part, we must track the maximum variable size for each fragment type and target-specific span-dependent instruction.

The padding can be 10 bytes, or perhaps longer.
x86 is the only user of MCInst::Flags span-dependent instruction ({evex} as we have discovered in #147229).
(While I am not familiar with x86 APX, I suspect that we should encode the prefix into the fixed part (MCDataFragment), leaving the opcode/operands to MCRelaxableFragment.)

We need the max variable size information for each fragment type and each target-specific FT_Relaxable, otherwise there is a risk that relaxing a 2-byte x86 instruction to 5-byte might copy 100-byte fixed part.


Although the current storage scheme could likely be optimized, it offers sufficient flexibility.
MCFragment users rely on the {set,get}Var{Contents,Fixups} APIs, allowing the internal implementation to be swapped out for a more efficient scheme if one is identified later.

Created using spr 1.3.5-bogner
@dschuff
Copy link
Member

dschuff commented Jul 14, 2025

I haven't actually reviewed this PR, but since it popped up in my notifications and it might be relevant to MCFragment refactoring, I wanted to let you know that we've finished the Native Client deprecation in Chromium, and removing NaCl support from LLVM is on my TODO list. So this might mean that we can also remove bundle alignment support from the assembler (assuming there are no users other than NaCl?) and some things could be simplified.

@MaskRay
Copy link
Member Author

MaskRay commented Jul 15, 2025

I haven't actually reviewed this PR, but since it popped up in my notifications and it might be relevant to MCFragment refactoring, I wanted to let you know that we've finished the Native Client deprecation in Chromium, and removing NaCl support from LLVM is on my TODO list. So this might mean that we can also remove bundle alignment support from the assembler (assuming there are no users other than NaCl?) and some things could be simplified.

Thanks! NaCL is the only user of bundle alignment. It should significantly simplify MCFragment. Thank you for putting the NaCL removal on your TODO list!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants