GeoFLARE: added GALE_FA, an alternate attention to GALE, for GeoTransolver by dakhare-creator · Pull Request #1405 · NVIDIA/physicsnemo

dakhare-creator · 2026-02-11T23:12:40Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

greptile-apps · 2026-02-11T23:15:01Z

Greptile Overview

Greptile Summary

This PR adds GAFLARE (Geometry-Aware FLARE), an alternative attention mechanism to GALE for the GeoTransolver model.

Key Changes:

New gaflare.py module implementing FLARE-based attention with geometry-aware context
Modified gale.py to support dynamic attention type selection via attention_type parameter
Updated geotransolver.py to expose attention_type parameter (default: "GALE")

Critical Issues Found:

gaflare.py line 121: use_te parameter is hardcoded to False, ignoring the constructor argument
gaflare.py line 122: Attention scale hardcoded to 1.0 instead of proper dim_head**-0.5 scaling
gale.py line 393: Unsafe use of globals() for class selection - security risk and violates MOD-009 coding standard

Minor Issues:

Typo in docstring: "Origional" → "Original"
Missing newline at end of gaflare.py

Important Files Changed

Filename	Overview
physicsnemo/experimental/models/geotransolver/gaflare.py	New GAFLARE attention mechanism added. Critical issues: `use_te` parameter hardcoded to False (line 121), attention scale set to 1.0 instead of proper scaling. Minor: typo in docstring, missing newline at EOF.
physicsnemo/experimental/models/geotransolver/gale.py	Integration changes to support attention type selection. Critical security issue: unsafe use of `globals()` for dynamic class instantiation (line 393) allows potential code execution.
physicsnemo/experimental/models/geotransolver/geotransolver.py	Added `attention_type` parameter to allow switching between GALE and GAFLARE. Clean pass-through implementation with proper default value.

greptile-apps

_{3 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T23:15:04Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+        self.use_te = False # te will disable FlashAttention for different size of q and k
+        self.scale = 1. #dim_head**-0.5


use_te parameter ignored - hardcoded to False on line 121, making the use_te constructor parameter ineffective.

Suggested change

self.use_te = False # te will disable FlashAttention for different size of q and k

self.scale = 1. #dim_head**-0.5

super().__init__()

self.use_te = use_te and TE_AVAILABLE

raise ValueError() added.

greptile-apps · 2026-02-11T23:15:05Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+        **kwargs,
+    ):
+        self.use_te = False # te will disable FlashAttention for different size of q and k
+        self.scale = 1. #dim_head**-0.5


self.scale set to 1.0 but commented code suggests it should be dim_head**-0.5 for proper attention scaling. Current implementation may affect attention quality.

Suggested change

self.scale = 1. #dim_head**-0.5

self.scale = dim_head**-0.5

Updated: self.scale = 1. # FLARE scale is 1.0
FLARE code uses scale 1.0

greptile-apps · 2026-02-11T23:15:06Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+    context = torch.randn(2, 8, 64, 32).to(device)
+    gaflare = GAFLARE(dim=256, heads=8, dim_head=32, context_dim=32).to(device)
+    outputs = gaflare(x, context)
+    print(outputs[0].shape)


Missing newline at end of file

Suggested change

print(outputs[0].shape)

print(outputs[0].shape)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-11T23:15:07Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+    :class:`GALE` : Origional GeoTransolver GALE attention class.
+    :class:`GALE_block` : Transformer block using GAFLARE attention.


Typo: "Origional" should be "Original"

Suggested change

:class:`GALE` : Origional GeoTransolver GALE attention class.

:class:`GALE_block` : Transformer block using GAFLARE attention.

:class:`GALE` : Original GeoTransolver GALE attention class.

:class:`GAFLARE_block` : Transformer block using GAFLARE attention.

Origional -> Original

physicsnemo/experimental/models/geotransolver/gaflare.py

greptile-apps · 2026-02-11T23:15:15Z

Additional Comments (1)

physicsnemo/experimental/models/geotransolver/gale.py
Using globals() for dynamic class selection is unsafe - allows arbitrary code execution if attention_type comes from untrusted input. This violates MOD-009 (avoid string-based class selection). Consider explicit mapping:

ATTENTION_CLASSES = {
    "GALE": GALE,
    "GAFLARE": GAFLARE,
}
if attention_type not in ATTENTION_CLASSES:
    raise ValueError(f"Unknown attention_type: {attention_type}")
self.Attn = ATTENTION_CLASSES[attention_type](
    hidden_dim,
    heads=num_heads,
    # ... rest of arguments
)

coreyjadams

Hi @dakhare-creator - thanks for opening this. Overall it seems reasonable. Good to put it in experimental. I made a couple comments in the code, but also:

it'd be unusual for use to publish a "GeoFlare" model without also making available "Flare". Can you add standard FLARE attention to the experimental/nn folder so we can do that too?
We can also probably easily create a flare.py itself for the model.
How much overlap in the math and attention blocks is there between PhysicsAttention and FLARE Attention? PhysicsAttention can be used on 2d and 3d data, could this? I strongly suspect yes. Can we make this modular so we can reuse as much as possible between attention layers?

Once we're ready, let's also invite the FLARE authors to take a look.

coreyjadams · 2026-02-12T01:29:15Z

physicsnemo/experimental/models/geotransolver/gale.py

        # GALE attention layer
-        self.Attn = GALE(
+        if attention_type in globals():
+            self.Attn = globals()[attention_type](


Let's avoid using globals() here like this I think it's not needed. You can just do a match to get the attention type?

Updated with match.

coreyjadams · 2026-02-12T01:29:39Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+if __name__ == "__main__":
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    x = (torch.randn(2, 100, 256).to(device),)
+    context = torch.randn(2, 8, 64, 32).to(device)
+    gaflare = GAFLARE(dim=256, heads=8, dim_head=32, context_dim=32).to(device)
+    outputs = gaflare(x, context)
+    print(outputs[0].shape)


Let's remove this from the model code.

physicsnemo/experimental/models/geotransolver/gaflare.py

coreyjadams · 2026-02-12T01:30:48Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+TE_AVAILABLE = check_version_spec("transformer_engine", "0.1.0", hard_fail=False)
+if TE_AVAILABLE:
+    import transformer_engine.pytorch as te


There is a newer, simpler syntax we can use here with OptionalImport.

Added following import to gaflare.py and gale.py:
from physicsnemo.core.version_check import OptionalImport
te = OptionalImport("transformer_engine.pytorch", "0.1.0")

coreyjadams · 2026-02-12T01:31:00Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+    import transformer_engine.pytorch as te
+
+
+class GAFLARE(nn.Module):


Can we revisit this name?

Yes for sure! Please let me know if you have any suggestion.

update main

update geoflare

physicsnemo/experimental/models/geotransolver/gaflare.py

mnabian · 2026-02-19T18:01:46Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+    See Also
+    --------
+    :class:`GALE` : Original GeoTransolver GALE attention class.
+    :class:`GALE_block` : Transformer block using GAFLARE attention.


Incorrect cross-reference

GALE_block calls GAFLARE and GALE is alternative attention mechanism.
Should we change it as -

See Also -------- :class:`GALE` : Original GeoTransolver GALE attention class. :class:`GALE_block` : Transformer block **that calls GALE or GAFLARE** attention.

mnabian · 2026-02-19T18:07:29Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+        # Linear projections for cross-attention
+        self.cross_q = linear_layer(dim_head, dim_head)
+        self.cross_k = linear_layer(context_dim, dim_head)
+        self.cross_v = linear_layer(context_dim, dim_head)


If context_dim=0, these layers are silently created, and will fail later (or produce garbage) if a non-empty context tensor is passed. A better design would skip creating the corss-attention layers entirely when context_dim=0. Or you can explicitly raise an error:

if context is not None and self.context_dim == 0: raise ValueError(...)

Added if context_dim > 0: to skip creating the cross-attention layers.

mnabian · 2026-02-19T18:09:20Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+        use_te: bool = True,
+        context_dim: int = 0,
+    ):
+        self.use_te = False # te will disable FlashAttention for different size of q and k


I suggest we explicitly raise an error when use_te=True, instead of silently setting it to False.

Added

if use_te: raise ValueError(( "GAFLARE does not support Transformer Engine backend. " "Use use_te=False; TE disables FlashAttention for differing q/k sizes in FLARE attention." )

vpuri3

Hi @dakhare-creator, I've left several comments in the code. Implementing FLARE straight on a geom problem can be error prone. I'd recommend first validating FLARE on one of the paper's benchmark problems

https://github.com/vpuri3/FLARE.py/blob/master/pdebench/models/flare.py

and then exposing the relevant layers to the geometry solvers interface.

vpuri3 · 2026-02-23T17:40:53Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+                "Use use_te=False; TE disables FlashAttention for differing q/k sizes in FLARE attention."
+            )
+        self.use_te = use_te
+        self.scale = 1.     # FLARE scale is 1.0


while 1 is the correct attn scale for dim_head = 8, this attn scale may not scale well for greater head dims. I'd recommend either exposing this setting to the user or setting it automatically like:

`self.scale = 1 if self.dim_head <= 8 else (self.dim_head ** -0.5)

Thanks for the recommendation. Added!

vpuri3 · 2026-02-23T17:42:06Z

physicsnemo/experimental/models/geotransolver/gale_fa.py

+    dim : int
+        Input dimension of the features.
+    heads : int, optional
+        Number of attention heads. Default is 8.
+    dim_head : int, optional
+        Dimension of each attention head. Default is 64.


Is it the case that dim = heads * dim_head? If so maybe an assert or warnings would be good in case the user. passes in an inconsistent setting.

In this implementation, dim = heads * dim_head is not the case.
We explicitly define inner_dim = heads * dim_head and the dataflow looks like
dim -> inner_dim -> [heads, dim_head] -> inner_dim -> dim

vpuri3 · 2026-02-23T17:46:40Z

physicsnemo/experimental/models/geotransolver/gaflare.py

+            # Use PyTorch's scaled dot-product attention
+            z = [F.scaled_dot_product_attention(_G, _k, _v, scale=1.0) for _G, _k, _v in zip(G, k, v)]
+            self_attention = [F.scaled_dot_product_attention(_k, _G, _z, scale=1.0) for _k, _G, _z in zip(k, G, z)]


pass in self.scale.

vpuri3 · 2026-02-23T17:56:39Z

physicsnemo/experimental/models/geotransolver/gale_fa.py

+        self.self_k = linear_layer(dim_head, dim_head)
+        self.self_v = linear_layer(dim_head, dim_head)


I've found that using a more expressive projection here really helps performance on PDE problems.

The tradeoff here is described in Appendix F under heading "Tradeoff between query dynamics and key/value expressivity" in the paper: https://arxiv.org/pdf/2508.12594.

For PDE problems, I've found that replacing FFN type layers (C -> 4C -> GeLU -> C) with deeper but narrower MLPs can help because the mapping is often smoother / more “function-approximation-like,” and gains come from expressive feature transforms more than from content-addressable routing/memorization.

Here's the full model definition I used in the experiments in the paper:

https://github.com/vpuri3/FLARE.py/blob/master/pdebench/models/flare.py

I understand that deep KV projections would increase parameter counts. To compensate for that, we have validated that FLARE performs at par with other models at smaller hidden sizes (C=64 for FLARE outperforms C=128 for transolver).

Thank you for the suggestion. I’ll test this and include it in the next PR. I’ve also opened an issue to track the improvement: #1440

vpuri3 · 2026-02-23T18:02:02Z

physicsnemo/experimental/models/geotransolver/gale_fa.py

+                ) for _cross_attention in cross_attention]
+            else:
+                cross_attention = [F.scaled_dot_product_attention(_q, k, v, scale=1.0) for _q in q]
+


Cross attention with FLARE has not been fully fleshed out. My plan for cross attention is outlined on page 71 (5.1.2 Aim 1(b): conditioning mechanism for dynamic PDE surrogates) of this document:

https://drive.google.com/file/d/1SNDjQ0gMSZmv0jg49S-risEoDiwE63aY/view?usp=sharing

Thank you for the suggestion. I’ll test this and include it in the next PR. I’ve also opened an issue to track the improvement: #1440

update main

update geoflare

mnabian · 2026-02-25T02:26:37Z

/blossom-ci

coreyjadams · 2026-02-25T15:12:43Z

Hi @dakhare-creator - with the geometry encoded FLARE, does this also enable standard FLARE attention and model?

dakhare-creator · 2026-02-25T17:28:46Z

Hi @dakhare-creator - with the geometry encoded FLARE, does this also enable standard FLARE attention and model?

Hi @coreyjadams, Yes if the context is None, the attention mechanism will be standard FLARE.

coreyjadams · 2026-02-26T19:58:27Z

physicsnemo/experimental/models/transolver/flare.py

+class FLARE(nn.Module):
+    r"""FLARE: Fast Low-rank Attention Routing Engine attention layer.
+    Adopted:
+    - FLARE attention: Fast Low-rank Attention Routing Engine
+        paper: https://arxiv.org/abs/2508.12594
+


This should probably go into experimental/nn/flare_attention.py?

coreyjadams · 2026-02-26T20:03:43Z

physicsnemo/models/transolver/transolver.py

+                self.Attn = FLARE(
                    hidden_dim,
-                    spatial_shape=spatial_shape,
                    heads=num_heads,
                    dim_head=hidden_dim // num_heads,
                    dropout=dropout,
-                    slice_num=slice_num,
+                    n_global_queries=slice_num,
                    use_te=use_te,
-                    plus=plus,


Does Flare support more multiple dimensions? How does it handle irregular vs. regular data?

No, FLARE implemented here is only designed for point cloud and does not work for regular data. Maybe we should consider adding raise error for structure data input.

correct - FLARE cannot make use of any connectivity information. If you feed a graph/mesh to flare, you have to only feed the unordered point cloud

coreyjadams · 2026-02-26T20:04:14Z

physicsnemo/models/transolver/transolver.py

 from physicsnemo.core.meta import ModelMetaData
 from physicsnemo.core.module import Module
 from physicsnemo.core.version_check import check_version_spec
+from physicsnemo.experimental.models.transolver.flare import FLARE


transolver can't import from experimental.

Make flare.py an experimental model instead?

dakhare-creator added 4 commits February 6, 2026 18:44

gaflare attention added

d61fd66

minor comments update

0206a35

GAFLARE updated: kv in cross, te

6a4a0bf

test function for GAFLARE added

e278263

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

greptile comments fix

7e1c271

coreyjadams self-requested a review February 12, 2026 01:27

coreyjadams reviewed Feb 12, 2026

View reviewed changes

dakhare-creator added 3 commits February 12, 2026 11:03

Merge pull request #13 from NVIDIA/main

f6ee8cb

update main

Merge pull request #14 from dakhare-creator/main

5da6f0d

update geoflare

main removed, OptionalImport, case added

1174fe3

mnabian reviewed Feb 19, 2026

View reviewed changes

physicsnemo/experimental/models/geotransolver/gaflare.py Outdated Show resolved Hide resolved

mnabian reviewed Feb 19, 2026

View reviewed changes

address mnabian comment

44bcbe0

coreyjadams mentioned this pull request Feb 23, 2026

🚀[FEA]: Add FLARE: Fast Low-rank Attention Routing Engine to model zoo #1150

Open

vpuri3 reviewed Feb 23, 2026

View reviewed changes

dakhare-creator and others added 5 commits February 23, 2026 15:02

Merge pull request #15 from NVIDIA/main

6bdd626

update main

Merge pull request #16 from dakhare-creator/main

6dacd6c

update geoflare

scale updated in GAFLARE

d1059e4

add tests for flare

2b6f33a

formatting

ff033e7

dakhare-creator added 2 commits February 25, 2026 10:19

rename: gaflare.py -> gale_fa.py

995ab9f

Merge origin/geoflare

33f7dd3

dakhare-creator added 2 commits February 25, 2026 10:59

rename: GAFLARE -> GALE_FA

2552481

minor update in test_geotransolver.py: GAFLARE -> GALE_FA

a09d16a

dakhare-creator changed the title ~~GeoFLARE: added GAFLARE, an alternate attention to GALE, for GeoTransolver~~ GeoFLARE: added GALE_FA, an alternate attention to GALE, for GeoTransolver Feb 25, 2026

dakhare-creator added 2 commits February 25, 2026 14:50

minor comments corrected, case _ added

8bcd6fb

FLARE added for Transolver, Transolver updated

a8c64df

coreyjadams requested changes Feb 26, 2026

View reviewed changes

		self.use_te = False # te will disable FlashAttention for different size of q and k
		self.scale = 1. #dim_head**-0.5

		:class:`GALE` : Origional GeoTransolver GALE attention class.
		:class:`GALE_block` : Transformer block using GAFLARE attention.

		import transformer_engine.pytorch as te


		class GAFLARE(nn.Module):

		self.self_k = linear_layer(dim_head, dim_head)
		self.self_v = linear_layer(dim_head, dim_head)

Conversation

dakhare-creator commented Feb 11, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 11, 2026

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnabian Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnabian Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpuri3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mnabian Feb 19, 2026 •

edited

Loading

mnabian Feb 19, 2026 •

edited

Loading

vpuri3 Feb 23, 2026 •

edited

Loading