Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

Open
MasayukiNagai opened this issue Oct 4, 2024 · 0 comments

Comments

@MasayukiNagai
Copy link

Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?

Screenshot 2024-10-04 at 12 10 31

Here's the code snippet where the L1 norm seems to be computed:

# MultKAN.py: forward, line 785~
x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)

if self.save_act:
    input_range = torch.std(preacts, dim=0) + 0.1
    output_range_spline = torch.std(postacts_numerical, dim=0) # for training, only penalize the spline part
    
    self.acts_scale_spline.append(output_range_spline / input_range)
# MultKAN.py: reg, line 1294~
if reg_metric == 'edge_forward_spline_n':
    acts_scale = self.acts_scale_spline
            
vec = acts_scale[i]
l1 = torch.sum(vec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant