Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

MasayukiNagai · 2024-10-04T17:04:17Z

Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?

Here's the code snippet where the L1 norm seems to be computed:

# MultKAN.py: forward, line 785~
x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)

if self.save_act:
    input_range = torch.std(preacts, dim=0) + 0.1
    output_range_spline = torch.std(postacts_numerical, dim=0) # for training, only penalize the spline part
    
    self.acts_scale_spline.append(output_range_spline / input_range)

# MultKAN.py: reg, line 1294~
if reg_metric == 'edge_forward_spline_n':
    acts_scale = self.acts_scale_spline
            
vec = acts_scale[i]
l1 = torch.sum(vec)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

MasayukiNagai commented Oct 4, 2024

Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

Clarification on L1 Norm Regularization in Paper vs. Code Implementation #476

Comments

MasayukiNagai commented Oct 4, 2024