You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?
Here's the code snippet where the L1 norm seems to be computed:
# MultKAN.py: forward, line 785~
x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)
if self.save_act:
input_range = torch.std(preacts, dim=0) + 0.1
output_range_spline = torch.std(postacts_numerical, dim=0) # for training, only penalize the spline part
self.acts_scale_spline.append(output_range_spline / input_range)
# MultKAN.py: reg, line 1294~
if reg_metric == 'edge_forward_spline_n':
acts_scale = self.acts_scale_spline
vec = acts_scale[i]
l1 = torch.sum(vec)
The text was updated successfully, but these errors were encountered:
Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?
Here's the code snippet where the L1 norm seems to be computed:
The text was updated successfully, but these errors were encountered: