layer 40 / logits all nan #133

darkacorn · 2024-12-11T11:34:24Z

its weird .. im trying todo abliteration / finetuning later on the model

but its acting rather different from glm4-chat / my stuff works for chat

what is the core difference expect the 16k+ audio token the 4 special ones . and WHY are we not getting any logits back ?

darkacorn · 2024-12-11T11:37:59Z

import torch
import einops 
import lovely_tensors as lt
import gc

lt.monkey_patch()
model = LanguageModel("/home/o_0/aiart/GLM-4-Voice/remove-refusals-with-transformers/glm-4-voice-9b", device_map="auto", torch_dtype=torch.float16)

print(model)```

responds as expected 

```ChatGLMForConditionalGeneration(
  (transformer): ChatGLMModel(
    (embedding): Embedding(
      (word_embeddings): Embedding(168960, 4096)
    )
    (rotary_pos_emb): RotaryEmbedding()
    (encoder): GLMTransformer(
      (layers): ModuleList(
        (0-39): 40 x GLMBlock(
          (input_layernorm): RMSNorm()
          (self_attention): SelfAttention(
            (query_key_value): Linear(in_features=4096, out_features=4608, bias=True)
            (core_attention): SdpaAttention(
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (dense): Linear(in_features=4096, out_features=4096, bias=False)
          )
          (post_attention_layernorm): RMSNorm()
          (mlp): MLP(
            (dense_h_to_4h): Linear(in_features=4096, out_features=27392, bias=False)
            (dense_4h_to_h): Linear(in_features=13696, out_features=4096, bias=False)
          )
        )
      )
      (final_layernorm): RMSNorm()
    )
    (output_layer): Linear(in_features=4096, out_features=168960, bias=False)
  )
  (generator): WrapperModule()
)```

darkacorn · 2024-12-11T11:39:12Z

    with model.trace() as tracer:
        with tracer.invoke("hello world") as invoker:
            output = model.output.save()

# First let's inspect the output shape and values
logits = output[0][:,-3,:]
print("Logits stats:", {
    "shape": logits.shape,
    "has_nan": torch.isnan(logits).any().item(),
    "max": logits.max().item() if not torch.isnan(logits).all() else "all NaN",
    "min": logits.min().item() if not torch.isnan(logits).all() else "all NaN"
})

# Filter out NaN values before taking argmax
if not torch.isnan(logits).all():
    masked_logits = torch.nan_to_num(logits, float('-inf'))  # Replace NaN with -inf
    token_id = masked_logits.argmax()
    decoded_token = model.tokenizer.decode([token_id], skip_special_tokens=False, clean_up_tokenization_spaces=False)
    print(f"Token ID: {token_id}")
    print(f"Decoded token: {decoded_token}")
else:
    print("All values are NaN - cannot determine token")```

Logits stats: {'shape': torch.Size([1, 168960]), 'has_nan': True, 'max': 'all NaN', 'min': 'all NaN'}
All values are NaN - cannot determine token

darkacorn · 2024-12-11T11:39:53Z

this is the weirdest thing ive ever seen in a model - can the author or someone from THUDM actually comment on that why logits are actually refused ?

SodaWithoutSparkles · 2024-12-11T14:45:01Z

this is the weirdest thing ive ever seen in a model - can the author or someone from THUDM actually comment on that why logits are actually refused ?

我也觉得很奇怪，怎么拿不到logits呢

darkacorn closed this as completed Dec 11, 2024

darkacorn reopened this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer 40 / logits all nan #133

layer 40 / logits all nan #133

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

SodaWithoutSparkles commented Dec 11, 2024

layer 40 / logits all nan #133

layer 40 / logits all nan #133

Comments

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

darkacorn commented Dec 11, 2024

SodaWithoutSparkles commented Dec 11, 2024