Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer 40 / logits all nan #133

Open
darkacorn opened this issue Dec 11, 2024 · 4 comments
Open

layer 40 / logits all nan #133

darkacorn opened this issue Dec 11, 2024 · 4 comments

Comments

@darkacorn
Copy link

its weird .. im trying todo abliteration / finetuning later on the model

but its acting rather different from glm4-chat / my stuff works for chat

what is the core difference expect the 16k+ audio token the 4 special ones . and WHY are we not getting any logits back ?

@darkacorn
Copy link
Author

import torch
import einops 
import lovely_tensors as lt
import gc

lt.monkey_patch()
model = LanguageModel("/home/o_0/aiart/GLM-4-Voice/remove-refusals-with-transformers/glm-4-voice-9b", device_map="auto", torch_dtype=torch.float16)

print(model)```

responds as expected 

```ChatGLMForConditionalGeneration(
  (transformer): ChatGLMModel(
    (embedding): Embedding(
      (word_embeddings): Embedding(168960, 4096)
    )
    (rotary_pos_emb): RotaryEmbedding()
    (encoder): GLMTransformer(
      (layers): ModuleList(
        (0-39): 40 x GLMBlock(
          (input_layernorm): RMSNorm()
          (self_attention): SelfAttention(
            (query_key_value): Linear(in_features=4096, out_features=4608, bias=True)
            (core_attention): SdpaAttention(
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (dense): Linear(in_features=4096, out_features=4096, bias=False)
          )
          (post_attention_layernorm): RMSNorm()
          (mlp): MLP(
            (dense_h_to_4h): Linear(in_features=4096, out_features=27392, bias=False)
            (dense_4h_to_h): Linear(in_features=13696, out_features=4096, bias=False)
          )
        )
      )
      (final_layernorm): RMSNorm()
    )
    (output_layer): Linear(in_features=4096, out_features=168960, bias=False)
  )
  (generator): WrapperModule()
)```

@darkacorn darkacorn reopened this Dec 11, 2024
@darkacorn
Copy link
Author

    with model.trace() as tracer:
        with tracer.invoke("hello world") as invoker:
            output = model.output.save()

# First let's inspect the output shape and values
logits = output[0][:,-3,:]
print("Logits stats:", {
    "shape": logits.shape,
    "has_nan": torch.isnan(logits).any().item(),
    "max": logits.max().item() if not torch.isnan(logits).all() else "all NaN",
    "min": logits.min().item() if not torch.isnan(logits).all() else "all NaN"
})

# Filter out NaN values before taking argmax
if not torch.isnan(logits).all():
    masked_logits = torch.nan_to_num(logits, float('-inf'))  # Replace NaN with -inf
    token_id = masked_logits.argmax()
    decoded_token = model.tokenizer.decode([token_id], skip_special_tokens=False, clean_up_tokenization_spaces=False)
    print(f"Token ID: {token_id}")
    print(f"Decoded token: {decoded_token}")
else:
    print("All values are NaN - cannot determine token")```
    
    
    

Logits stats: {'shape': torch.Size([1, 168960]), 'has_nan': True, 'max': 'all NaN', 'min': 'all NaN'}
All values are NaN - cannot determine token

@darkacorn
Copy link
Author

this is the weirdest thing ive ever seen in a model - can the author or someone from THUDM actually comment on that why logits are actually refused ?

@SodaWithoutSparkles
Copy link

this is the weirdest thing ive ever seen in a model - can the author or someone from THUDM actually comment on that why logits are actually refused ?

我也觉得很奇怪,怎么拿不到logits呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants