Skip to content

Commit

Permalink
Update S0-L18.md
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyanjun committed Mar 18, 2024
1 parent a223212 commit c2501cc
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions _contents/S0-L18.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ Scaling Policy [5].

## More Readings:

#### Knowledge Conflicts for LLMs: A Survey
+ https://arxiv.org/abs/2403.08319
+ This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness

#### Transformer Debugger
+ https://github.com/openai/transformer-debugger
+ Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders. TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.
Expand Down

0 comments on commit c2501cc

Please sign in to comment.