Skip to content

Conversation

mehulbafnaa
Copy link

@mehulbafnaa mehulbafnaa commented Jun 21, 2025

closes #325

Add Min-P Sampling Strategy (MinPSampling)

This PR introduces a new sampling strategy, Min-P Sampling, based on the recent paper [Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs](https://arxiv.org/abs/2407.01082). Min-P sampling addresses some limitations of existing strategies like Top-k and Top-p, providing improved control over diversity and reducing repetitive token generation.

Motivation

  • Enhanced Sampling Quality: Min-P sampling can achieve better diversity and quality compared to existing methods.
  • Reduced Repetition: It effectively reduces repetitive outputs common with other sampling methods.

Implementation Highlights

  • Adds MinPSampling class under gemma.gm.text.
  • Consistent API with existing sampling methods (Greedy, TopkSampling, ToppSampling).
  • Minimal invasive changes to existing codebase.

Performance

  • Benchmarks confirm comparable decoding speed to Top-p sampling with improved diversity.
  • Low overhead; opt-in functionality doesn't affect existing samplers.

Testing

  • Unit tests included to verify correct behavior and edge cases.
  • Added custom unit tests for additional coverage and robustness.
  • All CI checks passing successfully.

Usage Example

from gemma.gm.text import MinPSampling

sampler = MinPSampling(p=0.95)
tokens = sampler.get_next_tokens(logits, rng)

Notes

  • This PR intentionally excludes updates to documentation files (README.md and other docs), as requested.

References

@mehulbafnaa mehulbafnaa marked this pull request as ready for review June 21, 2025 03:18
@mehulbafnaa
Copy link
Author

@Conchylicultor Please review the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Integrate Min-P Sampling into Gemma

1 participant