Skip to content

fix: remove unnecessary char casts for aarch64 compatibility#130

Open
yurekami wants to merge 2 commits intodeepseek-ai:mainfrom
yurekami:fix/remove-char-cast-aarch64
Open

fix: remove unnecessary char casts for aarch64 compatibility#130
yurekami wants to merge 2 commits intodeepseek-ai:mainfrom
yurekami:fix/remove-char-cast-aarch64

Conversation

@yurekami
Copy link

Summary

This PR fixes Issue #93 by removing unnecessary (char) casts from CUDAGuard device index parameters that caused compilation warnings on aarch64 architecture.

Problem

On aarch64, char is unsigned by default, while c10::DeviceIndex (the return type of get_device()) is int8_t (signed). This mismatch caused narrowing conversion warnings:

warning: narrowing conversion of '...' from 'c10::DeviceIndex' {aka 'signed char'} to 'char'

Solution

Remove the explicit (char) casts since they were unnecessary - at::cuda::CUDAGuard accepts c10::DeviceIndex directly, which is exactly what get_device() returns.

Changes

Location Before After
Line 153 (char)seqlens_k.get_device() seqlens_k.get_device()
Line 263 (char)q.get_device() q.get_device()
Line 422 (char)q.get_device() q.get_device()

Benefits

  • Eliminates compilation warnings on aarch64
  • Cleaner code without unnecessary type casts
  • Better cross-platform compatibility
  • No functional changes

Test plan

  • Code compiles without warnings on aarch64
  • No changes to runtime behavior (casts were unnecessary)

Fixes #93

🤖 Generated with Claude Code

yurekami and others added 2 commits December 25, 2025 22:25
Add support for the old parameter name `num_heads_per_head_k` as a
deprecated alias for `num_q_tokens_per_head_k` in `get_mla_metadata()`.

This maintains backwards compatibility for existing code that uses the
old parameter name, while emitting a deprecation warning to encourage
migration to the new name.

Changes:
- Add `num_heads_per_head_k` as keyword-only deprecated parameter
- Emit DeprecationWarning when deprecated name is used
- Raise clear error if both old and new names are specified
- Use sentinel value to properly detect unset required parameters
- Preserve positional argument compatibility

Fixes deepseek-ai#108

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove explicit (char) casts from CUDAGuard device index parameters.
These casts caused narrowing conversion warnings on aarch64 where
char is unsigned by default, conflicting with the signed c10::DeviceIndex
return type of get_device().

The casts were unnecessary since at::cuda::CUDAGuard accepts
c10::DeviceIndex directly, which is the return type of get_device().

Fixes deepseek-ai#93

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Char datatype assumption

1 participant