Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] initial support of asym_int4_rtn #12484

Merged
merged 21 commits into from
Dec 5, 2024
Merged

Conversation

rnwang04
Copy link
Contributor

@rnwang04 rnwang04 commented Dec 3, 2024

Description

1. Why the change?

https://github.com/analytics-zoo/nano/issues/1766#issuecomment-2516410003
Work with https://github.com/intel-analytics/llm.cpp/pull/704 & https://github.com/intel-analytics/llm.cpp/pull/705

2. User API changes

New precision support of asym_int4

3. Summary of the change

  • Support new precision asym_int4
  • Support running asym_int4 on CPU to verify the correctness
  • Support running asym_int4 on NPU with optimize_model=False
  • Support running asym_int4 on NPU with optimize_model=True (mp version) for Qwen2-7B
  • Support new convert in convert_llm_for_deploy for Qwen2-7B with asym_int4
  • Support running asym_int4 on NPU with optimize_model=True (cpp version) for Qwen2-7B
  • Support running asym_int4 GW on NPU with optimize_model=True (cpp version) for Qwen2-7B
  • Support running asym_int4 split LM head
  • Have verified ssave & load of asym_int4 Qwen2-7B

4. How to test?

  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.

@rnwang04 rnwang04 marked this pull request as draft December 3, 2024 07:39
@rnwang04 rnwang04 marked this pull request as ready for review December 4, 2024 07:45
@rnwang04 rnwang04 changed the title initial support of asym_int4_rtn [NPU] initial support of asym_int4_rtn Dec 4, 2024
@rnwang04 rnwang04 requested a review from jason-dai December 4, 2024 07:46
@rnwang04 rnwang04 force-pushed the q4_1_rtn branch 2 times, most recently from b46ac52 to 67ab50e Compare December 5, 2024 08:51
@rnwang04 rnwang04 requested a review from plusbang December 5, 2024 08:51
@rnwang04 rnwang04 merged commit 49ab897 into intel-analytics:main Dec 5, 2024
1 check passed
@rnwang04 rnwang04 deleted the q4_1_rtn branch December 5, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants