Skip to content

[Tests]: Adding dummy causal models for testing in regular CI run #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

abukhoy
Copy link
Contributor

@abukhoy abukhoy commented May 29, 2025

Purpose of this PR:

This update aims to reduce test execution time for causal language model inference. Previously, tests were run using full-scale models with one or two layers, which was inefficient and time-consuming. Refactoring CLI api testing for independent testing and redundant conftest code.

What’s Changed:

Introduced dummy models with significantly smaller configurations by adjusting parameters such as max_position_embeddings, num_hidden_layers, num_attention_heads, hidden_size, intermediate_size, vocab_size and additional_params.
These lightweight models are used exclusively for testing purposes to ensure faster execution without compromising test coverage.

And CLI testing has two test scripts one is for export, compile, and execute, another is for infer cli api.

Note: This optimization is applied only to causal language models.

abukhoy added 9 commits May 30, 2025 06:34
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
"hpcai-tech/grok-1",
]

test_dummy_model_configs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this outside this file? may be we can maintain a CSV file for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made a json file for dummy configs.

"hpcai-tech/grok-1",
]

test_dummy_model_configs = [
# model_name, model_type, max_position_embeddings, num_hidden_layers, num_attention_heads, hidden_size, intermediate_size, vocab_size, additional_params
("TinyLlama/TinyLlama-1.1B-Chat-v1.0", "llama", 128, 1, 2, 64, 256, 32000, {"num_key_value_heads": 1}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we following any criteria for selecting these configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No criteria, but some of the models has constraint for some of the config params with specific value.


if model_hf is None:
model_hf, _ = load_causal_lm_model(model_config)
model_hf_cb = copy.deepcopy(model_hf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, later for CB testing we don't need to call the load_causal_lm_model again. And it reduces the code for dummy model execution. If you don't want this, please let me know.

@pytest.mark.cli
@pytest.mark.parametrize("config", configs)
def test_export_compile_execute_qnn_fb(mocker, config):
# testing export -> compile -> infer with full_batch_size in QNN enviroment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "enviroment"

@pytest.mark.qnn
@pytest.mark.cli
@pytest.mark.parametrize("config", configs)
def test_export_compile_execute_qnn(mocker, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Both test_export_compile_execute_qnn and test_export_compile_execute_qnn_fb is currently having same configs right? Ideally in test_export_compile_execute_qnn we should be providing BS and in test_export_compile_execute_qnn_fb we should be providing FBS.
  2. Rename test_export_compile_execute_qnn_fb -> test_export_compile_execute_qnn_fbs for better readability
  3. Typo in 'enviroment'

)
check_infer(mocker=mocker, generation_len=20, **local_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a vlm qnn test as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding this test but got error, so removed from the test.

mxfp6=ms.mxfp6,
mxint8=ms.mxint8,
full_batch_size=ms.full_batch_size,
enable_qnn=ms.enable_qnn,
image_url=kwargs["image_url"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we make sure the infer is running as expected? Please include proper asset for checking, export, compile and generation is running proper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added checks and assertions which were possible.

Signed-off-by: Abukhoyer Shaik <[email protected]>
filtered_configs = [(config, name) for name, config in model_config_dict.items() if name in selected_model_names]
config_objects = [item for item in filtered_configs]
model_names = [name for _, name in filtered_configs]
return config_objects, model_names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code loops over filtered_configs multiple times to extract config_objects and model_names, which is redundant. Can we reduce the number of loops?

Signed-off-by: Abukhoyer Shaik <[email protected]>

data = []

for i in range(12):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using magic numbers, we should define constants for 12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is model specific, so we cannot generalize now. This 12 is number of layers of the model like gpt2 model in this case. Actually, we are creating a custom IO file here and later it will be no longer needed when the compile CLI Api would be backed by HL Api.

mxint8=model_setup.mxint8,
full_batch_size=model_setup.full_batch_size,
enable_qnn=model_setup.enable_qnn,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should add a check for custom IO file too

@@ -5,13 +5,15 @@
#
# -----------------------------------------------------------------------------

import copy
import json
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add changes of the file https://github.com/quic/efficient-transformers/pull/314/files#diff-fcc05ce2e83110153682cf6f488ad15825018072c3efd5e48266119cebe4e77a in this PR. I think some how we missed the swiftkv Unit test from our repo. I think since you are making changes in this file, we can add these changes too,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants