Catch invalid broadcasts in pointwise reduce fusion #3659

shivadbhavsar · 2024-11-26T01:43:15Z

No description provided.

pfultz2 · 2024-11-26T15:02:14Z

src/fuse_reduce.cpp

+        auto bstrides = b->get_shape().strides();
+
+        return std::all_of(
+            reduce_axes.begin(), reduce_axes.end(), [&](auto a) { return bstrides.at(a) == 0; });


I think you need to check if the non-reduce axes are not being broadcasted as well.

pfultz2 · 2024-11-26T15:08:04Z

src/fuse_reduce.cpp

+            auto axes       = reduce->get_operator().to_value().at("axes").to_vector<size_t>();
+            auto broadcast  = r.instructions["broadcast"];
+            auto fbroadcast = r.instructions["final_broadcast"];
+            if(not(is_valid_broadcast(broadcast, axes) and is_valid_broadcast(fbroadcast, axes)))


You should just check "broadcast" and not "final_broadcast"(which is the broadcast after contiguous if there is one).

pfultz2 · 2024-11-26T19:28:22Z

src/fuse_reduce.cpp

-        match::find_matches(
-            mpm, find_reduce_pointwise{}, find_pointwise_reduce{}, find_reduce_reduce{});
+        match::find_matches(mpm, find_reduce_pointwise{}, find_pointwise_reduce{});
+        match::find_matches(mpm, find_reduce_reduce{});


Why is this moved out?

because when it hits the broadcast condition, its not fusing the other fused_reduce in the input

We might need to add this to the matcher then.

pfultz2 · 2024-11-26T20:05:05Z

src/fuse_reduce.cpp

+            auto broadcast = r.instructions["broadcast"];
+            if(not is_valid_broadcast(broadcast, axes))
+                return;
+        }


This check needs to be applied to all matchers.

codecov · 2024-11-26T20:43:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.20%. Comparing base (162b008) to head (4433e9b).
Report is 5 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3659      +/-   ##
===========================================
+ Coverage    92.18%   92.20%   +0.02%     
===========================================
  Files          513      513              
  Lines        21596    21653      +57     
===========================================
+ Hits         19908    19965      +57     
  Misses        1688     1688

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfultz2 · 2024-11-26T20:44:44Z

So we could create a matcher like this to check the axes:

template<class M>
static auto match_broadcast_axes(M m)
{
    return make_basic_fun_matcher([=](matcher_context& ctx, instruction_ref ins) {
        optional<instruction_ref> result = m.match(ctx, ins);
        if(contains(ctx.instructions, "broadcast"))
        {
            auto axes      = ins->get_operator().to_value().at("axes").to_vector<size_t>();
            auto broadcast = r.instructions["broadcast"];
            if(not is_valid_broadcast(broadcast, axes))
                return nullopt;
        }
        return result;
    });
}

And then we could change match_broadcastable_input to use it like this:

static auto match_broadcastable_input(const std::string& op, const std::string& name)
{
    auto match_op                 = match::name(op)(used_once_except_broadcast()).bind(name);
    auto match_op_input           = any_input(match_op, match::used_once());
    auto broadcast_match_op_input = any_input(match_broadcast(match_op), match::used_once());
    return match::any_of(match_op_input, match_broadcast_axes(broadcast_match_op_input));
}

I dont know if there is a better way of doing this.

pfultz2 · 2024-11-27T00:24:26Z

Maybe a verify test could be added for this as well.

shivadbhavsar · 2024-11-27T03:32:57Z

Maybe a verify test could be added for this as well.

how can I force the verify test to run with MIGRAPHX_DISABLE_LAYERNORM_FUSION=1? or will CI do that at some point?

migraphx-bot · 2024-11-27T05:56:42Z

Test	Batch	Rate new 4433e9	Rate old 162b00	Diff	Compare
torchvision-resnet50	64	3,256.58	3,257.53	-0.03%	✅
torchvision-resnet50_fp16	64	6,988.90	6,981.74	0.10%	✅
torchvision-densenet121	32	2,434.33	2,435.44	-0.05%	✅
torchvision-densenet121_fp16	32	4,056.93	4,088.61	-0.77%	✅
torchvision-inceptionv3	32	1,629.29	1,629.88	-0.04%	✅
torchvision-inceptionv3_fp16	32	2,742.90	2,745.06	-0.08%	✅
cadene-inceptionv4	16	764.40	764.50	-0.01%	✅
cadene-resnext64x4	16	810.75	810.97	-0.03%	✅
slim-mobilenet	64	7,390.45	7,464.92	-1.00%	✅
slim-nasnetalarge	64	208.49	208.49	0.00%	✅
slim-resnet50v2	64	3,443.34	3,440.82	0.07%	✅
bert-mrpc-onnx	8	1,149.61	1,145.86	0.33%	✅
bert-mrpc-tf	1	468.61	468.54	0.01%	✅
pytorch-examples-wlang-gru	1	417.22	418.01	-0.19%	✅
pytorch-examples-wlang-lstm	1	408.92	408.51	0.10%	✅
torchvision-resnet50_1	1	771.66	778.29	-0.85%	✅
cadene-dpn92_1	1	396.33	396.81	-0.12%	✅
cadene-resnext101_1	1	382.10	382.45	-0.09%	✅
onnx-taau-downsample	1	345.54	345.96	-0.12%	✅
dlrm-criteoterabyte	1	33.33	33.33	-0.01%	✅
dlrm-criteoterabyte_fp16	1	52.71	52.75	-0.07%	✅
agentmodel	1	8,127.97	8,309.93	-2.19%	✅
unet_fp16	2	58.86	58.83	0.06%	✅
resnet50v1_fp16	1	956.61	942.38	1.51%	✅
resnet50v1_int8	1	1,014.63	1,025.97	-1.11%	✅
bert_base_cased_fp16	64	1,170.12	1,170.17	-0.00%	✅
bert_large_uncased_fp16	32	363.32	363.14	0.05%	✅
bert_large_fp16	1	198.84	198.79	0.02%	✅
distilgpt2_fp16	16	2,200.20	2,200.72	-0.02%	✅
yolov5s	1	532.38	532.21	0.03%	✅
tinyllama	1	43.43	43.63	-0.44%	✅
vicuna-fastchat	1	173.03	173.33	-0.17%	✅
whisper-tiny-encoder	1	417.77	417.76	0.00%	✅
whisper-tiny-decoder	1	435.04	428.46	1.53%	✅

This build is OK for merge ✅

migraphx-bot · 2024-11-27T05:56:45Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

pfultz2 · 2024-11-27T15:19:12Z

how can I force the verify test to run with MIGRAPHX_DISABLE_LAYERNORM_FUSION=1? or will CI do that at some point?

Well I am hoping to remove layernorm fusion soon in #3465. For now the verify tests can run with the layernorm fusion, and when it gets removed in #3465, this test will check that it works correctly.

small test passing verify

2307be7

shivadbhavsar added the bugfix Fixes a bug found in the code. label Nov 26, 2024

shivadbhavsar requested a review from pfultz2 November 26, 2024 01:43

shivadbhavsar self-assigned this Nov 26, 2024

shivadbhavsar requested a review from causten as a code owner November 26, 2024 01:43

update condition

ed7e04b

pfultz2 reviewed Nov 26, 2024

View reviewed changes

shivadbhavsar added 2 commits November 26, 2024 12:23

update logic and existing tests

b418b44

tidy and format

57b5ebd

pfultz2 reviewed Nov 26, 2024

View reviewed changes

shivadbhavsar added 3 commits November 26, 2024 16:40

add check to matchers

a384375

revert tests

71cad8b

add unit tests

301cd71

pfultz2 approved these changes Nov 27, 2024

View reviewed changes

add verify test

4433e9b

shivadbhavsar requested review from CharlieL7 and TedThemistokleous November 27, 2024 03:33

TedThemistokleous approved these changes Nov 27, 2024

View reviewed changes

shivadbhavsar mentioned this pull request Nov 27, 2024

Remove layernorm fusion #3465

Open

causten merged commit 241e24e into develop Nov 29, 2024
43 of 44 checks passed

causten deleted the fix_reduce_fusion branch November 29, 2024 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch invalid broadcasts in pointwise reduce fusion #3659

Catch invalid broadcasts in pointwise reduce fusion #3659

shivadbhavsar commented Nov 26, 2024

pfultz2 Nov 26, 2024

pfultz2 Nov 26, 2024

pfultz2 Nov 26, 2024

shivadbhavsar Nov 26, 2024

pfultz2 Nov 26, 2024

pfultz2 Nov 26, 2024

codecov bot commented Nov 26, 2024 •

edited

Loading

pfultz2 commented Nov 26, 2024

pfultz2 commented Nov 27, 2024

shivadbhavsar commented Nov 27, 2024

migraphx-bot commented Nov 27, 2024

migraphx-bot commented Nov 27, 2024

pfultz2 commented Nov 27, 2024

Catch invalid broadcasts in pointwise reduce fusion #3659

Catch invalid broadcasts in pointwise reduce fusion #3659

Conversation

shivadbhavsar commented Nov 26, 2024

pfultz2 Nov 26, 2024

Choose a reason for hiding this comment

pfultz2 Nov 26, 2024

Choose a reason for hiding this comment

pfultz2 Nov 26, 2024

Choose a reason for hiding this comment

shivadbhavsar Nov 26, 2024

Choose a reason for hiding this comment

pfultz2 Nov 26, 2024

Choose a reason for hiding this comment

pfultz2 Nov 26, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 26, 2024 • edited Loading

Codecov Report

pfultz2 commented Nov 26, 2024

pfultz2 commented Nov 27, 2024

shivadbhavsar commented Nov 27, 2024

migraphx-bot commented Nov 27, 2024

migraphx-bot commented Nov 27, 2024

pfultz2 commented Nov 27, 2024

codecov bot commented Nov 26, 2024 •

edited

Loading