Official Review #1

micronet-challenge-submissions · 2019-10-23T21:03:18Z

Hello! Thanks so much for your entry!

When I try to run eval, I get errors load the weight_path. It looks like you have a local path hardcoded into the script there. Is that file available in this repo somewhere I'm not seeing? Or is it not necessary?

Trevor

micronet-challenge-submissions · 2019-10-24T19:41:11Z

Ping. Please let us know about this issue as soon as you can!

Trevor

tilmto · 2019-10-25T00:48:15Z

Sorry for the mistake. I forgot to change this local path. I have corrected it to be the right one, you can try to eval it now. Please contact me thought my email [email protected] if you encounter any other issues. Thank you very much!

micronet-challenge-submissions · 2019-10-25T17:48:01Z

Thanks for the fix! I've successfully validated your model accuracy. A few questions about your scoring:

Is the reason you're calling reduce_mean for the parameter counts because the parameters in different channels can be different bit-widths?

Adjustable-Quantization-MicroNet/adj_quant/scripts/trainer/tester.py

Line 65 in 3ffb2ed

    
           params += (model_info[key][0]['expand']*tf.reduce_mean(self.quant_info[new_key]['expand']['weight'])

Why is only the projection parameters divided by 32 here?

Adjustable-Quantization-MicroNet/adj_quant/scripts/trainer/tester.py

Line 69 in 3ffb2ed

    
           + model_info[key][0]['project']*tf.reduce_mean(self.quant_info[new_key]['project']['weight'])) / 32

For all of your conv/matmul operations, it looks like you're counting both multiplication and addition as being performed in reduced precision:

Adjustable-Quantization-MicroNet/adj_quant/scripts/trainer/tester.py

Line 72 in 3ffb2ed

flops += model_info[key][1]['total'] * 8 / 32

However, from your code it appears that you're performing "fake quantization" and rounding the input weights and activations to each layer before performing these in FP32. With this scheme, additions should be counted as occurring in full-precision as the result of the multiplications will be FP32, and those FP32 values will be then summed without rounding to the reduced precision format.

Swish activation functions should be counted as four operations (see example here). Also, you round the input operand to reduced precision but all operations after the first in the swish should be counted as full-precision. In summary, swish should be counted as four operations, only one of which is reduced precision.
Why is your FLOP count scaled by 2 here:

Adjustable-Quantization-MicroNet/adj_quant/scripts/trainer/tester.py

Line 110 in 3ffb2ed

self.flops = flops*2 + flops_swish

Thanks!
Trevor

tilmto · 2019-10-25T18:24:42Z

yes
I think you mistook the brackets there, all the parameters are divided by 32
The result of multiplication is still a quantized integer number because all the weights and activations are quantized to be integers. Compared with originally FP32+FP32, I think my way of calculating metric does make sense. Besides, the latency and energy cost of addition is much less than multiplication.
swish is net*sigmoid(net), I think the multiplication operation can be fused into the sigmoid due to the numerator 1, i.e. net/(1+exp(-net)). Here net is quantized, 1 is quantized, the only difference the exponential part, which will not make a difference if quantized. I once emailed u about this function and I think I misunderstood that all the operations can be directly count as quantized operation, which will not make a difference to the accuracy.
Because the flops I calculate before this step is actually MAC, i.e. multiplication and addition. I think multiplication and addition need to be calculated separately, right?
Plz contact me if u have more problems, thanks!

micronet-challenge-submissions · 2019-10-25T19:01:32Z

Sounds good.
Ah, you're correct. Thanks!
The cost of addition v. multiplication depends on the numerical format, which our rules are agnostic of. For simplicity, we count both as equal cost. The output of the multiplication being a quantized integer value would be true if it can be represented in the 23-bit mantissa of an FP32 value. However, your rounding procedure that you apply prior to each operation re-scales the quantized values after rounding & clamping them:

Adjustable-Quantization-MicroNet/adj_quant/scripts/effnetb0/effnetb0_model_adjutable.py

Line 148 in 3ffb2ed

descrete_input_data = tf.div(net, scale_node, name="discrete_data")

This scaling can cause the outputs of your multiplications to be non-representable as a reduced precision integer value. These floating-point value are then all summed together, which does not properly model the error introduced by performing reduced precision accumulation. According to the competition rules, these additions should be counted as FP32. Please update your scoring accordingly.

You only round the input to the operation, so the negation can be counted as reduced precision. The exponential will then output a full 32-bit value, which you do not round to demonstrate that it does not affect model quality. You then add 1 to a 32-bit value, which counts as a 32-bit operation and divide a quantized value by a 32-bit value, which counts as a 32-bit value. Please update your scoring to reflect this as well.
They do need to be counted separately, thanks for clarifying!

Thanks!
Trevor

tilmto · 2019-10-25T19:21:27Z

I think you can refer to this paper why the output can be still low-precision integers. It's a common way for calculating Flops by many low-precision inference papers. Plz correct me if I'm wrong.
Can I re-upload a new checkpoint with quantized-exp swish? Because in my experiments, the precision of this part does not matter. It's totally ok if this is not allowed by your regulations.
Thank you for your careful check!

micronet-challenge-submissions · 2019-10-25T19:34:47Z

The output can definitely still be low precision integers, but when we're emulating reduced-precision arithmetic in FP32 like we are here you need to be careful about rounding s.t. the evaluation procedure appropriately models true reduced precision arithmetic. The rules of the competition are designed to take this into account, and in your case there is no explicit rounding or checks prior to the additions being performed that verifies that the necessary conditions are met.
To be fair to the other competitors, we will stick with the checkpoint that you submitted prior the deadline. It's very cool that your verified this to work though, and if you want to upload it anyway we'd be interested to see it!

Thanks for your responses! If you can update your score with these two changes it looks like everything else checks out!

Trevor

tilmto · 2019-10-25T20:36:34Z

I still haven't got your point. I think The equation 7 in the above paper can be emulated accurately by our quantization procedure. This method is totally same with tensorflow official fake quant node and verified to achieve same accuracy when converted to be a tflite format and executed on mobile devices. You can double check it. When you mentioned "explicit rounding", you mean after calculating each partial sum in one convolution operation, we need to round them into an integer and sum them up to be a new element in the output activation?
Get it!
Sorry for the late response, I was in a meeting.

micronet-challenge-submissions · 2019-10-25T22:43:54Z

Here's an example: For simplicity, in the quant_info.json file under "Conv", I found an example channel where the weights and activations are both 8-bit. In your evaluation script, I dumped the scales and biases for these channels used during evaluation:

w_bits = a_bits = 8
w_scale = 23.5443649
w_bias = -107
a_scale = 3.59408283
a_bias = -1

We have weight value w and activation value x. During evaluation and prior to the convolution, we compute:

w' = clip(round(w * w_scale) - w_bias, 0, 255) / w_scale
x' = clip(round(x * a_scale) - a_bias, 0, 255) / a_scale

If w has value 1.23, w' will have value:
clip(round(1.23*23.5443649) - -107), 0, 255) / 23.5443649
= 5.776

If x' has value 3.21, x' will have value:
clip((round(3.21*3.59408283) - -1), 0, 255) / 3.59408283
= 3.617

When we multiply these two values, we get the value 20.891792. This is not an integer value. When we sum a number of these together, it will be with full FP32 additions. We don't know what the difference in quality is between this and true integer accumulation, but if your model missed a single image more it would no longer meet the accuracy threshold. According to the competition rules, these additions should be counted as full 32-bit additions.

tilmto · 2019-10-25T22:58:44Z

Yes it's just simulation so it's not true integer. I think we have some misunderstanding here. So are u familiar with this quantization aware training paper, especially the derivation of equation 7, and how tensorflow framework inserts fake nodes when creating training graph? This is just a simulation for equation 7, so the weight/activation are all not true integers, but they successfully simulate the 'quant' effect just like equation 7 does, which is supported well by tflite and mobile devices. It's a common simulation tool by many related papers, and you can refer to the implementation of these papers WAGE Scalable 8-bit training.

tilmto · 2019-10-25T23:10:18Z

If you have some doubts about the gap between fake quant nodes and the deployment of tflite on mobile devices, I think both our results and many results from your company has shown no difference. And you mentioned the metric/accuracy tradeoff, I can definitely improve the accuracy by increasing the metric a little. I just want to give you the ckpt with best metric.
Thanks for your patience!

micronet-challenge-submissions · 2019-10-26T01:06:20Z

We understand that this is standard procedure for evaluating the performance of quantized models. It's also standard procedure to use higher precision accumulators when performing actual quantized inference. From the QaT paper you linked "Accumulating products of uint8 values requires a 32-bit accumulator".

The competition rules are designed around this system. The additions in this case are considered to be 32-bit, and should be counted as such. Please update your score to reflect this.

tilmto · 2019-10-26T01:31:58Z

"Accumulating products of uint8 values requires a 32-bit accumulator" is just a way of hardware implementation, doesn't mean the addition is performed between 2 32-bit numbers. You can also use different hardware design like chunked-based accumulation. And for the QaT paper, their precision is 8-bit, which is much higher than us so a 32-bit accumulator is required. What we focus on is algorithm part, and there are many hardware tricks target at this like https://arxiv.org/abs/1901.06588.

We have averagely quantized the weights and activations to 2.94 bits and 4.87 bits. Our metric will increase a lot due to this special regulation, since most of the savings in Flops cannot be taken into count. And in this way, the Flops of "addition" is 8 times of "multiplication", dominating the total metric which does not make sense at all since actually multiplication is much more expensive than addition. Then your metric can never reflect the true performance of a method on hardware.

I wonder other teams' metric, I believe the teams with low metric all use quantization, they all need to calculate addition in that way? I hope you can further discuss with your team and generate a fair solution.

Thanks a lot for the long discussion!

celinerice · 2019-10-26T22:55:06Z

Hi Trevor,

Thanks a lot for your questions and the long discussion with us! As this challenge emphasizes that “Our goal is for our scoring metric to be indicative of performance that could be achieved in hardware”, we would like to justify that 1) using 32-bit accumulator for all addition and at the same time 2) treating the computational cost of a 32-bit accumulator the same as that of a 32-bit multiplier greatly overcounts the models’ computational cost. We greatly appreciate if you could check our justification below:

As we know, a 1-bit full adder is a canonical building block of arithmetic units, and commonly used as a measure of the computational cost in both machine learning and hardware communities. Specifically, multiplying two N-bit numbers requires N^2 full adders while adding two N-bit numbers requires N full adders (N^2/N=32 when N=32!), and Eq. 3 in this ICML2017 paper formulates the required number of full adders for a D-dimensional dot product between activations and weights. Hopefully, you agree that your way of 1) assigning 32-bit accumulators for additions in our model and 2) using the cost of 32-multipliers for these 32-bit accumulators, can greatly deviate with the model’s actual computational cost. I understand that the most accurate way is to quantify the total number of full adders. For example, assuming that we use 32-bit accumulators for all the additions for our model, the corresponding computational cost of 32-bit addition is similar to that of 6-bit multiplication (to be more precise, 5.65bit*5.65bit).
The 32-bit accumulator is indeed one potential choice of hardware design for additions in our model, however, such a worst-case design is merely adopted for ease of design when addition cost is negligible. When the addition cost is not trivial, it is more common to use adder trees, in which the overall number of addition blocks halves at each successive adder depth while their length increases by one-bit, culminating in the final multibit output (see Fig. 6 in this reference). If adopting such a commonly used adder tree design, additions will need about 10bits on average in our model and thus no more become the bottleneck of convolutions, which is also consistent with the commonly recognized observation of “multiplications greatly dominates the computational cost of DNN convolutions (see reference 1 and reference 2)”.

We believe that the above justifications hold for all participants’ models in this challenge. We look forward to your comments.

Yingyan

micronet-challenge-submissions · 2019-10-29T16:36:28Z

Yonggan & Yingyan,

Apologies for the delay, we have a lot of entries to get through:)

We agree that the scoring system does not accurately reflect the relative cost of integer multiplications and additions. However, the competition is not limited to integer arithmetic. As we've explained, we decide to keep all operations the same cost to avoid making the first iteration of the competition too complex.

We allow entries to count their additions as if the minimum number of bits necessary were used provided they accurately model these additions in their evaluation code. There are two conditions for meeting this requirement

The number of bits needed to represent the results of multiplications and additions exactly must be less than or equal to 23. This ensures that the values can be exactly represented in the 23-bit IEEE FP32 mantissa. For a given linear operation where the inputs are bit-width A and B and the reduction dimension is of size K, this means that A+B+log2(K) <= 23. This needs to be verified for every neuron/channel in the model.
Weights and activations need to be input into linear operations in their integer form s.t. their values are exactly represented in the mantissa. This means that re-scaling to FP32 must occur after the linear operation.

While you could check condition 1, condition 2 requires code changes and unfortunately we can't allow you to change your entry as it wouldn't be fair to the other competitors.

Trevor

tilmto · 2019-10-29T20:02:54Z

Hi Trevor,

I think the second point you mentioned needs discussion and it's just targeting at traditional quantization method k*[w/k] where k is the minimal value in the low precision representation. Like I have mentioned, quantization-range-based quantization method, i.e. normalized by the min/max value of full precision weight, will always use the full precision as input to the convolution during simulation process, but they have fused the quantization effect previously and can correctly simulate the equation 7 (the forward process on a hardware) in QaT paper which definitely takes integer input.

Currently many popular low-precision training methods aiming at simulating integer-input convolution uses this quantization-range-based quantiztion method like WAGE, Scalable 8-bit training. This method is also adopted by Tensorflow official quantization method which is supported by Tflite on mobile devices). These facts and the derivation from (1)-(7) in the QaT paper both show the correctness of this simulation method. I hope you could treat equally for traditional quantization and quantization-range-based quantization.

Also the accumulator implementation we mentioned above works for all fixed point number, not just integer, and it's a common sense that addition operations shouldn't be the bottleneck of neural network.

For more efficient discussion, I want to know your main question here. Our main point here is that our simulation method (also QaT method) can correctly simulate the integer-input convolution on the real hardware. So whether you have any doubts about the quantization-range-based quantization method in the QaT paper, or you just question our implementation way?

Thanks a lot and look forward your reply.

Yonggan

micronet-challenge-submissions · 2019-10-29T21:25:39Z

I'm not sure what you mean by (2) is targeting only quantization methods of the form k*[w/k]. In your approach that uses both the min/max, you convert the value to an integer in the range [0, 2**nbits -1] here:

Adjustable-Quantization-MicroNet/adj_quant/scripts/effnetb0/effnetb0_model_adjutable.py

Line 146 in 3ffb2ed

    
           net = tf.clip_by_value(net - quant_bias, clip_value_min=0., clip_value_max=tf.reshape(2.**bits-1., [1,1,1,-1])) + quant_bias

and then scale it back from the integer representation to the rounded floating-point representation here:

Adjustable-Quantization-MicroNet/adj_quant/scripts/effnetb0/effnetb0_model_adjutable.py

Line 148 in 3ffb2ed

descrete_input_data = tf.div(net, scale_node, name="discrete_data")

If you waited to apply the re-scaling until after the linear operation, as (2) suggests, and your model satisfies point (1) then the simulation of fixed-point arithmetic is exact. We would thus accept that scoring under the rules of the competition. You scale back to the FP representation prior to executing the kernel, so the arithmetic is approximate. Under the rules of the competition this is acceptable for counting the multiplications as reduced precision, but because the inputs to each addition inside the kernel are not rounded prior to executing the addition instructions we do not allow these to be counted as reduced precision.

Perhaps this more rigorous standard of proof is a shortcoming of the current rules and we should revisit it in future iterations of the competition. We'd be happy to discuss this with you further. However, for practical purposes we need to complete scoring of the entries by the end of this week and your current score does not comply with the rules. We would appreciate it if you would update your score.

tilmto · 2019-10-29T21:50:57Z

Hi Trevor,

I think I know your question here. You may have some misunderstanding of the hardware execution process of quantization-range-based quantization. We can refer to QaT to explain this more clearly.

The equation 7 here is the calculation of a convolution during inference in the hardware after conducting the quantization-range-based quantization, you can see that q1, q2 here are the quantized inputs (weight and activations).

This is derived from Equation 3, where the operands are S(q-z), a full precision number which is exactly the input in our implementation.

The only difference between these 2 equations is the M=S1*S2/S3, which maintains at 30 bits of precision claimed by the paper. So we can simulate the hardware situation in equation 7 based on equation 3 conveniently, and that's exactly the kernel idea in the QaT paper. And that's why tensorflow uses this method even for Tflite deployment on mobile devices.

You can double check the derivation in the paper to confirm it. Thanks!

Yonggan

micronet-challenge-submissions · 2019-10-29T23:10:11Z

Apologies, I think the confusion stems from the fact that the methodology I mentioned is for quantization approaches where the zero-point for both weights and activations is 0. You can see that in equation 4 from QaT the product decomposes into what I'm describing, with the scale applied after the inner product. Thus this is not an option for your approach.

The issue with your evaluation approach is not the factorization of the quantized computation, it's with the use of floating-point to simulate fixed-point computation. We allow this under the circumstances described in the rules, but your model does not meet these criterion given you do not round prior to performing additions inside the convolution kernels. We ask that you please update your score to reflect this.

celinerice · 2019-10-29T23:55:07Z

Dear Trevor, Thank you for taking the time to discuss with us! I agree with you that our model can save more hardware resources if we waited to apply the re-scaling until after the linear operation. However, I hope you can also consider the second point of my previous response (I copied it below for your convenience). Even if we are doing rescaling before the linear operation, the additions could have not performed in 32-bit precision until the resulting carry of additions extends the precision to 32bits. Therefore, assuming 32 bit for ALL additions in our model indeed greatly overcounts our model complexity. We greatly appreciate if you could consider this aspect. I believe that we share the common motivation of encouraging techniques that can have real benefits in hardware. The 32-bit accumulator is indeed one potential choice of hardware design for additions in our model, however, such a worst-case design is merely adopted for ease of design when addition cost is negligible. When the addition cost is not trivial, it is more common to use adder trees, in which the overall number of addition blocks halves at each successive adder depth while their length increases by one-bit, culminating in the final multibit output (see Fig. 6 in this reference <https://www.researchgate.net/figure/Two-level-fragment-of-the-adder-tree-structure_fig3_257672163>). If adopting such a commonly used adder tree design, additions will need about 10bits on average in our model and thus no more become the bottleneck of convolutions, which is also consistent with the commonly recognized observation of “multiplications greatly dominates the computational cost of DNN convolutions (see reference 1 <https://arxiv.org/pdf/1905.13298.pdf> and reference 2 <https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8429354>)”. Thanks! Best regards, - Yingyan ************************************************************************************ Yingyan Lin Assistant Professor Electrical & Computer Engineering | Rice University Duncan Hall 2040 | 6100 Main Street, MS 380 | Houston, TX 77005 Web: https://eiclab.net/ | Tel. 713-348-3020 ************************************************************************************

…

On Tue, Oct 29, 2019 at 6:10 PM micronet-challenge-submissions < ***@***.***> wrote: Apologies, I think the confusion stems from the fact that the methodology I mentioned is for quantization approaches where the zero-point for both weights and activations is 0. You can see that in equation 4 from QaT the product decomposes into what I'm describing, with the scale applied after the inner product. Thus this is not an option for your approach. The issue with your evaluation approach is not the factorization of the quantized computation, it's with the use of floating-point to simulate fixed-point computation. We allow this under the circumstances described in the rules, but your model does not meet these criterion given you do not round prior to performing additions inside the convolution kernels. We ask that you please update your score to reflect this. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=ANTGU65FV5H6XEDIFJRXFBDQRC7FHA5CNFSM4JEKFPKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSMKHI#issuecomment-547669277>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANTGU66AMES7LWGHQ6V2X3DQRC7FHANCNFSM4JEKFPKA> .

micronet-challenge-submissions · 2019-10-30T00:33:10Z

Thank you for the discussion as well.

You are certainly correct that assuming 32-bits for all additions will overestimate the cost of your model. Unfortunately there isn't anything we can do about this at this point. Your evaluation procedure doesn't comply with the requirements of the competition rules, and we can't change the rules or make an exception while being fair to the other competitors.

Trevor

celinerice · 2019-10-30T00:35:01Z

Dear Trevor, In response to your explanation below, we will perform simulations to show you that the outputs from the method in our submitted codes and a real fixed implementation are the same when given randomly generated inputs. The issue with your evaluation approach is not the factorization of the quantized computation, it's with the use of floating-point to simulate fixed-point computation. We allow this under the circumstances described in the rules, but your model does not meet these criteria given you do not round prior to performing additions inside the convolution kernels. Thank you in advance for your time and patience! Best regards, - Yingyan ************************************************************************************ Yingyan Lin Assistant Professor Electrical & Computer Engineering | Rice University Duncan Hall 2040 | 6100 Main Street, MS 380 | Houston, TX 77005 Web: https://eiclab.net/ | Tel. 713-348-3020 ************************************************************************************

…

On Tue, Oct 29, 2019 at 6:54 PM Yingyan Lin ***@***.***> wrote: Dear Trevor, Thank you for taking the time to discuss with us! I agree with you that our model can save more hardware resources if we waited to apply the re-scaling until after the linear operation. However, I hope you can also consider the second point of my previous response (I copied it below for your convenience). Even if we are doing rescaling before the linear operation, the additions could have not performed in 32-bit precision until the resulting carry of additions extends the precision to 32bits. Therefore, assuming 32 bit for ALL additions in our model indeed greatly overcounts our model complexity. We greatly appreciate if you could consider this aspect. I believe that we share the common motivation of encouraging techniques that can have real benefits in hardware. The 32-bit accumulator is indeed one potential choice of hardware design for additions in our model, however, such a worst-case design is merely adopted for ease of design when addition cost is negligible. When the addition cost is not trivial, it is more common to use adder trees, in which the overall number of addition blocks halves at each successive adder depth while their length increases by one-bit, culminating in the final multibit output (see Fig. 6 in this reference <https://www.researchgate.net/figure/Two-level-fragment-of-the-adder-tree-structure_fig3_257672163>). If adopting such a commonly used adder tree design, additions will need about 10bits on average in our model and thus no more become the bottleneck of convolutions, which is also consistent with the commonly recognized observation of “multiplications greatly dominates the computational cost of DNN convolutions (see reference 1 <https://arxiv.org/pdf/1905.13298.pdf> and reference 2 <https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8429354>)”. Thanks! Best regards, - Yingyan ************************************************************************************ Yingyan Lin Assistant Professor Electrical & Computer Engineering | Rice University Duncan Hall 2040 | 6100 Main Street, MS 380 | Houston, TX 77005 Web: https://eiclab.net/ | Tel. 713-348-3020 ************************************************************************************ On Tue, Oct 29, 2019 at 6:10 PM micronet-challenge-submissions < ***@***.***> wrote: > Apologies, I think the confusion stems from the fact that the methodology > I mentioned is for quantization approaches where the zero-point for both > weights and activations is 0. You can see that in equation 4 from QaT the > product decomposes into what I'm describing, with the scale applied after > the inner product. Thus this is not an option for your approach. > > The issue with your evaluation approach is not the factorization of the > quantized computation, it's with the use of floating-point to simulate > fixed-point computation. We allow this under the circumstances described in > the rules, but your model does not meet these criterion given you do not > round prior to performing additions inside the convolution kernels. We ask > that you please update your score to reflect this. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1?email_source=notifications&email_token=ANTGU65FV5H6XEDIFJRXFBDQRC7FHA5CNFSM4JEKFPKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSMKHI#issuecomment-547669277>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ANTGU66AMES7LWGHQ6V2X3DQRC7FHANCNFSM4JEKFPKA> > . >

micronet-challenge-submissions · 2019-10-30T01:13:40Z

We're certainly interested in your results, but you must understand that we cannot change the rules or make an exception at this point in the competition. Trevor

…

On Tue, Oct 29, 2019 at 5:35 PM celinerice ***@***.***> wrote: Dear Trevor, In response to your explanation below, we will perform simulations to show you that the outputs from the method in our submitted codes and a real fixed implementation are the same when given randomly generated inputs. The issue with your evaluation approach is not the factorization of the quantized computation, it's with the use of floating-point to simulate fixed-point computation. We allow this under the circumstances described in the rules, but your model does not meet these criteria given you do not round prior to performing additions inside the convolution kernels. Thank you in advance for your time and patience! Best regards, - Yingyan ************************************************************************************ Yingyan Lin Assistant Professor Electrical & Computer Engineering | Rice University Duncan Hall 2040 | 6100 Main Street, MS 380 | Houston, TX 77005 Web: https://eiclab.net/ | Tel. 713-348-3020 ************************************************************************************ On Tue, Oct 29, 2019 at 6:54 PM Yingyan Lin ***@***.***> wrote: > Dear Trevor, > > Thank you for taking the time to discuss with us! > I agree with you that our model can save more hardware resources if we > waited to apply the re-scaling until after the linear operation. However, I > hope you can also consider the second point of my previous response (I > copied it below for your convenience). Even if we are doing rescaling > before the linear operation, the additions could have not performed in > 32-bit precision until the resulting carry of additions extends the > precision to 32bits. Therefore, assuming 32 bit for ALL additions in our > model indeed greatly overcounts our model complexity. We greatly appreciate > if you could consider this aspect. I believe that we share the common > motivation of encouraging techniques that can have real benefits in > hardware. > > The 32-bit accumulator is indeed one potential choice of hardware design > for additions in our model, however, such a worst-case design is merely > adopted for ease of design when addition cost is negligible. When the > addition cost is not trivial, it is more common to use adder trees, in > which the overall number of addition blocks halves at each successive adder > depth while their length increases by one-bit, culminating in the final > multibit output (see Fig. 6 in this reference > < https://www.researchgate.net/figure/Two-level-fragment-of-the-adder-tree-structure_fig3_257672163 >). > If adopting such a commonly used adder tree design, additions will need > about 10bits on average in our model and thus no more become the bottleneck > of convolutions, which is also consistent with the commonly recognized > observation of “multiplications greatly dominates the computational cost of > DNN convolutions (see reference 1 <https://arxiv.org/pdf/1905.13298.pdf> > and reference 2 > <https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8429354>)”. > > Thanks! > > Best regards, > - Yingyan > > > ************************************************************************************ > Yingyan Lin > Assistant Professor > Electrical & Computer Engineering | Rice University > Duncan Hall 2040 | 6100 Main Street, MS 380 | Houston, TX 77005 > Web: https://eiclab.net/ | Tel. 713-348-3020 > > ************************************************************************************ > > > On Tue, Oct 29, 2019 at 6:10 PM micronet-challenge-submissions < > ***@***.***> wrote: > >> Apologies, I think the confusion stems from the fact that the methodology >> I mentioned is for quantization approaches where the zero-point for both >> weights and activations is 0. You can see that in equation 4 from QaT the >> product decomposes into what I'm describing, with the scale applied after >> the inner product. Thus this is not an option for your approach. >> >> The issue with your evaluation approach is not the factorization of the >> quantized computation, it's with the use of floating-point to simulate >> fixed-point computation. We allow this under the circumstances described in >> the rules, but your model does not meet these criterion given you do not >> round prior to performing additions inside the convolution kernels. We ask >> that you please update your score to reflect this. >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> < #1?email_source=notifications&email_token=ANTGU65FV5H6XEDIFJRXFBDQRC7FHA5CNFSM4JEKFPKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSMKHI#issuecomment-547669277 >, >> or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/ANTGU66AMES7LWGHQ6V2X3DQRC7FHANCNFSM4JEKFPKA > >> . >> > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AMILA65HWOYOFT4EYAPURS3QRDJDLA5CNFSM4JEKFPKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSRAPY#issuecomment-547688511>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMILA66DODFSHC7LK6NBTIDQRDJDLANCNFSM4JEKFPKA> .

micronet-challenge-submissions · 2019-10-31T23:42:40Z

Yingyan & Yonggan,

We're hoping to finalize the results tomorrow so that we can release them early next week. Could you advise whether or not you plan to update your score?

Trevor

tilmto · 2019-11-01T07:15:02Z

Hi Trevor,

Sorry for the late update. Actually I'm conducting the experiments to verify the correctness of our simulation. Anyway, I will provide the metric in your rules first. I add the flops of swish part and change the flops of addition in a full precision way. Here are our metric and each part's contribution to the final metric:

Final Metric: 0.46779
Params: 0.07742 (0.534M)
Flops of Multiplication: 0.04878 (57.072M)
Flops of Addition: 0.32974 (385.796M, extremely dominating the metric)
Flops of Swish: 0.01184 (13.855M)

Thank you very much for your remind!

Yonggan

micronet-challenge-submissions · 2019-11-01T18:21:20Z

Thanks Yonggan!

Trevor

tilmto closed this as completed Oct 29, 2019

tilmto reopened this Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Official Review #1

Official Review #1

micronet-challenge-submissions commented Oct 23, 2019

micronet-challenge-submissions commented Oct 24, 2019

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 •

edited

Loading

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 •

edited

Loading

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 •

edited

Loading

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 26, 2019

tilmto commented Oct 26, 2019 •

edited

Loading

celinerice commented Oct 26, 2019 •

edited

Loading

micronet-challenge-submissions commented Oct 29, 2019

tilmto commented Oct 29, 2019 •

edited

Loading

micronet-challenge-submissions commented Oct 29, 2019

tilmto commented Oct 29, 2019 •

edited

Loading

micronet-challenge-submissions commented Oct 29, 2019

celinerice commented Oct 29, 2019 via email

micronet-challenge-submissions commented Oct 30, 2019

celinerice commented Oct 30, 2019 via email

micronet-challenge-submissions commented Oct 30, 2019 via email

micronet-challenge-submissions commented Oct 31, 2019

tilmto commented Nov 1, 2019

micronet-challenge-submissions commented Nov 1, 2019

Official Review #1

Official Review #1

Comments

micronet-challenge-submissions commented Oct 23, 2019

micronet-challenge-submissions commented Oct 24, 2019

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 • edited Loading

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 • edited Loading

micronet-challenge-submissions commented Oct 25, 2019

tilmto commented Oct 25, 2019 • edited Loading

tilmto commented Oct 25, 2019

micronet-challenge-submissions commented Oct 26, 2019

tilmto commented Oct 26, 2019 • edited Loading

celinerice commented Oct 26, 2019 • edited Loading

micronet-challenge-submissions commented Oct 29, 2019

tilmto commented Oct 29, 2019 • edited Loading

micronet-challenge-submissions commented Oct 29, 2019

tilmto commented Oct 29, 2019 • edited Loading

micronet-challenge-submissions commented Oct 29, 2019

celinerice commented Oct 29, 2019 via email

micronet-challenge-submissions commented Oct 30, 2019

celinerice commented Oct 30, 2019 via email

micronet-challenge-submissions commented Oct 30, 2019 via email

micronet-challenge-submissions commented Oct 31, 2019

tilmto commented Nov 1, 2019

micronet-challenge-submissions commented Nov 1, 2019

tilmto commented Oct 25, 2019 •

edited

Loading

tilmto commented Oct 25, 2019 •

edited

Loading

tilmto commented Oct 25, 2019 •

edited

Loading

tilmto commented Oct 26, 2019 •

edited

Loading

celinerice commented Oct 26, 2019 •

edited

Loading

tilmto commented Oct 29, 2019 •

edited

Loading

tilmto commented Oct 29, 2019 •

edited

Loading