Public / onnxruntime / 1128882bfd2

Commits

Vincent Wang authored and GitHub committed 1128882bfd228 Nov 2024

Quantize Bias for Conv/Gemm on Quantized Model (#22889)

Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.