Public / onnxruntime / e9057d2e498

Commits

Young Jin Kim authored and GitHub committed e9057d2e49818 May 2021
ZCode FastFormers changes (#5827)

* Add FBGEMM submodule

* Add fbgemm based per-channel quantization

* Add missing logic for pre-layernorm transformer model fusion

* add support for structured pruning architecture -fastformers

* Fix windows build

* Add a default behavior when head_size is not present for the backward compatibility

* Remove FBGEMM and default to tensor-wise quantization, column-wise quantization will be enabled later

* Fixed some unit test errors

* Fix windows compile error and unit test errors

* delete the option removed from the upstream

* Addresses review comments and fixes a merge error

* Remove commented out code

* add non-zero zp support

* support A and B scale with any dimensions

* fix build breaks

* fix warning in MSVC

* Fix bug for not checking original float value names when treat it as not existing.

* Clean up head size

* Clean up python tools

* Enable per column quantization

* fix quant weight cleanup bug

* A few code clean up

* Some code clean-up

* Some code clean-up

* Change option name

* update default value

* Rename option and parameter names

* Missing argument name change

* Add tests for quantization options for attention and matmul

Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>