Public / onnxruntime / 4819fbf31c3

Skip to sidebar navigation
Skip to content

Commits

Chen Fu authored and GitHub committed 4819fbf31c331 Oct 2023

Augment blockwise quantization (#18101)

### Description
Augment block wise 4b quantization -- plain CPU impl

### Motivation and Context

Allow column wise or row wise blocks. Experiments show row wise
quantization in LLM weight matrices achieves better precision.

Added tests for quantization and dequantization code.