Commits


Chen Fu authored and GitHub committed 4819fbf31c3
Augment blockwise quantization (#18101) ### Description Augment block wise 4b quantization -- plain CPU impl ### Motivation and Context Allow column wise or row wise blocks. Experiments show row wise quantization in LLM weight matrices achieves better precision. Added tests for quantization and dequantization code.