Public / onnxruntime / 50bda44a705

Commits

kunal-vaishnavi authored and GitHub committed 50bda44a70502 Oct 2024
Fix equation in MatMulNBits op spec (#22253)

### Description
This PR fixes an equation in the MatMulNBits op spec. The old formula is
stated as

```
[CeilDiv((N * n_blocks_per_col + 1) * bits, 8)]
```

but it should be stated as

```
[N * CeilDiv(n_blocks_per_col * bits, 8)]
```

or as

```
[N * FloorDiv((n_blocks_per_col + 1) * bits, 8)]
```

### Motivation and Context
For models such as ChatGLM where the column size is odd, the division
math can be off. For example:


![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0)

With the old equation, the projections are calculated as follows.

```
# Down projection
B = 4,096 x 107 x 64
zero_points = 221,184
N = 4,096
n_blocks_per_col = 107
 
4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184

# Up projection
B = 13,696 x 32 x 64
zero_points = 219,136
N = 13,696
n_blocks_per_col = 32
 
13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832
```

With the new equation, the projections are calculated as follows.

```
# Down projection
B = 4,096 x 107 x 64
zero_points = 221,184
N = 4,096
n_blocks_per_col = 107
 
4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184

# Up projection
B = 13,696 x 32 x 64
zero_points= 219,136
N = 13,696
n_blocks_per_col = 32
 
13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136
```