Commits


Tianlei Wu authored and GitHub committed 99349e58d71
dump tensor statistics (#15761) Dump statistics of input and/or output tensors of each node. It could help to find out why a model outputs NaN. To use this tool, just add `--cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1` when build onnxruntime package. Then set some environment varaibles before running model with onnxruntime: ``` export ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA=1 export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1 export ORT_DEBUG_NODE_IO_DUMP_STATISTICS_DATA=1 ``` Then statistics data will be appended after the dumping of input and output tensors. One possible cause of a FP16 or mixed precision model outputs NaN: some number exceeds the limit of FP16 (like max FP16 value is 65504). When a fp32 model has value > 65504 in a node output, it will become INF when converting the node to FP16. In this case, you need keep related nodes in FP32 to avoid the issue. You can dump tensor statistics of FP32 model to find out such candidate nodes.