12.4 TensorRT 实用工具

前言

工程化部署是一个复杂的任务,涉及的环节众多,因此需要有足够好的工具来检测、分析,NVIDIA也提供了一系列工具用于分析、调试、优化部署环节。本节就介绍两个实用工具,nsight system 和 polygraphy。

nsight system可分析cpu和gpu的性能,可找出应用程序的瓶颈。

polygraphy可在各种框架中运行和调试深度学习模型,用于分析模型转换间的瓶颈。

nsight system

NVIDIA Nsight Systems是一个系统分析工具,它可以分析CPU和GPU的利用率、内存占用、数据输送量等各种性能指标,找出应用程序的瓶颈所在。用户文档

安装

打开官网,选择对应的操作系统、版本进行下载 。

  • NsightSystems-2023.3.1.92-3314722.msi,双击安装,一路默认
  • 将目录添加到环境变量:C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\target-windows-x64
  • 将gui工作目录页添加到环境变量:C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\host-windows-x64

运行

nsys包括命令行工具与UI界面,这里采用UI界面演示。

  • 命令行工具是C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\target-windows-x64\nsys.exe
  • UI界面是C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\host-windows-x64\nsys-ui.exe

nsys运行逻辑是从nsys端启动任务,nsys会自动监控任务的性能。

第一步:启动nsys。在cmd中输入nsys-ui,或者到安装目录下双击nsys-ui.exe。

第二步:创建project,配置要运行的程序。在这里运行本章配套代码01_trt_resnet50_cuda.py。具体操作如下图所示

nsys-init

第三步:查看统计信息

nsys-results

nsight system是一个强大的软件,但具体如何有效使用,以及如何更细粒度、更接近底层的去分析耗时,请大家参照官方文档以及需求来学习。

polygraphy

polygraphy是TensorRT生态中重要的debug调试工具,它可以

  • 使用多种后端运行推理计算,包括 TensorRT, onnxruntime, TensorFlow;
  • 比较不同后端的逐层计算结果;
  • 由模型文件生成 TensorRT 引擎并序列化为.plan;
  • 查看模型网络的逐层信息;
  • 修改 Onnx 模型,如提取子图,计算图化简;
  • 分析 Onnx 转 TensorRT 失败原因,将原计算图中可以 / 不可以转 TensorRT 的子图分割保存;
  • 隔离 TensorRT 中错误的tactic;

常用的几个功能是:

  • 检验 TensorRT 上计算结果正确性 /精度
  • 找出计算错误 / 精度不足的层
  • 进行简单的计算图优化

安装

pip install nvidia-pyindex
pip install polygraphy

验证

polygraphy依托于虚拟环境运行,因此需要激活相应的虚拟环境,然后执行 polygraphy -h

polygraphy有七种模式,分别是 {run,convert,inspect,surgeon,template,debug,data},具体含义参见文档

(pt112) C:\Users\yts32>polygraphy -h
usage: polygraphy [-h] [-v] {run,convert,inspect,check,surgeon,template,debug,data} ...

Polygraphy: A Deep Learning Debugging Toolkit

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Tools:
  {run,convert,inspect,check,surgeon,template,debug,data}
    run                 Run inference and compare results across backends.
    convert             Convert models to other formats.
    inspect             View information about various types of files.
    check               Check and validate various aspects of a model
    surgeon             Modify ONNX models.
    template            [EXPERIMENTAL] Generate template files.
    debug               [EXPERIMENTAL] Debug a wide variety of model issues.
    data                Manipulate input and output data generated by other Polygraphy subtools.

案例1:运行onnx及trt模型

polygraphy run resnet50_bs_1.onnx --onnxrt      

polygraphy run resnet50_bs_1.engine --trt --input-shapes 'input:[1,3,224,224]' --verbose

得到如下运行日志,表明两个框架推理运行成功:

......
| Completed 1 iteration(s) in 1958 ms | Average inference time: 1958 ms.
......
| Completed 1 iteration(s) in 120.1 ms | Average inference time: 120.1 ms.

案例2:对比onnx与trt输出结果(常用)

polygraphy还可以充当trtexec的功能,可以实现onnx导出trt模型,并且进行逐层结果对比。

其中atol表示绝对误差,rtol表示相对误差。

polygraphy run  resnet50_bs_1.onnx --onnxrt --trt ^
--save-engine=resnet50_bs_1_fp32_polygraphy.engine ^
--onnx-outputs mark all --trt-outputs mark all ^
--input-shapes "input:[1,3,224,224]" ^
--atol 1e-3 --rtol 1e-3 --verbose > onnx-trt-compare.log

输出的日志如下:

对于每一个网络层会输出onnx、trt的直方图,绝对误差直方图,相对误差直方图

最后会统计所有网络层符合设置的超参数atol, rtol的百分比,本案例中 Pass Rate: 100.0%

[I]     Comparing Output: 'input.4' (dtype=float32, shape=(1, 64, 112, 112)) with 'input.4' (dtype=float32, shape=(1, 64, 112, 112))
[I]         Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         onnxrt-runner-N0-08/20/23-23:08:36: input.4 | Stats: mean=0.20451, std-dev=0.31748, var=0.10079, median=0.19194, min=-1.3369 at (0, 33, 13, 104), max=2.0691 at (0, 33, 64, 77), avg-magnitude=0.29563
[V]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-1.34 , -0.996) |         29 | 
                (-0.996, -0.656) |      11765 | #
                (-0.656, -0.315) |      31237 | ###
                (-0.315, 0.0255) |     140832 | #############
                (0.0255, 0.366 ) |     411249 | ########################################
                (0.366 , 0.707 ) |     166138 | ################
                (0.707 , 1.05  ) |      40956 | ###
                (1.05  , 1.39  ) |        551 | 
                (1.39  , 1.73  ) |         54 | 
                (1.73  , 2.07  ) |          5 | 
[I]         trt-runner-N0-08/20/23-23:08:36: input.4 | Stats: mean=0.20451, std-dev=0.31748, var=0.10079, median=0.19194, min=-1.3369 at (0, 33, 13, 104), max=2.0691 at (0, 33, 64, 77), avg-magnitude=0.29563
[V]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-1.34 , -0.996) |         29 | 
                (-0.996, -0.656) |      11765 | #
                (-0.656, -0.315) |      31237 | ###
                (-0.315, 0.0255) |     140832 | #############
                (0.0255, 0.366 ) |     411249 | ########################################
                (0.366 , 0.707 ) |     166138 | ################
                (0.707 , 1.05  ) |      40956 | ###
                (1.05  , 1.39  ) |        551 | 
                (1.39  , 1.73  ) |         54 | 
                (1.73  , 2.07  ) |          5 | 
[I]         Error Metrics: input.4
[I]             Minimum Required Tolerance: elemwise error | [abs=8.3447e-07] OR [rel=0.037037] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=2.6075e-08, std-dev=3.2558e-08, var=1.06e-15, median=1.4901e-08, min=0 at (0, 0, 0, 3), max=8.3447e-07 at (0, 33, 86, 43), avg-magnitude=2.6075e-08
[V]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 8.34e-08) |     757931 | ########################################
                    (8.34e-08, 1.67e-07) |      40058 | ##
                    (1.67e-07, 2.5e-07 ) |       4334 | 
                    (2.5e-07 , 3.34e-07) |        249 | 
                    (3.34e-07, 4.17e-07) |        181 | 
                    (4.17e-07, 5.01e-07) |         53 | 
                    (5.01e-07, 5.84e-07) |          0 | 
                    (5.84e-07, 6.68e-07) |          8 | 
                    (6.68e-07, 7.51e-07) |          1 | 
                    (7.51e-07, 8.34e-07) |          1 | 
[I]             Relative Difference | Stats: mean=6.039e-07, std-dev=5.4597e-05, var=2.9809e-09, median=8.7838e-08, min=0 at (0, 0, 0, 3), max=0.037037 at (0, 4, 15, 12), avg-magnitude=6.039e-07
[V]                 ---- Histogram ----
                    Bin Range          |  Num Elems | Visualization
                    (0      , 0.0037 ) |     802806 | ########################################
                    (0.0037 , 0.00741) |          7 | 
                    (0.00741, 0.0111 ) |          1 | 
                    (0.0111 , 0.0148 ) |          0 | 
                    (0.0148 , 0.0185 ) |          0 | 
                    (0.0185 , 0.0222 ) |          0 | 
                    (0.0222 , 0.0259 ) |          1 | 
                    (0.0259 , 0.0296 ) |          0 | 
                    (0.0296 , 0.0333 ) |          0 | 
                    (0.0333 , 0.037  ) |          1 | 
[I]         PASSED | Output: 'input.4' | Difference is within tolerance (rel=0.001, abs=0.001)
[I]     PASSED | All outputs matched | Outputs: ['input.4', 'onnx::MaxPool_323', 'input.8', 'input.16', 'onnx::Conv_327', 'input.24', 'onnx::Conv_330', 'onnx::Add_505', 'onnx::Add_508', 'onnx::Relu_335', 'input.36', 'input.44', 'onnx::Conv_339', 'input.52', 'onnx::Conv_342', 'onnx::Add_517', 'onnx::Relu_345', 'input.60', 'input.68', 'onnx::Conv_349', 'input.76', 'onnx::Conv_352', 'onnx::Add_526', 'onnx::Relu_355', 'input.84', 'input.92', 'onnx::Conv_359', 'input.100', 'onnx::Conv_362', 'onnx::Add_535', 'onnx::Add_538', 'onnx::Relu_367', 'input.112', 'input.120', 'onnx::Conv_371', 'input.128', 'onnx::Conv_374', 'onnx::Add_547', 'onnx::Relu_377', 'input.136', 'input.144', 'onnx::Conv_381', 'input.152', 'onnx::Conv_384', 'onnx::Add_556', 'onnx::Relu_387', 'input.160', 'input.168', 'onnx::Conv_391', 'input.176', 'onnx::Conv_394', 'onnx::Add_565', 'onnx::Relu_397', 'input.184', 'input.192', 'onnx::Conv_401', 'input.200', 'onnx::Conv_404', 'onnx::Add_574', 'onnx::Add_577', 'onnx::Relu_409', 'input.212', 'input.220', 'onnx::Conv_413', 'input.228', 'onnx::Conv_416', 'onnx::Add_586', 'onnx::Relu_419', 'input.236', 'input.244', 'onnx::Conv_423', 'input.252', 'onnx::Conv_426', 'onnx::Add_595', 'onnx::Relu_429', 'input.260', 'input.268', 'onnx::Conv_433', 'input.276', 'onnx::Conv_436', 'onnx::Add_604', 'onnx::Relu_439', 'input.284', 'input.292', 'onnx::Conv_443', 'input.300', 'onnx::Conv_446', 'onnx::Add_613', 'onnx::Relu_449', 'input.308', 'input.316', 'onnx::Conv_453', 'input.324', 'onnx::Conv_456', 'onnx::Add_622', 'onnx::Relu_459', 'input.332', 'input.340', 'onnx::Conv_463', 'input.348', 'onnx::Conv_466', 'onnx::Add_631', 'onnx::Add_634', 'onnx::Relu_471', 'input.360', 'input.368', 'onnx::Conv_475', 'input.376', 'onnx::Conv_478', 'onnx::Add_643', 'onnx::Relu_481', 'input.384', 'input.392', 'onnx::Conv_485', 'input.400', 'onnx::Conv_488', 'onnx::Add_652', 'onnx::Relu_491', 'input.408', 'onnx::Flatten_493', 'onnx::Gemm_494', 'output']


[I] Accuracy Summary | onnxrt-runner-N0-08/20/23-23:08:36 vs. trt-runner-N0-08/20/23-23:08:36 | Passed: 1/1 iterations | Pass Rate: 100.0%

更多使用案例推荐阅读github cookbook

小结

本节介绍了nsight system和polygraphy的应用,在模型部署全流程中,可以深入挖掘的还有很多,推荐查看TensorRT的GitHub下的tools目录

Copyright © TingsongYu 2021 all right reserved,powered by Gitbook文件修订时间: 2024年04月26日21:48:10

results matching ""

    No results matching ""