Skip to content

Conversation

megemini
Copy link
Contributor

@megemini megemini commented Oct 13, 2024

Create A Good Pull Request

修复对于 Paddle 版本的检查

依赖 PR:#1061

下面的文字请保留在PR说明的最后面,并在提完PR后,根据实际情况勾选确认以下情况

Please check the follow step before merging this pull request

  • Python code style verification
  • Review all the code diff by yourself
  • All models(TensorFLow/Caffe/ONNX/PyTorch) testing passed
  • Details about your pull request, releated issues

If this PR add new model support, please update model_zoo.md and add model to out test model zoos(@wjj19950828)

  • New Model Supported
  • No New Model Supported

@megemini
Copy link
Contributor Author

Update 20241017

  • test_benchmark 测试每个模型之前,先生成一个 Failedresult.txt 文件

    由于 result.txt 是在 pd_infer.py 中生成的,而这个文件如果一开始便导入 from paddle import fluid ,则直接报错,不会再生成 Failedresult.txt 文件,而 CI 最后获取不到 result.txt 认为测试通过,因此,需要再测试前先生成一个 Failedresult.txt 文件。

  • test_autoscan 使用 logger 而不是 logging 记录日志

    python 的 logging 是个单例,很容易在复杂的项目中被修改定义,如日志级别。这次升级之后 import paddle 会修改 logging ,导致日志无法输出到 log 中,也就无法记录测试的正确与否,故此,修改为 logger 进行日志记录。

另外,X2paddle-autoscantest 这个 CI 貌似卡住了?我在本地的 docker 中测试没啥问题:

这是 onnx 的:

image

日志:


===================onnx==================
The num of test_file is: 18
test_auto_scan_abs.py:Run Successfully!
test_auto_scan_averagepool_10.py:Run Successfully!
test_auto_scan_averagepool_7.py:Run Successfully!
test_auto_scan_compare_ops.py:Run Successfully!
test_auto_scan_conv2d.py:Run Successfully!
test_auto_scan_elementwise_ops.py:Run Successfully!
test_auto_scan_equal.py:Run Successfully!
test_auto_scan_hardsigmoid.py:Run Successfully!
test_auto_scan_isinf.py:Run Successfully!
test_auto_scan_isnan.py:Run Successfully!
test_auto_scan_logical_ops.py:Run Successfully!
test_auto_scan_mod.py:Run Successfully!
test_auto_scan_nonzero.py:Run Successfully!
test_auto_scan_reduce_ops.py:Run Successfully!
test_auto_scan_sum_7.py:Run Successfully!
test_auto_scan_sum_8.py:Run Successfully!
test_auto_scan_unsqueeze_13.py:Run Successfully!
test_auto_scan_unsqueeze_7.py:Run Successfully!

这是 pytorch 的:

image

里面有个 logs/tmp.log 是我自己调试用的 ~

日志:


===================torch-===================
The num of test_file is: 7
test_auto_scan_amax.py:Run Successfully!
test_auto_scan_conv1d.py:Run Successfully!
test_auto_scan_conv2d.py:Run Successfully!
test_auto_scan_conv3d.py:Run Successfully!
test_auto_scan_conv_transpose1d.py:Run Successfully!
test_auto_scan_conv_transpose2d.py:Run Successfully!
test_auto_scan_topk.py:Run Successfully!

@luotao1 帮忙看看,是 CI 的问题还是啥哩?

@megemini
Copy link
Contributor Author

能不能用 pip freeze > XXX.txt 看一下环境?我这边是

absl-py==2.1.0
aiohappyeyeballs==2.4.3
aiohttp==3.10.9
aiosignal==1.3.1
allure-pytest==2.13.5
allure-python-commons==2.13.5
anyio==4.0.0
appdirs==1.4.4
astor==0.8.1
astroid==2.9.3
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.1.0
audioread==3.0.1
autograd==1.4
Babel==2.13.1
bce-python-sdk==0.8.92
blinker==1.7.0
certifi==2019.11.28
cffi==1.16.0
cfgv==3.4.0
chardet==3.0.4
clang-format==13.0.0
click==8.1.7
cloudpickle==3.0.0
coloredlogs==15.0.1
contourpy==1.2.0
coverage==5.5
cpplint==1.6.0
cryptography==41.0.5
cycler==0.12.1
dbus-python==1.2.16
decorator==5.1.1
Deprecated==1.2.14
distlib==0.3.7
distro==1.8.0
distro-info==0.23+ubuntu1.1
docker-pycreds==0.4.0
easyocr==1.7.2
exceptiongroup==1.1.3
filelock==3.13.1
Flask==3.0.0
flask-babel==4.0.0
flatbuffers==24.3.25
fonttools==4.44.0
frozenlist==1.4.1
fsspec==2024.9.0
future==0.18.3
gast==0.6.0
gitdb==4.0.11
GitPython==3.1.40
google-pasta==0.2.0
GPUtil==1.4.0
grpcio==1.66.2
gym==0.26.2
gym-notices==0.0.8
h11==0.14.0
h5py==3.12.1
httpcore==1.0.1
httpx==0.25.1
huggingface-hub==0.25.1
humanfriendly==10.0
hypothesis==6.88.3
identify==2.5.31
idna==2.8
imageio==2.35.1
importlib-metadata==6.8.0
importlib-resources==6.1.1
inflect==7.4.0
iniconfig==2.0.0
ipykernel==4.6.0
ipython==5.3.0
isort==5.12.0
itsdangerous==2.1.2
Jinja2==3.1.2
joblib==1.3.2
jupyter_client==8.6.0
jupyter_core==5.5.0
keras==3.6.0
kiwisolver==1.4.5
kornia==0.5.11
kornia_rs==0.1.5
lazy-object-proxy==1.9.0
lazy_loader==0.4
libclang==18.1.1
librosa==0.10.2.post1
lightning-utilities==0.11.7
llvmlite==0.41.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.1
mccabe==0.6.1
mdurl==0.1.2
ml-dtypes==0.3.2
mock==5.1.0
more-itertools==10.5.0
mpmath==1.3.0
msgpack==1.1.0
multidict==6.1.0
namex==0.0.8
networkx==3.2.1
ninja==1.11.1.1
nodeenv==1.8.0
nose==1.3.7
numba==0.58.1
numpy==1.26.1
nvidia-cublas-cu11==11.11.3.6
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.7.0.84
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.3.0.86
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.5.86
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.19.3
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.77
nvidia-nvtx-cu11==11.8.86
nvidia-nvtx-cu12==12.1.105
onnx==1.17.0
onnxruntime==1.19.2
opencv-python==4.6.0.66
opencv-python-headless==4.10.0.84
opt-einsum==3.3.0
optree==0.13.0
packaging==23.2
paddle2onnx==1.1.0
paddlepaddle-gpu==3.0.0b1
pandas==2.1.2
parameterized==0.9.0
pathtools==0.1.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.1.0
platformdirs==3.11.0
pluggy==1.3.0
pooch==1.8.0
pre-commit==2.17.0
prettytable==3.9.0
prompt-toolkit==1.0.18
propcache==0.2.0
protobuf==3.20.2
psutil==5.9.6
ptyprocess==0.7.0
pyclipper==1.3.0.post5
pycparser==2.21
pycrypto==2.6.1
pycryptodome==3.19.0
pygame==2.5.2
PyGithub==2.1.1
Pygments==2.16.1
PyGObject==3.36.0
PyJWT==2.8.0
pylint==2.12.0
PyNaCl==1.5.0
pynvml==11.5.3
pyparsing==3.1.1
pypinyin==0.53.0
pytest==7.4.3
python-apt==2.0.1+ubuntu0.20.4.1
python-bidi==0.6.0
python-dateutil==2.8.2
pytorch-lightning==2.4.0
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
rarfile==4.1
regex==2024.9.11
requests==2.22.0
requests-unixsocket==0.2.0
resampy==0.4.2
rich==13.9.2
safetensors==0.4.5
scikit-image==0.24.0
scikit-learn==1.3.2
scipy==1.11.3
sentry-sdk==1.34.0
setproctitle==1.3.3
shapely==2.0.6
simplegeneric==0.8.1
six==1.14.0
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soundfile==0.12.1
soxr==0.5.0.post1
sympy==1.13.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow==2.16.1
tensorflow-io-gcs-filesystem==0.37.1
termcolor==2.5.0
threadpoolctl==3.2.0
tifffile==2024.8.30
timm==1.0.9
tokenizers==0.20.0
toml==0.10.2
tomli==2.0.1
torch==2.4.1
torchaudio==2.4.1
torchmetrics==1.4.2
torchvision==0.19.1
tornado==6.3.3
tqdm==4.66.5
traitlets==5.13.0
transformers==4.45.2
treelib==1.7.0
triton==3.0.0
typeguard==4.3.0
typing_extensions==4.12.2
tzdata==2023.3
ubelt==1.3.3
unattended-upgrades==0.1
Unidecode==1.3.8
urllib3==2.0.7
virtualenv==20.26.6
visualdl==2.5.3
wandb==0.15.12
wcwidth==0.2.9
Werkzeug==3.0.1
wget==3.2
wrapt==1.13.3
-e git+ssh://[email protected]/megemini/X2Paddle.git@017d3d718488c8a22e12fe9724a3ed844962f648#egg=x2paddle
xdoctest==1.1.1
XlsxWriter==3.0.9
yarl==1.14.0
zipp==3.17.0

@luotao1 luotao1 added the contributor External developers label Oct 17, 2024
@luotao1
Copy link
Collaborator

luotao1 commented Oct 17, 2024

absl-py==2.1.0
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
allure-pytest==2.13.5
allure-python-commons==2.13.5
anyio==4.0.0
appdirs==1.4.4
astor==0.8.1
astroid==2.9.3
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.1.0
audioread==3.0.1
autograd==1.4
Babel==2.13.1
bce-python-sdk==0.8.92
blinker==1.7.0
certifi==2019.11.28
cffi==1.16.0
cfgv==3.4.0
chardet==3.0.4
clang-format==13.0.0
click==8.1.7
cloudpickle==3.0.0
coloredlogs==15.0.1
contourpy==1.2.0
coverage==5.5
cpplint==1.6.0
cryptography==41.0.5
cycler==0.12.1
dbus-python==1.2.16
decorator==5.1.1
Deprecated==1.2.14
distlib==0.3.7
distro==1.8.0
distro-info==0.23+ubuntu1.1
docker-pycreds==0.4.0
easyocr==1.7.2
exceptiongroup==1.1.3
filelock==3.13.1
Flask==3.0.0
flask-babel==4.0.0
flatbuffers==24.3.25
fonttools==4.44.0
frozenlist==1.4.1
fsspec==2024.9.0
future==0.18.3
gast==0.6.0
gitdb==4.0.11
GitPython==3.1.40
google-pasta==0.2.0
GPUtil==1.4.0
grpcio==1.67.0
gym==0.26.2
gym-notices==0.0.8
h11==0.14.0
h5py==3.12.1
httpcore==1.0.1
httpx==0.25.1
huggingface-hub==0.25.2
humanfriendly==10.0
hypothesis==6.88.3
identify==2.5.31
idna==2.8
imageio==2.36.0
importlib-metadata==6.8.0
importlib-resources==6.1.1
iniconfig==2.0.0
ipykernel==4.6.0
ipython==5.3.0
isort==5.12.0
itsdangerous==2.1.2
Jinja2==3.1.2
joblib==1.3.2
jupyter_client==8.6.0
jupyter_core==5.5.0
keras==3.6.0
kiwisolver==1.4.5
kornia==0.5.11
lazy-object-proxy==1.9.0
lazy_loader==0.4
libclang==18.1.1
librosa==0.8.1
lightning-utilities==0.11.8
llvmlite==0.41.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.1
mccabe==0.6.1
mdurl==0.1.2
ml-dtypes==0.3.2
mock==5.1.0
mpmath==1.3.0
multidict==6.1.0
namex==0.0.8
networkx==3.2.1
ninja==1.11.1.1
nodeenv==1.8.0
nose==1.3.7
numba==0.58.1
numpy==1.26.1
nvidia-cublas-cu11==11.11.3.6
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.7.0.84
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.3.0.86
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.5.86
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.19.3
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.77
nvidia-nvtx-cu11==11.8.86
nvidia-nvtx-cu12==12.1.105
onnx==1.17.0
onnxruntime==1.19.2
opencv-python==4.6.0.66
opencv-python-headless==4.10.0.84
opt-einsum==3.3.0
optree==0.13.0
packaging==23.2
paddle2onnx==1.1.0
paddlepaddle-gpu==3.0.0b1
pandas==2.1.2
parameterized==0.9.0
pathtools==0.1.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.1.0
platformdirs==3.11.0
pluggy==1.3.0
pooch==1.8.0
pre-commit==2.17.0
prettytable==3.9.0
prompt-toolkit==1.0.18
propcache==0.2.0
protobuf==3.20.2
psutil==5.9.6
ptyprocess==0.7.0
pyclipper==1.3.0.post5
pycparser==2.21
pycrypto==2.6.1
pycryptodome==3.19.0
pygame==2.5.2
PyGithub==2.1.1
Pygments==2.16.1
PyGObject==3.36.0
PyJWT==2.8.0
pylint==2.12.0
PyNaCl==1.5.0
pynvml==11.5.3
pyparsing==3.1.1
pytest==7.4.3
python-apt==2.0.1+ubuntu0.20.4.1
python-bidi==0.6.3
python-dateutil==2.8.2
pytorch-lightning==2.4.0
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
rarfile==4.1
regex==2024.9.11
requests==2.22.0
requests-unixsocket==0.2.0
resampy==0.4.2
rich==13.9.2
safetensors==0.4.5
scikit-image==0.24.0
scikit-learn==1.3.2
scipy==1.11.3
sentry-sdk==1.34.0
setproctitle==1.3.3
shapely==2.0.6
simplegeneric==0.8.1
six==1.14.0
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soundfile==0.12.1
sympy==1.13.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow==2.16.1
tensorflow-io-gcs-filesystem==0.37.1
termcolor==2.5.0
threadpoolctl==3.2.0
tifffile==2024.8.30
timm==1.0.10
tokenizers==0.20.1
toml==0.10.2
tomli==2.0.1
torch==2.4.1
torchaudio==2.4.1
torchmetrics==1.4.3
torchvision==0.19.1
tornado==6.3.3
tqdm==4.66.5
traitlets==5.13.0
transformers==4.45.2
treelib==1.7.0
triton==3.0.0
typing_extensions==4.8.0
tzdata==2023.3
ubelt==1.3.3
unattended-upgrades==0.1
urllib3==2.0.7
virtualenv==20.26.6
visualdl==2.5.3
wandb==0.15.12
wcwidth==0.2.9
Werkzeug==3.0.1
wget==3.2
wrapt==1.13.3
xdoctest==1.1.1
XlsxWriter==3.0.9
yarl==1.15.3
zipp==3.17.0

@megemini
Copy link
Contributor Author

megemini commented Oct 17, 2024

用 logger 调试了一下,日志中:

image

X2Paddle/test_autoscan/onnx/onnxbase.py 中调用 create_predictor 卡住了 ~

image

可以看到,第一行日志打印出来了,但是 predictor = create_predictor(config) 卡住了,后面的 logger.info(">>> predictor.get_input_names...") 没有运行到 ~

这个 from paddle.inference import create_predictor 是从 paddle.base.core 中导入的,这个接口应该是 c++ 暴露出来的,目前暂时定位到这里 ~ 总感觉跟环境有点关系 ... ...

image

image

@megemini
Copy link
Contributor Author

megemini commented Oct 18, 2024

Update 20241018

目前基本确认,是 create_predictor 创建的时候卡住了,应该是跟环境有关系,之所以分析与环境有关,我这边实验了一下:

单独拎出来导致卡住的那段代码:

import paddle
from paddle.inference import create_predictor, PrecisionType
from paddle.inference import Config
import os

if __name__ == '__main__':
    # paddle_model_path = os.path.join(
    #     self.pwd, self.name, self.name + '_' + str(ver) +
    #     '_paddle/inference_model/model.pdmodel')
    # paddle_param_path = os.path.join(
    #     self.pwd, self.name, self.name + '_' + str(ver) +
    #     '_paddle/inference_model/model.pdiparams')

    paddle_model_path = '/home/shun/Documents/Projects/tmp/Abs_7_paddle/inference_model/model.pdmodel'
    paddle_param_path = '/home/shun/Documents/Projects/tmp/Abs_7_paddle/inference_model/model.pdiparams'

    config = Config()
    config.set_prog_file(paddle_model_path)
    if os.path.exists(paddle_param_path):
        config.set_params_file(paddle_param_path)

    # initial GPU memory(M), device ID
    config.enable_use_gpu(200, 0)
    # optimize graph and fuse op
    config.switch_ir_optim(False)
    config.enable_memory_optim()
    # disable feed, fetch OP, needed by zero_copy_run
    config.switch_use_feed_fetch_ops(False)
    config.disable_glog_info()
    pass_builder = config.pass_builder()

    predictor = create_predictor(config)

    print('-'*30, 'OK')

上面的模型 '/home/shun/Documents/Projects/tmp/Abs_7_paddle/inference_model/model.pdmodel' 是通过 test_autoscan/onnx/test_auto_scan_abs.py 生成的。

注意:直接运行这个脚本,模型会在运行结束后被删掉,因此需要打断点,然后中途把模型拷贝出来 ~

运行这段代码,在我本地 paddle 的 dev 环境,以及 paddle 2.4.2.post117 中运行,都顺利通过:

image

第一行是 dev 环境,后面那个是 2.4.2.post117 环境 ~

结合提到的 create_predictor 的那个 issue PaddlePaddle/Paddle#57139 ,可以推断:paddle 在某些环境下导入 inference 模型,会卡住 ~

目前 CI 中,我让 self.run_dynamic = True,即,paddle 导入之前导出的非 inference 模型,然后与 onnx 或 pytorch 比对,如此,CI 可以通过 ~

image

与我本地运行的结果与数量一致 ~

这种依赖环境的问题确实比较难定位,可以看看底层是不是在编译的时候有什么依赖 ~

@luotao1


另外,之所以 test_benchmark 没遇到,一是可能,目前我都是本地验证,所以没问题;二是可能,test_benchmark 基本都是动态模型测试,也不涉及 ~

@luotao1 luotao1 merged commit 4236c2a into PaddlePaddle:develop Oct 18, 2024
4 checks passed
@luotao1
Copy link
Collaborator

luotao1 commented Oct 18, 2024

@gzydsmz 请帮忙推动下 #1064 (comment) 问题的解决。

@vivienfanghuagood
Copy link

删掉 config.disable_glog_info() 这样, export GLOG_v=3 看下具体的日志

@megemini
Copy link
Contributor Author

删掉 config.disable_glog_info() 这样, export GLOG_v=3 看下具体的日志

我另开一个 PR #1077 来跟踪这个问题 ~

不过,貌似现在 CI 一直在排队,等有调试结果了再知会您 ~

感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants