Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Detect encoding of clamav process output and decode accordingly #188

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tohoku
Copy link

@tohoku tohoku commented Jan 26, 2022

#142 mentions the following error getting returned intermittently:

[ERROR] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 276: invalid continuation byte
Traceback (most recent call last):
File "/var/task/scan.py", line 236, in lambda_handler
scan_result, scan_signature = clamav.scan_file(file_path)
File "/var/task/clamav.py", line 195, in scan_file
output = av_proc.communicate()[0].decode()

Challenge: The av_proc.communicate() method returns the output of the clamscan call, but sometimes that output is using a different charset than the default utf-8 used by decode().

As a workaround, this PR will use chardet to try to determine the charset and decode accordingly.

In issue #183, removing the -a flag from the clamscan call is mentioned as a possible workaround, but I have not tried this.

@CLAassistant
Copy link

CLAassistant commented Jan 26, 2022

CLA assistant check
All committers have signed the CLA.

@tohoku tohoku marked this pull request as ready for review January 26, 2022 04:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants