You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
to wait for next bulletins, but when a bulletin is not available yet meteofrance.fr redirects to an HTML page, leading to an exeption:
Traceback (most recent call last):
File "urlwatch/handler.py", line 120, in process
data = FilterBase.process(filter_kind, subfilter, self, data)
File "urlwatch/filters.py", line 188, in processreturn filtercls(state.job, state).filter(data, subfilter)
File "urlwatch/filters.py", line 399, in filterreturn'\n\n'.join(pdftotext.PDF(io.BytesIO(data), password=subfilter.get('password', '')))
pdftotext.Error: poppler error creating document
maybe the pdf2text need some kind of error handling option, what do you think?
The text was updated successfully, but these errors were encountered:
No, it they don't return a proper 404, in fact it really depends on the request:
$ curl -I https://donneespubliques.meteofrance.fr/donnees_libres/bulletins/BCM/202207.pdf
HTTP/1.1 404 Not Found
...
$ curl -i https://donneespubliques.meteofrance.fr/donnees_libres/bulletins/BCM/202207.pdf
HTTP/1.1 302 Found
Date: Tue, 19 Apr 2022 20:08:54 GMT
Location: https://donneespubliques.meteofrance.fr/?fond=donnee_indisponible
...
And in urlwatch, in the job retrieve function we're hitting the 2nd case:
(Pdb) p response
<Response [200]>
(Pdb) p response.url
'https://donneespubliques.meteofrance.fr/?fond=donnee_indisponible'
(Pdb) p response.history
[<Response [302]>]
I wanted to use urlwatch like this:
to wait for next bulletins, but when a bulletin is not available yet meteofrance.fr redirects to an HTML page, leading to an exeption:
maybe the pdf2text need some kind of error handling option, what do you think?
The text was updated successfully, but these errors were encountered: