-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing Issue using http URL (while with the https URL is OK) #111
Comments
The problem is that for the http URL, the server only returns a HTTP 302 redirect response to the https URL with an empty body. So actually the parser is expected to follow a redirect and send another request to the https location. However, the default simple Java client does not follow redirects to from http to https automatically by design. That's why it just parses the empty body and you get an empty result. If you need to handle the redirects automatically, you would have to create your own |
Hi @radkovo , I see your point and I just implemented the solution you suggested. Anyway, in modern web pages, the redirect request is a very common situation since the majority of all web sites implement a HTTP connection; so I was just wandering whether it is possible to include a simple FollowRedirectNetworkProcessor implementing NetworkProcessor interface and then highlight into the library documentation the possibility to encounter this kind of issue and then provide the possible solution by using the new Network Processor able to follow the redirect requests. `public class FollowRedirectNetworkProcessor implements NetworkProcessor {
}` |
Many thanks for the implementation, it seems reasonable. Would you mind creating a pull request containing this in order to preserve your credits? A note about redirects may be included in README as well. |
I'm using the 4.0.0-SNAPSHOT version.
I'm trying to parse a CSS using the HTTP protocol and the library is not able to parse it, no errors are provided but the StyleSheet size is 0.
StyleSheet parse = CSSFactory.parse(new URL("http://www.comune.tarquinia.vt.it/it-it/bundles/css?v=JDDcjL-Xc5tWJJFBqj9KRTIVQ84T6Y95cTlC9yQn3kQ1"), "UTF-8");
parse.size() --> 0
On the other hand, by using the HTTPS version of the same CSS the library is able to parse it:
StyleSheet parse = CSSFactory.parse(new URL("https://www.comune.tarquinia.vt.it/it-it/bundles/css?v=JDDcjL-Xc5tWJJFBqj9KRTIVQ84T6Y95cTlC9yQn3kQ1"), "UTF-8");
parse.size() --> 889
Best regards and thank you for your work.
The text was updated successfully, but these errors were encountered: