Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Issue using http URL (while with the https URL is OK) #111

Open
MarcoManca opened this issue Jan 29, 2021 · 3 comments
Open

Parsing Issue using http URL (while with the https URL is OK) #111

MarcoManca opened this issue Jan 29, 2021 · 3 comments

Comments

@MarcoManca
Copy link

I'm using the 4.0.0-SNAPSHOT version.
I'm trying to parse a CSS using the HTTP protocol and the library is not able to parse it, no errors are provided but the StyleSheet size is 0.

StyleSheet parse = CSSFactory.parse(new URL("http://www.comune.tarquinia.vt.it/it-it/bundles/css?v=JDDcjL-Xc5tWJJFBqj9KRTIVQ84T6Y95cTlC9yQn3kQ1"), "UTF-8");

parse.size() --> 0

On the other hand, by using the HTTPS version of the same CSS the library is able to parse it:

StyleSheet parse = CSSFactory.parse(new URL("https://www.comune.tarquinia.vt.it/it-it/bundles/css?v=JDDcjL-Xc5tWJJFBqj9KRTIVQ84T6Y95cTlC9yQn3kQ1"), "UTF-8");

parse.size() --> 889

Best regards and thank you for your work.

@radkovo
Copy link
Owner

radkovo commented Jan 30, 2021

The problem is that for the http URL, the server only returns a HTTP 302 redirect response to the https URL with an empty body. So actually the parser is expected to follow a redirect and send another request to the https location. However, the default simple Java client does not follow redirects to from http to https automatically by design. That's why it just parses the empty body and you get an empty result.

If you need to handle the redirects automatically, you would have to create your own NetworkProcessor implementation (see the DefaultNetworkProcessor for inspiration) and then use CSSFactory.setNetworkProcessor() to use it before calling parse()).

@MarcoManca
Copy link
Author

Hi @radkovo , I see your point and I just implemented the solution you suggested.
It works and in this way, I solved the detected problem.

Anyway, in modern web pages, the redirect request is a very common situation since the majority of all web sites implement a HTTP connection; so I was just wandering whether it is possible to include a simple FollowRedirectNetworkProcessor implementing NetworkProcessor interface and then highlight into the library documentation the possibility to encounter this kind of issue and then provide the possible solution by using the new Network Processor able to follow the redirect requests.

`public class FollowRedirectNetworkProcessor implements NetworkProcessor {

@Override
public InputStream fetch(URL url) throws IOException {
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setInstanceFollowRedirects(true);  //you still need to handle redirect manully.
    HttpURLConnection.setFollowRedirects(true);
    InputStream is;
    if ("gzip".equalsIgnoreCase(conn.getContentEncoding())) {
        is = new GZIPInputStream(conn.getInputStream());
    } else {
        boolean redirect = false;

        // normally, 3xx is redirect
        int status = conn.getResponseCode();
        if (status != HttpURLConnection.HTTP_OK) {
            if (status == HttpURLConnection.HTTP_MOVED_TEMP
                    || status == HttpURLConnection.HTTP_MOVED_PERM
                    || status == HttpURLConnection.HTTP_SEE_OTHER) {
                redirect = true;
            }
        }
        if (redirect) {
            // get redirect url from "location" header field
            String newUrl = conn.getHeaderField("Location");
            // open the new connnection again
            conn = (HttpURLConnection) new URL(newUrl).openConnection();                              
        }
        is = conn.getInputStream();
    }
    return is;
}

}`

@radkovo
Copy link
Owner

radkovo commented Feb 3, 2021

Many thanks for the implementation, it seems reasonable. Would you mind creating a pull request containing this in order to preserve your credits? A note about redirects may be included in README as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants