Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

okhttp protocol: HTTP request header lacks protocol name and version #775

Conversation

sebastian-nagel
Copy link
Contributor

The verbatim HTTP request header saved in metadata by the okhttp protocol implementation does not contain the protocol name and version in the request message. The request message should be (instead of GET /path ):

GET /path HTTP/1.1

See also commoncrawl/news-crawl#34.

This PR adds

  • the HTTP protocol name and version to the request message
  • normalizes the protocol name for HTTP/2 (okhttp's internal name is "H2")

Two notes:

  • HTTP/2 is enabled by default but requires Java 9 upwards, e.g. when running with OpenJDK 11 request/response headers are now:
$> /usr/lib/jvm/java-11-openjdk-amd64/bin/java ... com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol \
   -c ... https://www.google.com/
...
_request.headers_: GET / HTTP/2
...

_response.headers_: HTTP/2 200 
...
  • request and response protocol version may differ. Verified using wget:
$> wget -d https://www.kp.ru/daily/27061/4129507/
...
---request begin---
GET /daily/27061/4129507/ HTTP/1.1
...
---response begin---
HTTP/1.0 200 OK
...

…tation:

- HTTP request header: add the HTTP protocol version to the request message
   "GET /path HTTP/1.1" (instead of "GET /path")
- normalize "HTTP/2" protocol version
@jnioche jnioche added this to the 1.16 milestone Dec 13, 2019
@jnioche jnioche merged commit b2d66dd into apache:master Dec 13, 2019
@jnioche
Copy link
Contributor

jnioche commented Dec 13, 2019

fab, thanks @sebastian-nagel

@sebastian-nagel sebastian-nagel deleted the http-request-header-message-add-protocol-version branch December 13, 2019 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants