-
-
Notifications
You must be signed in to change notification settings - Fork 84
Bad desktop data dumps for Jan. #74
Comments
December 2016 looks anomalous also—sudden dramatic drop in overall weight vs. the previous month (if only it were true!). |
We had an issue with the requests database where the primary key ran out of 32-bit numbers - doh. It should be fixed for the 2/1 crawl and we're looking at backfilling the December and January crawl stats from the HARs in bigquery. |
Thanks Patrick! I was looking for evidence of responsive images in WordPress 4.4 hopefully pulling down the average size as it rolls out. |
How come the errors for |
I need to learn more about the BigQuery -> MySQL pipeline, but I hope to get this fixed soon. |
See also this comment from #116:
|
Which links specifically? The desktop links to the archived dumps on archive.org are all working for me. http://www.archive.org/download/httparchive_downloads_Jan_1_2019/httparchive_Jan_1_2019_pages.gz Is the problem that an automated script is trying to use the pre-archive location and it moved once the archiving completed? For the pipeline, would it be easier if a copy of the dumps was also archived to the cloud storage bucket? |
Sorry this is an old issue from 2017 that I updated. Was triaging old issues. |
Whoops. My bad. Dumps from 2 years ago? If the links don't work they're gone.
…________________________________
From: Rick Viscomi <[email protected]>
Sent: Monday, February 4, 2019 6:56 PM
To: HTTPArchive/legacy.httparchive.org
Cc: Patrick Meenan; Mention
Subject: Re: [HTTPArchive/legacy.httparchive.org] Bad desktop data dumps for Jan. (#74)
Sorry this is an old issue from 2017 that I updated. Was triaging old issues.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#74 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAbHBdm6b2vWs1eei8ZHqIASlLzykdXfks5vKMjHgaJpZM4LxDxq>.
|
Both desktop data dumps for this month (2017-01-01 and 2017-01-15) are showing malformed data. Known issue?
http://httparchive.org/interesting.php
The text was updated successfully, but these errors were encountered: