Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Google Takeout Location History #1160

Merged
merged 26 commits into from
Aug 28, 2023
Merged

Add support for Google Takeout Location History #1160

merged 26 commits into from
Aug 28, 2023

Conversation

postmaxin
Copy link
Contributor

This adds support for parsing the location history provided by Google Takeout. This makes it possible to convert a month's, year's, or your entire location history at once instead of having to export one day's history at a time on Google Maps.

There is documentation in the xmldoc folder with some examples.

It works great, but there are one, possibly two, other additions I'd like to make in following diffs:

  1. Add the ability to select date ranges. For example, I went on a road trip that started in the last week of February, but right now I can only select tracks at the resolution of a whole month.

  2. Add support for the higher-resolution "roadSegment" blocks in newer "activitySegment" blocks. This will have to be optional as you have to query the Google Maps API to get lat/long, which costs money. However, if you're willing to pay for it, you can get great detail out of it. They look like this:

       "roadSegment": [{
          "placeId": "ChIJBdXmuuWaK4gRoiClERhZcy8",
          "duration": "76s"
        }, {
          "placeId": "ChIJ4ekNlOWaK4gRxu-moMRFo60",
          "duration": "58s"
        }, {
          "placeId": "ChIJ10qEheWaK4gRQ1T24wEuuN4",
          "duration": "58s"
        }, {
          "placeId": "ChIJe7NWfu-aK4gRFHBB6P0PMe4",
          "duration": "58s"
[...]

This adds support for parsing the location history provided by Google Takeout.
This makes it possible to convert a month's, year's, or your entire location
history at once instead of having to export one day's history at a time on
Google Maps.

There is documentation in the `xmldoc` folder with some examples.

It works great, but there are one, possibly two, other additions I'd like to
make in following diffs:

1. Add the ability to select date ranges. For example, I went on a road trip
   that started in the last week of February, but right now I can only select
	 tracks at the resolution of a whole month.

2. Add support for the higher-resolution "roadSegment" blocks in newer
   "activitySegment" blocks. This will have to be optional as you have to
	 query the Google Maps API to get lat/long, which costs money. However,
	 if you're willing to pay for it, you can get great detail out of it.
	 They look like this:

```
       "roadSegment": [{
          "placeId": "ChIJBdXmuuWaK4gRoiClERhZcy8",
          "duration": "76s"
        }, {
          "placeId": "ChIJ4ekNlOWaK4gRxu-moMRFo60",
          "duration": "58s"
        }, {
          "placeId": "ChIJ10qEheWaK4gRQ1T24wEuuN4",
          "duration": "58s"
        }, {
          "placeId": "ChIJe7NWfu-aK4gRFHBB6P0PMe4",
          "duration": "58s"
[...]
```
@postmaxin
Copy link
Contributor Author

I am seeing some errors from the builds now. It's worth noting that these are on Ubuntu focal and jammy and I was developing on kinetic:

+ xmllint --noout --relaxng http://docbook.org/xml/5.0/rng/docbook.rng xmldoc/readme.xml
xmldoc/readme.xml:19: element para: Relax-NG validity error : Did not expect element para there
xmldoc/readme.xml:2: element section: Relax-NG validity error : Did not expect element section there
xmldoc/readme.xml:2: element section: Relax-NG validity error : Element chapter has extra content: section
xmldoc/readme.xml:11: element info: Relax-NG validity error : Element book has extra content: info
xmldoc/readme.xml fails to validate

I couldn't get xmllint to run at all, so I was just guessing on the doc formatting. I'll make a focal instance and try to repro.

googletimeline.cc: In member function ‘int GoogleTimelineFormat::add_activity_segment(const QJsonObject&)’:
googletimeline.cc:262:41: error: conversion from ‘QJsonValue’ to non-scalar type ‘const QJsonValueRef’ requested
  262 |     for (const QJsonValueRef pointRef : points) {
      |                                         ^~~~~~
googletimeline.cc:278:40: error: conversion from ‘QJsonValue’ to non-scalar type ‘const QJsonValueRef’ requested
  278 |     for (const QJsonValueRef pointRef: points) {
      |                                        ^~~~~~

This does not happen in kinetic. Could be a qt5/qt6 thing? Either way, I'll try to repro/fix on focal as well.

@postmaxin
Copy link
Contributor Author

OK, it looks like this PR has made it past the computers (and I've learned a lot about how this projects' testing works). I eagerly await human input. ☺

@tsteven4
Copy link
Collaborator

Here are some questions about this pr. If we continue I will have comments about the implementation.

Is there enough interest in this format to justify it's inclusion?
Is there documentation available for this format? If so please reference it.
In the provided test case numerous sparse tracks are created, including some without any points. Is there any value in creating tracks with no points?
Wouldn't it be better to detect (QJsonObject::contains(), QJsonValue::Undefined) points without coordinates instead of relying on QJsonValue::toInt default values to create null islands and deleting them later?
/* as waypointPath does not include timestamps, set every * point along the path to the start of the activity */
is problematic. As an example, it results in kml balloons where the average speed is not bounded by the minimum and maximum speeds. Wouldn't it be better to leave the default QDateTime, which is invalid, as the creation time?
The provided test case exercises gpsbabel, but fails to judge the results (e.g. compare ${REFERENCE}/x ${TMPDIR}/y)

@postmaxin
Copy link
Contributor Author

Is there enough interest in this format to justify it's inclusion?

There are plenty of docs online about how to download your location history from Google Takeout, so I'd assume someone other than I will find it useful.

I am also doing some work on the prettymaps project to allow people to paint GPX tracks on it (marceloprates/prettymaps#64) and will be writing a how-to on how to combine the two.

In general. I think anybody who gets sick of having to download their KML from google timeline a day at a time will be interested in this format. It might be worth referencing it in the KML format docs as well.

Is there documentation available for this format? If so please reference it.

No, there is not. Google just gives you a big wad of JSON to figure out for yourself.

In the provided test case numerous sparse tracks are created, including some without any points. Is there any value in creating tracks with no points?

Right now it is copying the data exactly as it appears in the JSON, except filtering out anything with a lon/lat of 0/0. Remember this is the raw data that google has collected about your location history over the years. I'll have it filter out empty tracks as well.

Wouldn't it be better to detect (QJsonObject::contains(), QJsonValue::Undefined) points without coordinates instead of relying on QJsonValue::toInt default values to create null islands and deleting them later?

 /* as waypointPath does not include timestamps, set every * point along the path to the start of the activity */

is problematic. As an example, it results in kml balloons where the average speed is not bounded by the minimum and maximum speeds. Wouldn't it be better to leave the default QDateTime, which is invalid, as the creation time?

I'll have a look at at that too.

The provided test case exercises gpsbabel, but fails to judge the results (e.g. compare ${REFERENCE}/x ${TMPDIR}/y)

I'll have it make sure a few tracks and points actually exist in the output. I meant to do that after adding date range support, but I wanted to get a PR out to you all before this PR got any longer.

googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
googletakeout.cc Outdated Show resolved Hide resolved
@postmaxin
Copy link
Contributor Author

Wouldn't it be better to detect (QJsonObject::contains(), QJsonValue::Undefined) points without coordinates instead of relying on QJsonValue::toInt default values to create null islands and deleting them later?

In my complete Google history, I have 984 of these events in 2014, 715 of them 2015, then zero from 2016-2021, then one in 2022. There was a bug that Google (mostly) fixed.

The way I have it now, the logic to pull it out only exists in one place, and I'm using rte_name and creation_time in the warning. It could technically be written to be very slightly more CPU efficient... but these should also be the vast minority of records from many years ago,

@tsteven4
Copy link
Collaborator

please quiet expected messages with something like

 if (global_opts.debug_level > 0)  {
  ...
}

By default, debug_level will be 0. You can change it on the command line with -D n.

* add `googletakeout.h` to HEADERS
* convert most `#define`s to static members
* don't pollute global namespace
* rename logging functions
* convert some methods to static functions for clang-tidy
* `title_case` loop by reference instead of by index (clang-tidy)
* better use of debugging levels
* drop dead simplifiedRawPath code for now
* drop empty tracks
* default constructor for GoogleTakeoutInputStream
* add some tests that check the content of the data

Also, I found some unofficial documentation about the location history
format and added that (thanks @CarlosBergillos !)
@postmaxin
Copy link
Contributor Author

please address the comment above about quieting the status messages.

Can you explain this further? I thought all of my status messages were behind a Warning(), fatal(), or Debug().

@postmaxin
Copy link
Contributor Author

I believe "Robert Lipe, [email protected]" will want to be listed as an owner on the copyright. ask @GPSBabelDeveloper.

Robert -- what is your stance here? I wrote this and licensed it under the license of your choosing. Do you want to claim authorship?

rather than go through all the trouble of xml_slice and test_takeout_xpath() can we just do a text compare, using the compare function defined in testo, of ${TMPDIR}/googletakeout.gpx with a reference file you add? While this occasionally results in irrelevant compare failures this method has served us well for many years. It alleviates the need for other dependencies and the maintenance of the judgment code you proposed.

I resisted that because it means I am testing the GPX output, not the Google Takeout input. For example, if we discover a bug in this code in the future, we won't be able to sanely write a regression test that proves the bug is fixed with any sort of context about the bug.

@tsteven4
Copy link
Collaborator

please address the comment above about quieting the status messages.

Can you explain this further? I thought all of my status messages were behind a Warning(), fatal(), or Debug().

Those functions output the message when called regardless of any state. That is fine for real warnings and fatal errors. But for detailed status or debug messages the caller needs to decide when to output the message. For example

gpsbabel/main.cc

Lines 277 to 280 in 4386f99

if (global_opts.debug_level > 0) {
Warning().noquote() << QStringLiteral("%1: reader %2 took %3 seconds.")
.arg(MYNAME, ivecs.fmtname, QString::number(timer.elapsed()/1000.0, 'f', 3));
}

You can see your test, and "gpsbabel -i googletakeout -f reference/googletakeout" results 341 messages being sent to standard error.

@postmaxin
Copy link
Contributor Author

Can you explain this further? I thought all of my status messages were behind a Warning(), fatal(), or Debug().
Those functions output the message when called regardless of any state.

Oh. I completely misunderstood them. I thought "Warning" only applied when we were set to be more verbose than log level "Info", etc. I'll fix that.

@postmaxin
Copy link
Contributor Author

Those functions output the message when called regardless of any state.

Why does Debug() take an integer, then?

@tsteven4
Copy link
Collaborator

I resisted that because it means I am testing the GPX output, not the Google Takeout input. For example, if we discover a bug in this code in the future, we won't be able to sanely write a regression test that proves the bug is fixed with any sort of context about the bug.

I agree it's an imperfect methodology, but it has served us well for almost two decades. It occasionally requires us to update a reference file which contains expected output. It is unfortunate that we mixed actual reference files with files containing expected output. In reality the most common difficulty has been cross platform rounding issues.

In the case you cite, we would expect the creator of the bug fix to either alter an existing reference file of expected output, or create a new test that fails without the fix and passes with the fix.

@tsteven4
Copy link
Collaborator

Why does Debug() take an integer, then?

The rarely used function Debug() interprets its parameter as an indent level.

…s makes a difference, but give Robert co-copyright.
@postmaxin
Copy link
Contributor Author

I agree it's an imperfect methodology, but it has served us well for almost two decades.

Okay. You want to be on the cutting edge of C++, which I think is awesome. If you are not interested in me improving our testability, I'll make the test a simple file comparison.

In the case you cite, we would expect the creator of the bug fix to either alter an existing reference file of expected output, or create a new test that fails without the fix and passes with the fix.

It would be easier to add comments in the test code about the reason for a test if it was not just comparing flat files. But that's fine. I can update the PR to just compare flat files.

@tsteven4
Copy link
Collaborator

please address the comment above about quieting the status messages.

Can you explain this further? I thought all of my status messages were behind a Warning(), fatal(), or Debug().

Robert -- what is your stance here?

I believe the answer is related to defending our license terms which Robert has had to do in the past. I seem to remember this gets complicated if the owner cannot be found (like many of our past contributors). Note that the owner of the copyright is listed, not the author of the code. Listing multiple owners may be acceptable.

@postmaxin
Copy link
Contributor Author

The rarely used function Debug() interprets its parameter as an indent level.

Thanks. I have not had the opportunity to work in C++ for at least 7 years now and I came in with incorrect assumptions.

@tsteven4
Copy link
Collaborator

In regard to
revert constructor change from ...
Sorry. The mystery is how that worked in Qt6. The answer is an undocumented constructor:
https://github.com/qt/qtbase/blob/d3b5353380797f3b67599ccebc5dc916057681e5/src/corelib/tools/qlist.h#L295-L298
I should have used a braced init list so the documented constructor QList::QList(std::initializer_list args) came into play.

 GoogleTakeoutInputStream(const QString& source) : sources({source}) {}

@postmaxin
Copy link
Contributor Author

Hey,
What else is needed to get this approved? I want to build on top of this code, so I want to avoid having a stack of diffs that may require cascading changes.

@tsteven4
Copy link
Collaborator

Robert and I have been communicating and we are about there. If you will humor me here is a patch with one fix and some trivia.

  • title_case can be static. this is the only substantial change.
  • regenerate includes using include-what-you-use and a custom mapping file. This is imperfect, but it is the best we have.
  • reorder .h file. we are trying to organize similar to https://google.github.io/styleguide/cppguide.html#Declaration_Order. It makes new classes easier for us to understand to have the similarly organized.

If you can apply this hopefully it will head off cascading changes.

thanks
patch2.txt

@postmaxin
Copy link
Contributor Author

regenerate includes using include-what-you-use and a custom mapping file. This is imperfect, but it is the best we have.

Not a fan of this one, because it unnecessarily increases the verbosity, but it's fine. Thank you for helping me get to the finish line!

@tsteven4 tsteven4 merged commit fb04802 into GPSBabel:master Aug 28, 2023
20 checks passed
@tsteven4
Copy link
Collaborator

Thanks.

@postmaxin postmaxin deleted the takeout2 branch August 28, 2023 22:31
@GPSBabelDeveloper
Copy link
Collaborator

GPSBabelDeveloper commented Aug 28, 2023 via email

robertlipe pushed a commit that referenced this pull request Aug 12, 2024
* Add support for Google Takeout Location History

This adds support for parsing the location history provided by Google Takeout.
This makes it possible to convert a month's, year's, or your entire location
history at once instead of having to export one day's history at a time on
Google Maps.

There is documentation in the `xmldoc` folder with some examples.

It works great, but there are one, possibly two, other additions I'd like to
make in following diffs:

1. Add the ability to select date ranges. For example, I went on a road trip
   that started in the last week of February, but right now I can only select
	 tracks at the resolution of a whole month.

2. Add support for the higher-resolution "roadSegment" blocks in newer
   "activitySegment" blocks. This will have to be optional as you have to
	 query the Google Maps API to get lat/long, which costs money. However,
	 if you're willing to pay for it, you can get great detail out of it.
	 They look like this:

```
       "roadSegment": [{
          "placeId": "ChIJBdXmuuWaK4gRoiClERhZcy8",
          "duration": "76s"
        }, {
          "placeId": "ChIJ4ekNlOWaK4gRxu-moMRFo60",
          "duration": "58s"
        }, {
          "placeId": "ChIJ10qEheWaK4gRQ1T24wEuuN4",
          "duration": "58s"
        }, {
          "placeId": "ChIJe7NWfu-aK4gRFHBB6P0PMe4",
          "duration": "58s"
[...]
```

* rename variable

* this TODO is done

* now builds on focal, fix documentation

* patch -p2 < ~/testo.log

* use `&&` instead of `and` for old compilers

* don't leak objects when we skip events at exact lat/lon 0/0

* use a reference (even though the compiler would have optimized that away)

* rename to googletakeout

* fix docs

* Address code review requests

* add `googletakeout.h` to HEADERS
* convert most `#define`s to static members
* don't pollute global namespace
* rename logging functions
* convert some methods to static functions for clang-tidy
* `title_case` loop by reference instead of by index (clang-tidy)
* better use of debugging levels
* drop dead simplifiedRawPath code for now
* drop empty tracks
* default constructor for GoogleTakeoutInputStream
* add some tests that check the content of the data

Also, I found some unofficial documentation about the location history
format and added that (thanks @CarlosBergillos !)

* apply @tsteven4 's patch. thank you!!

* add license

* fix logging

* As I assigned the correct license to my code, I am skeptical that this makes a difference, but give Robert co-copyright.

* make the test more boring

* rm xml_slice

* revert constructor change from @tsteven4 's patch

* apply @tsteven4 's latest constructor

* apply @tsteven4 's latest patch -- `cmake --build . --target check` passes

---------

Co-authored-by: tsteven4 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants