Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time speed #3

Open
laurentg opened this issue Apr 14, 2015 · 4 comments
Open

Real-time speed #3

laurentg opened this issue Apr 14, 2015 · 4 comments

Comments

@laurentg
Copy link

Unless I'm mistaken, there is no "real-time current speed" information in the format. Shouldn't we have this information on the format too?

We could also think on a predicted speed(s), if available.

@kpwebb
Copy link
Contributor

kpwebb commented Apr 17, 2015

Ok! I've updated the docs and proto buff message spec to handle both baseline and current conditions. Curious to hear your thoughts.

@laurentg
Copy link
Author

On the BaselineStats message: are the hourOfWeek... fields a list of speeds for each 1h bin?

Why not having a dedicated message for each bin, such as this:

message SpeedQuartiles {
    required float averageSpeed = 1;
    optional float topQuartileSpeed = 2;
    optional float bottomQuartileSpeed = 3;
}
message BaselineStats {
    required SegmentDefinition segment = 1;
    required float averageSpeed = 2;
    repeated BinStats hourOfWeek = 3;
}

Just a small note: if we have both topQuartile and bottomQuartile, should we not have the averageSpeed be a medianSpeed for consistency (aka middleQuartile) ? Mathematically, you can have the topQuartile lower than the average (although you would need a few spurrious rockets in the samples). And for a routing point of view, basing computations on the median speed could make more sense than the average (although both would be rather similar anyway, the median being probably a bit lower), but I'm not an expert on this.

@laurentg
Copy link
Author

About current condition, should the currentWindowStartTimestamp/ currentWindowEndTimestamp should be marked as "required" ? There could be an option to skip them in the request, as for some usage they could be unnecessary (thinking of route planners again).

Also, providing both start/end 64 bits timestamps require lots of bandwith, maybe only a start plus offset; and/or a 32 bits seconds timestamp, or even shorter, two offsets to a global message "current" timestamp.

@abyrd
Copy link

abyrd commented May 6, 2015

Agreed that if we are including quartiles, we should include the median rather than the mean.

I see the column-store format (arrays containing all measurements for all hours of the week) as mainly a space optimization. If we're not aiming for maximum compactness then it is indeed clearer to have one "struct" (compound protobuf message) per hour, with all the quantile fields present in each of those messages.

Here we have a dichotomy between historical data over all hours of the week, versus current data for only the segments where we have observations. This implies that data consumers will always handle all prediction and interpolation, which is unlikely, so there does need to be a way to express full-dataset predicted results where the inference/interpolation is done by the producer.

From what I've read "required" fields in Protobuf are essentially deprecated as they have been found to cause unexpected fragility, so I'd be in favor of just avoiding required fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants