- Introduction
- Stored Data
- Preprocessing
- Computing Quickest/Shortest Path
- Efficiency and Feasibility
- References
- Conclusion
Google Maps is a web mapping service and it works on the global scale. Since the data and the operating scale on which it has to perform is tremendously large, it requires really fast calculations and very fast processing with no space for errors. The challenges and tasks to build Google Map are the following:
-
Show the User Interface of Google Map according to layout chosen (default, satellite, terrain) with names of all the places and roads (The data of all road and place names and their locations across the globe can be taken from sources like OpenStreetMap but Google Maps has its own different Database).
-
'Travel From' tab if autofilled by selecting "Your Location" option which needs the 'Google location services' to be turned ON, using the GPS location of the device, its latitude and longitude are received and stored. Or if filled by typing in the name of initial location, the name of the location is searched in the Data and its corresponding latitude and longitude are known through "Geocoding" (translation of address to latitude,longitude). And the same process of Geocoding is adopted for 'Travel To' tab representing the final destination's name.
-
Now, when I have the initial and final location details, sesrch and store them in the Graph that represents the map.
-
Single out all the routes that connect the two locations. This can be done by initiating search from both sides i.e. Travel To and Travel From. Its really hard to find out how, when and where will the two searches stop at one middle location in the route because its highly case dependent. All the roads stretching from both locations need to be searched for whether they can meet in a common location which is loads of work for its concerned algorithm as many roads may stretch from a single node(location).
-
Now, I have all the connecting routes, I can manipulate the quickest path using concerned Algorithm. It considers lots of parameters to judge the road (edge) weight like traffic - the number of devices on that road right now, length of the road, feasibility to travel etc. and then scores the road considering them. The quickest path is the one with the lowest score.
-
As the algorithm used is "greedy", the possible loss can be countered by the calculation of the second quickest path(the Runner Up) and then compare them to wipe out worst case possibility. This must be enough, no need to consider third or so on routes as it will be useless and affecting efficiency. Google Maps suggests two paths and Runner Up sometimes benefits in some cases for example, in long journeys-its No Toll route, when you want to travel other road and not much time difference between both.
-
The roads of the solution routes are now identified within the Maps and displayed on User Interface along with proper colour coding representing traffic conditions as we all know - blue - solution route roads/green - throughout the global map(no delay),yellow(slight or usual traffic), orange(medium near heavy traffic), red(traffic delay), red-black(stopped traffic). Historical data of that road and 'Crowdsourcing' are used to predict traffic conditions.
A storage space within the device is acquired when the Google Maps App is installed, this is used to download the map of Globe that is shown as User Interface even when the Mobile Data is OFF i.e. user is offline. Other than this there are two types of Data that Google maps need to be store:
- One is the associated with the user(device) that comes into existence when Google Maps application is used. This data is stored locally on the device as Location History that can be viewed in the "Your Timeline" feature of Google Maps. Also small bits of informations like your device GPS coordinates, speed are continuously sent anonimously to the server database that is the base of real-time traffic calculations and also contributes to 'usual/default' data which is used to provide services in case Google location services are turned OFF.
- Other one is the Database for the pre-processing and User Interface that is there forever and is always being updated periodically with the help of collected GPS, Location History and other datas of Users, new updates from all the resources. As there are many sources from where Maps get the data, each pack of data has to be engineered to set it to the right format and then combine with other data packs to work together. Huge data comes from numerous sources across the globe namely Government, local transport departments, road sensors, partner companies that collect traffic data for study, Satellite images, contributions via MapMaker, street view, locayion services, etc. We all know that today, memory and storage are no longer issues as they were until the last century. The issue now is 'Time'. With time, we need faster and faster solutions. Hence, the real task lies in organising that huge bunch of data to extract the most out of it. Wisely choosing data structures and attributes so that no unnecessary storage or searching or iteration is done and there is minimum space for errors.
- Size of data, and how to store it?
Combining satellite, aerial and street level images, Google Maps has over "20 petabytes" of data. Lots of storage space is required but Google uses its own invention to handle these much amounts of data. Google Maps uses Bigtable Database to store data in it in form of Tables. Bigtable is used for other Google services as well and is compressed, high performance, prototype example of 'wide column store' data storage system. On the user device, the data is stored as Location History in Your Timeline feature as said before. This is used to improve user experience.
- The data structures involved:
Map is stored as Graph and hence, the whole map is transformed to "Graph" Data Structure. Every intersection of roads on the map is a "vertex" in the graph. Each edge connecting two vertices in the graph represents a road. Every road on the map has a name or number like 11 or 148B or names after towns they are connecting for the roads lower in heirarchy. So each road (edge) holds a value - "hash value" that is associated with informations regarding road like name of the road and is stored with a pointer indicating the information associated with it. Every edge has a weight (score) that represents the amount of time that any vehicle takes to travel through road.
Definitely, a lot of preprocessing goes on to achieve instantaneous results. Fast algorithms require processing on the map data before any routes are computed. I use "Contraction Hierarchies" method to speed-up the calculation of quickest route. This method creates shortcuts during pre-processing phase which are then used during query to skip over some vertices. Road networks are highly hierarchical e.g.some edges and vertices are more important than others based on the fact that they are more travelled or they connect two cities or are not dead end roads like highways. If a road is constructed, it changes infrequently and hence, we have a lot of time to pre-compute the distance between the two cities that it connects before queries are to be addressed. Once computed, this saved "shortcut" with its edge weight can be used for all queries that include that road. Hence, no need to calculate the same thing until there is any update regarding that road. This speeds up the process multiple times and reduces the number of vertices to be looked upon. Most of the run-time during pre-processing phase computes the order in which the vertices are to be contracted.
Mostly we need to go from one city to another without any exit in between them. But there may be queries when this is not the case. Certain queries demand some preferences like highways or ferries, etc. Live updates about traffic or accidents or road-blockage or checking whether the road is feasible/allowed for the user's means of transportation are to be taken care of. All these are considered during 'Customisation Phase'. Considering all the facts, if there is any need to recompute the shortcut depending on the new edge weights, then this is done in this phase i.e. between pre-processing and Query phase.
Time is precious. Hence, we always need the "quickest path" rather than "shortest path" which may consume more time depending on traffic, condition of road, means of transportation and many other parameters. Hence, here I tried to optimize time to travel.
- Algorithms used:
Where Google Maps solves a very complicated day-to-day problem to find the quickest navigation route from any place to any other place according to our means of transport, it is expected that it uses more than one algorithms to deliver its multiple services. Each algorithm addresses a specific issue whether it be finding quickest path or understanding queries or processing GPS or dealing with the complex geometry.
Algorithms used in Google Maps:
-
Algorithms of computational geometry to organize the map data and retrieve it efficiently.
-
Algorithms to draw maps (e.g. project latitude and longitude coordinates, place names for streets, cities, businesses, parks, etc.).
-
Algorithms to understand queries from users.
-
Algorithms to process GPS signals.
-
Algorithms to perform Geocoding, converting addresses to points (or polygons) on a map.
-
Bi-directional: Route is computed both forward from origin and backward from the destination. It reduces the area searched. But has complications to determine when the two searches have met at a point on optimum or near optimum route.
-
Shortcuts(Contraction Hierarchies)
-
A* Algorithm (Dijkstra combined with the heuristic no. 3 discussed later)
-
Algorithms to perform "Reverse Geocoding", converting points to addresses.
- Can you comment on the computation time without any heuristic?
Without any heuristic, the performance and efficiency required can never be attained. Using Dijkstra alone will slow the query time so much that it'll be impossible for the application to run with real world time constraints. Since, there are millions of vertices and edges which make using Dijkstra alone totally impractical. Hence, heuristics are always required for the application to answer in real time and be ready for the next one.
- Mention the heuristics to speed up the computation. Time/Accuracy trade-off?
As discussed above, support of some approximations has to be taken because abnormally fast solutions are desired with such a big database. Hence there is a trade-off between time and accuracy. One has to be sacrificed in order to gain other. As is the case of Dijkstra, accuracy is assured 100% but there is a huge time delay but we are optimising time, so, some assumptions and approximations when added to Dijkstra algorithm, give a huge leap of computation time.
The pseudo-code for Dijkstra is as following:
function Dijkstra(Graph, source):
create vertex set Q
for each vertex v in Graph: // Initialization
dist[v] <- Infinity // Unknown distance from source to v initially referred with infinity
prev[v] <- Undefined // Previous vertex in optimal path from source
add v to Q // All vertices initially in Q (unvisited vertices)
dist[source] <- 0 // Distance from source to source
while Q is not empty:
u <- vertex in Q with min dist[u] // Source vertex will be selected first
remove u from Q
for each neighbour v of u: // where v is still in Q
alt <- dist[u] + length(u,v)
if alt < dist[v]: // A shorter path to v has been found
dist[v] <- alt
prev[v] <- u
return dist[], prev[]
This looks and works absolutely fine and I'll be using this as the basis for A* algorithm but adding some heuristics in it since our database is so large hence we need very fast computations to deliver results instantaneously. Heuristics used through the design:
-
Geocoding's search or any other searching uses Dictionary approach according to me. This means every level of any address is arranged alphabetically. For example, for all levels i.e. continents then countries then states then districts then cities then towns and villages are arranged alphabetically at each level (upto whichever level is applicable for the address). This reduces the area to be searched as is in Dictionary.
-
Top-down heuristics (Contraction Hierarchies) are the best approach according to me but they need more pre-processing time which is okay as we have enough time for pre-processing as said before. This classifies vertices which are part of many shortest paths as more important than those which are only needed for a few shortest paths. To compute a nested dissection, graph is recursively seperated into two halves.
-
A* heuristic : I calculate the slope of the line connecting start and end with the help of latitides and longitudes of locations. Then for each vertex throughout the search, I'll add the tangent of the angle = slope of the line connecting vertex and end with the edge width of the road. This will not only help me to avoid a lot of distributed non-relevant paths but also ensure that I always towards the end.
- Will you be able to reroute quickly if the user takes a different turn?
Yes, a code has to be inserted that will check whether the user is on the route that is calculated according to device's initial location. The live tracking is continuously done through GPS service since bits of informations are sent as stated above. Since, each road is a series of all the location points whose (latitude, longitude) combination commprise the road. Hence, during live tracking, if the device happens to have a different location geocode than those on the pre-calculated route, then the algorithm repeats but this time the road on which the device is currently moving is the starting one. And when a location is detected which is on the pre-calculated route, then no need to continue the algorithm further but just add the pre-calculated route from that location to the final location.
- Theoretically, how would it perform?
Theoritically, it's possible that it can perform the way it's expected but obviously this is way farther than what actually is working as Google Maps. Performance issues are always there when something has to work in non-ideal circumstances but they can be resolved to an extent with improvements. I can't predict of all the problems it may face while running but theoritically this design looks good to go.
- Can it work on the scale of the globe?
Until this is actually working on a platform, it's not possible for me to say whether it can work on global scale or not but definitely a lot of improvements and transformations will be required before this can perform such a complex task. It'll require huge efforts to improve the efficiency. Handling such a large amount of data of the whole landscape, and addressing interactive queries from across the globe will not be possible without abnormally fast solutions and obviously approximations. Besides there are multiple issues that can only be understood and resolved when they come on the platform. Hence, this design may not perform as expected but can be improved to seek wonderful results.
- What all sources did I use to come up with this awesome solution?
I did the Google searches for: [1]. which algorithm google maps use [2]. how does google maps store data [3]. size of data possessed by google maps [4]. Contraction Hierarchies [5]. nested dissection [6]. Bigtable
[0]. I experimented different cases in Google Maps App to test the theory.
Overall, Google Maps service is very much complicated and a lot of the problems and complications arise when something is executed in the real world circumstances. Its remarkable that Maps work in the real world with such high accuracy and user satisfaction. But this success is a result of excellent efforts through these 15 years since its launch tirelessly working on User Experience and also the remarkable years of its making and shaping an idea into reality. Loads of experiments and feedbacks have tansformed Google Maps through the years and its expected that the algorithms and functioning have transformed every year to enhance User Experience. The above design is of course not the same on which Google Maps is based on (true structure will always remain a mistery) and definitely can't match the level of 15 years of hard-work and efforts that hundreds or perhaps thousands of people have made to make it legendary but this is a small effort to understand what may happens behind the scenes with just a click. Not only Google Maps but every service is just amazing. Not enough can be realised how much it takes to make anything possible with all the complications of real world sitting outside. I really appreciate everyone's efforts and I think things like Google Maps have revolutionalized the possibilities of technology in the real world.