- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 376
Int vs BigInt
Developers,
This topic came up off list and I thought this should be part of a more public discussion, so here are my thoughts on it. I think both the ideas below are worth pursuing but are huge tasks that are currently not funded and I'm concerned that we should not tackle them on a piecemeal, ad-hoc basis.
Problem:
postgis loads data with a gid column that is typed bigint. OSM data has node and edge ids that can not be held in an integer field. Today, all pgrouting code assumes ids are integer and larger numbers overflow the field causing problems.
So the obvious answer is to update the code to be bigint compatible.
The issues related to doing this are as follows:
- 
pgrouting has the bad habit of using ids as indexes into arrays which is a problem. The code partially minimizes the indexing issue by located the minimum id and subtracting that from all other ids shifting the range to eliminate the leading empty ids, but if the (max_id - min_id) is large and the ids are sparsely allocated across the range we still need to allocate (max_id - min_id + 1) elements in our storage structures. For example if you have two ids '2' and '1,000,002' the code allocates 1,000,001 elements. This can obviously be fixed with some additional coding. 
- 
the code would/might need to be changed to such that the boost structures can support bigint values and this will double the amount of storage needed for ids. There is also the possibility that if the point above does renumbering and we limit the number of ids in any given request to unsigned integer that the code might not need changes and we would only limit the size of the problem we can solve and max unsigned integer sized problem is still larger than anyone would want to wait for a solution. 
- 
this would also imply that the output data types we use would get converted to bigint also. 
- 
If we make this change, we need to do it to all our algorithms so we keep the commands consistent and compatible with one another. This would involve updating all of the follow algorithms in pgrouting: - apsp_johnson
- apsp_warshall
- astar
- bd_astar
- bd_dijkstra
- common
- dijkstra
- driving_distance
- kdijkstra
- ksp
- shooting_star ** deprecated, will be deleted **
- trsp
- tsp
- vrp_basic
- vrpdptw
 
While this does not have anything to do with this specific problem, we need to consider any major reworking of the code like this would involve with the fact for historical evolution of the code base we have copied source code from one algorithm to another so we have a lot of duplicated code. It would make a lot of sense at this point to refactor some of this code into common set of reusable C++ base classes. To put this into context all of the follow code is in someway based off of the original dijkstra code:
- astar
- bd_astar
- bd_dijkstra
- dijkstra
- driving_distance
- kdijkstra
- ksp
So every time we make a change to one of these algorithms, there is a good chance that the problem exists in one or more of the other ones. We have been struggling with this problem for some time.
I don't want to blowup this problem into something that is so big it will never get done, but I do think it is important that we think about these issues and that we put together a plan that takes these issues into account in the planning of these and other potential changes.