Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Integers correctly #20

Open
rafaqz opened this issue Dec 19, 2022 · 7 comments
Open

Handle Integers correctly #20

rafaqz opened this issue Dec 19, 2022 · 7 comments

Comments

@rafaqz
Copy link
Collaborator

rafaqz commented Dec 19, 2022

From the dbase spec

I | Long | 4 bytes.        Leftmost bit used to indicate sign, 0 negative.

A zero sign bit means negative numbers. E.g. not a regular julia/C Int32 at all:

julia> bitstring(Int32(-1))
"11111111111111111111111111111111"
@visr
Copy link
Collaborator

visr commented Dec 19, 2022

How odd. So the sign is reversed, and also it is 32 bits instead of 64:

elseif fld == 'I' || fld == '+'
rt = Int

Are integers in practice mostly encoded as type N without decimals, and that this therefore hasn't really been an issue before? Regardless would be good to fix.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Dec 19, 2022

Yeah this is a pretty weird format.

I think yes, mostly .dbf uses string numbers like 'N' for everything so we just haven't noticed yet. We'll need to find some test data that has the column types missing from the current tests.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Dec 19, 2022

According to this I is only used by visual fox pro anyway:

http://www.independent-software.com/dbase-dbf-dbt-file-format.html

And Microsoft ODBC doesn't use them at all
https://learn.microsoft.com/en-us/sql/odbc/microsoft/dbase-data-types?view=sql-server-ver16
Maybe we can just not handle I at all.

Hmm maybe its not so clear what ODBC uses.
The dbase 7 spec seems to be what this package was built from? but probably most files are III or V ?

@visr
Copy link
Collaborator

visr commented Dec 19, 2022

dc0cafb mentions dBase III+ / xBase. Most of that code is still the same as far as dBase support. Later I used the references under https://github.com/JuliaData/DBFTables.jl#format-description-resources, of which the .dk site mentions:

Note that this structure is valid for Xbase - and dBASE v. III - 5. Later versions of dBASE has a different layout, like dBASE 7

So I wouldn't say this package is based on v7, but older versions, that seem to be more commonly used with shapefiles.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Dec 19, 2022

Ok so its dbase III with some types mixed in from later versions and Fox Pro.

This python package has another breakdown of the versions:
https://github.com/ethanfurman/dbf/tree/master/dbf

Maybe for correctness and simplicity we should only support dBase III ?

I still don't understand the I it seems to be meant to be a Sign–magnitude zero negative long int but most packages in other languages seem to be just reading it in as a regular twos-complement 32 it integer. If everyone does it wrong then we're fine, right??

@visr
Copy link
Collaborator

visr commented Dec 19, 2022

Ok so its dbase III with some types mixed in from later versions and Fox Pro.

That is basically what people call xBase it seems. This is what Wikipedia says:

xBase is a name applied to clones of the dBase, typically dBASE III+–V. Most xBase programs either use the format directly or uses a derived format with custom extensions.

So far my approach wasn't to implement a spec, but to add what is needed based on real world data.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Dec 19, 2022

Yes that blog post linked above for the .Net version seems to say the same thing, the spec for early versions is unclear. It's hilarious how widely this is used in GIS given there is no concrete spec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants