Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the types to match the most recent Apache Arrow version #39

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mharju
Copy link

@mharju mharju commented Nov 13, 2019

Hi!

Really love what you have done!

I managed to compile this now with OS X (that is still a bit messy so I'll have to refactor that a bit before maybe posting a PR about that) and also updated the code to work with the most recent Apache Arrow library.

@cldellow
Copy link
Owner

Hey Mikko, thanks for the contribution!

Looks like the PR failed to build in Travis: https://travis-ci.org/cldellow/sqlite-parquet-vtable/builds/611367321

I think that's expected, since the makefile is still linking against 0.9.0.

I tried bumping it to 0.15.1 and building on a clean Ubuntu 16.04 c5n.2xlarge instance. But it failed: https://gist.github.com/cldellow/4634a192c1f08399975a2b730edee044 Looks like something to do with libz.a in parquet-cpp.

Before I spend some time digging into that, maybe you can save me some time :) Did you have to change something locally to get it to build?

It's possible that if you were only building on OS X you didn't hit this issue - maybe your build was using your shared libraries. The Linux build currently builds static libraries so that the versioning story is straightforward for end users.

@cldellow
Copy link
Owner

I tweaked the Makefile to use 0.15.1 (in the upgrade-deps branch -- see diff).

Now I get new build failures :) You can see them at https://gist.github.com/cldellow/ab127e57e2da4d48b2b517fa9e706296

When you have a moment, can you let me know if you have any ideas on what the issue might be?

@mharju
Copy link
Author

mharju commented Nov 14, 2019

Hmm, that seems odd – maybe we need an explicit include for parquet/types.h ..?

@mharju
Copy link
Author

mharju commented Nov 14, 2019

It's possible that if you were only building on OS X you didn't hit this issue - maybe your build was using your shared libraries. The Linux build currently builds static libraries so that the versioning story is straightforward for end users.

Yes – I'm actually building a dynamic library for OS X:s needs but I think the error you get on Ubuntu seems to be related to a missing header or something similar – not regards to linking?

@cldellow
Copy link
Owner

Arg, yes, I think you're right that's it to do with the headers. I had tweaked the includes list to update the location of some of the compression libraries, and that resulted in picking up a different parquet/types.h file.

If I fixed that, then I get:

g++  -c -o parquet_cursor.o /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc -I /home/ubuntu/sqlite-parquet-vtable/build/linux/../../sqlite -I /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src -I /home/ubuntu/sqlite-parquet-vtable/build/linux/parquet-cpp/arrow_ep-prefix/src/arrow_ep/cpp/src -I /home/ubuntu/sqlite-parquet-vtable/build/linux/parquet-cpp/src -O3 -std=c++11 -Wall -fPIC -g
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc: In member function ‘void ParquetCursor::ensureColumn(int)’:
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:780:37: error: invalid conversion from ‘long long int*’ to ‘parquet::TypedScanner<parquet::PhysicalType<(parquet::Type::type)2u> >::T* {aka long int*}’ [-fpermissive]
           s->NextValue(&rv, &wasNull);
                                     ^
In file included from /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/api/reader.h:23:0,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_table.h:6,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.h:5,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:1:
/home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/column_scanner.h:145:8: note:   initializing argument 1 of ‘bool parquet::TypedScanner<DType>::NextValue(parquet::TypedScanner<DType>::T*, bool*) [with DType = parquet::PhysicalType<(parquet::Type::type)2u>; parquet::TypedScanner<DType>::T = long int]’
   bool NextValue(T* val, bool* is_null) {
        ^
/home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:865:46: error: invalid conversion from ‘long long int*’ to ‘parquet::TypedScanner<parquet::PhysicalType<(parquet::Type::type)2u> >::T* {aka long int*}’ [-fpermissive]
         hadValue = s->NextValue(&rv, &wasNull);
                                              ^
In file included from /home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/api/reader.h:23:0,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_table.h:6,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.h:5,
                 from /home/ubuntu/sqlite-parquet-vtable/build/linux/../../parquet/parquet_cursor.cc:1:
/home/ubuntu/sqlite-parquet-vtable/build/linux/arrow/cpp/src/parquet/column_scanner.h:145:8: note:   initializing argument 1 of ‘bool parquet::TypedScanner<DType>::NextValue(parquet::TypedScanner<DType>::T*, bool*) [with DType = parquet::PhysicalType<(parquet::Type::type)2u>; parquet::TypedScanner<DType>::T = long int]’
   bool NextValue(T* val, bool* is_null) {
        ^
Makefile:68: recipe for target 'parquet_cursor.o' failed
make: *** [parquet_cursor.o] Error 1

This appears to be related to the long long changes - long term, we can do some platform specific ifdef, I guess. Although perhaps there's a third, more portable option that we should be using?

For now, I just commented that change to make progress on getting it to build on Linux. At last, it built, but when trying to load the resulting shared library to run the tests, one gets:

Error: build/linux/libparquet.so: undefined symbol: _ZN5arrow16PrimitiveBuilderINS_9Int32TypeEE4InitEl

which decodes to:

ubuntu@ip-172-31-88-73:~/sqlite-parquet-vtable$ c++filt -n _ZN5arrow16PrimitiveBuilderINS_9Int32TypeEE4InitEl
arrow::PrimitiveBuilder<arrow::Int32Type>::Init(long)

which makes me think my includes are still messed up, specifically the one providing the template definitions.

This is probably a trivial bug for most people. :) But for me, I haven't programmed professionally in C++ for a long time, so I think I'll need to set aside at least a few hours to make progress. Hopefully in a week or so I'll have time available to make progress.

cldellow pushed a commit that referenced this pull request Nov 14, 2019
See comment at #39 (comment)
in #39 for log messages.

I think I need to refactor the `-I` directives -- I think when I updated them to
fix finding compression library headers, I brought in an internal set of arrow
includes, perhaps?
@mharju
Copy link
Author

mharju commented Nov 15, 2019

This appears to be related to the long long changes - long term, we can do some platform specific ifdef, I guess. Although perhaps there's a third, more portable option that we should be using?

Yes, this sounds good. Do you want me to give it a stab or do you have time to do it yourself?

This is probably a trivial bug for most people. :) But for me, I haven't programmed professionally in C++ for a long time, so I think I'll need to set aside at least a few hours to make progress. Hopefully in a week or so I'll have time available to make progress.

I feel you – the last time I've done C++ professionally in the beginning of the millennium so a lot has changed since :D

@cldellow
Copy link
Owner

Re long long - that'd be great! It may be as simple as using int64_t?

@mharju
Copy link
Author

mharju commented Nov 22, 2019

I added long_t typedef'd to int64_t, is that an overkill?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants