Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamReader not supporting Date values #45090

Open
poppgs opened this issue Dec 20, 2024 · 0 comments
Open

StreamReader not supporting Date values #45090

poppgs opened this issue Dec 20, 2024 · 0 comments

Comments

@poppgs
Copy link

poppgs commented Dec 20, 2024

Describe the bug, including details regarding any error messages, version, and platform.

I have been given a parquet file with the following schema:

Name: string
Ident: string
Date: date32[day]
EventTimestamp: timestamp[us]

I dumped out the schema using a simple python script:

#! /usr/bin/env python3
import pyarrow as pa
import pyarrow.parquet as pq
import sys

schema = pq.read_schema(sys.argv[1])
print(schema)

I was trying to figure out what C++ type is date32[day]? I've looked all over and it SEEMS like the underlying data type is int32_t. But if I try to read it using that type, I get:


terminate called after throwing an instance of 'parquet::ParquetException'
  what():  Column converted type mismatch.  Column 'Date' has converted type 'DATE' not 'INT_32'

I've also tried using a string but I get this error:

terminate called after throwing an instance of 'parquet::ParquetException'
  what():  Column physical type mismatch.  Column 'Date' has physical type 'INT32' not 'BYTE_ARRAY'

Here's my code, almost verbatim what is on the example page for Parquet. Nothing I try allows me to read the DATE type:


#include <iostream>
#include "arrow/io/file.h"
#include "parquet/stream_reader.h"
#include "readfile.h"

int main(int argc, char **argv)
{

if (argc < 2)
{
  std::cout << "Usage: " << argv[0] << " <input filename" << std::endl;
  exit(2);
}
char *input_filename = argv[1];
std::cout << "Input filename is " << input_filename << std::endl;
std::shared_ptr<arrow::io::ReadableFile> infile;

PARQUET_ASSIGN_OR_THROW(
    infile,
    arrow::io::ReadableFile::Open(input_filename));

parquet::StreamReader os{parquet::ParquetFileReader::Open(infile)}
std::string name;
std::string ident;
int32_t date;
std::string ts_event_utc;

while (!os.eof())
{
  os >> name >> ident >> date >> ts_event_utc >> parquet::EndRow;      
}

Looking at the code in streamreader.cc, I see no reference to a DATE type in either the exceptions at the top of the file or the operators below.

Component(s)

C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant