- [PARQUET-331] - Merge script doesn't surface stderr from failed sub processes
- [PARQUET-336] - ArrayIndexOutOfBounds in checkDeltaByteArrayProblem
- [PARQUET-337] - binary fields inside map/set/list are not handled in parquet-scrooge
- [PARQUET-338] - Readme references wrong format of pull request title
- [PARQUET-279] - Check empty struct in the CompatibilityChecker util
- [PARQUET-339] - Add Alex Levenson to KEYS file
- PARQUET-151 - Null Pointer exception in parquet.hadoop.ParquetFileWriter.mergeFooters
- PARQUET-152 - Encoding issue with fixed length byte arrays
- PARQUET-164 - Warn when parquet memory manager kicks in
- PARQUET-199 - Add a callback when the MemoryManager adjusts row group size
- PARQUET-201 - Column with OriginalType INT_8 failed at filtering
- PARQUET-227 - Parquet thrift can write unions that have 0 or more than 1 set value
- PARQUET-246 - ArrayIndexOutOfBoundsException with Parquet write version v2
- PARQUET-251 - Binary column statistics error when reuse byte[] among rows
- PARQUET-252 - parquet scrooge support should support nested container type
- PARQUET-254 - Wrong exception message for unsupported INT96 type
- PARQUET-269 - Restore scrooge-maven-plugin to 3.17.0 or greater
- PARQUET-284 - Should use ConcurrentHashMap instead of HashMap in ParquetMetadataConverter
- PARQUET-285 - Implement nested types write rules in parquet-avro
- PARQUET-287 - Projecting unions in thrift causes TExceptions in deserializatoin
- PARQUET-296 - Set master branch version back to 1.8.0-SNAPSHOT
- PARQUET-297 - created_by in file meta data doesn't contain parquet library version
- PARQUET-314 - Fix broken equals implementation(s)
- PARQUET-316 - Run.sh is broken in parquet-benchmarks
- PARQUET-317 - writeMetaDataFile crashes when a relative root Path is used
- PARQUET-320 - Restore semver checks
- PARQUET-324 - row count incorrect if data file has more than 2^31 rows
- PARQUET-325 - Do not target row group sizes if padding is set to 0
- PARQUET-329 - ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY was removed (incompatible change)
- PARQUET-175 - Allow setting of a custom protobuf class when reading parquet file using parquet-protobuf.
- PARQUET-223 - Add Map and List builiders
- PARQUET-245 - Travis CI runs tests even if build fails
- PARQUET-248 - Simplify ParquetWriters's constructors
- PARQUET-253 - AvroSchemaConverter has confusing Javadoc
- PARQUET-259 - Support Travis CI in parquet-cpp
- PARQUET-264 - Update README docs for graduation
- PARQUET-266 - Add support for lists of primitives to Pig schema converter
- PARQUET-272 - Updates docs decscription to match data model
- PARQUET-274 - Updates URLs to link against the apache user instead of Parquet on github
- PARQUET-276 - Updates CONTRIBUTING file with new repo info
- PARQUET-286 - Avro object model should use Utf8
- PARQUET-288 - Add dictionary support to Avro converters
- PARQUET-289 - Allow object models to extend the ParquetReader builders
- PARQUET-290 - Add Avro data model to the reader builder
- PARQUET-306 - Improve alignment between row groups and HDFS blocks
- PARQUET-308 - Add accessor to ParquetWriter to get current data size
- PARQUET-309 - Remove unnecessary compile dependency on parquet-generator
- PARQUET-321 - Set the HDFS padding default to 8MB
- PARQUET-327 - Show statistics in the dump output
- PARQUET-229 - Make an alternate, stricter thrift column projection API
- PARQUET-243 - Add avro-reflect support
- PARQUET-262 - When 1.7.0 is released, restore semver plugin config
- PARQUET-292 - Release Parquet 1.8.0
- PARQUET-23 - Rename to org.apache.
- PARQUET-3 - tool to merge pull requests based on Spark
- PARQUET-4 - Use LRU caching for footers in ParquetInputFormat.
- PARQUET-8 - [parquet-scrooge] mvn eclipse:eclipse fails on parquet-scrooge
- PARQUET-9 - InternalParquetRecordReader will not read multiple blocks when filtering
- PARQUET-18 - Cannot read dictionary-encoded pages with all null values
- PARQUET-19 - NPE when an empty file is included in a Hive query that uses CombineHiveInputFormat
- PARQUET-21 - Fix reference to 'github-apache' in dev docs
- PARQUET-56 - Added an accessor for the Long column type in example Group
- PARQUET-62 - DictionaryValuesWriter dictionaries are corrupted by user changes.
- PARQUET-63 - Fixed-length columns cannot be dictionary encoded.
- PARQUET-66 - InternalParquetRecordWriter int overflow causes unnecessary memory check warning
- PARQUET-69 - Add committer doc and REVIEWERS files
- PARQUET-70 - PARQUET #36: Pig Schema Storage to UDFContext
- PARQUET-75 - String decode using 'new String' is slow
- PARQUET-80 - upgrade semver plugin version to 0.9.27
- PARQUET-82 - ColumnChunkPageWriteStore assumes pages are smaller than Integer.MAX_VALUE
- PARQUET-88 - Fix pre-version enforcement.
- PARQUET-94 - ParquetScroogeScheme constructor ignores klass argument
- PARQUET-96 - parquet.example.data.Group is missing some methods
- PARQUET-97 - ProtoParquetReader builder factory method not static
- PARQUET-101 - Exception when reading data with parquet.task.side.metadata=false
- PARQUET-104 - Parquet writes empty Rowgroup at the end of the file
- PARQUET-106 - Relax InputSplit Protections
- PARQUET-107 - Add option to disable summary metadata aggregation after MR jobs
- PARQUET-114 - Sample NanoTime class serializes and deserializes Timestamp incorrectly
- PARQUET-122 - make parquet.task.side.metadata=true by default
- PARQUET-124 - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
- PARQUET-132 - AvroParquetInputFormat should use a parameterized type
- PARQUET-135 - Input location is not getting set for the getStatistics in ParquetLoader when using two different loaders within a Pig script.
- PARQUET-136 - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
- PARQUET-142 - parquet-tools doesn't filter _SUCCESS file
- PARQUET-145 - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
- PARQUET-150 - Merge script requires ':' in PR names
- PARQUET-157 - Divide by zero in logging code
- PARQUET-159 - paquet-hadoop tests fail to compile
- PARQUET-162 - ParquetThrift should throw when unrecognized columns are passed to the column projection API
- PARQUET-168 - Wrong command line option description in parquet-tools
- PARQUET-173 - StatisticsFilter doesn't handle And properly
- PARQUET-174 - Fix Java6 compatibility
- PARQUET-176 - Parquet fails to parse schema contains '\r'
- PARQUET-180 - Parquet-thrift compile issue with 0.9.2.
- PARQUET-184 - Add release scripts and documentation
- PARQUET-186 - Poor performance in SnappyCodec because of string concat in tight loop
- PARQUET-187 - parquet-scrooge doesn't compile under 2.11
- PARQUET-188 - Parquet writes columns out of order (compared to the schema)
- PARQUET-189 - Support building parquet with thrift 0.9.0
- PARQUET-196 - parquet-tools command to get rowcount & size
- PARQUET-197 - parquet-cascading and the mapred API does not create metadata file
- PARQUET-202 - Typo in the connection info in the pom prevents publishing an RC
- PARQUET-207 - ParquetInputSplit end calculation bug
- PARQUET-208 - revert PARQUET-197
- PARQUET-214 - Avro: Regression caused by schema handling
- PARQUET-215 - Parquet Thrift should discard records with unrecognized union members
- PARQUET-216 - Decrease the default page size to 64k
- PARQUET-217 - Memory Manager's min allocation heuristic is not valid for schemas with many columns
- PARQUET-232 - minor compilation issue
- PARQUET-234 - Restore ParquetInputSplit methods from 1.5.0
- PARQUET-235 - Fix compatibility of parquet.metadata with 1.5.0
- PARQUET-236 - Check parquet-scrooge compatibility
- PARQUET-237 - Check ParquetWriter constructor compatibility with 1.5.0
- PARQUET-239 - Make AvroParquetReader#builder() static
- PARQUET-242 - AvroReadSupport.setAvroDataSupplier is broken
- PARQUET-2 - Adding Type Persuasion for Primitive Types
- PARQUET-25 - Pushdown predicates only work with hardcoded arguments
- PARQUET-52 - Improve the encoding fall back mechanism for Parquet 2.0
- PARQUET-57 - Make dev commit script easier to use
- PARQUET-61 - Avoid fixing protocol events when there is not required field missing
- PARQUET-74 - Use thread local decoder cache in Binary toStringUsingUTF8()
- PARQUET-79 - Add thrift streaming API to read metadata
- PARQUET-84 - Add an option to read the rowgroup metadata on the task side.
- PARQUET-87 - Better and unified API for projection pushdown on cascading scheme
- PARQUET-89 - All Parquet CI tests should be run against hadoop-2
- PARQUET-92 - Parallel Footer Read Control
- PARQUET-105 - Refactor and Document Parquet Tools
- PARQUET-108 - Parquet Memory Management in Java
- PARQUET-115 - Pass a filter object to user defined predicate in filter2 api
- PARQUET-116 - Pass a filter object to user defined predicate in filter2 api
- PARQUET-117 - implement the new page format for Parquet 2.0
- PARQUET-119 - add data_encodings to ColumnMetaData to enable dictionary based predicate push down
- PARQUET-121 - Allow Parquet to build with Java 8
- PARQUET-128 - Optimize the parquet RecordReader implementation when: A. filterpredicate is pushed down , B. filterpredicate is pushed down on a flat schema
- PARQUET-133 - Upgrade snappy-java to 1.1.1.6
- PARQUET-134 - Enhance ParquetWriter with file creation flag
- PARQUET-140 - Allow clients to control the GenericData object that is used to read Avro records
- PARQUET-141 - improve parquet scrooge integration
- PARQUET-160 - Simplify CapacityByteArrayOutputStream
- PARQUET-165 - A benchmark module for Parquet would be nice
- PARQUET-177 - MemoryManager ensure minimum Column Chunk size
- PARQUET-181 - Scrooge Write Support
- PARQUET-191 - Avro schema conversion incorrectly converts maps with nullable values.
- PARQUET-192 - Avro maps drop null values
- PARQUET-193 - Avro: Implement read compatibility rules for nested types
- PARQUET-203 - Consolidate PathFilter for hidden files
- PARQUET-204 - Directory support for parquet-schema
- PARQUET-210 - JSON output for parquet-cat
- PARQUET-22 - Parquet #13: Backport of HIVE-6938
- PARQUET-49 - Create a new filter API that supports filtering groups of records based on their statistics
- PARQUET-64 - Add new logical types to parquet-column
- PARQUET-123 - Add dictionary support to AvroIndexedRecordReader
- PARQUET-198 - parquet-cascading Add Parquet Avro Scheme
- PARQUET-50 - Remove items from semver blacklist
- PARQUET-139 - Avoid reading file footers in parquet-avro InputFormat
- PARQUET-190 - Fix an inconsistent Javadoc comment of ReadSupport.prepareForRead
- PARQUET-230 - Add build instructions to the README
- ISSUE 399: Fixed resetting stats after writePage bug, unit testing of readFooter
- ISSUE 397: Fixed issue with column pruning when using requested schema
- ISSUE 389: Added padding for requested columns not found in file schema
- ISSUE 392: Value stats fixes
- ISSUE 338: Added statistics to Parquet pages and rowGroups
- ISSUE 351: Fix bug #350, fixed length argument out of order.
- ISSUE 378: configure semver to enforce semantic versioning
- ISSUE 355: Add support for DECIMAL type annotation.
- ISSUE 336: protobuf dependency version changed from 2.4.1 to 2.5.0
- ISSUE 337: issue #324, move ParquetStringInspector to org.apache.hadoop.hive.serde...
- ISSUE 381: fix metadata concurency problem
- ISSUE 359: Expose values in SimpleRecord
- ISSUE 335: issue #290, hive map conversion to parquet schema
- ISSUE 365: generate splits by min max size, and align to HDFS block when possible
- ISSUE 353: Fix bug: optional enum field causing ScroogeSchemaConverter to fail
- ISSUE 362: Fix output bug during parquet-dump command
- ISSUE 366: do not call schema converter to generate projected schema when projection is not set
- ISSUE 367: make ParquetFileWriter throw IOException in invalid state case
- ISSUE 352: Parquet thrift storer
- ISSUE 349: fix header bug
- ISSUE 344: select * from parquet hive table containing map columns runs into exception. Issue #341.
- ISSUE 347: set reading length in ThriftBytesWriteSupport to avoid potential OOM cau...
- ISSUE 346: stop using strings and b64 for compressed input splits
- ISSUE 345: set cascading version to 2.5.3
- ISSUE 342: compress kv pairs in ParquetInputSplits
- ISSUE 333: Compress schemas in split
- ISSUE 329: fix filesystem resolution
- ISSUE 320: Spelling fix
- ISSUE 319: oauth based authentication; fix grep change
- ISSUE 310: Merge parquet tools
- ISSUE 314: Fix avro schema conv for arrays of optional type for #312.
- ISSUE 311: Avro null default values bug
- ISSUE 316: Update poms to use thrift.exectuable property.
- ISSUE 285: [CASCADING] Provide the sink implementation for ParquetTupleScheme
- ISSUE 264: Native Protocol Buffer support
- ISSUE 293: Int96 support
- ISSUE 313: Add hadoop Configuration to Avro and Thrift writers (#295).
- ISSUE 262: Scrooge schema converter and projection pushdown in Scrooge
- ISSUE 297: Ports HIVE-5783 to the parquet-hive module
- ISSUE 303: Avro read schema aliases
- ISSUE 299: Fill in default values for new fields in the Avro read schema
- ISSUE 298: Bugfix reorder thrift fields causing writting nulls
- ISSUE 289: first use current thread's classloader to load a class, if current threa...
- ISSUE 292: Added ParquetWriter() that takes an instance of Hadoop's Configuration.
- ISSUE 282: Avro default read schema
- ISSUE 280: style: junit.framework to org.junit
- ISSUE 270: Make ParquetInputSplit extend FileSplit
- ISSUE 271: fix bug: last enum index throws DecodingSchemaMismatchException
- ISSUE 268: fixes #265: add semver validation checks to non-bundle builds
- ISSUE 269: Bumps parquet-jackson parent version
- ISSUE 260: Shade jackson only once for all parquet modules
- ISSUE 267: handler only handle ignored field, exception during will be thrown as Sk...
- ISSUE 266: upgrade parquet-mr to elephant-bird 4.4
- ISSUE 258: Optimize scan
- ISSUE 259: add delta length byte arrays and delta byte arrays encodings
- ISSUE 249: make summary files read in parallel; improve memory footprint of metadata; avoid unnecessary seek
- ISSUE 257: Create parquet-hadoop-bundle which will eventually replace parquet-hive-bundle
- ISSUE 253: Delta Binary Packing for Int
- ISSUE 254: Add writer version flag to parquet and make initial changes for supported parquet 2.0 encodings
- ISSUE 256: Resolves issue #251 by doing additional checks if Hive returns "Unknown" as a version
- ISSUE 252: refactor error handler for BufferedProtocolReadToWrite to be non-static
- ISSUE 250: pretty_print_json_for_compatibility_checker
- ISSUE 243: add parquet cascading integration documentation
- ISSUE 248: More Hadoop 2 compatibility fixes
- ISSUE 247: fix bug: when field index is greater than zero
- ISSUE 244: Feature/error handler
- ISSUE 187: Plumb OriginalType
- ISSUE 245: integrate parquet format 2.0
- ISSUE 242: upgrade elephant-bird version to 4.3
- ISSUE 240: fix loader cache
- ISSUE 233: use latest stable release of cascading: 2.5.1
- ISSUE 241: Update reference to 0.10 in Hive012Binding javadoc
- ISSUE 239: Fix hive map and array inspectors with null containers
- ISSUE 234: optimize chunk scan; fix compressed size
- ISSUE 237: Handle codec not found
- ISSUE 238: fix pom version caused by bad merge
- ISSUE 235: Not write pig meta data only when pig is not avaliable
- ISSUE 227: Breaks parquet-hive up into several submodules, creating infrastructure ...
- ISSUE 229: add changelog tool
- ISSUE 236: Make cascading a provided dependency
- ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
- ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...
- ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
- ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...
- ISSUE 223: refactor encoded values changes and test that resetDictionary works
- ISSUE 222: fix bug: set raw data size to 0 after reset
- ISSUE 221: make pig, hadoop and log4j jars provided
- ISSUE 220: parquet-hive should ship and uber jar
- ISSUE 213: group parquet-format version in one property
- ISSUE 215: Fix Binary.equals().
- ISSUE 210: ParquetWriter ignores enable dictionary and validating flags.
- ISSUE 202: Fix requested schema when recreating splits in hive
- ISSUE 208: Improve dic fall back
- ISSUE 207: Fix offset
- ISSUE 206: Create a "Powered by" page
- ISSUE 204: ParquetLoader.inputFormatCache as WeakHashMap
- ISSUE 203: add null check for EnumWriteProtocol
- ISSUE 205: use cascading 2.2.0
- ISSUE 199: simplify TupleWriteSupport constructor
- ISSUE 164: Dictionary changes
- ISSUE 196: Fixes to the Hive SerDe
- ISSUE 197: RLE decoder reading past the end of the stream
- ISSUE 188: Added ability to define arbitrary predicate functions
- ISSUE 194: refactor serde to remove some unecessary boxing and include dictionary awareness
- ISSUE 190: NPE in DictionaryValuesWriter.
- ISSUE 191: Add compatibility checker for ThriftStruct to check for backward compatibility of two thrift structs
- ISSUE 186: add parquet-pig-bundle
- ISSUE 184: Update ParquetReader to take Configuration as a constructor argument.
- ISSUE 183: Disable the time read counter check in DeprecatedInputFormatTest.
- ISSUE 182: Fix a maven warning about a missing version number.
- ISSUE 181: FIXED_LEN_BYTE_ARRAY support
- ISSUE 180: Support writing Avro records with maps with Utf8 keys
- ISSUE 179: Added Or/Not logical filters for column predicates
- ISSUE 172: Add sink support for parquet.cascading.ParquetTBaseScheme
- ISSUE 169: Support avro records with empty maps and arrays
- ISSUE 162: Avro schema with empty arrays and maps
- ISSUE 175: fix problem with projection pushdown in parquetloader
- ISSUE 174: improve readability by renaming variables
- ISSUE 173: make numbers in log messages easy to read in InternalParquetRecordWriter
- ISSUE 171: add unit test for parquet-scrooge
- ISSUE 165: distinguish recoverable exception in BufferedProtocolReadToWrite
- ISSUE 166: support projection when required fields in thrift class are not projected
- ISSUE 167: fix oom error dues to bad estimation
- ISSUE 154: improve thrift error message
- ISSUE 161: support schema evolution
- ISSUE 160: Resource leak in parquet.hadoop.ParquetFileReader.readFooter(Configurati...
- ISSUE 163: remove debugging code from hot path
- ISSUE 155: Manual pushdown for thrift read support
- ISSUE 159: Counter for mapred
- ISSUE 156: Fix site
- ISSUE 153: Fix projection required field
- ISSUE 150: add thrift validation on read
- ISSUE 149: changing default block size to 128mb
- ISSUE 146: Fix and add unit tests for Hive nested types
- ISSUE 145: add getStatistics method to parquetloader
- ISSUE 144: Map key fields should allow other types than strings
- ISSUE 143: Fix empty encoding col metadata
- ISSUE 142: Fix total size row group
- ISSUE 141: add parquet counters for benchmark
- ISSUE 140: Implemented partial schema for GroupReadSupport
- ISSUE 138: fix bug of wrong column metadata size
- ISSUE 137: ParquetMetadataConverter bug
- ISSUE 133: Update plugin versions for maven aether migration - fixes #125
- ISSUE 130: Schema validation should not validate the root element's name
- ISSUE 127: Adding dictionary encoding for non string types.. #99
- ISSUE 125: Unable to build
- ISSUE 124: Fix Short and Byte types in Hive SerDe.
- ISSUE 123: Fix Snappy compressor in parquet-hadoop.
- ISSUE 120: Fix RLE bug with partial literal groups at end of stream.
- ISSUE 118: Refactor column reader
- ISSUE 115: Map key fields should allow other types than strings
- ISSUE 103: Map key fields should allow other types than strings
- ISSUE 99: Dictionary encoding for non string types (float double int long boolean)
- ISSUE 47: Add tests for parquet-scrooge and parquet-cascading
- ISSUE 126: Unit tests for parquet cascading
- ISSUE 121: fix wrong RecordConverter for ParquetTBaseScheme
- ISSUE 119: fix compatibility with thrift remove unused dependency