diff --git a/CHANGELOG b/CHANGELOG new file mode 100644 index 0000000000..37846bb528 --- /dev/null +++ b/CHANGELOG @@ -0,0 +1,65 @@ + +GOBBLIN 0.6.0 +-------------- + +NEW FEATURES + +* [Compaction] Added M/R compaction/de-duping for hourly data +* [Compaction] Added late data handling for hourly and daily M/R compaction: https://github.com/linkedin/gobblin/wiki/Compaction#handling-late-records; added support for triggering M/R compaction if late data exceeds a threshold +* [I/O] Added support for using Hive SerDe's through HiveWritableHdfsDataWriter +* [I/O] Added the concept of data partitioning to writers: https://github.com/linkedin/gobblin/wiki/Partitioned-Writers +* [Runtime] Added CliLocalJobLauncher for launching single jobs from the command line. +* [Converters] Added AvroSchemaFieldRemover that can remove specific fields from a (possibly recursive) Avro schema. +* [DQ] Added new row-level policies RecordTimestampLowerBoundPolicy and AvroRecordTimestampLowerBoundPolicy for checking if a record timestamp is too far in the past. +* [Kafka] Added schema registry API to KafkaAvroExtractor which enables supports for various Kafka schema registry implementations (e.g. Confluent's schema registry). +* [Build/Release] Added build instrumentation to publish artifacts to Maven Central + +BUG FIXES + +* [Retention management] Trash handles deletes of files already existing in trash correctly. +* [Kafka] Fixed an issue that may cause Kafka adapter to miss data if the fork fails. + +OTHER IMPROVEMENTS + +* [Runtime] Added metrics for job executions +* [Metrics] Added a root metric context to keep track of GC of metrics and metric contexts and make sure those are properly reported +* [Compaction] Improve topic isolation in MRCompactor +* [Build/release] Java version compatibility raised to Java 7. +* [Runtime] Deprecated COMMIT_ON_PARTIAL_SUCCESS and added a new policy for successful extracts +* [Retention management] Async trash implementation for parallel deletions. +* [Metrics] Added tracking events emission when data gets published +* [Retention management] Added support for parallel execution to the dataset cleaner +* [Runtime] Update job execution info in the execution history store upon every task completion + +INCUBATION + +Note: these are new features which are under active development and may be subject to significant changes. + +* [gobblin-ce] Adding support for Gobblin Continuous Execution on Yarn +* [distcp-ng] Started work on bulk transfer (file copies) using Gobblin +* [distcp-ng] Added a light-weight Hadoop FileSystem implementation for file transfer from SFTP +* [gobblin-config] Added API for dataset driven + +EXTERNAL CONTRIBUTIONS + +We would like to thank all our external contributors for helping improve Gobblin. + +* kadaan, joel.baranick: + - Separate publisher filesystem from writer filesystem + - Support for generating Idea projects with the correct language level (Java 7) + - Fixed yarn conf path in gobblin-yarn.sh +* mwol(Maurice Wolter) + - Implemented new class AvroCombineFileSplit which stores the avro schema for each split, determined by the corresponding input file. +* cheleb(NOUGUIER Olivier) + - Add support for maven install +* dvenkateshappa + - bugifx to RestApiExtractor.java + - Added an excluding column list , which can be used for salesforce configuration with huge list of columns. +* klyr (Julien Barbot) + - bugfix to gobblin-mapreduce.sh +* gheo21 + - Bumped kafka dependency to 2.11 +* ahollenbach (Andrew Hollenbach) + - configuration improvements for standalone mode +* lbendig (Lorand Bendig) + - fixed a bug in DatasetState creation