From 3ab15950ad904fea84dde32bfd705c2b79a656dd Mon Sep 17 00:00:00 2001 From: Sahil Takiar Date: Thu, 12 May 2016 11:22:22 -0700 Subject: [PATCH] Release 0.7.0 --- CHANGELOG | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) diff --git a/CHANGELOG b/CHANGELOG index 852bb0d849..f12ee88d49 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,138 @@ +GOBBLIN 0.7.0 +------------- + +Created Date: 05/11/2016 + +## NEW FEATURES + +* [Hive Registration] [PR 651] Hive registration initial commit +* [Runtime] [PR 674] Lifecycle Events for JobListeners +* [Hive Registration] [PR 684] Add inline Hive registration to Gobblin job +* [SFTP] [PR 686] Modified the SFTP extractor to also use password for connecting +* [Hive Registration] [PR 701] Reg compacted datasets in Hive +* [Retention] [PR 716] Use configClient to configure retention jobs +* [Hive Distcp] [PR 728] Hive dataset implementation for distcp. +* [Hive Distcp] [PR 749] Hivesource copyentity +* [Hive Distcp] [PR 757] Hive distcp: check target metastore to perform table syncs. +* [Hive Registration] [PR 773] Refactoring Hive registration to allow query-based approach +* [Config Management] [PR 774] Add HDFS config deployment tool +* [Avro to ORC] [PR 780] Flatten Avro Schema to make it optimal for ORC +* [Hive Distcp] [PR 801] Implemented Hive registration steps in Hive distcp. +* [Hive Registration] [PR 803] Add snapshot Hive registration policy +* [YARN] [PR 828] Add zookeeper based job lock for gobblin yarn +* [Kafka] [PR 835] Add kafka simple json source +* [Metrics] [PR 863] Metric reporters (Graphite, InfluxDB) +* [JDBC Writer] [PR 893] JDBC Writer +* [Config Management] [PR 928] Substitution of system and env variable in config management +* [Core] [PR 942] Allow disabling state store. +* [Avro to ORC] [PR 972] Avro2orc Source/Converter/Extractor/Publisher + +## BUG FIXES + +* [Distcp] [PR 645] Fix parent directory creation in distcp-ng +* [Admin Dashboard] [PR 646] Downgraded jetty version to be java 7 compatible +* [Admin Dashboard] [PR 648] Excluded old version of servlet-api artifact from Hadoop 2 dependencies +* [State Store] [PR 655] Fix hanging StateStoreCleaner +* [Publisher] [PR 657] Issue #561 - fix for BaseDataPublisher to mark WorkingState correctly +* [Core] [PR 661] Change ParallelRunner.close to wait for all futures to finish +* [Core] [PR 663] ParallelRunner catches exceptions correctly and has failure policies. +* [Build] [PR 664] Fix broken Gobblin version resolution ( fixes #662 ) +* [Build] [PR 665] Gobblin-compaction tarball doesn't contain gobblin-compaction.jar +* [Core] [PR 670] Fixing FindBugs warnings +* [Core] [PR 676] Ensure that parallel runner waits for the underlying tasks to finish +* [Core] [PR 677] Fix race condition in FsStateStore +* [Compaction] [PR 680] Fix a ConcurrentModificationException in MRCompactor +* [Admin Dashboard] [PR 681] Fixed off by one issue when listing the job executions in Admin UI +* [Config Management] [PR 682] various bug fixes when integrate test with hdfs store +* [Core] [PR 690] Add missing jar to MR runner script +* [Distcp] [PR 691] Fix permissions for directories in distcp. +* [Core] [PR 700] Add missing jars to gobblin mapreduce runner, sort. +* [Core] [PR 706] Fixing CliOptions config file fs +* [Core] [PR 722] Fixing FindBugs warnings in gobblin-compaction +* [Build] [PR 743] Fixing skipTestGroup option +* [Build] [PR 775] Fix javadoc warnings by only adding linksOffline to projects that the current project depends on. +* [Core] [PR 797] Fixing Fork + Task Retry Logic #776 +* [Distcp] [PR 884] Fix issue with replicating owner and permission of system directories in distcp +* [Data Management] [PR 887] Fix NPE in DateTimeDatasetVersionFinder +* [Data Management] [PR 888] Fix NPE in datasetversion finder +* [Core] [PR 903] The underlying Avro CodecFactory only matches lowercase codecs, so we should make sure they are lowercase before trying to find one +* [Compaction] [PR 952] Unified way to execute Hive and MR-based compaction jobs +* [Core] [PR 958] Fix parallelization of renameRecursively in PathUtils. +* [YARN] [PR 962] Cleanup the helix job when closing the GobblinHelixJobLauncher + +## IMPROVEMENTS + +* [Distcp] [PR 647] Add option to set group for distcp-ng +* [Build] [PR 650] Javadoc task should pick up system proxy settings +* [Distcp] [PR 669] Parallelized copy listing generation in distcp. +* [Data Management] [PR 671] Added ConfigurableCleanableDatasetFinder. Renamed some CleanableDatasets for clarification +* [Admin Dashboard] [PR 687] Enable AdminUI when running gobblin under yarn +* [Job Exec History] [PR 688] Added a log line when starting to write job execution history +* [Build] [PR 694] Adding throttled upload of sonatype packages +* [Metrics] [PR 698] Log which custom metric reporter class is wired up +* [Documentation] [PR 704] Remove @link tags from @see javadoc tags +* [Job History Store] [PR 705] Improve database history store performance +* [YARN] [PR 708] Fixed the file mode of the gobblin-yarn.sh script to match the other scripts. +* [Core] [PR 713] Don't send an email on shutdown when email notifications are disabled. +* [Admin Dashboard] [PR 717] More flexible Admin configuration +* [Core] [PR 727] Modified to add a configuration to skip previous run during FileBasedExtraction for full load +* [Core] [PR 733] Add ability to configure the encryption_key_loc filesystem +* [Build] [PR 737] Better travis scripts which support test error reporting +* [Core] [PR 741] Fix #740 for FsStateStore.createAlias and removing usage of FileUtil.copy +* [Core] [PR 759] Allow downloading other filetypes in FileBasedExtractor +* [Data Management] [PR 760] Per dataset retention blacklist +* [Retention] [PR 764] Ensure that jobs cleanup correctly +* [Core] [PR 766] Create GZIPFileDownloader.java +* [YARN] [PR 768] Switch LogCopier from ScheduledExecutorService to HashedWheelTimer +* [Core] [PR 772] Upgrading and re-enabling Findbugs +* [Kafka] [PR 777] Adding Parallelization to WorkUnit Creation in KafkaSource +* [Documentation] [PR 788] Initial commit for mkdocs and readthedocs integration +* [Kafka] [PR 789] Parallize late data copy +* [Config Management] [PR 794] Read current version of config store from metadata file +* [Build] [PR 799] (#798) Adding JaCoCo and Coveralls support for code coverage analysis +* [Core] [PR 808] (#792, #802) Adding ApplicationLauncher to manage app services, including GobblinMetrics lifecyle +* [Data Management] [PR 812] Make generic version, version finder, version selection policy +* [Hive Registration] [PR 815] Improve Hive registration performance +* [Core] [PR 829] Adds support to `HadoopUtils` for overwriting files +* [Build] [PR 832] (#830) excluding hive-exec from gobblin-compaction +* [YARN] [PR 834] Enable the maximum log file size for Gobblin Yarn LogCopier to be configured +* [Compaction] [PR 847] Change default value of compaction.job.avro.single.input.schema to true +* [Distcp] [PR 849] Distcp partition filter and kerberos authentication +* [Kafka] [PR 856] Clean up KafkaSource +* [Core] [PR 872] Change BoundedBlockingRecordQueue to be backed by ArrayBlockingQueue +* [Distcp] [PR 873] Implement simulate mode in distcp. +* [Distcp] [PR 877] Stream datasets to distcp. +* [Hive Distcp] [PR 878] Distcp on Hive supports predicates for fast partition skips, and supports copying full directories recursively +* [Hive Registration] [PR 885] Add locking to Hive registration +* [Distcp] [PR 886] Purge distcp persist directory at the beginning of publish phase. +* [Distcp] [PR 889] Avro schema modification in distcp is executed only for URLs in the origin schema and authority +* [Hive Distcp] [PR 890] Dynamic partition filtering for distcp Hive. +* [Hive Registration] [PR 894] Enable multiple db and table names in Hive registration +* [Core] [PR 897] Make it possible to disable publishing in job by specifying empty job data publisher +* [Core] [PR 902] Make it possible to specify empty job data publisher +* [Distcp] [PR 906] Maximum size for distcp CopyContext cache. +* [Retention] [PR 908] Add typesafe support to glob version finder for audit retention +* [Core] [PR 913] Job state stored in distributed cache in MR mode. +* [Data Management] [PR 926] Make NewestKSelectionPolicy use Java Generics instead of FileSystemDatasetVersion +* [Core] [PR 932] Separate jobstate from taskstate and datasetstate +* [Documentation] [PR 937] Add documentation for topic specific partitioning configuration +* [Hive Distcp] [PR 940] Distcp hive registration metadata +* [Hive Distcp] [PR 941] Delete empty parent directories on Hive de-registration. Optimize deregistration +* [Distcp] [PR 944] Bin pack distcp-ng work units. +* [Data Management] [PR 947] Make VersionSelectionPolicy to work with any DatasetVersion +* [Distcp] [PR 949] Parallelize renameRecursively for distcp. +* [Hive Distcp] [PR 950] Add delete methods when deregistering Hive partitions in distcp. +* [Data Management] [PR 951] Moving NonNewestKSelectionPolicy logic to NewestKSelectionPolicy +* [Hive Distcp] [PR 953] Added instrumentation to Hive copy. +* [Config Management] [PR 956] Make the default store for SimpleHDFSConfigStoreFactory configurable +* [Hive Distcp] [PR 959] Remove checksum from HiveDistcp copy listing. +* [Hive Distcp] [PR 960] Accelerate path diff in HiveCopyEntityHelper by reusing FileStatus. +* [Distcp] [PR 966] Set max work units per multiworkunit for distcp. +* [Core] [PR 970] Fixing rest of findbugs warnings, and setting findbugs to fail the build on new warnings +* [Distcp] [PR 971] Distcp ng handle directory structure copy +* [Core] [PR 974] Deprecating and removing support for Hadoop versions other than 2.x.x +* [Hive Distcp] [PR 975] Added whitelist and blacklist capabilities to HiveDatasetFinder. + GOBBLIN 0.6.2 =============