Skip to content

Commit

Permalink
Release 0.10.0 (#1820)
Browse files Browse the repository at this point in the history
  • Loading branch information
autumnust authored and ibuenros committed May 5, 2017
1 parent 13a21ad commit 67f0d12
Showing 1 changed file with 95 additions and 0 deletions.
95 changes: 95 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,98 @@
GOBBLIN 0.10.0
-------------
###Created Date:05/01/2017

## HIGHLIGHTS

* Gobblin-as-a-Service: Global orchestrator with REST API for submitting logical flow specifications. Logical flow specifications compile down to physical pipeline specs (Gobblin Jobs) that can run on one or more heterogeneous Gobblin deployments.
* Gobblin Throttling: Library and service to enforce global usage policies by various processes or applications. For example, Gobblin throttling allows limiting the aggregate QPS to a single Database of all MR applications.
* Gobblin Stream Mode: This release introduces support for running streaming ingestion pipelines that include all the standard Gobblin pipeline capabilities (converters, forks etc). Streaming sources (Kafka) and sinks (Kafka, Couchbase, Eventhub) are included.
* Gobblin compliance: Including functionality for purging datasets, Gobblin Compliance module allows for data purging to meet regulatory compliance requirements. (https://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-Compliance/)
* New Writers: Couchbase (PR 1433), EventHub (PR 1537).
* New Sources: Azure Data Lake (PR 1764)

## NEW FEATURES

* [Source] [PR 1764]Added Azure Data Lake source.
* [Source] [PR 1762]Added Salesforce daily-based dynamic partitioning.
* [Source] [PR 1742]Enabled `QueryBasedExtractor` to retry from first iterator.
* [Core] [PR 1772]Supported shorter dataset state store name to handle overlong dataURN.
* [Core] [PR 1678]Introduced `GlobalMetadata` into data pipelines and updated corresponding Gobblin components.
* [Core] [PR 1709]Introduced custom Task interface and execution to Gobblin.
* [Core] [PR 1727]Introduced `MRTask` inherited from Task interface that runs an MR job.
* [Core] [PR 1457]Added token-based extractor.
* [Core] [PR 1463]MySQL Database as state store.
* [Core] [PR 1524]Zookeeper and HelixPropertyStore as state store.
* [Core] [PR 1662]Added compression and encryption support to `SimpleDataWriter`.
* [Cluster] [PR 1524]Scripts to launch Gobblin in standalone cluster mode.
* [Encryption] [PR 1616]Added encryption support by introducing a StreamCodec objects that encode/decode bytestreams flowing through it.
* [Encryption] [PR 1690]Added `gobblin-crypto` module containing encryption-related interfaces for gobblin.
* [Extractor] [PR 1518]Implemented Streaming extractor for stream source.
* [Distcp] [PR 1735]Enabled updating existing hive table for distcp, instead of deleting originally existed one.
* [Hive-Registration] [PR 1722]Added runtime table properties into Hive Registration.
* [Writer] [PR 1537]Implemented Eventhub synchronized data writer.
* [Writer] [PR 1819]Implemented asynchronized HTTP Writer.


## IMPROVEMENTS

* [Build] [PR 1817]Light distribution package building.
* [Cluster] [PR 1599]Supported multiple Helix controllers for Gobblin standalone cluster manager for high availability.
* [Cluster] [PR 1613]Support Helix 0.6.7.
* [Cluster] [PR 1592]Added `ScheduledJobConfigurationManager` in gobblin-cluster to periodically consume from Kafka for new JobSpecs
* [Compaction] [PR 1760]Implemented general Gobblin-built-in compaction using customized gobblin task.
* [Converters] [PR 1780]Support `.gzip` extension for UnGzipConverter.
* [Converters] [PR 1701]Set streamcodec in encrypting converter explicitly.
* [Converter] [PR 1612]Implemented converter that samples records based on configured sampling ratio.
* [Core] [PR 1739]Reduced memory usage when loading by adding `commonProps` to `FsStateStore`.
* [Core] [PR 1741]Removed fork branch index, task ID and job ID from task metrics.
* [Core] [PR 1649]Enabled events emission when `LimiterExtractorDecorator` failed to retrieve the record.
* [Core] [PR 1702]Implemented writer-side partitioner based on incoming set of records' `WorkUnitState`.
* [Core] [PR 1518] [PR 1596]Enhanced Watermark components for streaming.
* [Core] [PR 1534]Implemented converter to convert `.pull` files into `.conf` file using the corresponding template.
* [Core] [PR 1505]Enabled creation and access `WorkUnits` and `TaskStates` through `StateStore` interface.
* [Copy-Replication] [PR 1728]Added logic of `AbortOnSingleDatasetFailure` in distcp.
* [Metric] [PR 1782]Added Pinot-based completeness check verifier.
* [Publisher] [PR 1702]Enable collecting partition information and publish metadata files in each partition directory by default setting.
* [Source] [PR 1666] [PR 1733]Implemented source-side partitioner for `QueryBasedSource`, allowing user-specified partitions.
* [Runtime] [PR 1552]Optimized tasks execution in single branch by removing unnecessary data structure used in fork.
* [Runtime] [PR 1791]Support state persistence for partial commit.
* [Writer] [PR 1265]Replace DatePartitionedDailyAvroSource with configurable partitioning.


## BUGS FIXES

* [Core] [PR 1724]Fixed hanging embedded Gobblin when initialization fails.
* [Core] [PR 1736]Fixes of contention on shared object `SimpleDateFormat` among all pull jobs start simultaneously in multi-threads context.
* [Core] [PR 1665]Fixed threadpool leak in HttpWriter.
* [Hive-Registration] [PR 1635]Fix NullPointerException when Deserializer is not properly initialized.
* [Metastore] [PR 986]Fixed `gobblin.metastore.DatabaseJobHistoryStore`'s vulnerability regarding to SQL injection.
* [Runtime] [PR 1801]Fixed `JobScheduler` failed when "jobconf.fullyQualifiedPath" is not set.
* [Runtime] [PR 1624]Fix speculative run for `SimpleDataWriter`.
* [Source] [PR 1756]Enabled `UncheckedExecutionException` catching in HiveSource.


## EXTERNAL CONTRIBUTIONS

* enjoyear
- Fixed multi-threading bug in TimestampWatermark.(PR 1736)
- Maintained and fixed google-related source issues. (PR 1771, PR 1765, PR 1742, PR 1628)
* kadaan
- Fixed `JobScheduler` failed when "jobconf.fullyQualifiedPath" is not set. (PR 1801)
- Optimized tasks execution in single branch by removing unnecessary data structure used in fork. (PR 1552)
* erwa
- Revert Hive version to 1.0.1, add AvroSerDe handling in HiveMetaStoreUtils.getDeserializer. (PR 1643)
- Fix NullPointerException when Deserializer is not properly initialized. (PR 1635)
* howu
- Refactor RestApiConnector and RestApiExtractor. (PR 1708)
- Update constructor of FlowConfigClient and FlowStatusClient. (PR 1734)
* jinhyukchang
- Added support for Azure Data Lake(ADL) as a source (PR 1764)
- Added abortOnSingleDatasetFailure to CopyConfiguration. (PR 1728)
* wosiu
- Fix speculative run for SimpleDataWriter. (PR 1624)


GOBBLIN 0.9.0
-------------

Expand Down

0 comments on commit 67f0d12

Please sign in to comment.