Skip to content

Latest commit

 

History

History
1799 lines (1344 loc) · 117 KB

CHANGELOG.md

File metadata and controls

1799 lines (1344 loc) · 117 KB

Change Log

All notable changes to Fili will be documented here. Changes are accumulated as new paragraphs at the top of the current major version. Each change has a link to the pull request that makes the change and to the issue that triggered the pull request if there was one.

Added:

Changed:

Deprecated:

Fixed:

Removed:

  • Removed deprecated code references

    • Renamed keys from BardLoggingFilter properties off deprecated refence class (this was an artifact from a bad rename)
  • Removed older deprecated code

    • Removed constructos and getters with clean replacements
    • Stripped the remaining UI/NonUI code
    • Cleaned up old schema classes and methods
    • Removed orphaned metadata response data factory
    • Removed pre-theta sketch code
    • Removed deprecated min/max aggregations
    • Removed loader code for metrics that don't include dimension dictionary
    • Removed KeyValueStoreDimension

Known Issues:

Current

v0.9.137 - 2018/04/13

0.9 Highlights

Fili Security Added!

Release security module for fili data security filters. Created ChainingRequestMapper, and a set of mappers for gatekeeping on security roles and whitelisting dimension filters.

Added by @michael-mclawhorn in yahoo#405

DataApiRequestFactory layer

Downstream projects now have more flexibility to construct DataApiRequest by using injectableFactory. An additional constructor for DataApiRequestImpl unpacks the config resources bundle to make it easier to override dictionaries.

Added by @michael-mclawhorn in yahoo#603

Make Field Accessor PostAggregation able to reference post aggregations in adddition to aggregations

Druid allows (but does not protect against ordering) post aggregation trees referencing columns that are also post aggregation trees. This makes it possible to send such a query by using a field accessor to reference another query expression. Using this capability may have some risk.

Added by @michael-mclawhorn in yahoo#543

Etag Cache

In the more recent versions of druid that are released after February 23rd, 2017. Druid added support for HTTP Etag. By including a If-None-Match header along with a druid query, druid will compute a hash as the etag in a way such that each unique response has a corresponding unique etag, the etag will be included in the header along with the response. In addition, if a query to druid includes the If-None-Match with a etag of the query, druid will check if the etag matches the response of the query, if yes, druid will return a HTTP Status 304 Content Not Modified response to indicate that the response is unchanged and matches the etag received from druid query request header. Otherwise druid will execute the query and respond normally with a new etag attached to the response header.

This new feature is designed by @garyluoex . For more info, visit @garyluoex 's design at yahoo#255

More robust Lucene Search Provider and Key Value Store

Lucene Search Provider can re-open in a bug-free way and close more cleanly

Added by @garyluoex in yahoo#551 and yahoo#521

Extraction Function on selector filter

Update Fili to accommodate the deprecated ExtractionFilter in druid, use selector filter with extraction function instead. Added extraction function on dimensional filter, defaults to extraction function on dimension if it exists.

Added by @garyluoex in yahoo#617

More controllable RequestLog

Exposes the LogInfo objects stored in the RequestLog, via RequestLog::retrieveAll making it easier for customers to implement their own scheme for logging the RequestLog

Added by @archolewa in yahoo#574

Druid lookup metadata load status check

Fili now supports checking Druid lookup status as one of it's health check. It will be very easy to identify any failed lookups.

Added by @QubitPi in yahoo#620

Add ability to use custom rate limiting schemes

While backward compatibility is guaranteed, Fili now allows users to rate limit(with a a new rate limiter) based on different criteria other than the default criteria.

Added by @efronbs in yahoo#591

Support Time Format Extraction Function in Fili

Druid TimeFormatExtractionFunction is added to Fili. API users could interact with Druid using TimeFormatExtractionFunction through Fili.

Added by @QubitPi in yahoo#611

Dimension load strategy indicator

In order to allow clients to be notified if a dimension's values are browsable and searchable, a storage strategy metadata is added to dimension. A browsable and searchable dimension is denoted by LOADED, whereas the opposite is denoted by NONE. This will be very useful for UI backed by Fili on sending dimension-related queries.

Added by @michael-mclawhorn, @garyluoex and @QubitPi in yahoo#575, yahoo#589, yahoo#558, yahoo#578

Query Split Logging

Include metrics in logging to allow for better evaluation of the impact of caching for split queries. There used to be only a binary flag (BardQueryInfo.cached) that is inconsistently set for split queries. Now 3 new metrics are added

  1. Number of split queries satisfied by cache
  2. Number of split queries actually sent to the fact store. (not satisfied by cache)
  3. Number of weight-checked queries

Added by @QubitPi in yahoo#537

Configurable Metric Long Name

Logical metric has more config-richness to not just configure metric name, but also metric long name, description, etc. MetricInstance is now created by accepting a LogicalMetricInfo which contains all these fields in addition to metric name.

Added by @QubitPi in yahoo#492

Search provider can hot-swap index and key value store can hot-swap store location

LuceneSearchProvider is able to hot swap index by replacing Lucene index by moving the old index directory to a different location, moving new indexes to a new directory with the same old name, and deleting the old index directory in file system. KeyValueStore is also made to support hot-swapping key value store location

Added by @QubitPi in yahoo#522

Uptime Status Metric

A metric showing how long Fili has been running is available.

Added by @mpardesh in yahoo#518

Consolidate UI & Non-UI broker configurations

ui_druid_broke and non_ui_druid_broker are not used separately anymore. Instead, a single druid_broker replaces the two. For backwards compatibility, Fili checks if druid_broker is set. If not, Fili uses non_ui_druid_broker and then ui_druid_broker

Added by @mpardesh in yahoo#489

Credits

Thanks to everyone who contributed to this release!

@michael-mclawhorn Michael Mclawhorn @garyluoex Gary Luo @archolewa Andrew Cholewa @QubitPi Jiaqi Liu @asifmansoora Asif Mansoor Amanullah @efronbs Ben Efron @deepakb91 Deepak Babu @tarrantzhang Tarrant Zhang @kevinhinterlong Kevin Hinterlong @mpardesh Monica Pardeshi @colemanProjects Neelan Coleman @onlinecco @dejan2609 Dejan Stojadinović

Added:

Changed:

Deprecated:

Fixed:

Known Issues:

Removed:

v0.8.69 - 2017/06/06

The main changes in this version are changes to the Table and Schema structure, including a major refactoring of Physical Table. The concept of Availability was split off from Physical Table, allowing Fili to better reason about availability of columns in Data Sources in ways that it couldn't easily do before, like in the case of Unions. As part of this refactor, Fili also gains 1st-class support for queries using the Union data source.

Full description of changes to Tables, Schemas, Physical Tables, Availability, PartialDataHandler, etc. tbd

This was a long and winding journey this cycle, so the changelog is not nearly as tight as we'd like (hopefully we'll come back and consolidate it for this release), but all of the changes are in there. Along the way, we also addressed a number of other small concerns. Here are some of the highlights beyond the main changes around Physical Tables:

Fixes:

  • Unicode characters are now properly sent back to Druid
  • Druid client now follows redirects

New Capabilities & Enhancements:

  • Can sort on dateTime
  • Can use Druid query response for final verification of response partiality
  • Class Scanner Spec can discover dependencies, making its dynamic equality testing easier to use
  • There's an example application that shows how to slurp configuration from an existing Druid instance
  • Druid queries return a Future instead of void, allowing for blocking requests if needed (though use sparingly!)
  • Support for extensions defining new Druid query types

Performance upgrades:

  • Lazy DruidFilters
  • Assorted log level reductions
  • Lucene "total results" 50% speedup

Deprecations:

  • DataSource::getDataSources no longer makes sense, since UnionDataSource only supports 1 table now
  • BaseTableLoader::loadPhysicalTable. Use loadPhysicalTablesWithDependency instead
  • LogicalMetricColumn isn't really a needed concept

Removals:

  • PartialDataHandler::findMissingRequestTimeGrainIntervals
  • permissive_column_availability_enabled feature flag, since the new Availability infrastructure now handles this
  • Lots of things on PhysicalTable, since that system was majorly overhauled
  • SegmentMetadataLoader, which had been deprecated for a while and relies on no longer supported Druid features

Added:

Changed:

Deprecated:

Fixed:

Removed:

  • Refactor Physical Table Definition and Update Table Loader

    • Removed deprecated PhysicalTableDefinition constructor that takes a ZonlessTimeGrain. Use ZonedTimeGrain instead
    • Removed BaseTableLoader::buildPhysicalTable. Table building logic has been moved to PhysicalTableDefinition
  • Move UnionDataSource to support only single tables

    • DataSource no longer accepts Set<Table> in a constructor
  • CompositePhsyicalTable Core Components Refactor

    • Removed deprecated method PartialDataHandler::findMissingRequestTimeGrainIntervals
    • Removed permissive_column_availability_enabled feature flag support and corresponding functionality in PartialDataHandler. Permissive availability is instead handled via table configuration, and continued usage of the configuration field generates a warning when Fili starts.
    • Removed getIntersectSubintervalsForColumns and getUnionSubintervalsForColumns from PartialDataHandler. Availability now handles these responsibilities.
    • Removed getIntervalsByColumnName, resetColumns and hasLogicalMapping methods in PhysicalTable. These methods were either part of the availability infrastructure, which changed completely, or the responsibilities have moved to PhysicalTableSchema (in the case of hasLogicalMapping).
    • Removed PartialDataHandler::getAvailability. Availability (on the PhysicalTables) has taken it's place.
    • Removed SegmentMetadataLoader because the endpoint this relied on had been deprecated in Druid. Use the DataSourceMetadataLoader instead.
      • Removed SegmentMetadataLoaderHealthCheck as well.
  • Major refactor for availability and schemas and tables

    • Removed ZonedSchema (all methods moved to child class ResultSetSchema)
    • PhysicalTable no longer supports mutable availability
      • Removed addColumn, removeColumn, getWorkingIntervals, and commit
      • Other mutators no longer exist, availability is immutable
      • Removed getAvailableIntervals. Availability::getAvailableIntervals replaces it.
    • Removed DruidResponseParser::buildSchema. That logic has moved to the ResultSetSchema constructor.
    • Removed redundant buildLogicalTable methods from BaseTableLoader

v0.7.37 - 2017/04/04

This patch is to back-port a fix for getting Druid to handle international / UTF character sets correctly. It is included in the v0.8.x stable releases.

Fixed:

v0.7.36 - 2017/01/30

This release is a mix of fixes, upgrades, and interface clean-up. The general themes for the changes are around metric configuration, logging and timing, and adding support for tagging dimension fields. Here are some of the highlights, but take a look in the lower sections for more details.

Fixes:

  • Deadlock in LuceneSearchProvider
  • CORS support when using the RoleBasedAuthFilter

New Capabilities & Enhancements:

  • Dimension field tagging
  • Controls around max size of Druid response to cache
  • Logging and timing enhancements

Deprecations / Removals:

  • RequestLog::switchTiming is deprecated due to it's difficulty to use correctly
  • Metric configuration has a number of deprecations as part of the effort to make configuration easier and less complex

Changes:

  • There was a major overhaul of Fili's dependencies to upgrade their versions

Added:

Changed:

Deprecated:

Fixed:

Removed:

v0.6.29 - 2016/11/16

This release is focused on general stability, with a number of bugs fixed, and also adds a few small new capabilities and enhancements. Here are some of the highlights, but take a look in the lower sections for more details.

Fixes:

  • Dimension keys are now properly case-sensitive (
    • Because this is a breaking change, the fix has been wrapped in a feature flag. For now, this defaults to the existing broken behavior, but this will change in a future version, and eventually the fix will be permanent.
  • all-grain queries are no longer split
  • Closed a race condition in the LuceneSearchProvider where readers would get an error if an update was in progress
  • Correctly interpreting List-type configs from the Environment tier as a true List
  • Stopped recording synchronous requests in the ApiJobStore, which is only intended to hold async requests

New Capabilities & Enhancements:

  • Customizable logging format
  • X-Request-Id header support, letting clients set a request ID that will be included in the Druid query
  • Support for Druid's In filter
  • Native support for building DimensionRows from AVRO files
  • Ability to set headers on Druid requests, letting Fili talk to a secure Druid
  • Better error messaging when things go wrong
  • Better ability to use custom Druid query types

Added:

Changed:

Deprecated:

Fixed:

v0.1.x - 2016/09/23

This release focuses on stabilization, especially of the Query Time Lookup (QTL) capabilities, and the Async API and Jobs resource. Here are the highlights of what's in this release:

  • A bugfix for the DruidDimensionLoader
  • A new default DimensionLoader
  • A bunch more tests and test upgrades
  • Filtering and pagination on the Jobs resource
  • A userId field for default Job resource representations
  • Package cleanup for the jobs-related classes

Added:

Deprecated:

Changed:

  • Removed physicalName lookup for metrics in TableUtils::getColumnNames to remove spurious warnings
    • Metrics are not mapped like dimensions are. Dimensions are aliased per physical table and metrics are aliazed per logical table.
    • Logical metric is mapped with one or many physical metrics. Same look up logic for dimension and metrics doesn't make sense.

Jobs:

  • HashPreResponseStore moved to test root directory.

    • The HashPreResponseStore is really intended only for testing, and does not have capabilities (i.e. TTL) that are needed for production.
  • The TestBinderFactory now uses the TestAsynchronousWorkflowsBuilder

    • This allows the asynchronous functional tests to add countdown latches to the workflows where necessary, allowing for thread-safe tests.
  • Removed JobsApiRequest::handleBroadcastChannelNotification

    • That logic does not really belong in the JobsApiRequest (which is responsible for modeling a response, not processing it), and has been consolidated into the JobsServlet.
  • ISSUE-17 Added pagination parameters to PreResponse

    • Updated JobsServlet::handlePreResponseWithError to update ResultSet object with pagination parameters
  • Enrich jobs endpoint with filtering functionality

    • The default job payload generated by DefaultJobPayloadBuilder now has a userId
  • Removed timing component in JobsApiRequestSpec

    • Rather than setting an async timeout, and then sleeping, JobsApiRequestSpec::handleBroadcastChannelNotification returns an empty Observable if a timeout occurs before the notification is received now verifies that the Observable returned terminates without sending any messages.
  • Reorganizes asynchronous package structure

    • The jobs package is renamed to async and split into the following subpackages:
      • broadcastchannels - Everything dealing with broadcast channels
      • jobs - Everything related to jobs, broken into subpackages
        • jobrows - Everything related to the content of the job metadata
        • payloads - Everything related to building the version of the job metadata to send to the user
        • stores - Everything related to the databases for job data
      • preresponses - Everything related to PreResponses, broken into subpackages
        • stores - Everything related to the the databases for PreResponse data
      • workflows - Everything related to the asynchronous workflow

Query Time Lookup (QTL)

  • QueryTimeLookup Functionality Testing

    • AbstractBinderFactory now uses TypeAwareDimensionLoader instead of KeyValueStoreDimensionLoader
  • Fix Dimension Serialization Problem with Nested Queries

    • Modified DimensionToDefaultDimensionSpec serializer to serialize Dimension to apiName if it's not in the inner-most query
    • Added Util::hasInnerQuery helper in serializer package to determine if query is the inner most query or not
    • Added tests for DimensionToDefaultDimensionSpec

General:

Fixed: