-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2167] Allow filtering of Hive datasets by underlying HDFS folder location #4069
Changes from 6 commits
9897f4f
f556f27
e1ed1a6
d4e60be
e736bab
da1902d
1906286
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -215,6 +215,32 @@ public void testDatasetConfig() throws Exception { | |
|
||
} | ||
|
||
@Test | ||
public void testHiveTableFolderFilter() throws Exception { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can be renamed to |
||
List<HiveDatasetFinder.DbAndTable> dbAndTables = Lists.newArrayList(); | ||
dbAndTables.add(new HiveDatasetFinder.DbAndTable("db1", "table1")); | ||
HiveMetastoreClientPool pool = getTestPool(dbAndTables); | ||
|
||
Properties properties = new Properties(); | ||
properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST, ""); | ||
// Try a regex with multiple groups | ||
properties.put(HiveDatasetFinder.TABLE_FOLDER_ALLOWLIST_FILTER, "(/tmp/|a).*"); | ||
|
||
HiveDatasetFinder finder = new TestHiveDatasetFinder(FileSystem.getLocal(new Configuration()), properties, pool); | ||
List<HiveDataset> datasets = Lists.newArrayList(finder.getDatasetsIterator()); | ||
|
||
Assert.assertEquals(datasets.size(), 1); | ||
|
||
properties.put(HiveDatasetFinder.HIVE_DATASET_PREFIX + "." + WhitelistBlacklist.WHITELIST, ""); | ||
// The table located at /tmp/test should be filtered | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be helpful to call out that the dataset table is created at path |
||
properties.put(HiveDatasetFinder.TABLE_FOLDER_ALLOWLIST_FILTER, "/a/b"); | ||
|
||
finder = new TestHiveDatasetFinder(FileSystem.getLocal(new Configuration()), properties, pool); | ||
datasets = Lists.newArrayList(finder.getDatasetsIterator()); | ||
|
||
Assert.assertEquals(datasets.size(), 0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can also add a test for the case where the regex is empty or null |
||
} | ||
|
||
private HiveMetastoreClientPool getTestPool(List<HiveDatasetFinder.DbAndTable> dbAndTables) throws Exception { | ||
|
||
SetMultimap<String, String> entities = HashMultimap.create(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pattern.compile(regex.get()) is called every time the method shouldAllowTableLocation is executed when the regex is present. Compiling a regex every time can be inefficient, we can compile the pattern once and store it, eg.:
private static Pattern compiledAllowlistPattern = regex.map(Pattern::compile).orElse(null);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compiled it as part of the class field instead, good callout