-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/one splits file to rule them all #1409
Conversation
…om/NationalSecurityAgency/datawave into feature/OneSplitsFileToRuleThemAll
warehouse/ingest-core/src/main/java/datawave/ingest/config/BaseHdfsFileCacheUtil.java
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java
Outdated
Show resolved
Hide resolved
Conflicts: warehouse/ingest-core/src/main/java/datawave/ingest/config/BaseHdfsFileCacheUtil.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/ShardedTableMapFile.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/MultiTableRangePartitioner.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/job/ShardedTableMapFileTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/MultiTableRRRangePartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/MultiTableRangePartitionerTest.java
warehouse/ingest-core/src/main/java/datawave/ingest/config/BaseHdfsFileCacheUtil.java
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/SplitsFile.java
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java
Outdated
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java
Outdated
Show resolved
Hide resolved
...ngest-core/src/main/java/datawave/ingest/mapreduce/partition/MultiTableRangePartitioner.java
Outdated
Show resolved
Hide resolved
Conflicts: warehouse/ingest-core/src/main/java/datawave/ingest/config/BaseHdfsFileCacheUtil.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/IngestJob.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/MultiRFileOutputFormatter.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/NonShardedSplitsFile.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/ShardedTableMapFile.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/BalancedShardPartitioner.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/MultiTableRRRangePartitioner.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/MultiTableRangePartitioner.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/SplitBasedHashPartitioner.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/TabletLocationHashPartitioner.java warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/partition/TabletLocationNamePartitioner.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/job/MultiRFileOutputFormatterTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/job/ShardedTableMapFileTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/job/TableSplitsCacheTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/BalancedShardPartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/MultiTableRRRangePartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/MultiTableRangePartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/SplitBasedHashPartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/TabletLocationHashPartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/TabletLocationNamePartitionerTest.java warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/TestShardGenerator.java
Conflicts: warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/ShardedTableMapFile.java
Add splits file to distributed cache, as NSPF used to.
…formance in testing
* dedup location string objects * Avoid need for LinkedHashMap * Ensure maps and lists are unmodifiable
…all as possible Showed over 50% smaller than the equivalent hash map
…itsFileToRuleThemAll
warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/TestShardGenerator.java
Outdated
Show resolved
Hide resolved
warehouse/ingest-core/src/test/java/datawave/ingest/mapreduce/partition/TestShardGenerator.java
Outdated
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/IngestJob.java
Outdated
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/IngestJob.java
Show resolved
Hide resolved
warehouse/ingest-core/src/main/java/datawave/ingest/mapreduce/job/TableSplitsCache.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're in here you might want to fix the test data in this and the corresponding unit tests. those "splits" are actually memory locations from something (I can't remember what) and they should be some kind of base64. The only reason they don't fail on the encode/decode is because apache commons base64 will happily let you input and output bad data . you can see https://github.com/NationalSecurityAgency/datawave/pull/2480/files for an example of how they should look
Merge NonShardedSplitsFile and ShardedTableMapFile. Use TableSplitsCache instead of reaching out to Accumulo in every reducer.
Splits File generation now takes configured Partitioner needs into account.