-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema to table location #134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -235,7 +235,8 @@ public void runRetention( | |||
} | |||
|
|||
private Path getTrashPath(String path, String filePath, String trashDir) { | |||
return new Path(filePath.replace(path, new Path(path, trashDir).toString())); | |||
return new Path( | |||
filePath.replace(new Path(path).toString(), new Path(path, trashDir).toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the schema, the string path
was file:///tmp/...
. However, we were trying to replace it with something like file:/tmp/...
, because the Path
type only allows for single forward slashes. So, I converted it to a Path
to standardize the format to file:/tmp/...
so they would match.
log.info("Detected {} orphan files", orphanFiles.size()); | ||
for (String of : orphanFiles) { | ||
List<String> orphanFilesWithoutMetadata = | ||
orphanFiles.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why metadata.json files filtered out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the tests were meant for locations without schema: See this slack message from Stas
@@ -60,10 +60,6 @@ public String allocateTableLocation( | |||
String databaseId, String tableId, String tableUUID, String tableCreator) { | |||
Preconditions.checkArgument(databaseId != null, "Database ID cannot be null"); | |||
Preconditions.checkArgument(tableId != null, "Table ID cannot be null"); | |||
Preconditions.checkArgument(tableUUID != null, "Table UUID cannot be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change seem regressive, its valid to ensure tableUUID is not null/ storageproperties is correctly configured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was a bit confused here. I removed it because it was failing this test because the STAGED_TABLE_DTO
did not have a UUID
; there is probably a better solution here but I wasn't sure what was best
* @return the table location | ||
*/ | ||
@Override | ||
public String allocateTableLocation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we ought to be careful with this. The reason we didn't make this change earlier was to ensure backward compatibilty. I would hold off checking this in until we keep the behavior compatible in li-openhouse. Could you make a change there to not call super.allocateTableStorage
?
* href="https://github.com/linkedin/openhouse/issues/121"> | ||
*/ | ||
@Override | ||
public String allocateTableLocation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to be careful with this change as well. tables-test-fixtures uses non-schemed paths. Please validate tables-test-fixtures continue to work after this change cc: @jiang95-dev
storageManager.getDefaultStorage().getClient().getRootPrefix(), | ||
databaseID, | ||
tableId + "-" + tableUUID); | ||
String endpoint = storageManager.getDefaultStorage().getClient().getEndpoint(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to make this change yet. constructTablePath is only getting used in CTAS (and only for hdfs). non-hdfs storages need much more things to correctly support ctas: BDP-23723
@@ -144,8 +145,10 @@ private void validatePathOfProvidedRequest( | |||
extractFromTblPropsIfExists(databaseId + "." + tableId, tableProperties, TBL_RAW_KEY); | |||
|
|||
java.nio.file.Path previousPath = | |||
InternalRepositoryUtils.constructTablePath( | |||
storageManager, dbIdFromProps, tblIdFromProps, tableUUIDProperty); | |||
Paths.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets avoid making change to this class. CTAS support is not present for non-hdfs storages, and this needs to be solved thoughtfully: BDP-23723
see doesPathExist(..) UnsupportedOperartionException
@HotSushi Thanks for the comments. I originally started the PR because the ADLS storage was failing without schema in the table location. I added changes for |
yes |
Thanks @kmcclenn. You can close this pr if
|
Summary
Issue Changed table locations to include the scheme.
Changes
Changed both the implementation of creating the table location (and dealing with its ramifications) and the tests that test the table location. Allows all table locations to include the schema (eg.
hdfs://tmp/...
instead oftmp/...
). Furthermore, added this fix for the oldconstructTablePath
method and the newallocateTableStorage
method. Removed outdated overriding methods for Local and HDFS storage that don't include the scheme.Testing Done
Changed tests to reflect the table locations now containing the schema.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.