Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17351: Decompose filestore "get file" API #3047

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

gerlowskija
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-17351

Description

Solr's filestore "get file" API actually covers 4 or 5 distinct operations. Depending on provided query parameters, it can:

  • return filestore entry metadata (when meta=true is specified)
  • instruct the receiving Solr node to pull a file from another node's filestore and cache it locally (when getFrom=someOtherNode is specified)
  • instruct the receiving Solr node to push its cached copy of a file out to all other Solr nodes (when sync=true is specified)
  • return raw file contents
  • return JSON-ified file contents

This makes the code for this endpoint somewhat complex. It also makes it hard to model and parse the response on the client side.

Solution

This PR splits up the "get file" endpoint into a number of different APIs. Specifically:

  • metadata-fetching has been moved out to the endpoint, GET/api/cluster/filestore/metadata/some/path.txt
  • Filestore commands such as pushing/pulling files are now available at: POST /api/cluster/filestore/commands
  • Support for "JSON-ified" file data has been dropped in this PR (but will be retained but deprecated in the eventual 9.x backport)

These divisions allow us to generate SolrRequest/SolrResponse classes representing these APIs, meaning that SolrJ users no longer need to use GenericSolrRequest/GenericSolrResponse.

Tests

Some rewriting in TestDistribFileStore. Existing package and filestore tests continue to pass.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Identical to the pre-existing`/cluster/files/a/b.txt?meta=true`, but in
its own endpoint.

Does NOT remove the existing endpoint or switch tests over to use it,
but we will want to do this eventually on `main`.
Identical to the pre-existing`/node/files/a/b.txt`, but in its own
endpoint.  Does NOT remove the existing endpoint or switch tests over
to use it.  Those can be handled in a separate commit to enable this one
to be easily backported later to 9x if desired.

This also doesn't enable codegen for the new API, though I think the
blockers are sufficiently cleared up now to enable that if desired.
Still no modifications to use the API internally, or removal of the old
endpoint, in order to ease backporting.
This should probably be pulled into a separate PR/branch, and modified
to address all relevant endpoints and other loose ends of SOLR-17562.
TestDistribFileStore fails with the changes in this commit.  The changes
themselves are fine, but it fails because of a pre-existing *ahem*
feature in SolrClient where SolrParams are encoded as form-params
instead of query-params in some circumstances.

I could probably hack around this for the moment (e.g. by using a
builder setter to ensure 'getFrom' is sent as a true query param).

But I've decided to halt progress on this branch and break some other
pieces out into their own PRs, while I send some questions out to the
community on this front.
Ensures v2 POST query-params aren't put in a 'form' body.
@gerlowskija
Copy link
Contributor Author

Still TODO: ref-guide updates, additional testing.

@gerlowskija gerlowskija changed the title Solr 17351 break up getfile api main SOLR-17351: Decompose filestore "get file" API Jan 20, 2025
@epugh
Copy link
Contributor

epugh commented Jan 20, 2025

Does this impact #3031 ? Since I am using cc.getFileStore I don't think so, but wanted to check.

@gerlowskija
Copy link
Contributor Author

That PR uses the Java interface, and not the HTTP APIs directly, so it should be fine afaict.

@POST
@Operation(
summary =
"Pushes a file to other nodes, or pulls a file from other nodes in the Solr cluster.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the fact that one thing does a push or pull suggest a issue to fix? https://en.wikipedia.org/wiki/List_of_Doctor_Dolittle_characters#Pushmi-Pullyu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Syncs a file via either pushing or pulling across the nodes in the Solr cluster." ???

String getFrom,
@Parameter(
description =
"If true, triggers syncing for this file across all nodes in the filestore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's own end point? Or, do we somehow eliminate the need for this?

it -> {
try {
InputStream inputStream = it.getInputStream();
if (inputStream != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this ever be null? wouldn't you have an exception instead?

return dirListingResponse;
}

// TODO Modify the filestore implementation itself to return this object, so conversion isn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i also ran into this in the schema designer

@@ -194,6 +301,48 @@ public SolrJerseyResponse deleteFile(String filePath, Boolean localDelete) {
return response;
}

@Override
@PermissionName(PermissionNameProvider.Name.FILESTORE_WRITE_PERM)
public SolrJerseyResponse executeFileStoreCommand(String path, String getFrom, Boolean sync) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we trying to get rid of the "command" pattern in our new APIs?

} catch (IOException e) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e);
}
ClusterFileStore.syncToAllNodes(fileStore, path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been a bit confused between NodeFileStore, ClusterFileStore, DistribFileStore ;-).

.collect(Collectors.toList());
directoryListingResponse.files = Collections.singletonMap(path, directoryContents);
return directoryListingResponse;
if (type == FileStore.FileType.NOFILE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels overly finicky logic.. why is it so hard to decide when to get metadata? What are we missing that doens't just make this simple. A thing has a meta thing. That should be the only way.

throw new SolrException(
SolrException.ErrorCode.SERVER_ERROR, "Error reading file " + pathCopy);
// User wants to get the "raw" file
// TODO Should we be trying to json-ify otherwise "raw" files in this way? It seems like a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case FOR this? Do we have one right now?

@@ -173,7 +173,7 @@ public void uninstall(String packageName, String version)
String.format(Locale.ROOT, "/package/%s/%s/%s", packageName, version, "manifest.json"));
for (String filePath : filesToDelete) {
DistribFileStore.deleteZKFileEntry(zkClient, filePath);
String path = "/api/cluster/files" + filePath;
String path = "/api/cluster/filestore/files" + filePath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the more verbose path I think... Can a filestore have anything under than files? If not, what about just "/api/cluster/filestore"

@@ -115,7 +115,7 @@ openssl dgst -sha1 -sign my_key.pem runtimelibs.jar | openssl enc -base64 | sed
+
[source, bash]
----
curl --data-binary @runtimelibs.jar -X PUT http://localhost:8983/api/cluster/files/mypkg/1.0/myplugins.jar?sig=<signature-of-jar>
curl --data-binary @runtimelibs.jar -X PUT http://localhost:8983/api/cluster/filestore/files/mypkg/1.0/myplugins.jar?sig=<signature-of-jar>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe drops the files?

try {
fileStore.syncToAllNodes(path);
} catch (IOException e) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know if we can blame the user

ByteBuffer filedata = null;
try {
final var fileRequest = new FileStoreApi.GetFile(path);
final var client = coreContainer.getSolrClientCache().getHttpSolrClient(baseUrl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use the SolrClientCache for anything other than streaming expressions. I'm stamping out such usages here: https://issues.apache.org/jira/browse/SOLR-17630
I'm particularly surprised to see you removed a use of the new requestWithBaseUrl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants