Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Add prototype for long running API calls for Archived Recordings #698

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Josh-Matsuoka
Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka commented Oct 29, 2024

Hi,

This adds a prototype for long running API operations (first step of #286).

The workflow proceeds as described in the issue:

  • Cryostat receives a patch request to archive a recording
  • Basic validation is done (check that target is valid and that the recording exists)
  • Following this an ArchiveRequest is created that tracks a job ID and the requested recording
  • The archive request is then passed on to the ArchiveRequestGenerator (works similarly to the InterruptibleReportsGenerator)
  • ArchiveRequestGenerator performs the archiving on a separate thread, firing a notification to the web client for success or failure which includes the job ID

TODO: Web Client needs to be adjusted to account for the new notifications, refreshing the tables when it receives them.
TODO: Implement the same framework for Grafana uploads of active/archived recordings, as well as report generation for active/archived recordings

@mergify mergify bot added the safe-to-test label Oct 29, 2024
@Josh-Matsuoka Josh-Matsuoka added the feat New feature or request label Oct 29, 2024
@Josh-Matsuoka
Copy link
Contributor Author

@andrewazores which other endpoints do you think could use a framework like this? What other operations are likely to block the client for a long time?

@andrewazores
Copy link
Member

andrewazores commented Oct 30, 2024

Archiving an active recording has the obvious long-running concerns where the current architecture can lead to the client waiting a long time for Cryostat to respond. Another similar case right now is uploading recordings to the jfr-datasource for viewing in Grafana. Both the Grafana feature and the report generation feature also run into it for both active and archived recordings.

In the future, features like taking a heap dump should also fit into this same long-running task architecture.

@Inject private RecordingHelper recordingHelper;
@Inject ReportsService reportsService;

private Map<String, Map<String, AnalysisResult>> jobResults;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid adding something like this that hangs on to the report generation results in memory. The report generation system already has tiered in-memory and S3-backed caching, so it'd be best to lean on that.

For archiving recordings, or for uploading recording files to jfr-datasource, it's a simpler design because there is no response payload to send back to the client - the client just needs to know that the job has been completed and whether it completed successfully.

For report generation, where it takes time and the client needs to not only not that it has completed and whether it was a success, but also needs a response payload (report result), I think an API that behaves something like Map.computeIfAbsent() would be best. I'm thinking that the ReportsService interface should gain methods for checking whether a result is already present for a given key. The default implementation would trivially return false, and the two caching tier implementations would check their respective caches and delegates.

image

HTTP 200 responses would contain the Map<String, AnalysisResult> as the response body for the client to use. HTTP 202 responses would contain a Job UUID in the response body, maybe a Location header to tell the client the URL where to find the result when it's ready (1), and that's it. Later, the completion notification containing the Job UUID would be emitted. When the client receives the the notification containing that UUID it sends a follow-up GET to the Location header URL, and this time around the result is ready to go so the response is an HTTP 200 with the expected data. This way, the existing tiered caching infrastructure is reused and there are no new endpoints introduced, only different response status behaviours on the existing endpoints.

(1) right now, the URL would be the same URL that the request was sent to initially, ex. /api/v4/reports/{encodedKey}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request safe-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants