Replies: 4 comments 14 replies
-
Idea: Denormalize (enrich) the data in SQL.We can use the submission collector as a basis for the submission list fetch. Consider this (currently used) code: $collector = Repo::submission()->getCollector()
->filterByContextIds([$context->getId()]); // In a way, this is already an access policy check
... add filters ...
$submissions = $collector->getMany();
return $response->withJson([
'itemsMax' => $collector->limit(null)->offset(null)->getCount(),
'items' => some_method_to_map_to_json($submissions)
); This is good enough for the general-purpose API (and single-submission front end). However, it's not rich enough for the submissions list, which needs e.g. review assignment and editorial assignment information. We could augment the data without re-implementing the code, with SQL like the following:
Typical results:
(The syntax is unfortunately slightly different for the two DBMSs -- this can be finessed by a Laravel querybuilder macro, I think. There are better functions in more recent releases of MariaDB/MySQL than Characteristics:
In PHP in the $collector = Repo::submission()->getCollector()
->filterByContextIds([$context->getId()]) // In a way, this is already an access policy check
->augmentWithReviewerData()
->augmentWithStageAssignmentData()
... (I think we could also apply this idea to our localizations -- so that we can return the full set of localized data to the UI, rather than using multiple joins to get the "primary" and "current" locale data. But that's another story.) |
Beta Was this translation helpful? Give feedback.
-
Idea: Use query builders to implement access policies in SQL.Building on Idea: Denormalize (enrich) the data in SQL, use querybuilders to build access control in SQL (in place of the current access policy toolset, which is aging poorly). This would be usable in both aggregate (submission lists) and individual (single submission workflow) cases using a single implementation. It might also prove a general replacement for our current access policy toolkit. For simple cases, this is clear: PHP checks that require objects to be fetched and instantiated first get replaced by collector calls. Old patterns (used e.g. in access policies): $reviewAssignment = $reviewAssignmentDao->getById($reviewId);
// Only supports one submission; hard to know where in the policy set it's stored in the authorized context!
$submission = $this->getAuthorizedContextObject(Application::ASSOC_TYPE_SUBMISSION);
if ($reviewAssignment->getSubmissionId() != $submission->getId()) {
return AuthorizationPolicy::AUTHORIZATION_DENY;
} New pattern that might get a list of reviews by composing collectors: $submissonCollector = Repo::submissions()->getCollector()
->filterByContextIds([$context->getId()])
->filterByAssignedUserIds([$user->getId()]);
$reviewAssignmentCollector = Repo::reviewAssignments()->getCollector()
->filterBySubmissionIds($submissionCollector);
// The $reviewAssignmentCollector can now be used to get information about one or many
// review assignments that are assigned to a specific user, in a specific context, without needing
// to execute a large series of queries to build up data. In fact, using this object would cause only
// a single query to be executed if e.g. `getMany` is called. To be determined:
|
Beta Was this translation helpful? Give feedback.
-
I would like to mention interesting middle ground approach (between 1 query and 500 sql queries), which I learnt while building graphql APIs and found it quite easy and efficient. Initial JS implementation came from facebook - https://github.com/graphql/dataloader , but there are implementations in all languages - here is php https://github.com/overblog/dataloader-php To briefly explain it, lets say that you need submissions, submission-> review_assignments and submission->review_assignments->user to build your detailed response. You can achieve that with 3 queries.
Dataloader is than responsible to match correct ids. So it has to be correct, but these are often very simple loops and in practice couple of such dataloaders satisfies such dependency use cases. So what I imagine in our codebase that in the Submission mapping class, when you calculate stages you could ask Dataloader for review_assignments for given submission_id. Dataloader is than responsible to collect all these submission ids within one sync event loop. And making one query after the sync event loop and resolving promise afterwards, which allows to carry on. So it requires using promises to get this working. In Node.js its all you use.. so very native pattern. I am not sure whether in php code base it would be desirable pattern. In short terms - it would be really just batching the sql requests that are coming from the mappers. Obviously building one super sql query will be always fastest, but this approach still scales well using basic sql queries, which returns always the same object shapes. Happy to do deeper dive what it could look like in our code base if there would be interest.. |
Beta Was this translation helpful? Give feedback.
-
@asmecher Just to follow up our discussion. |
Beta Was this translation helpful? Give feedback.
-
The new submission lists toolset (#7495) includes information in the submission lists about submissions, editorial assignments, individual review assignments, etc. This mixes different "depths" of data much more freely than our submission lists have done before. Building these new lists will be a heavyweight process and loading everything from the database one at a time will not be scalable:
Compounding the problem is that aggregate information like the submission lists will have to reimplement access control: we already have access policies for each submission ("can I view the submission's workflow") but we will need the same for aggregate data ("should I include this submission's review information in the list"). This will get more complicated as we enrich the submission lists to include more data that was previously only available on the individual submission level. Our access policies are not currently written with lists in mind -- there is only one "submission" authorized context object per request, for example.
We might be able to solve several of these problems at once.
Beta Was this translation helpful? Give feedback.
All reactions