QOL - Extended Queries #291

jkeon · 2023-11-08T16:42:05Z

Using vanilla Unity EntityQuery objects are powerful for getting the subset of data you want but they are limited to the structural aspect of the archetype.

Example: Get all Containers that are full

Container Entity
- Quantity Component - 10 units
- MaxQuantity Component - 10 units
- IsFull Component - Tag

From a query perspective, finding anything with these components is going to be super fast and return only the entities that are containers and are full.

However from a practical perspective, this implies some really inefficient practices.
Whenever we add to the Quantity, once we hit the max, we would need to add the IsFull tag which results in a structural change.
We could change the IsFull tag to a bool component that all Containers have but at that point we might as well just get rid of it all together and instead just compare the Quantity to the MaxQuantity to determine if a container is full or not.

We're efficient from a structural change perspective and storage usage, but we're offloading the mental load and coding for jobs to the developer who only wants to operate on Full Containers.

With Entities 1.0 on Unity 2022.2+ the job scheduler has reduced a lot of it's overhead. See: https://blog.unity.com/engine-platform/improving-job-system-performance-2022-2-part-2

Which allows for an architecture of scheduling more micro bursted jobs to make sense.

This could allow for us to write our own extension methods to scheduling jobs that allow for better quality of life Queries and less boilerplate.

LINQ Style Syntax

Relationships are hard in ECS. Flecs author estimates it will take 2 years for him to build it out properly. https://ajmmertens.medium.com/a-roadmap-to-entity-relationships-5b1d11ebb4eb
If we had this in Unity today, then the Query system would allow for it, but we don't, so we have to make due with pre-jobs.

With Unity's Burst and deferred NativeList we should be able to string together micro jobs that can whittle down the Archetype set into the set that we actually care about for our heavy lifting jobs.

DoSomethingWithFullContainersJob.Schedule(
    m_ContainersQuery
        .FilterOn<IsContainerFull>
        .ToResult<Entity>());

Which would result in the end user code looking like:

//DoSomethingWithFullContainersJob struct

//Could have this version, or could pass the collection and let you iterate through it yourself
public void Execute(Entity fullContainerEntity)
{
    //fullContainerEntity is guaranteed to be a container entity that is full
}

Options

FilterOn

Probably the first thing you'd want so you can narrow down the collection.
Creates a new instance of the job type which gets triggered by the archetype query and pulls in whatever is needed from the query.
In the above example, IsContainerFull would want to get the Quantity and MaxQuantity components from the archetype query.
The logic for IsContainerFull would check if the Quantity is >= MaxQuantity and return true or false to allow for the entity to make it through to the next part.

SortOn

Pretty simple, allow for a sort pass job.

ToResult

The last part, we don't care about Quantity or MaxQuantity, we just cared if the Container was full. What we care about in our job is the Entity.
Our logic job DoSomethingWithFullContainersJob will get triggered and run on this final result list of Entity objects.
If we cared about something else, we could have our custom ToResult job struct that would pull in whatever was needed to generate the result struct.

Implementation

Possibly need to use SourceGenerators which would build up the pre-jobs and ensure we're pulling in the right component lookups to build the queries.

Might be able to get away with it via generics but we might get a monstrosity that looks like this:

DoSomethingWithFullContainersJob.Schedule(
    m_ContainersQuery
        .FilterOn<IsContainerFull<Quantity, MaxQuantity, Entity>, Quanity, MaxQuantity, Entity>
        .ToResult<Entity>());

Would want to make sure that we cache the results and only re-run the pre-jobs if we need to.

For example, we have a job that runs near the start of the frame on full containers.

We don't know which ones are full so our pre-jobs get scheduled to build up the filtered down collection.
We then run our logic job that works on full containers.
Then mid frame, we have another job that also works on full containers for something different.
No containers had their Quantity or MaxQuantity written to so the cached result from the beginning of the frame is still valid. No pre-jobs needed.
Then something runs that does update Quantity
Then near the end of the frame, we have yet another job that also works on full containers for again, something different.
We need to run the pre-jobs again.

This approach "just works" but you could run into issues where you're running the prejobs way more than you need to. We should issue a warning to the developer or some way for them to know that they MIGHT want to look at ordering better to make use of the cached results.

The text was updated successfully, but these errors were encountered:

mbaker3 · 2023-11-08T17:41:01Z

This is an interesting idea but I have some concerns. I've been working in this space quite a bit lately :)

Storing Results

I get that filtering your make work set is cumbersome but do we really want to pay the allocation cost of a NativeList(Allocator.TempJob) and its associated bucket allocations?

Unless the comparison is really expensive you're probably better off just skipping over entities that don't meet your filter in the job itself.

To reduce boilerplate we may be better served by a custom job type that takes in filters and automatically does the skipping for you. I may already be close to a solution for this with my custom enumerator that takes in a filter in QuestCollection of our project. We can improve it further by making it chunk aware and striding over the chunk of filtering on the existence of a component (Tag) or the value of a SharedComponent.

Sort

Sort worries me because you essentially turn your data set into a series of random lookups after you sort it.

If you're just looking for the highest value or highest X values then you can track those as you iterate instead. Lots of boilerplate today but probably something we could create some convenience patterns for.

If you truly do need all of your results sorted (Ex: Some UI presentation) then you're best served to do the sorting after your filtering and any other processing/data collection you want to perform. The native collections have async and sync sort functions to help you with this today.

Cached Results and Invalidation

I think we're close to this today. LazyReactiveValue is most of the way there and just needs to support async scheduling its refresh when an Acquire is requested on the data. Very doable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QOL - Extended Queries #291

QOL - Extended Queries #291

jkeon commented Nov 8, 2023

mbaker3 commented Nov 8, 2023

QOL - Extended Queries #291

QOL - Extended Queries #291

Comments

jkeon commented Nov 8, 2023

LINQ Style Syntax

Options

Implementation

mbaker3 commented Nov 8, 2023

Storing Results

Sort

Cached Results and Invalidation