Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Universal Query Language #400

Conversation

PawelPeczek-Roboflow
Copy link
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow commented May 16, 2024

Description

In this PR I ship extension to core block in form of universal query / operation language.

The idea is the following:

  • we define JSON-based language to express operations on different types of data that we support (for example, for sv.Detections we support filtering, offseting and getting detections metadata - like size or location) and evaluation of unary and binary operations that are useful for conditional logic
  • definition is extensible - to add new operation we only need to create pydantic entity to host its definition, implementation of that operation and register the operation in UQL execution engine
  • definitions are self-describing - entities defining ops expose JSON manifests including custom fields expressing types of operations - that together should allow creating relatively generic UI for different kinds of blocks using UQL and execute simple type inference procedure on UI side to very effectively narrow down allowed operations that users can apply in specific contexts - hopefully making the whole thing quite intuitive
  • we do not make changes into workflows EE, apart from quality of life improvements - selectors embedded in blocks manifests dicts (so far we supported only singular references and lists of references, now we would like to be able to provide Dict[str, reference] as input - as this is very useful for providing parametrisation of UQL execution engine - let's say that u want to filter detections, but threshold of confidence is provided in runtime, not hardcoded in definition)
  • I do not imply that all blocks should use UQL, this is optional breadth-wise extension for workflows, but I think it will be beneficial, as:
    • we would be able to have quite generic builders for tons of operations in UI
    • we would be able to quickly extend UQL
    • we would eliminate need for multiple custom glue code blocks and python-code blocks (to some extent)

Downsides of the idea:

  • there needs to be some intelligence in the UI to create builders for operations and to know how to behave in context of specific blocks (for instance - condition would provide all parameters of UQL EE in the manifest definition vs detections filter would implicitly iterate over predictions in batch that is provided as predictions parameter)
  • the JSON definition is not quite easy to read by human - but it is quite easy to programatically construct and parse, which is more important in that case. I do recommend not pushing into "verbose" UQL, as we will add significant amount of work on the parsing side and on the query construction side, as both sides would need to implement "parser" of some kind

Here are the examples:

As u see, they all operate on the very similar definitions and in general this extension bring a lot of expression power.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

  • NOT READY TO MERGE

Any specific deployment considerations

  • We need sv.Detections as main data format to proceed with this PR (requires merging Sv detections in core blocks #392)
  • we need to push changes started here further - AL Data collector needs to have new sink and we need to make the operation set better and more robust

Docs

  • Docs updated? What were the changes:

@PawelPeczek-Roboflow PawelPeczek-Roboflow changed the base branch from main to feature/sv_detections_in_workflows May 20, 2024 12:57
inference/core/interfaces/http/orjson_utils.py Outdated Show resolved Hide resolved
inference/core/utils/image_utils.py Outdated Show resolved Hide resolved
class ExtractDetectionProperty(OperationDefinition):
model_config = ConfigDict(
json_schema_extra={
"description": "Extracts property from single detection",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is to handle sv.Detections then we might want to think if we want to promote operating on single detection instances. It is possible to get individual detections from sv.Detections object like this:

for i in range(len(detections)):
    yield detections[i]

Though above will create a copy of data, might be costly to execute.

type: Literal["!="]


class NumerGreater(BinaryOperator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numer -> Number

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

type: Literal["(Number) >"]


class NumerGreaterEqual(BinaryOperator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numer -> Number

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

["barcode-detection" for _ in range(len(detections))]
)
detections["data"] = (
np.array(extracted_data) if len(extracted_data) > 0 else np.empty(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted_data is defined as list, so we can simplify to detections["data"] = np.array(extracted_data) if extracted_data else np.empty(0)

Also:

>>> np.array([])
array([], dtype=float64)
>>> np.empty(0)
array([], dtype=float64)

So we could further simplify to detections["data"] = np.array(extracted_data)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

)
detections["data"] = (
np.array(extracted_data) if len(extracted_data) > 0 else np.empty(0)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to detections["data"] = np.array(extracted_data)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

detections[DETECTION_ID_KEY] = np.array([uuid4() for _ in range(len(detections))])
detections[PREDICTION_TYPE_KEY] = np.array(
["qrcode-detection" for _ in range(len(detections))]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop can be avoided: detections[PREDICTION_TYPE_KEY] = np.array(["qrcode-detection"] * len(detections))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

)

LONG_DESCRIPTION = """
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -15,6 +19,9 @@ def construct_workflow_output(
workflow_outputs: List[JsonField],
execution_cache: ExecutionCache,
) -> Dict[str, List[Any]]:
# TODO: figure out if we needed generic "to-root coordinates" transformation?
# Maybe output constructor should never touch this? Maybe it's better to
# have steps transforming to specific coordinates systems?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we extract this to separate task? It seems we are already handling this by calling sv_detections_to_root_coordinates below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue submitted

@PawelPeczek-Roboflow PawelPeczek-Roboflow changed the base branch from feature/sv_detections_in_workflows to main June 3, 2024 13:52
@PawelPeczek-Roboflow PawelPeczek-Roboflow changed the base branch from main to feature/sv_detections_in_workflows June 3, 2024 16:13
@PawelPeczek-Roboflow PawelPeczek-Roboflow marked this pull request as ready for review June 3, 2024 16:13
@PawelPeczek-Roboflow PawelPeczek-Roboflow merged commit 15d821e into feature/sv_detections_in_workflows Jun 3, 2024
49 checks passed
@PawelPeczek-Roboflow PawelPeczek-Roboflow deleted the feature/create_universal_query_operation_language branch June 3, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants