Merge branch 'main' into DSEGOG-321-delete-records

ral-facilities · Aug 14, 2024 · b94b1fd · b94b1fd
2 parents ce27ee0 + 01cd923
commit b94b1fd
Show file tree

Hide file tree

Showing 53 changed files with 3,364 additions and 77 deletions.
diff --git a/FUNCTIONS.md b/FUNCTIONS.md
@@ -0,0 +1,85 @@
+# Functions
+
+In OperationsGateway, "functions" is a feature by which users can define their own outputs from a records channel data, using predefined builtin functions and common mathematical operators. This functionality is described here from a developers perspective, explaining relevant concepts and how to develop the functionality further as needed.
+
+## Terminology
+Various terms are used, perhaps not always consistently, in the codebase. Some of these are exclusively used in the context of functions, whilst others are more widely used across the codebase.
+- `function`: General term for a single combination of `name` and `expression` defined by the user, to be evaluated on each `record` currently loaded in the UI. It may depend on one or more of the following:
+    - `constant`: Explicit numerical value not dependent on the `record` in question, for example `0.1` or `123`.
+    - `variable`: Non-numeric string describing a value that **is** dependent on the record, either:
+        - The name of a `channel`.
+        - The `name` of another `function`, which has already been evaluated for this record.
+    - `operand`: Symbol representing a mathematical operation, one of `+`, `-`, `*`, `/` and `**`.
+    - `builtin`: A predefined function which is applied to user defined input. This may be simple and just use a `numpy` implementation, such `np.mean`. It may be more complex and require custom functionality, such as determining the `background` of a signal.
+- `name`: String identifying a `function`, so that it can be used as an input to other user defined `function`s.
+- `expression`: String defining operations to be applied to each record, following the supported syntax.
+- `return type` or `data type`: Channels can either be "scalar" (`float`), "waveform" (two `np.ndarray`s for x and y) or "image" (one 2D `np.ndarray`). Similarly, the output of any `function` or any intermediary `operand` or `builtin` will be one of these types.
+
+## Lark
+To implement functions, we use the [Lark](https://github.com/lark-parser/lark) library. As a starting point, the [JSON parser tutorial](https://github.com/lark-parser/lark/blob/master/docs/json_tutorial.md) introduces the crucial concepts for building a parser and `Transformer`.
+
+### Parser
+To parse an `expression`, suitable grammar must be defined. This is done in [`parser.py`](operationsgateway_api/src/functions/parser.py), and the same grammar is used for all [transformations](#transformers).
+
+The format of the grammar uses the basic Lark concepts covered in the JSON tutorial. "Rules" such as `operation` have multiple possible patterns to match against, which are "rules" in their own right such as `addition`. This can refer back to more generic rules, such as `term`. Ultimately, each rule will be evaluated until it can be expressed in terms of "terminals", which are either literal strings (e.g. `"+"` for addition) or regex (`CNAME` and `SIGNED_NUMBER` are predefined regex patterns we import). For a more in depth discussion of grammar, please refer to the Lark documentation.
+
+To expand the grammar, additional patterns can be added here. For example, other `operation`s such as floor division would need to be defined under that rule as:
+```
+    ?operation    : subtraction
+                  | addition
+                  | multiplication
+                  | division
+                  | exponentiation
+                  | floor_division
+
+    floor_division: term "//" term
+```
+
+Once defined, the parser is then used to convert a string into a `ParseTree` of `Token`s, based on the defined grammar. However, in our use case we do not display or pass the expression in this format so this can be mostly ignored.
+
+### Transformers
+Once an `expression` is parsed, we then need to transform it into something useful. What this output is can vary, but in all cases the grammar (and so parser) is the same. Currently there are three parsers (see their docstrings for implementation details):
+- [`TypeTransformer`](operationsgateway_api/src/functions/type_transformer.py)
+- [`VariableTransformer`](operationsgateway_api/src/functions/variable_transformer.py)
+- [`ExpressionTransformer`](operationsgateway_api/src/functions/expression_transformer.py)
+
+The primary feature of all of these is their callback functions. When transforming the tree, Lark will look for a function on the transformer with the name matching the token being transformed (i.e. one of the rules from the parser). If found, this will be called with the **list** of `Token`s as the argument. So for any `operation`, you will have two `Token`s - the `term` to the left and right of the actual operator (e.g. `"+"`).
+
+This allow us to, for example, define the callback function `variable` which, respectively:
+- Lookup the dtype of the channel in the manifest and to ensure it is being passed to builtins which accept its type
+- Build a set of channel names which feature in the `expression`
+- Return the numerical value of that channel for the given record to be used in the final calculation
+
+To extend this functionality, using floor division as an example, one would need to add a callback to `TypeTransformer` to identify it as an operation which has requirements on which types can be used together **and** `ExpressionTransformer` to actually perform the division. It would not be necessary to add it to `VariableTransformer`, as it is not relevant for channel name identification.
+
+## Builtins
+One of the main features of "functions" is the ability for users to apply predefined, non-trivial analysis to a channel. Since these are more complex, they are defined separately from the Lark classes.
+
+Each new builtin should extend the abstract [`Builtin`](operationsgateway_api/src/functions/builtins/builtin.py) class, which defines the basic properties that identify it and crucially, the implementation details in the `evaluate` function.
+
+A reference should also be added to `builtins_dict` on [`Builtin`](operationsgateway_api/src/functions/builtins/builtins.py). This is called by the Lark `Transformer`s when needed, and does a lookup of the function name before calling `evaluate` from the correct class.
+
+It should be noted, that the distinction between purely `numpy` functions and `builtin` functions is somewhat arbitrary - it would be possible to represent all `builtin` functions directly on the Lark classes like the former, however this introduces a lot of complexity in those classes, and means type checking and evaluations for a given builtin would be split across different `Transformer`s. Likewise all the `numpy` functions could be refactored into their own classes, however given their (relative) simplicity and the fact they are unlikely to be regularly modified, this has not (yet) been done.
+
+## Data representation
+Of the three data types, "scalar" and "image" are already well represented, and support operations such as addition and multiplication. However, "waveform" is not. In principle therefore, it is necessary to define methods for how a data type behaves when these operations are applied. If more data types are developed in the future, or custom behaviour is needed for "scalar" or "image", then the pattern of `WaveformVariable` should be extended to those use cases.
+
+### WaveformVariable
+The [`WaveformVariable`](operationsgateway_api/src/functions/waveform_variable.py) class achieves this by defining dunder methods for `__add__`, `__sub__` and so on. Note that for commutative operations like the former, the definition of `__radd__` is trivial, but needs to be explicit for non-commutative operations like subtraction. Generally speaking, all these operations are applied to the y axis of the data, but the x axis is persisted and so the output remains a `WaveformVariable`.
+
+As we use `numpy` for much of the implementation details, we also define functions for `min`, `mean` etc. These will be called so that the syntax for a `WaveformVariable`:
+```python
+>>> import numpy as np
+>>> from operationsgateway_api.src.functions.waveform_variable import WaveformVariable
+>>> np.mean(WaveformVariable(x=np.array([1,2,3]), y=np.array([1,4,9])))
+4.666666666666667
+```
+is the same as that of "image" or "scalar":
+```python
+>>> np.mean(1)
+1.0
+```
+In this case, the output is typically a "scalar"; the applied function reduces the dimensionality of the input. In the process, the x axis data is discarded.
+
+It may be necessary to define additional builtin functions (e.g. `np.median` as `median`) or operators (e.g. `//` as `__floordiv__`), in which case these will also need to be referenced here so that they can be applied to data of the type "waveform".
+
diff --git a/operationsgateway_api/src/exceptions.py b/operationsgateway_api/src/exceptions.py
@@ -111,6 +111,12 @@ def __init__(self, msg="Error during handling of experiments", *args, **kwargs):
         self.status_code = 500
 
 
+class FunctionParseError(ApiError):
+    def __init__(self, msg="Problem with function syntax", *args, **kwargs):
+        super().__init__(msg, *args, **kwargs)
+        self.status_code = 400
+
+
 class ExportError(ApiError):
     def __init__(self, msg="Error during creation of export file", *args, **kwargs):
         super().__init__(msg, *args, **kwargs)

diff --git a/operationsgateway_api/src/functions/__init__.py b/operationsgateway_api/src/functions/__init__.py
@@ -0,0 +1,15 @@
+from operationsgateway_api.src.functions.builtins.tokens import TOKENS
+from operationsgateway_api.src.functions.expression_transformer import (
+    ExpressionTransformer,
+)
+from operationsgateway_api.src.functions.type_transformer import TypeTransformer
+from operationsgateway_api.src.functions.variable_transformer import VariableTransformer
+from operationsgateway_api.src.functions.waveform_variable import WaveformVariable
+
+__all__ = (
+    ExpressionTransformer,
+    TOKENS,
+    TypeTransformer,
+    VariableTransformer,
+    WaveformVariable,
+)
diff --git a/operationsgateway_api/src/functions/builtins/background.py b/operationsgateway_api/src/functions/builtins/background.py
@@ -0,0 +1,51 @@
+import numpy as np
+
+from operationsgateway_api.src.functions.builtins.builtin import Builtin
+from operationsgateway_api.src.functions.waveform_variable import WaveformVariable
+
+
+class Background(Builtin):
+    input_types = {"waveform", "image"}
+    output_type = "scalar"
+    symbol = "background"
+    token = {
+        "symbol": symbol,
+        "name": "Background",
+        "details": (
+            "Calculate the background of a waveform or image. Errors if scalar "
+            "provided. "
+            "Implementation (waveform): First, applies smoothing by taking "
+            "weighted nearest and next-nearest neighbour contributions to y "
+            "values whose difference from their neighbours is more than 0.2 "
+            "times the total range in y. The first 25 and last 25 y values in "
+            "the signal are averaged to give an estimate of the background."
+            "Implementation (image): The average pixel value in the 10 by 10 region "
+            "in the top left of the image is returned."
+        ),
+    }
+
+    @staticmethod
+    def evaluate(argument: "WaveformVariable | np.ndarray") -> float:
+        """
+        Waveform:
+        First, applies smoothing by taking weighted nearest and next-nearest
+        neighbour contributions to  y values whose difference from their neighbours
+        is more than 0.2 times the total range in y.
+        The first 25 and last 25 y values in the signal are averaged to give an
+        estimate of the background.
+
+        Image:
+        The average pixel value in the 10 by 10 region in the top left of the image
+        is returned.
+        """
+        if isinstance(argument, WaveformVariable):
+            Builtin.smooth(argument.y)
+            if len(argument.y) < 50:
+                return np.mean(argument.y)
+            else:
+                return (np.mean(argument.y[:25]) + np.mean(argument.y[-25:])) / 2
+        elif isinstance(argument, np.ndarray):
+            return np.mean(argument[:10, :10])
+        else:
+            # The check will fail and raise a TypeError
+            Background.evaluation_type_check(argument)
diff --git a/operationsgateway_api/src/functions/builtins/builtin.py b/operationsgateway_api/src/functions/builtins/builtin.py
@@ -0,0 +1,147 @@
+from abc import ABC, abstractmethod
+
+import numpy as np
+
+from operationsgateway_api.src.functions.waveform_variable import WaveformVariable
+
+
+class Builtin(ABC):
+    @property
+    @abstractmethod
+    def input_types(self) -> "set[str]":
+        """The channel types accepted by this builtin"""
+
+    @property
+    @abstractmethod
+    def output_type(self) -> str:
+        """The channel type output by this builtin"""
+
+    @property
+    @abstractmethod
+    def symbol(self) -> str:
+        """The symbol used to represent this builtin in an expression"""
+
+    @property
+    @abstractmethod
+    def token(self) -> "dict[str, str]":
+        """`dict` containing all help and implementation details"""
+
+    @staticmethod
+    @abstractmethod
+    def evaluate(
+        argument: "float | WaveformVariable | np.ndarray",
+    ) -> "float | WaveformVariable | np.ndarray":
+        """Actually evaluate the builtin on a single numeric argument"""
+
+    @staticmethod
+    def centroid(image: np.ndarray, axis: int) -> int:
+        """
+        Calculates the centre of mass
+        """
+        sums = np.sum(image, axis=axis)
+        weighted_sums = sums * np.arange(len(sums))
+        centre_of_mass = np.sum(weighted_sums) / np.sum(sums)
+        return int(centre_of_mass)
+
+    @staticmethod
+    def calculate_fwhm(waveform: WaveformVariable) -> "tuple[float, float]":
+        """
+        First, applies smoothing by taking weighted nearest and next-nearest
+        neighbour contributions to  y values whose difference from their neighbours
+        is more than 0.2 times the total range in y. The maximum y value is then
+        identified along with the x positions bounding the FWHM.
+        """
+        y = waveform.y
+        Builtin.smooth(y)
+
+        y -= np.min(y)
+        max_y = np.max(y)
+        half_max_left_i = half_max_right_i = max_i = np.argmax(y)
+        low_values_left = np.where(y[:max_i] <= max_y / 2)[0]
+        low_values_right = np.where(y[max_i:] <= max_y / 2)[0]
+
+        if len(low_values_left):
+            half_max_left_i = low_values_left[-1]
+
+        if len(low_values_right):
+            half_max_right_i = low_values_right[0] + max_i
+
+        return waveform.x[half_max_left_i], waveform.x[half_max_right_i]
+
+    @staticmethod
+    def smooth(y: np.ndarray) -> None:
+        """
+        Applies smoothing by taking weighted nearest and next-nearest
+        neighbour contributions to  y values whose difference from their neighbours
+        is more than 0.2 times the total range in y.
+
+        Note that this modifies y in place.
+        """
+        diff_tolerance = (np.max(y) - np.min(y)) * 0.2
+        diff = np.diff(y)
+
+        length = len(y)
+        if length >= 3:
+            if (
+                abs(diff[0]) > diff_tolerance
+                and abs(diff[0] + diff[1]) > diff_tolerance
+            ):
+                y[0] += (3 * diff[0] + diff[1]) / 6
+
+            for i in range(1, length - 1):
+                if abs(diff[i - 1]) > diff_tolerance and abs(diff[i]) > diff_tolerance:
+                    correction = -2 * diff[i - 1] + 2 * diff[i]
+                    divisor = 7
+                    if i > 1:
+                        correction -= diff[i - 2] + diff[i - 1]
+                        divisor += 1
+                    if i < length - 2:
+                        correction += diff[i] + diff[i + 1]
+                        divisor += 1
+
+                    y[i] += correction / divisor
+
+            if (
+                abs(diff[-1]) > diff_tolerance
+                and abs(diff[-1] + diff[-2]) > diff_tolerance
+            ):
+                y[-1] -= (3 * diff[-1] + diff[-2]) / 6
+
+    @classmethod
+    def type_check(cls, argument_type: str) -> None:
+        """Raises a TypeError with a human readable message detailed the
+        provided and acceptable types to this function.
+
+        Args:
+            argument (str): Argument provided.
+
+        Raises:
+            TypeError: Formatted with input and acceptable types.
+        """
+        if argument_type not in cls.input_types:
+            raise TypeError(
+                f"'{cls.symbol}' accepts {cls.input_types} type(s), "
+                f"'{argument_type}' provided",
+            )
+
+    @classmethod
+    def evaluation_type_check(
+        cls,
+        argument: "float | WaveformVariable | np.ndarray",
+    ) -> None:
+        """Raises a TypeError with a human readable message detailed the
+        provided and acceptable types to this function from an actual value
+        provided at evaluation.
+
+        Args:
+            argument (float | WaveformVariable | np.ndarray): Argument provided.
+
+        Raises:
+            TypeError: Formatted with input and acceptable types.
+        """
+        types_dict = {
+            float: "scalar",
+            WaveformVariable: "waveform",
+            np.ndarray: "image",
+        }
+        cls.type_check(types_dict.get(type(argument), type(argument).__name__))
diff --git a/operationsgateway_api/src/functions/builtins/builtins.py b/operationsgateway_api/src/functions/builtins/builtins.py
@@ -0,0 +1,55 @@
+from operationsgateway_api.src.functions.builtins.background import Background
+from operationsgateway_api.src.functions.builtins.builtin import Builtin
+from operationsgateway_api.src.functions.builtins.centre import Centre
+from operationsgateway_api.src.functions.builtins.centroid_x import CentroidX
+from operationsgateway_api.src.functions.builtins.centroid_y import CentroidY
+from operationsgateway_api.src.functions.builtins.falling import Falling
+from operationsgateway_api.src.functions.builtins.fwhm import FWHM
+from operationsgateway_api.src.functions.builtins.fwhm_x import FWHMX
+from operationsgateway_api.src.functions.builtins.fwhm_y import FWHMY
+from operationsgateway_api.src.functions.builtins.integrate import Integrate
+from operationsgateway_api.src.functions.builtins.rising import Rising
+
+
+class Builtins:
+    builtins_dict: "dict[str, Builtin]" = {
+        Background.symbol: Background,
+        Centre.symbol: Centre,
+        CentroidX.symbol: CentroidX,
+        CentroidY.symbol: CentroidY,
+        Falling.symbol: Falling,
+        FWHM.symbol: FWHM,
+        FWHMX.symbol: FWHMX,
+        FWHMY.symbol: FWHMY,
+        Integrate.symbol: Integrate,
+        Rising.symbol: Rising,
+    }
+    tokens = [b.token for b in builtins_dict.values()]
+
+    @staticmethod
+    def get_builtin(builtin_name: str) -> Builtin:
+        try:
+            return Builtins.builtins_dict[builtin_name]
+        except KeyError as e:
+            raise AttributeError(
+                f"'{builtin_name}' is not a recognised builtin function name",
+            ) from e
+
+    @staticmethod
+    def type_check(tokens: list, is_evaluation: bool = False) -> str:
+        builtin_name, argument = tokens
+        builtin = Builtins.get_builtin(builtin_name)
+
+        if is_evaluation:
+            builtin.evaluation_type_check(argument)
+        else:
+            builtin.type_check(argument)
+
+        return builtin.output_type
+
+    @staticmethod
+    def evaluate(tokens: list) -> str:
+        Builtins.type_check(tokens, True)
+        builtin_name, argument = tokens
+        builtin = Builtins.get_builtin(builtin_name)
+        return builtin.evaluate(argument)