Skip to content

Commit

Permalink
Merge branch 'main' into DSEGOG-321-delete-records
Browse files Browse the repository at this point in the history
  • Loading branch information
MRichards99 committed Aug 14, 2024
2 parents ce27ee0 + 01cd923 commit b94b1fd
Show file tree
Hide file tree
Showing 53 changed files with 3,364 additions and 77 deletions.
85 changes: 85 additions & 0 deletions FUNCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Functions

In OperationsGateway, "functions" is a feature by which users can define their own outputs from a records channel data, using predefined builtin functions and common mathematical operators. This functionality is described here from a developers perspective, explaining relevant concepts and how to develop the functionality further as needed.

## Terminology
Various terms are used, perhaps not always consistently, in the codebase. Some of these are exclusively used in the context of functions, whilst others are more widely used across the codebase.
- `function`: General term for a single combination of `name` and `expression` defined by the user, to be evaluated on each `record` currently loaded in the UI. It may depend on one or more of the following:
- `constant`: Explicit numerical value not dependent on the `record` in question, for example `0.1` or `123`.
- `variable`: Non-numeric string describing a value that **is** dependent on the record, either:
- The name of a `channel`.
- The `name` of another `function`, which has already been evaluated for this record.
- `operand`: Symbol representing a mathematical operation, one of `+`, `-`, `*`, `/` and `**`.
- `builtin`: A predefined function which is applied to user defined input. This may be simple and just use a `numpy` implementation, such `np.mean`. It may be more complex and require custom functionality, such as determining the `background` of a signal.
- `name`: String identifying a `function`, so that it can be used as an input to other user defined `function`s.
- `expression`: String defining operations to be applied to each record, following the supported syntax.
- `return type` or `data type`: Channels can either be "scalar" (`float`), "waveform" (two `np.ndarray`s for x and y) or "image" (one 2D `np.ndarray`). Similarly, the output of any `function` or any intermediary `operand` or `builtin` will be one of these types.

## Lark
To implement functions, we use the [Lark](https://github.com/lark-parser/lark) library. As a starting point, the [JSON parser tutorial](https://github.com/lark-parser/lark/blob/master/docs/json_tutorial.md) introduces the crucial concepts for building a parser and `Transformer`.

### Parser
To parse an `expression`, suitable grammar must be defined. This is done in [`parser.py`](operationsgateway_api/src/functions/parser.py), and the same grammar is used for all [transformations](#transformers).

The format of the grammar uses the basic Lark concepts covered in the JSON tutorial. "Rules" such as `operation` have multiple possible patterns to match against, which are "rules" in their own right such as `addition`. This can refer back to more generic rules, such as `term`. Ultimately, each rule will be evaluated until it can be expressed in terms of "terminals", which are either literal strings (e.g. `"+"` for addition) or regex (`CNAME` and `SIGNED_NUMBER` are predefined regex patterns we import). For a more in depth discussion of grammar, please refer to the Lark documentation.

To expand the grammar, additional patterns can be added here. For example, other `operation`s such as floor division would need to be defined under that rule as:
```
?operation : subtraction
| addition
| multiplication
| division
| exponentiation
| floor_division
floor_division: term "//" term
```

Once defined, the parser is then used to convert a string into a `ParseTree` of `Token`s, based on the defined grammar. However, in our use case we do not display or pass the expression in this format so this can be mostly ignored.

### Transformers
Once an `expression` is parsed, we then need to transform it into something useful. What this output is can vary, but in all cases the grammar (and so parser) is the same. Currently there are three parsers (see their docstrings for implementation details):
- [`TypeTransformer`](operationsgateway_api/src/functions/type_transformer.py)
- [`VariableTransformer`](operationsgateway_api/src/functions/variable_transformer.py)
- [`ExpressionTransformer`](operationsgateway_api/src/functions/expression_transformer.py)

The primary feature of all of these is their callback functions. When transforming the tree, Lark will look for a function on the transformer with the name matching the token being transformed (i.e. one of the rules from the parser). If found, this will be called with the **list** of `Token`s as the argument. So for any `operation`, you will have two `Token`s - the `term` to the left and right of the actual operator (e.g. `"+"`).

This allow us to, for example, define the callback function `variable` which, respectively:
- Lookup the dtype of the channel in the manifest and to ensure it is being passed to builtins which accept its type
- Build a set of channel names which feature in the `expression`
- Return the numerical value of that channel for the given record to be used in the final calculation

To extend this functionality, using floor division as an example, one would need to add a callback to `TypeTransformer` to identify it as an operation which has requirements on which types can be used together **and** `ExpressionTransformer` to actually perform the division. It would not be necessary to add it to `VariableTransformer`, as it is not relevant for channel name identification.

## Builtins
One of the main features of "functions" is the ability for users to apply predefined, non-trivial analysis to a channel. Since these are more complex, they are defined separately from the Lark classes.

Each new builtin should extend the abstract [`Builtin`](operationsgateway_api/src/functions/builtins/builtin.py) class, which defines the basic properties that identify it and crucially, the implementation details in the `evaluate` function.

A reference should also be added to `builtins_dict` on [`Builtin`](operationsgateway_api/src/functions/builtins/builtins.py). This is called by the Lark `Transformer`s when needed, and does a lookup of the function name before calling `evaluate` from the correct class.

It should be noted, that the distinction between purely `numpy` functions and `builtin` functions is somewhat arbitrary - it would be possible to represent all `builtin` functions directly on the Lark classes like the former, however this introduces a lot of complexity in those classes, and means type checking and evaluations for a given builtin would be split across different `Transformer`s. Likewise all the `numpy` functions could be refactored into their own classes, however given their (relative) simplicity and the fact they are unlikely to be regularly modified, this has not (yet) been done.

## Data representation
Of the three data types, "scalar" and "image" are already well represented, and support operations such as addition and multiplication. However, "waveform" is not. In principle therefore, it is necessary to define methods for how a data type behaves when these operations are applied. If more data types are developed in the future, or custom behaviour is needed for "scalar" or "image", then the pattern of `WaveformVariable` should be extended to those use cases.

### WaveformVariable
The [`WaveformVariable`](operationsgateway_api/src/functions/waveform_variable.py) class achieves this by defining dunder methods for `__add__`, `__sub__` and so on. Note that for commutative operations like the former, the definition of `__radd__` is trivial, but needs to be explicit for non-commutative operations like subtraction. Generally speaking, all these operations are applied to the y axis of the data, but the x axis is persisted and so the output remains a `WaveformVariable`.

As we use `numpy` for much of the implementation details, we also define functions for `min`, `mean` etc. These will be called so that the syntax for a `WaveformVariable`:
```python
>>> import numpy as np
>>> from operationsgateway_api.src.functions.waveform_variable import WaveformVariable
>>> np.mean(WaveformVariable(x=np.array([1,2,3]), y=np.array([1,4,9])))
4.666666666666667
```
is the same as that of "image" or "scalar":
```python
>>> np.mean(1)
1.0
```
In this case, the output is typically a "scalar"; the applied function reduces the dimensionality of the input. In the process, the x axis data is discarded.

It may be necessary to define additional builtin functions (e.g. `np.median` as `median`) or operators (e.g. `//` as `__floordiv__`), in which case these will also need to be referenced here so that they can be applied to data of the type "waveform".

6 changes: 6 additions & 0 deletions operationsgateway_api/src/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ def __init__(self, msg="Error during handling of experiments", *args, **kwargs):
self.status_code = 500


class FunctionParseError(ApiError):
def __init__(self, msg="Problem with function syntax", *args, **kwargs):
super().__init__(msg, *args, **kwargs)
self.status_code = 400


class ExportError(ApiError):
def __init__(self, msg="Error during creation of export file", *args, **kwargs):
super().__init__(msg, *args, **kwargs)
Expand Down
15 changes: 15 additions & 0 deletions operationsgateway_api/src/functions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from operationsgateway_api.src.functions.builtins.tokens import TOKENS
from operationsgateway_api.src.functions.expression_transformer import (
ExpressionTransformer,
)
from operationsgateway_api.src.functions.type_transformer import TypeTransformer
from operationsgateway_api.src.functions.variable_transformer import VariableTransformer
from operationsgateway_api.src.functions.waveform_variable import WaveformVariable

__all__ = (
ExpressionTransformer,
TOKENS,
TypeTransformer,
VariableTransformer,
WaveformVariable,
)
51 changes: 51 additions & 0 deletions operationsgateway_api/src/functions/builtins/background.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import numpy as np

from operationsgateway_api.src.functions.builtins.builtin import Builtin
from operationsgateway_api.src.functions.waveform_variable import WaveformVariable


class Background(Builtin):
input_types = {"waveform", "image"}
output_type = "scalar"
symbol = "background"
token = {
"symbol": symbol,
"name": "Background",
"details": (
"Calculate the background of a waveform or image. Errors if scalar "
"provided. "
"Implementation (waveform): First, applies smoothing by taking "
"weighted nearest and next-nearest neighbour contributions to y "
"values whose difference from their neighbours is more than 0.2 "
"times the total range in y. The first 25 and last 25 y values in "
"the signal are averaged to give an estimate of the background."
"Implementation (image): The average pixel value in the 10 by 10 region "
"in the top left of the image is returned."
),
}

@staticmethod
def evaluate(argument: "WaveformVariable | np.ndarray") -> float:
"""
Waveform:
First, applies smoothing by taking weighted nearest and next-nearest
neighbour contributions to y values whose difference from their neighbours
is more than 0.2 times the total range in y.
The first 25 and last 25 y values in the signal are averaged to give an
estimate of the background.
Image:
The average pixel value in the 10 by 10 region in the top left of the image
is returned.
"""
if isinstance(argument, WaveformVariable):
Builtin.smooth(argument.y)
if len(argument.y) < 50:
return np.mean(argument.y)
else:
return (np.mean(argument.y[:25]) + np.mean(argument.y[-25:])) / 2
elif isinstance(argument, np.ndarray):
return np.mean(argument[:10, :10])
else:
# The check will fail and raise a TypeError
Background.evaluation_type_check(argument)
147 changes: 147 additions & 0 deletions operationsgateway_api/src/functions/builtins/builtin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
from abc import ABC, abstractmethod

import numpy as np

from operationsgateway_api.src.functions.waveform_variable import WaveformVariable


class Builtin(ABC):
@property
@abstractmethod
def input_types(self) -> "set[str]":
"""The channel types accepted by this builtin"""

@property
@abstractmethod
def output_type(self) -> str:
"""The channel type output by this builtin"""

@property
@abstractmethod
def symbol(self) -> str:
"""The symbol used to represent this builtin in an expression"""

@property
@abstractmethod
def token(self) -> "dict[str, str]":
"""`dict` containing all help and implementation details"""

@staticmethod
@abstractmethod
def evaluate(
argument: "float | WaveformVariable | np.ndarray",
) -> "float | WaveformVariable | np.ndarray":
"""Actually evaluate the builtin on a single numeric argument"""

@staticmethod
def centroid(image: np.ndarray, axis: int) -> int:
"""
Calculates the centre of mass
"""
sums = np.sum(image, axis=axis)
weighted_sums = sums * np.arange(len(sums))
centre_of_mass = np.sum(weighted_sums) / np.sum(sums)
return int(centre_of_mass)

@staticmethod
def calculate_fwhm(waveform: WaveformVariable) -> "tuple[float, float]":
"""
First, applies smoothing by taking weighted nearest and next-nearest
neighbour contributions to y values whose difference from their neighbours
is more than 0.2 times the total range in y. The maximum y value is then
identified along with the x positions bounding the FWHM.
"""
y = waveform.y
Builtin.smooth(y)

y -= np.min(y)
max_y = np.max(y)
half_max_left_i = half_max_right_i = max_i = np.argmax(y)
low_values_left = np.where(y[:max_i] <= max_y / 2)[0]
low_values_right = np.where(y[max_i:] <= max_y / 2)[0]

if len(low_values_left):
half_max_left_i = low_values_left[-1]

if len(low_values_right):
half_max_right_i = low_values_right[0] + max_i

return waveform.x[half_max_left_i], waveform.x[half_max_right_i]

@staticmethod
def smooth(y: np.ndarray) -> None:
"""
Applies smoothing by taking weighted nearest and next-nearest
neighbour contributions to y values whose difference from their neighbours
is more than 0.2 times the total range in y.
Note that this modifies y in place.
"""
diff_tolerance = (np.max(y) - np.min(y)) * 0.2
diff = np.diff(y)

length = len(y)
if length >= 3:
if (
abs(diff[0]) > diff_tolerance
and abs(diff[0] + diff[1]) > diff_tolerance
):
y[0] += (3 * diff[0] + diff[1]) / 6

for i in range(1, length - 1):
if abs(diff[i - 1]) > diff_tolerance and abs(diff[i]) > diff_tolerance:
correction = -2 * diff[i - 1] + 2 * diff[i]
divisor = 7
if i > 1:
correction -= diff[i - 2] + diff[i - 1]
divisor += 1
if i < length - 2:
correction += diff[i] + diff[i + 1]
divisor += 1

y[i] += correction / divisor

if (
abs(diff[-1]) > diff_tolerance
and abs(diff[-1] + diff[-2]) > diff_tolerance
):
y[-1] -= (3 * diff[-1] + diff[-2]) / 6

@classmethod
def type_check(cls, argument_type: str) -> None:
"""Raises a TypeError with a human readable message detailed the
provided and acceptable types to this function.
Args:
argument (str): Argument provided.
Raises:
TypeError: Formatted with input and acceptable types.
"""
if argument_type not in cls.input_types:
raise TypeError(
f"'{cls.symbol}' accepts {cls.input_types} type(s), "
f"'{argument_type}' provided",
)

@classmethod
def evaluation_type_check(
cls,
argument: "float | WaveformVariable | np.ndarray",
) -> None:
"""Raises a TypeError with a human readable message detailed the
provided and acceptable types to this function from an actual value
provided at evaluation.
Args:
argument (float | WaveformVariable | np.ndarray): Argument provided.
Raises:
TypeError: Formatted with input and acceptable types.
"""
types_dict = {
float: "scalar",
WaveformVariable: "waveform",
np.ndarray: "image",
}
cls.type_check(types_dict.get(type(argument), type(argument).__name__))
55 changes: 55 additions & 0 deletions operationsgateway_api/src/functions/builtins/builtins.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from operationsgateway_api.src.functions.builtins.background import Background
from operationsgateway_api.src.functions.builtins.builtin import Builtin
from operationsgateway_api.src.functions.builtins.centre import Centre
from operationsgateway_api.src.functions.builtins.centroid_x import CentroidX
from operationsgateway_api.src.functions.builtins.centroid_y import CentroidY
from operationsgateway_api.src.functions.builtins.falling import Falling
from operationsgateway_api.src.functions.builtins.fwhm import FWHM
from operationsgateway_api.src.functions.builtins.fwhm_x import FWHMX
from operationsgateway_api.src.functions.builtins.fwhm_y import FWHMY
from operationsgateway_api.src.functions.builtins.integrate import Integrate
from operationsgateway_api.src.functions.builtins.rising import Rising


class Builtins:
builtins_dict: "dict[str, Builtin]" = {
Background.symbol: Background,
Centre.symbol: Centre,
CentroidX.symbol: CentroidX,
CentroidY.symbol: CentroidY,
Falling.symbol: Falling,
FWHM.symbol: FWHM,
FWHMX.symbol: FWHMX,
FWHMY.symbol: FWHMY,
Integrate.symbol: Integrate,
Rising.symbol: Rising,
}
tokens = [b.token for b in builtins_dict.values()]

@staticmethod
def get_builtin(builtin_name: str) -> Builtin:
try:
return Builtins.builtins_dict[builtin_name]
except KeyError as e:
raise AttributeError(
f"'{builtin_name}' is not a recognised builtin function name",
) from e

@staticmethod
def type_check(tokens: list, is_evaluation: bool = False) -> str:
builtin_name, argument = tokens
builtin = Builtins.get_builtin(builtin_name)

if is_evaluation:
builtin.evaluation_type_check(argument)
else:
builtin.type_check(argument)

return builtin.output_type

@staticmethod
def evaluate(tokens: list) -> str:
Builtins.type_check(tokens, True)
builtin_name, argument = tokens
builtin = Builtins.get_builtin(builtin_name)
return builtin.evaluate(argument)
Loading

0 comments on commit b94b1fd

Please sign in to comment.