Skip to content

[Feature Request]: AnomalyDetection with non-numeric features #35841

@charlespnh

Description

@charlespnh

What would you like to happen?

It seems like currently Beam's anomaly detection module only supports numeric features, e.g. beam.Row(f1=0.1, f2=0.2, ...) - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/anomaly/detectors/pyod_adapter.py#L80

For use cases with vector embedding, e.g. beam.Row(embedding=[0.1, 0.2, ...]), then the pipeline will fail with the following error:

np_batch.append(np.fromiter(row, dtype=np.float64))
TypeError: float() argument must be a string or a real number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions