-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
What would you like to happen?
It seems like currently Beam's anomaly detection module only supports numeric features, e.g. beam.Row(f1=0.1, f2=0.2, ...) - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/anomaly/detectors/pyod_adapter.py#L80
For use cases with vector embedding, e.g. beam.Row(embedding=[0.1, 0.2, ...]), then the pipeline will fail with the following error:
np_batch.append(np.fromiter(row, dtype=np.float64))
TypeError: float() argument must be a string or a real number, not 'list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner