Skip to content

CSVRecord failing when field/header is missing #5

Open
@enriquedelpino

Description

@enriquedelpino

Hi all,

Us in Graph.Build have been users of your project for quite a long time already. We are working on a use case, where a mapping transformation for a CSV file expects to find a determinate field (defined by a column header), but in practicality, we cannot assume all the CSV files being transformed are complete in definition. They might have a column missing, as this column might be optional.

In these scenarios, we would like to carry on with the transformation even if it's partial, but under the current code, we see the following lines in the get method on CSVRecord, cause the whole transformation to break.

if (!this.data.containsKey(toDatabaseCase)) { throw new IllegalArgumentException(String.format("Mapping for %s not found, expected one of %s", toDatabaseCase, data.keySet())); }

We believe this code should be more resilient, log out the missing field, but still return an empty list, to avoid stopping the transformation running.

` @OverRide
public List get(String reference) {
String toDatabaseCase;
if (this.data.containsKey(reference.toUpperCase())) {
toDatabaseCase = reference.toUpperCase();
} else if (this.data.containsKey(reference.toLowerCase())) {
toDatabaseCase = reference.toLowerCase();
} else {
toDatabaseCase = reference;
}
if (!this.data.containsKey(toDatabaseCase)) {
logger.warn(String.format("Mapping for %s not found, expected one of %s", toDatabaseCase, data.keySet()));
return List.of();
}
String obj = this.data.get(toDatabaseCase);

    if (obj == null) return List.of();
    return List.of(obj);
}`

I've been comparing the current definitions of the different Record implementations (JSON, XML, Excel, etc.) to the older class definitions, for instance in your old RML Mapper v 5.0, and it seems this same error was wide spread among the other record types, but it has been addressed in the new dataio project, which leads me to believe, my proposition actually aligns to your general position in the rest of record types when it comes to a missing field on the input file.

What are your thoughts on this?

Kind Regards,
Enrique

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions