You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to enhance the current data harvesting system to support non-federal data sources. This involves updating the harvest_source table to include additional fields that differentiate between federal and non-federal data sources, or alternatively, using the existing schema_type field to manage this distinction.
How to reproduce
When harvesting non-federal data sources, such as NYC Data.json, validation errors occur, preventing all records from being harvested.
Expected behavior
The bureauCode field should not be validated when processing non-federal data.
Actual behavior
validation error: <ValidationError: "'bureauCode' is a required property">
Sketch
Modify the harvest_source table to include a field indicating whether the source is federal or non-federal.
Update the harvester process to support the processing of non-federal data.
The text was updated successfully, but these errors were encountered:
We have a non-federal schema, but the JSON Schema version is super old. @rshewitt went through the process of upgrading the federal-v1.1 version and it is in the datagov-harvester here. I examined the differences between the federal and non-federal on the old version; they mostly consist of allowing REDACTED in the federal version, and some namespace convention changes that are mostly unnecessary.
I would propose going from the updated federal version, and making things not required using best judgement from the DCAT-US spec documentation. For example, while bureauCode is listed in the summary area as being "required", in the details, you'll see it is Yes, for United States Federal Government agencies. So for non-federal validation, this should be optional.
We need to enhance the current data harvesting system to support non-federal data sources. This involves updating the harvest_source table to include additional fields that differentiate between federal and non-federal data sources, or alternatively, using the existing schema_type field to manage this distinction.
How to reproduce
When harvesting non-federal data sources, such as NYC Data.json, validation errors occur, preventing all records from being harvested.
Expected behavior
The bureauCode field should not be validated when processing non-federal data.
Actual behavior
validation error:
<ValidationError: "'bureauCode' is a required property">
Sketch
The text was updated successfully, but these errors were encountered: