Skip to content

Conversation

didipayson
Copy link

@didipayson didipayson commented Sep 30, 2025

Add support for nested arrays in Databricks for odcs v3 importer

  • Tests pass
    Ran 'datacontract test' on an odcs yaml for an actual databricks table with fields having:
- name: field1
  logicalType: array
  physicalType: array
  items:
    physicalType: array
    logicalType: array 
    items:
      physicalType: array
      logicalType: array
      items:
        physicalType: int
        logicalType: integer
- name: field2
  logicalType: array
  physicalType: array
  items:
    logicalType: array
    physicalType: array
    items:
      logicalType: string
      physicalType: string
  • ruff format
  • README.md updated (if relevant)
  •  CHANGELOG.md entry added

Notes:

  • Also added support for array of object. No actual databricks table with a field of this type to test with, but just checked the type built by the importer is something like: ARRAY<STRUCTid:string,zip:string>
  • if the logical type of the item does not map to a datacontract mapping type, defaults to string

@didipayson didipayson changed the title Odcs v3 importer databricks nested arrays support odcs v3 importer databricks nested arrays support Sep 30, 2025
@jochenchrist
Copy link
Contributor

Do you really have an array within an array?

- name: field2
  logicalType: array
  physicalType: array
  items:
    logicalType: array
    physicalType: array
    items:
      logicalType: string
      physicalType: string

If you expect a structure like this:

ARRAYSTRUCTid:string,zip:string

      - name: some_array_of_records
        logicalType: array
        items:
          logicalType: object
          properties:
            - name: id
              logicalType: string
            - name: zip
              logicalType: string

@didipayson
Copy link
Author

didipayson commented Oct 17, 2025

Do you really have an array within an array?

- name: field2
  logicalType: array
  physicalType: array
  items:
    logicalType: array
    physicalType: array
    items:
      logicalType: string
      physicalType: string

If you expect a structure like this:

ARRAYSTRUCTid:string,zip:string

      - name: some_array_of_records
        logicalType: array
        items:
          logicalType: object
          properties:
            - name: id
              logicalType: string
            - name: zip
              logicalType: string

The data_type of the Databricks table column is: array<array<string>>
Yes, it is an array within an array

We have another column where data_type is 3 array levels: array<array<array<int>>>
It is an array within an array within an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants