Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during null-handling on TupleValue #230

Open
fee-mendes opened this issue Jan 23, 2025 · 1 comment
Open

Exception during null-handling on TupleValue #230

fee-mendes opened this issue Jan 23, 2025 · 1 comment

Comments

@fee-mendes
Copy link
Member

fee-mendes commented Jan 23, 2025

Consider the following schema and table to migrate:

CREATE TYPE new_social.post_reply (
    parent text,
    root text
);

CREATE TYPE new_social.embedding_blob (
    cid text,
    mime_type text,
    size bigint
);

CREATE TYPE new_social.embed_external (
    description text,
    thumb frozen<embedding_blob>,
    title text,
    uri text
);

CREATE TYPE new_social.embed_media (
    kind text,
    alt text,
    blob frozen<embedding_blob>,
    aspect_ratio frozen<tuple<int, int>> -- Problem is here
);

CREATE TYPE new_social.embeddings (
    media frozen<set<frozen<embed_media>>>,
    external frozen<embed_external>,
    record text
);

CREATE TABLE new_social.post (
    author text,
    created_at timestamp,
    content text,
    embed embeddings,
    id text,
    labels set<text>,
    language set<text>,
    reply post_reply,
    tags set<text>,
    PRIMARY KEY (author, created_at)
);

This schema gets inferred by Spark as the following dataframe:

25/01/23 13:43:36 INFO Scylla: root
 |-- author: string (nullable = false)
 |-- created_at: timestamp (nullable = false)
 |-- content: string (nullable = true)
 |-- embed: struct (nullable = true)
 |    |-- media: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- kind: string (nullable = true)
 |    |    |    |-- alt: string (nullable = true)
 |    |    |    |-- blob: struct (nullable = true)
 |    |    |    |    |-- cid: string (nullable = true)
 |    |    |    |    |-- mime_type: string (nullable = true)
 |    |    |    |    |-- size: long (nullable = true)
 |    |    |    |-- aspect_ratio: struct (nullable = true)
 |    |    |    |    |-- 0: integer (nullable = true)
 |    |    |    |    |-- 1: integer (nullable = true)
 |    |-- external: struct (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- thumb: struct (nullable = true)
 |    |    |    |-- cid: string (nullable = true)
 |    |    |    |-- mime_type: string (nullable = true)
 |    |    |    |-- size: long (nullable = true)
 |    |    |-- title: string (nullable = true)
 |    |    |-- uri: string (nullable = true)
 |    |-- record: string (nullable = true)
 |-- id: string (nullable = true)
 |-- labels: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- language: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- reply: struct (nullable = true)
 |    |-- parent: string (nullable = true)
 |    |-- root: string (nullable = true)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)

As aspect_ratio is a frozen<tuple<int, int>>, there may be situations where it is null. However, spark aborts the migration task with:

25/01/23 13:43:39 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) (172.19.0.201 executor 0): com.datastax.spark.connector.types.TypeConversionException: Cannot convert object null to com.datastax.spark.connector.TupleValue.
@fee-mendes
Copy link
Member Author

Ugly workaround: Use a FROZEN<map<text, int>> instead (or a set, though in my case a map makes more sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant