Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot bulk insert from one local table to another local table if one has an identity #7887

Open
nick-oehmen opened this issue Feb 5, 2025 · 3 comments

Comments

@nick-oehmen
Copy link

Hello! While implementing Citus distributed and reference tables, we encountered an issue inserting data from one local table to another when the target local table has an identity. Repro steps below:

Citus version: Citus 12.1.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

--create a test schema
create schema test;

-- add our first local table
create table test.local1(
    id text not null primary key
);

-- add our reference table
create table test.reference1(
    id int not null primary key,
    reference_col1 text not null
);

-- apply the reference table type
select create_reference_table('test.reference1');

-- add our second local table
create table test.local2(
    id int not null generated always as identity,
    local1fk text not null,
    reference1fk int not null,
    constraint loc1fk foreign key (local1fk) references test.local1(id),
    constraint reference1fk foreign key (reference1fk) references test.reference1(id),
    constraint testlocpk primary key (id)
);

-- create a distributed table
create table test.distributed1(
    id int not null generated always as identity,
    distro_key int not null,
    constraint testdistro1pk primary key (id, distro_key)
);

-- apply distribution on the distro key
select create_distributed_table('test.distributed1', 'distro_key');

-- add our first local table values
insert into test.local1(id) values ('aaaaa'), ('bbbbb'), ('ccccc');

-- add our reference data
insert into test.reference1(id, reference_col1) values (1, 'test'), (2, 'test2'), (3, 'test3');

-- add data to our distributed table
insert into test.distributed1(distro_key) values (1), (1), (1), (2), (2), (3);

-- we confirm that we have two local tables and one reference table
select
    a.table_name,
    b.citus_table_type
from information_schema.tables a
    left join citus_tables b on b.table_name::text = concat(a.table_schema::text, '.', a.table_name::text)
where a.table_schema = 'test'
	and a.table_type = 'BASE TABLE'
order by a.table_name

-- bulk insert from our first local table into our second
insert into test.local2(local1fk, reference1fk)
select id, 1 from test.local1

-- why does this fail?  invalid string enlargement request size: -4

At first I assumed this might be an unsupported operation, but I tried it again in the exact same scenario in which the local table has a serial PK rather than an identity, and did not receive the error. As a result, I suspect this is a bug related to support of identities.

Successful test with serial instead:

--create a test schema
create schema testserial;

-- add our first local table
create table testserial.local1(
    id text not null primary key
);

-- add our reference table
create table testserial.reference1(
    id int not null primary key,
    reference_col1 text not null
);

-- apply the reference table type
select create_reference_table('testserial.reference1');

-- add our second local table
create table testserial.local2(
    id bigserial,
    local1fk text not null,
    reference1fk int not null,
    constraint loc1fk foreign key (local1fk) references testserial.local1(id),
    constraint reference1fk foreign key (reference1fk) references testserial.reference1(id),
    constraint testlocpk primary key (id)
);

-- create a distributed table
create table testserial.distributed1(
    id bigserial,
    distro_key int not null,
    constraint testdistro1pk primary key (id, distro_key)
);

-- apply distribution on the distro key
select create_distributed_table('testserial.distributed1', 'distro_key');

-- add our first local table values
insert into testserial.local1(id) values ('aaaaa'), ('bbbbb'), ('ccccc');

-- add our reference data
insert into testserial.reference1(id, reference_col1) values (1, 'test'), (2, 'test2'), (3, 'test3');

-- add data to our distributed table
insert into testserial.distributed1(distro_key) values (1), (1), (1), (2), (2), (3);

-- we confirm that we have two local tables and one reference table
select
    a.table_name,
    b.citus_table_type
from information_schema.tables a
    left join citus_tables b on b.table_name::text = concat(a.table_schema::text, '.', a.table_name::text)
where a.table_schema = 'testserial'
	and a.table_type = 'BASE TABLE'
order by a.table_name

-- bulk insert from our first local table into our second
insert into testserial.local2(local1fk, reference1fk)
select id, 1 from test.local1

-- why did this work, but an identity did not?
@colm-mchugh
Copy link
Contributor

@nick-oehmen we need to investigate and fix the error you are hitting in your first test, but as a workaround could you omit the reference1fk constraint from the CREATE TABLE test.local2 statement and add it after test.local2 has been populated with the INSERT from test.local1 ?

@nick-oehmen
Copy link
Author

Thank you for the response @colm-mchugh! The example I provided is a pretty simple one just to demonstrate the issue, the actual issue is occurring in a much larger DB as we're testing implementing distribution prior to pushing to the live production DB. The bulk insert operations occur throughout the DB in stored procedures and there's no way to prevent them from running after distribution is implemented, if anything this may just delay our implementation of table distribution until this issue is fixed. Do you happen to know a timeline for resolution?

@colm-mchugh
Copy link
Contributor

Do you happen to know a timeline for resolution?

We will aim to fix this in Citus 13.0.1, so approx. 6-8 weeks from now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants