Skip to content

Batch update with some invalid values fails for all rows #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vadz opened this issue Jan 28, 2025 · 9 comments
Open

Batch update with some invalid values fails for all rows #89

vadz opened this issue Jan 28, 2025 · 9 comments

Comments

@vadz
Copy link

vadz commented Jan 28, 2025

Suppose there is the following table in the database:

create table t(x integer check (x<100));

and consider a program which does something like the following (simplified):

constexpr int ARRAY_SIZE = 2;
SQLSetStmtAttr(h, SQL_ATTR_PARAMSET_SIZE, (void *)ARRAY_SIZE, 0);

SQLINTEGER values[ARRAY_SIZE] = { 1, 101 }; // First value is valid, second one is not.
SQLLEN inds[ARRAY_SIZE];
SQLBindParameter(h, 1, SQL_PARAM_INPUT, SQL_C_INTEGER, SQL_INTEGER, 0, 0, values, 0, inds);

SQLUSMALLINT status[ARRAY_SIZE];
SQLSetStmtAttr(h, SQL_ATTR_PARAM_STATUS_PTR, status, 0);

SQLPrepare(h, "insert into t(x) values(?)", SQL_NTS);
SQLRETURN rc = SQLExecute(h);

When using SQL Server with either Microsoft ODBC driver or FreeTDS, rc is SQL_SUCCESS_WITH_INFO and status array is filled with { SQL_PARAM_SUCCESS, SQL_PARAM_ERROR } which is nice because it allows the application to determine which row(s) contained values that resulted in an error.

When using PostgreSQL ODBC driver (v13.02, but from examining Git history it doesn't look like there have been any changes here even in the latest version), rc is SQL_ERROR and status contains SQL_PARAM_ERROR for both elements, which doesn't provide any useful information.

Is this behaviour intentional and, if not, would it be possible to change it to be more consistent with other ODBC drivers and, also, more useful?

@davecramer
Copy link
Contributor

So if I understand correctly, you would like the insert for the first to succeed and the second to fail.
Since the insert for both is a single transaction when the first fails the second fails as postgres does not commit the transaction.
This is by design.

see https://www.postgresql.org/docs/current/tutorial-transactions.html for more details

Dave

@vadz
Copy link
Author

vadz commented Feb 1, 2025

Not quite, I'm perfectly fine with the insert failing and actually think this is more correct than doing a partial insert as some other databases/ODBC drivers do. But it would be great to have some information about which rows resulted in an error and this is something that SQL_ATTR_PARAM_STATUS_PTR is supposed to provide, but this driver returns SQL_PARAM_ERROR in it even for the first row, which would have succeeded.

I.e. the problem is not (so much) returning SQL_ERROR instead of SQL_SUCCESS_WITH_INFO (although this is inconsistent with the other ODBC drivers I've tried), but not providing any useful information in the status array.

@davecramer
Copy link
Contributor

Unfortunately the server does not provide that information. It doesn't know that the first one succeeded since it was all done in one insert.

insert into testtab1 values (1,'blah');
ERROR:  duplicate key value violates unique constraint "testtab1_pkey"
DETAIL:  Key (id)=(1) already exists.

While we do get the error message, we don't know if it is the 1st or 2nd value that failed

@vadz
Copy link
Author

vadz commented Feb 2, 2025

Just trying to understand: how does the driver manage to send multiple parameters to the server? From looking at the code it seems like it does PQsendQuery() for each row in a loop, but I must be missing something...

@davecramer
Copy link
Contributor

Can you show me the code you are thinking it is using.
Without looking at the code I would think it does

insert into testtab1 (id) values (1),(2);

@vadz
Copy link
Author

vadz commented Feb 2, 2025

Can you show me the code you are thinking it is using.

I've been looking at the part of PGAPI_Execute() between

next_param_row:
and

psqlodbc/execute.c

Lines 1241 to 1244 in 1fbc3b2

if (!exec_end)
{
goto next_param_row;
}
but I could easily be wrong as I mostly just looked at the label name. I probably should understand how to enable tracing in this code to see what it's really doing during tun-time...

Without looking at the code I would think it does

insert into testtab1 (id) values (1),(2);

Interesting, wouldn't this run into some kind of limitation (number of parameters or total length of the command) for even relatively small amounts of data?

@davecramer
Copy link
Contributor

it does, but not as much as one would think. It will handle quite a few values.

@davecramer
Copy link
Contributor

I would just enable logging on the server to see how it does this.

@vadz
Copy link
Author

vadz commented Feb 5, 2025

I would just enable logging on the server to see how it does this.

Good point, thanks, this is indeed even simpler. And what I see is that the driver (at least 13.02 version from Debian Bookworm) just sends multiple semicolon separated statements to the server in a single request, e.g. for my test I see the following

LOG:  00000: statement: insert into soci_test(val) values('1');insert into soci_test(val) values('a')
ERROR:  22P02: invalid input syntax for integer: "a" at character 74

If the "at character N" part is always present (but I'm not at all sure about this), it looks like it would be possible to find the row which resulted in the error by remembering the positions of the statement boundaries for each row.

But now I also wonder if this is really the most efficient way to do bulk inserts/updates with Postgres, at the very least repeating the entire query many times like this should consume much more bandwidth than preparing it once/reusing it it for all rows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants