How to limit the number of rows returned #502

jorinvo · 2025-07-28T06:54:48Z

jorinvo
Jul 28, 2025

Hi, I am trying to limit the number of rows a query can return without modifying the SQL code.

It seems as simple as calling rows.Close() after a certain amount of rows.

But it seems like internally either DuckDB or the driver is loading the whole result set into memory.

I am running the following example script:

func main() {
	dbFile := "shaper.duckdb"
	dbConnector, err := duckdb.NewConnector(dbFile, nil)
	if err != nil {
		panic(err)
	}
	db := sql.OpenDB(dbConnector)
	ctx := context.Background()
	conn, err := db.Conn(ctx)
	if err != nil {
		panic(err)
	}
	rows, err := conn.QueryContext(ctx, "SELECT * FROM prometheus;")
	if err != nil {
		panic(err)
	}
	i := 0
	for rows.Next() {
		var c1 any
		var c2 any
		var c3 any
		var c4 any
		var c5 any
		var c6 any
		var c7 any
		var c8 any
		var c9 any
		var c10 any
		var c11 any
		err := rows.Scan(&c1, &c2, &c3, &c4, &c5, &c6, &c7, &c8, &c9, &c10, &c11)
		if err != nil {
			panic(err)
		}
		fmt.Println(c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11)
		i++
		if i > 10 {
			fmt.Println("Query returned more than", 10, "rows, truncating results.")
			break
		}
	}
	err = rows.Close()
	if err != nil {
		panic(err)
	}
}

The code behaves correctly for small datasets.
But my dataset has 50 million rows and my terminal simply crashes with an OOM error.

Interestingly it doesn't OOM if I only select a single column.

It feels like there must be a way to safeguard against this happening without adding LIMIT to the SQL. I am not even sure what part of the system is loading all the data into memory.

Do you have any ideas what I am doing wrong? Thank you.

jorinvo · 2025-07-28T09:42:39Z

jorinvo
Jul 28, 2025
Author

I tried the same select * query via the duckdb CLI. It also runs out of memory.

Does this mean duckdb always loads the fully result set in memory instead of streaming the result?

0 replies

taniabogatsch · 2025-07-28T10:19:48Z

taniabogatsch
Jul 28, 2025
Collaborator

There are functions in the C API to use a streaming approach instead of materialising the entire result set. In the CLI, we likely don't use them (their C++ equivalents), and the same for go-duckdb. I wouldn't mind switching to a streaming approach, but I expect it to require a bit of work on how to best use the (partially) deprecated functionality in the C API. Maybe including some PRs to duckdb/duckdb reworking bits of it.

1 reply

jorinvo Jul 28, 2025
Author

Thank you. That's good to know. For now it seems more feasible that I find a way to inject a LIMIT clause into SQL queries and reworking those internals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to limit the number of rows returned #502

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to limit the number of rows returned #502

Uh oh!

jorinvo Jul 28, 2025

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

jorinvo Jul 28, 2025 Author

Uh oh!

taniabogatsch Jul 28, 2025 Collaborator

Uh oh!

jorinvo Jul 28, 2025 Author

jorinvo
Jul 28, 2025

Replies: 2 comments 1 reply

jorinvo
Jul 28, 2025
Author

taniabogatsch
Jul 28, 2025
Collaborator

jorinvo Jul 28, 2025
Author