Skip to content

Commit

Permalink
Update tables.jl (#20)
Browse files Browse the repository at this point in the history
addressed write Vector size issue (17): the first row was taken as the underlying multiplier of the buffer, causing failures should subsequent rows be larger (with at least one >5% larger than the first).

adjusted the strategy to initially find the maximum row size, leveraging this as the buffer multiplier.

a more sensible approach would be to sum the row sizes, to create an exact buffer vector, but I am unaware of potential downstream impacts
  • Loading branch information
djholiver authored Apr 20, 2024
1 parent 7c8e4b1 commit 7265f67
Showing 1 changed file with 13 additions and 7 deletions.
20 changes: 13 additions & 7 deletions src/tables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,23 +80,29 @@ function writewithschema(io, parts, rows, st, sch, dictrow, compress, kw)
pos = 0
else
row, rowst = rowsstate
# calc nbytes on first row, then allocate bytes
# calc nbytes on all rows to find max, then allocate bytes
bytesperrow = nbytes(schtyp, row)
while true
rowsstate = iterate(rows, rowst)
rowsstate === nothing && break
row, rowst = rowsstate
nb = nbytes(schtyp, row)
if nb > bytesperrow
bytesperrow = nb
end
end
rowsstate = iterate(rows)
row, rowst = rowsstate
blen = trunc(Int, nrow * bytesperrow * 1.05) # add 5% cushion
bytes = Vector{UInt8}(undef, blen)
n = 1
nb = nbytes(schtyp, row)
while true
pos = writevalue(Binary(), schtyp, row, bytes, pos, blen, kw)
rowsstate = iterate(rows, rowst)
rowsstate === nothing && break
row, rowst = rowsstate
nb = nbytes(schtyp, row)
if (pos + nb - 1) > blen
# unlikely, but need to resize our buffer
rowslefttowrite = nrow - n
avgbytesperrowwritten = div(bytesperrow, n)
resize!(bytes, pos + (rowslefttowrite * avgbytesperrowwritten))
end
bytesperrow += nb
n += 1
end
Expand Down

0 comments on commit 7265f67

Please sign in to comment.