Debugging the whole ecosystem

I was recently exporting a large number of records via CSV, some 130k rows. Luckily, the rows only consisted of three columns.

When testing the output, I noticed that there were only 65355 rows written. A nice round number, being the highest positive integer that can be stored in a 16bit value.

I checked the docs for Ruby’s IO#write, and noted that from IO#write_nonblock, it may be buffered. So stepping through the loop from the working case to the failing:

File.open(path, 'wb') do |f|
  records.each.with_index do |record, index|
    binding.pry if index >= 65354
    f.write(record.csv_row)
  end
end

I could see that it failed to alter the size of the file at path. Manually calling f.flush && f.write(row_data) seemed to make a difference to the file size.

Now I inserted a IO#flush in the output loop every 100 rows or so.

File.open(path, 'wb') do |f|
  records.each.with_index do |record, index|
    f.write(record.csv_row)
    f.flush if index % 100 == 0
  end
end

Still no difference in the output though, loading it into Apple Pages. Still limited to 65355 rows. On a hunch, I opened the file in vim. Lo and behold! 130k lines ready and waiting for me. Going back, I removed the IO#flush call, regenerated the file, and still Pages reports 65355 rows, vim reports 130k lines.

So, moral of the story… don’t rely on pretty software for large data sets? Trust but verify? Commandline wins?

James Cowlishaw @Cowlibob
Mastodon