lukaszkrzyz13 Jun 2026 14:24

Exporting a couple million rows to a file from a Laravel command. I have seen three approaches recommended: chunk, chunkById, and cursor. They all claim to be the memory-safe choice, and I have been burned before by chunk skipping rows when the underlying data changes mid-iteration. I want to pick one deliberately and know the failure mode I am accepting.

What actually differs between chunkById and cursor in memory behavior and in correctness under concurrent writes, and which do you reach for on a big export?

Replies (5)
marcoviola13 Jun 2026 14:54

chunk paginates with offset under the hood, so if rows are inserted or deleted while you iterate, your offsets shift and you can skip or double-process rows. chunkById fixes that by paging on a where id greater than the last seen id, which is stable under inserts because it does not rely on offset. For an export where data may change during the run, chunkById is the correct default. Never use plain chunk on a mutating table.

0
ivan_morozov13 Jun 2026 16:24

cursor is a different mechanism: it uses a PHP generator and a single query, yielding one model at a time, so PHP memory stays flat. But the whole result set is still being streamed from the database over one connection held for the entire export, and the server is materializing that result. cursor is great for memory but it ties up a connection for the full duration and does not give you the restartability that id-based paging does.

0
roman_ch13 Jun 2026 17:54

The restartability point is underrated. With chunkById you know the last id you processed, so if the export dies at row 1.4 million you resume from there. With cursor a failure means starting over. For a multi-million row export that can take real time, the ability to checkpoint the last id and resume is often worth more than the marginal memory difference. We log the last id every chunk for exactly this.

0
lukaszkrzyz13 Jun 2026 19:24

Restartability decides it for me, this export runs long enough that a mid-run failure starting over is unacceptable. chunkById with the last processed id checkpointed to a small state row it is. Good to have the chunk-skips-rows behavior confirmed as an offset problem rather than something I was doing wrong, that bug cost me a weekend once.

0
simondev13 Jun 2026 21:04

One combine tip: use chunkById for the stable paging but inside each chunk still write to the output stream immediately rather than buffering, and call unset or let the chunk go out of scope so models are freed. People pick chunkById then accidentally accumulate the rows into one big array to write at the end, reintroducing the memory problem they were avoiding. Stream out per chunk and memory stays flat and the run is resumable.

0
Write a reply
Markdown. ```php blocks are runnable.