We often need to iterate over the database rows in our migrations. When dealing with millions of records, basic iteration techniques doesn’t work well because each loaded object is consuming system memory and it issues at least one database query per object to load.
Solution 1
The problem with this version is that it will load all 4000216 objects into memory, all memory will be consumed and it will start to use disk swap and it will take hours to complete.
We can optimize it a little bit by specifying select parameter in our query:
Solution 2
Version 2 still loads all objects in memory but selects only id field.
We need to avoid loading all objects in the memory, we’ll iterate over collection and we’ll load only current object.
Solution 3
Version 3 it’s OK, but it can be speed up by loading objects in batch not just one by one.
Solution 4
Examining the log:
We have loaded 100 objects with one query. Solution 4 seems to be the best solution to iterate over large data sets as it uses less memory with fewer SQL requests.
Update:
Mitchell proposed a better solution to use ActiveRecord’sfind_in_batches method. DHH commited this feature on February 23, 2009 that permits iterating over large data sets in batches:
Solution 5
No comments:
Post a Comment