I have a large collection in Mongo. Around 1.7 billion records that take up around 5TB of storage space. I no longer need to keep this data indefinitely so I'm looking at options for getting rid of most of the data, preferably based on "createdAt".
I'm wondering what to expect if I add a ttl index to only keep records around for a month at the most. I have the following index currently:
{"v" : 1,"key" : {"createdAt" : 1},"name" : "createdAt_1","ns" : "someNS.SomeCollection","background" : true}
How quickly would mongo be able to delete all that data? From what I've read, the ttl process runs every 60 seconds. How much data does it delete each time around?
Best Answer
Adding a TTL index to a large collection like that can really impact performance. If you need to continue querying this collection while creating the TTL, you might consider initially creating the TTL index far in the past so that no documents would actually be expired. Once an index has been created with a TTL, you can later adjust how long documents are meant to stay around for.
Once you've created that index, you can either manually run queries to delete the old data until you're close to up-to-date and able to adjust the TTL, or bump up the TTL slowly so that you're able to control the performance impact.
(Source: advice from mlab on adding a TTL to a 1TB collection. If you don't need to maintain access to data while removing old documents, completely ignore this advice)
Timing of the Delete Operation
When you build a TTL index in the background, the TTL thread can begin deleting documents while the index is building. If you build a TTL index in the foreground, MongoDB begins removing expired documents as soon as the index finishes building.
The TTL index does not guarantee that expired data will be deleted immediately upon expiration. There may be a delay between the time a document expires and the time that MongoDB removes the document from the database.
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workloadof your mongod instance, expired data may exist for some time beyondthe 60 second period between runs of the background task.