Duplicates are spread over millions of small files.Duplicates are between multiple datasets.Regardless on the data, there are four scenarios that I can think of where deduplication makes sense. This means that there is a minimum dedup gain below which the overhead will simply negate the benefit, even if our hardware is powerful enough. When the benefit from deduplication is minimal, the overhead will not only be in terms of higher memory usage, lower performance, but the DDT will be using disk space in the dozens, if not hundreds, of GBs. It is best to have a good idea of the data to be stored and do some research before enabling dedup by default (i.e. When there is no benefit from deduplication, the overhead will bring the system performance to a grinding halt. It’s only when we have heavy writes on compressed datasets that we need to benchmark and decide which compression algorithm we should use, if any.ĭeduplication is more demanding and a weightier decision to make. The overhead of compression is contained (not storage-wide) and except where there is heavy read/modify/write, it will not have unjustified overhead. Part 5: Deduplication in Practice To Dedup or Not to DedupĬompression almost always offers a win-win situation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |