On the necessity of hot and cold data identification to reduce the write amplification in flash-based SSDsOn the necessity of hot and cold data identification to reduce the write amplification in flash-based SSDs
Faculty of Sciences. Mathematics and Computer Science
Modeling Of Systems and Internet Communication (MOSAIC)
Performance evaluation. - Amsterdam
82(2014), p. 1-14
University of Antwerp
The write performance and life span of a solid state drive is greatly influenced by the garbage collection algorithm. This algorithm selects the data blocks to be erased which can be subsequently used for storing new data. Any valid data left on a selected block needs to be written elsewhere before the block can be erased and contributes to the so-called write amplification. As all of the data on a solid state drive is not accessed equally often, data identification techniques have been proposed that identify the more frequently accessed, called hot, from the less frequently accessed, termed cold, data. These data identification techniques have been shown to be quite effective in reducing the write amplification essentially by using different blocks to store the hot and cold data, but they also contribute to the complexity of the device. Write approaches that use different blocks for writes triggered by the operating system and writes triggered by the garbage collection algorithm have also been proposed. These approaches do not require a data identification technique and thus simplify the design of the device, while also reducing the write amplification. In this paper we compare the performance of such a write approach with write approaches that do rely on data identification using both mean field models and simulation experiments. The main finding is that the added gain of identifying hot and cold data is quite limited, especially as the hot data gets hotter. Moreover, the write approaches relying on hot and cold data identification may even become inferior if either the fraction of data labeled hot is not ideally chosen or if the probability of having false positives or negatives when identifying data is substantial (e.g. 5%5%).