1tb ssd drive Basic Principles of SSDs for Distributed Storage

1TB SSD drive Basic Principles of SSDs for Distributed Storage 

ssd 256gb SSD OEM 1tb ssd drive

1TB SSD drive

SSID drive,” SSD (Solid State Drive) performance and capacity continue to break through 1TB SSD drive The price continues to decrease, ushered in rapid development, and is now a very popular storage medium in commercial servers and high-performance storage services. As a developer, you need to understand the basic principles of SSDs so that they can better play to their strengths and avoid their weaknesses when developing. This article is based on the references listed at the end.

SSD was born in the 1970s, the earliest SSD used RAM as a storage medium, but the RAM data will be lost after a power failure, and the price is particularly expensive. Later, flash-based (Flash)-based SSDs appeared, and the data was not lost after Flash powered down, so Flash-SSDs slowly replaced RAM-SSDs, but HDDs already occupied most of the market. By the beginning of this century, with the continuous progress of the manufacturing process, SDD ushered in a significant development, while HDD in the process and technology has been difficult to have a breakthrough, SSD in performance and capacity is still breaking through, I believe that in the near future, SSD in the field of online storage will replace HDD, become the mainstream of software-defined storage (SDS) equipment.

SSD is mainly composed of an SSD controller, a Flash storage array, on-board DRAM (optional), and a HOST interface (SATA, SAS, PCIe, etc.).

the basic storage unit of Flash is a floating-gate transistor, which is divided into NOR type and NAND type according to the manufacturing process. NAND capacity is large, read and write according to Page, and suitable for data storage, basically, the flash of the SSD used for storage is NAND.

Flash works similarly to FETs by using voltage control on and off between the source and drain.

write is to add a positive voltage to the control electrode, causing electrons to enter the floating gate through the insulating layer. Therefore, the write operation cannot suck electrons out of the floating gate, so the overwrite must be erased before it is written.

erase operation is the opposite, which is to apply a positive voltage to the substrate and suck the electrons out of the floating gate.

reads the control gate to determine whether the drain-source is in the on state, and then it can be determined whether the floating gate has stored charge, and then determine whether the memory unit is 1 or 0.

image quoted from https://wantssd.com/

In the second section – flash basics, SSD internal is generally using NAND-Flash as the storage medium, the logical structure is as follows:

 

There are generally multiple NAND-Flash in SSDs, each NAND-Flash contains multiple Blocks, and each Block contains multiple Pages. Due to the nature of NAND, access must be in Page units, that is, at least one Page per reading or writing. Typically, each Page is 4K or 8K in size. Another feature of NAND is that only a single Page can be read and written, and cannot be overwritten to write a Page, if you want to overwrite the write, you must first empty the contents of it and then write. Since the voltage to empty the content is high, it must be in Block units, so when there is no idle Page, you must find a Block that has no valid content, erase it first, and then select the idle Page to write. In theory, it can also be designed to be erased by byte, but the NAND capacity is generally very large, the byte erasure efficiency is low, and the speed is slow, so it is designed to be erased by Block.

the SSD also maintains a mapping table that maintains logical address to physical address mapping. Each time a read or write is made, the physical address can be calculated by directly looking up the logical address, eliminating seek time and rotation time compared to traditional mechanical disks.

from the principle of NAND-Flash, it can be seen that the main difference between it and HDD is:

in the sequential read test, because the positioning data is only needed once, after positioning, it is a process of reading data in large quantities, at this time, the performance gap between HDD and SSD is mainly reflected in the reading speed. HDDs can reach about 200M, while ordinary SSDs are twice as high.

in the random read test, because each read must first locate the data, and then read, the HDD positioning data consumes a lot of time, generally a few milliseconds to more than ten milliseconds, much higher than the SSD positioning data time (generally about 0.1ms), therefore, the random read and write test is mainly reflected in the speed of the two positioning data, at this time, the performance of the SSD is much better than the HDD.

the write of SSD is divided into two types: new write and overwrite write, and the processing process is different.

   

if there are more overwriting write operations, more invalid pages will be generated, similar to disk fragmentation, and the SSD’s GC mechanism is needed to reclaim this part of the space.

Before discussing the GC mechanism, let’s understand that Over-Provisioning means that the SSD actually has more storage space than it can write. For example, the actual space of an SSD is 128G, but the usable capacity is only 120G. Why do you need Over-Provisioning? Consider the following example:

 

As shown in the image above, suppose there are only two Blocks in the system, and there are finally two invalid Page left. At this point, to write a new Page, according to the NAND principle, two invalid Pages must be erased before they can be used for writing. The granularity of the erase is Block, which needs to read the valid data of the current Block to the new Block, if there is no additional space at this time, the erase operation cannot be done, then the two invalid Page can not be used in the end. Therefore, the SSD needs to provide additional space, that is, Over-Provisioning, to ensure the normal operation of the GC.

C process is as follows:

the SSD’s GC mechanism poses two problems:

Frequent GC on some blocks will cause these components to reach the write limit faster than others. Therefore, a Wear-Leveling algorithm is required to make the number of erasures of the original documents more even, thereby prolonging the life of the SSD.

trim directive, also known as Disable Delete Notify, is a technology developed by Microsoft and major SSD vendors, which belongs to the technical instructions of the ATA8-ACS specification.

Trim(Discard) is mainly to improve the efficiency of GC and reduce the occurrence of write amplification, the biggest role is to empty the invalid data to be deleted. When the SSD performs the read, erase, and write steps, the erased steps are done first, so that the performance of the SSD can be played, usually, a large part of the reason why the SSD drops is that there are too many invalid data to be deleted, and the master must first do the emptying processing when writing each time, so the performance is limited.

when deleting a file on the file system, simply mark the location where the data to be deleted is available in the logical data table, rather than really deleting the data on disk. Systems using mechanical hard disks do not need to send any messages about file deletion to the storage device at all, and the system can directly overwrite new data on useless data at any time. SSD only when the system is ready to write new data to that location, SSD realizes that the original write data has been deleted. If the SSD performs GC operations before this, then the GC will migrate and write the data that has actually been deleted to other blocks as valid data, which is not necessary.

without Trim, the SSD cannot know in advance that those ‘deleted’ data pages are ‘invalid’, and must not know that the data can be erased until the system requires data to be written in the same place so that the best optimization cannot be made at the most appropriate time, which affects both the efficiency of the GC (indirectly affecting the performance) and the life of the SSD.

Trim and Discard support, not only SSD to implement this function, but also the file system, RAID control card and SSD involved in the entire data link need to be implemented. To use this feature you must add the discard option to the mount file system. If you manage the SSD bare device yourself, you need to operate it through the ioctl function BLKDISCARD command.

before analyzing Bit-Error, let’s review the basics of a flash chapter. Bit-Error is a silent error for disks. The factors that cause Nand-Error are:

the types of errors caused by different factors are also different:

the longer the resolution time, the more electrons will be leaked by flash’s floating gate, so the higher the bit error rate, so the NAND-Error mechanism is mainly to reduce the Resolution-Error.

read a Page of and, the unselected Page control pole in the Block will add a positive voltage to ensure that the unselected MOS tube is on. Such frequent positive voltage is applied to a MOS tube control pole, which may cause electrons to be sucked into the floating gate, forming a slight Program, resulting in a right shift in the distributed voltage, resulting in a Bit-Error. Note that Read-Disturb only affects other Pages in the same Block.

 

After erasing, all bits are 1, writing 1 does not require a program, and writing 0 requires a program. As shown in the image below, the green Cell is written 0, they need Program, and the red cell writes 1 and does not require Program. We call the green Cell Program Cells and the red Cell Stacked Cells. When writing a Page, a positive voltage (20V is shown in the figure below) is added to the control pole of the WordLine, which is grounded for stringing where the Program Cells are located, and a positive voltage (10V for the Drawing where the Program Cell is located). The end consequence of this is that Stressed Cell will also be slightly programmed. Unlike Read Disturb, Program Disturb affects not only other Pages in the same Block but also its own Page. Similarly, it is an undesirable minor program that causes the bit to flip, which is not permanently damaged, and after erasing, the Block can be used again.

 

can be referred to as Classical Dual-pool algorithm – efficient Wear Leveling

 

edited at 15:47 SSD 256GB SSD OEM 1TB SSD drive