1TB SSD drive DDR3 Debugging Essentials Guide

1TB SSD drive DDR3 Debugging Essentials Guide 

hard drives SSD OEM1TB SSD drive DDR3 Essential Guide

this article belongs to the author’s original 1TB SSD drive, if reproduced 1TB SSD drive, please indicate the author and source!

The ignorant teenager who was also DDR3 before, due to project needs, and work needs, had the privilege of in-depth study of DDR3, the middle also did go through a variety of blind stages, querying data, establishing engineering, debugging errors, etc., now dare not say that it is proficient, can only be said to be the basic introduction, the purpose of writing this article is nothing more than to let those beginners like me take fewer detours, but also only when it is to throw bricks and jade, I hope that the gods can give more advice! Thank you in advance, the following is also in order not to let everyone look so boring, will also be appropriate to change the way, but also hope that everyone can give a few thin noodles, the gods more pointers, small rookies to learn well, after all, it is also hand-beaten materials, hey hey, the following into the main topic ~

As a beginner, I think it is best to do the following three aspects: material preparation and learning, engineering establishment and commissioning, and daily summary and recording.

Some people will ask who knows to learn the material, you also guessed that I want to say the manual, nonsense, fools know that there is no manual to get rid of, of course, not only the manual if you are the first contact, it is best to download a few related academic papers, what, do not know where to find?! No, Baidu, I am on the 1TB SSD drive up and down (I Xidian, there is no sister alumnus, hey hey, the school is free under), simply put, download a few related papers, you can understand the relevant structural process, you should want to debug what to find related (I was stupid x, adjust DDR3, the next few DDR2, of course, can be learned, but the heart is affected ah), And of course the most important official datasheet, without it, want to get DDR3? Wash and sleep ~ Of course, in the experimental platform, I will not say, if the conditions are simple, or the pocket is shy, soft imitation can also be, how? Look down on soft imitations. No money to study is not allowed? I was the scalp soft imitation for a week, but the pit daddy’s laboratory has a board, I am stunned that I don’t know, and then I found a teacher to take her, so I got on her ~ ~ ~ Good, pull away, there is a board, the schematic of the board also has to have it, the materials can be, at least you can have something to do!

and then it is to establish engineering and engineering debugging, for beginners online will say first look at the manual, build an excuse, run a run, and understand the next routine, yes, this step must be done, and to do it seriously! But you may find that the final result may not be as complicated as the example…, let’s elaborate, engineering debugging is a very boring but very arduous process, its importance, I will not say here, I feel unimportant and it is time to sleep again

Finally, it is a daily summary and record, such as a diary, but it is not, this is considered public, and your sister wants to see it. You don’t give? ~~~ Just kidding, I’ve always thought it’s a good habit, to keep a record of some mistakes made every day, summarizing the causes and solutions, which will be a fortune! Of course, people who may not have been used to it, at the beginning did not know what to record, I was the same at the beginning, I just wanted to write something, write to know, now do not write are uncomfortable ~ ~ ~ This is a disease, but do not have to treat. Simply put, some experimental phenomena, or daily simulation results and analysis summaries, etc., can be seen by yourself, even if the boss asks you to write a report, you also have materials, right? Grab your pen and knock on your keyboard.~~~Follow my left hand and right hand…

The time is now for real knives and guns to dry hard, you can’t run!

material preparation and learning

First, briefly introduce my preparation, and then briefly introduce the relevant principle knowledge of DDR3, as a professional document, this part of the content must have, and must have! I use the X home Ofrtex-6 FPGA, DDR3 is used by Micron, the paper is a few, the name I do not list, check it yourself, below I briefly introduce the DDR3, of course, pick and engineering-related, other to check the information.

Arithmetic, stick a picture, who let me be diligent and love the people it ~ ~ ~ The focus has been marked out, you can move your fingers, some of the documents I will not say more, see for yourself, introduce a few other keywords, very useful!


DDR(Double Data Rate

SDRAM), that is, double-rate synchronous dynamic random access memory, which means that the data will be sampled by the rising and falling edges of the clock, relative to the rising edge of the clock, which is equivalent to doubling the sampling clock frequency. DDR3 SDRAM greatly improves system performance while reducing system power consumption, for the reason that “fly-by” and dynamic on-chip matching techniques have a significant effect on signal integrity.


1, logical Bank

The storage unit that makes up DDR3 is called a logical bank, in which a row is specified first, and then a column is specified, which can be accurately located to the desired storage location, which is the basic principle of DDR3 addressing, at present, DDR3 is an 8bank design

2, physical bank

This is a term related to the storage subsystem, not for the memory chip, the Northbridge chip on the PC is used to control the data exchange between the memory and the CPU, to efficiently transmit data, the Northbridge chip is the data bit width of the memory bus is equivalent to the bit width of the CPU data bus, this bit width is called the physical Bank (also known as Rank), the current bit width is 64bit, Each memory particle has a bit width of 8 bits, and to meet the 64-bit width required by Rank, 8 memory particles are needed to be composed in parallel.


3, line the activation command


Before reading/writing access to data in a bank in DDR3, the row where the data in the bank is located must first be activated and once activated, the row will remain active until a precharge command is sent to DDR3. The send line activates the imperative, and the bank address is issued at the same time as the corresponding line address; After the line activation command is sent, the column address addressing command is then sent with the specific read/write operation command, and since these two commands are also issued at the same time, a column addressing is generally expressed as a read/write command. The time interval between the effectiveness of the line and the issuance of the read/write command is defined as tRCD, which is an important timing parameter of DDR, and the generalized tRCD is measured in clock cycles, such as tRCD=3, which means that the delay period is 3 clock cycles.


4, read/write commands


DDR3 can send a read/write command to read/write to the line after executing the bank’s line activation command, when sending a read/write command, pin A10 decides whether to allow automatic precharge operation, if precharge is allowed, then the line will be automatically precharged at the end of the read/write command, otherwise, the line will remain active The control logic can continue to read/write to the line

5, data mask


DDR3 uses data masking (DQM) technology to mask unwanted data. By using DQM, the DDR3 controller can indicate the validity of the I/O port data in bytes as the operating unit, of course, when reading DDR3, the masked data will still be read out of memory, but it is masked at the “mask logic unit”


6, a precharge command


After the data reading is complete, to free up the space in the DDR3 readout amplifier for other lines in the same position bank to address and transmit data, the DDR3 chip will execute a precharge command to close the current working line. Pre-charging reloads all storage units in aviation and resets the row address, where A10 decides to pre-charge a bank or all banks. After the pre-charge command, it takes a while to allow the sending line activation command to operate a new working line, an interval called tRP


7, refresh operation


DDR3 requires constant refresh operations to maintain efficient storage of data in the storage unit. Although pre-charge can be refreshed for work in one or all banks, the pre-charge command is related to the operation, the traversal of all storage spaces is not guaranteed, and its operation time is not fixed. The refresh operation, on the other hand, has a fixed operating cycle, operating on all rows at once to maintain all the data in the storage unit. However, unlike precharge, the line of the refresh operation refers to the same line in all banks, and the working row address in each bank in the precharge is not necessarily the same.


there are two types of refresh operations: automatic refresh and self-refresh. For an automatic refresh, during the refresh process, all banks stop working, and after each refresh, DDR3 can enter the normal working state, that is, during the refresh, all work instructions can only wait and cannot be executed. Self-refresh is mainly used for data maintenance in the low-power state of sleep mode when it no longer operates at a high system clock but is refreshed according to the internal clock.


8, burst length/type


Burst refers to the continuous data access of adjacent memory units in the same row, the number of clock cycles of continuous access is the burst length, the burst length is determined by the mode register MR0, and can be set to fixed mode: BC4 or BL8 mode, that is, in the read/write operation room through the A12 to select the burst length 4 or 8. When performing a burst transfer, as long as the starting column address and burst length are specified, DDR3 will automatically read/write access to the corresponding number of subsequent storage units in turn, eliminating the need for the DDR3 controller to continuously provide column addresses. In this mode, in addition to the first data access requiring several cycles, subsequent data can be obtained in only one cycle for each time


9, mode register


the mode register is used to set the working mode of DDR3 SDRAM, and currently supports 4 mode registers, namely MR0, MR1, MR2, and MR3, of which MR0 is used to set the basic operating mode of DDR3, including burst length, burst type, CAS delay, and DLL reset. MR1 is used to store DLL enable, output drive length, Rtt_Nom, extra length, write level enable, etc. MR2 is used to store the characteristics of the control update, the Rtt_WR impedance, and the CAS write length. MR3 is used to control MPR.


Xilinx provides MIG tools for high-speed memory interface solutions, MIG tools in the DDR3 SDRAM controller designed to run the user in V6 and other devices through the user interface to quickly establish FPGA internal control logic and external memory access connection, DDR3 SDRAM memory interface includes user interface modules, storage controllers, physical layer modules, as well as local interfaces and physical layer interfaces.


The following important things are reiterated to clarify the following: tRCD (RAS to CAS delay), CL (CAS Latency, CAS latency/RL, only on read), CWL (CAS).

Write Latency), additional delay AL, BL (Burst Length, DDR3 prefetch is 8 bit, so BL is fixed to 8), BC (DDR3 adds a 4 bit

Burst Chop Burst Mutation).


There is a workflow material with DDR3, given the length, I will not mention it, and I should also look at it myself.


2. Engineering establishment and commissioning


first, create a kernel

in CORE Generator If you have very good parameters, according to your parameters to establish, if you do not have chip information, but there are predecessors of the test project, you can also, in the project directory to find the datasheet .txt file open, which has the previously established the relevant configuration of the core, of course, if you establish the core, will also generate such a file, the configuration is also inside, if you have nothing, just want to run a simulation, you can! Then acquiesce all the way, or arbitrarily configure, and it is not illegal anyway.


After the construction is completed, the task comes, if you are an old fritter, I don’t need to say more, if you are a novice you follow the process of creating a simulation in the manual? Forget me, the final result of the MIG tool is stored in a folder called <component name>, < component name > folder contains 3 folders: docs: contains the relevant documents in the design. example_design: Available in simulation as well as hardware testing, in addition to the controller design, the example_design includes synthesizable test files that generate read-write commands and compare logic to verify the consistency of the written data with the read data. user_design: Contains only the control design, allowing users to design application logic (test files) based on their needs.


Want to analyze directly, you can! Add all the files under ipcore_dir\ddr3\example_design\rtl to your new project, and

in the ipcore_dir\ddr3\example_design\sim directory

Of course, if you are accustomed to using models, you can follow the instructions in the datasheet, I am a little lazy, just added directly, the following can be simply configured under the top-level parameters, you can simulate faster, set the SIM_BYPASS_INIT_CAL to FAST, you clicked to run it? Why not?! Let the program run, the following is our project, there is a traffic generator in the example_design, simply put, it is to generate data, and finally receive data to you, this part is also the user can edit to delete the modification, but also the difference with the user_design, user_design as the name suggests, the user’s design, so the traffic generator is, of course, self-defined, other the same, for example,_ Design simulation I will not analyze specifically, you can see for yourself, I still say something practical!


example_design just so that you can understand the work of DDR3, the actual application or to use the user_design, and on this basis to modify, specific modifications and other details of the engineering problems, introduced in the summary, the following to the example_design engineering for some analysis:

Data generated by the traffic generator, through the user interface part, memory control part, physical layer part transmitted to the external DDR3 memory, data reception and vice versa, the specific file is what to do, see the manual. Anyway, I did not change the program in the Memc_ui_top, but only changed the parameters at the top level.


The following is the implementation block diagram in the user_design, all we have to do is to build our own user_design

You may say, there is no basis at all, how to build? Or the manual

There are several timing diagrams in the manual, there are BL=4, etc., the role of the specific signal and the input and output relationship, a causal relationship I will not say, the manual has a description of the signal line, a look at it to understand, we control that is written following these timing diagrams, the above finger is an example, for reference only.


three, daily summary and record


The last part is not as boring as the previous two parts, the following is mainly my daily summary, mainly including specific engineering implementation and simulation analysis, as well as problems encountered and solutions. This part has a high gold content, I try to write it in detail, and everyone also seriously appreciates it.


parameter settings:


input clock: 125M (this clock is a crystal oscillator clock connected to an external chip, and the internal use requires frequency doubling division).


master clock: 400M (obtained by input clock multiplier).


burst length: 8 (4 was used during the period, the specific reason, the following).


a block diagram below illustrates the bit-width clock relationship

First of all, take the sending data as an example, the user-side data first passes through the FIFO cache to solve the problem of the cross-clock domain, the write clock of the FIFO is 100M, the data bit width is 256bit, the read clock of the FIFO is 200M, the bit width is 128bit, the master clock is 400M, and the user clock is divided into frequencies for this master clock, which can verify the amount of data: 200M*128bit = 400M*2*32bit (where 2 are sent up and down edges).


Next, a brief description of the project: at the beginning the data is sent by a traffic generator traffic_generator, first, write data to DDR3, add one to the data, add 8 to the address (burst length is 8, and it seems that the address must be added 8, I tried 4, no), when the data address is added to 4000, the write operation ends, the data is read, and the same address plus 8, Of course, read 4000 so far. Read and write data and read and write address because it is cross-clock domain processing, so they need to go through FIFO (specific settings I will not say, basic settings), that is, 4 FIFO, write data FIFO, write address FIFO, read data FIFO, read address FIFO, two of which FIFO can be IP check multiplexing, for 4 FIFO read and write will become the focus of the design and difficulty, I can only say the general idea, the rest of the rest rely on everyone to grope.


First of all, for writing data and writing address FIFO write enable, as long as you are not satisfied, you can write, and our main control is when to read, and how to read, mainly through the state machine to complete, write data FIFO read enable and user interface write enable (app_wdf_wren) can be considered synchronous, we want to control the generation is also these signals: app_wdf_wren, app_wdf_ end, wr_dfifo_ren, wr_afifo_ren, determine the input signals of these signals: app_wdf_rdy, app_rdy, work (write request, not empty and initialize at the same time there is a written request).


First of all, for the read data and the write enable of the read address FIFO, as long as the address and data are valid, it can be written, for the read enable of the read data, as long as the FIFO is not empty, the data is out, and we mainly control the read enable of the read address FIFO, mainly through the state machine, we want to control the generated signals: app_en, rd_afifo_ren, Determines the input signals for these signals: app_rdy, Iraq.


the program design is completed, it is best to configure the parameters of the DDR3 IP core to facilitate simulation, the SIM_BYPASS_INIT_CAL can be set to FAST, you can skip the initial initialization, save simulation time, set THE ORDERING to STRICT, the intermediate data will not be reordered, easy to view the intermediate simulation results.


The following is the synthesis, implementation, and generation of bit files, which can be on the board…


If you don’t have a problem, you’re doomed!


let’s talk about my bad life!


problem summary one: the system cannot initialize


In general, if the user_design that comes with the generated kernel is added normally, it can be compiled normally, and there will be no problem of uninitialization!!! In this project, according to the setting summary of the corresponding settings, careful inspection, and the use of chip scope for signal capture, you can trigger a signal a trigger signal test and then analyze the cause, do not rush to suspect the problem of the hardware (provided that it has been successful).


problem summary two: although it can be initialized, no data is written


This problem mainly occurs in the top-level instantiation, the initialization signal is not properly connected to the module, mainly or not carefully, carefully check the top-level, especially where you have modified it yourself, you must pay attention!


here’s a magical phenomenon: (Write data starts at 128, the final version is 256, change to 256 can avoid a lot of problems, you can think about why?) )


1, the read and write clock is changed to 200M (this clock corresponds to the clock that writes FIFO, that is, the above 100M is replaced by 200M, ddr3 core master clock 400M), burst length 8 (predictable results appear two consecutive cycles of data are the same), write data bit width 128, read bit width 128, grab result is the same as the predicted result


2, the read and write clock is changed to 100M, the burst length is 4 (the predictable result will not appear for two consecutive cycles of data is the same), the write data bit width is 128, the read bit width is 128, the capture result is the same as the predicted result, it seems that if the write bit width is not modified, the burst length is changed to 4 is also a strategy

The following is a specific analysis of the above two situations:


phenomenon one: the DDR3 single piece selected for this platform is 16 bits, and the hardware design is to connect the two pieces in parallel, see below Because the written data is 128 bits, as shown above, we write only a large box of data why the part of the data that is not written later retains the value of the first 128 bits of data, which is related to the write enable of the FIFO that writes the data, and then introduces below (first know that the burst length is 8 when the first 128 and the last 128 are written to the same data), and then the above analysis, and when we are below the phenomenon 2, we changed the burst length to 4, It can be realized that the readout data is an address corresponding to a data, why is this? Because, although we have a burst length of 4, the address in our program is still added to 8, so the result is the address plus 8, we write the data of the 4 addresses (read the same), of course, when we read it is like extracting, two in one (after all, the two data are the same), so although the output result is the same, it is also correct, but is this the end? Thinking about it again, this would be a waste of half of our storage space!!!!!! You write half of the waste data, which is intolerable in practice! So what we have to do is to store 128-bit data one by one, not have duplication, and avoid wasting storage space, although changing the burst length can get the correct result, the cost is obvious!


Earlier I mentioned that duplicate data occurs because there is a problem with the read enable of writing data FIFO, the following focuses on solving, first look at two graphs, the above figure is the correct soft imitation result, and the following figure is the wrong result that causes two duplicate data

The correct read enable should always be 1, and the read enable in the following figure changes pulsed, when the read enables is 0, the data maintains the data of the previous cycle, which is the root cause since the reason is analyzed to modify the program

to make the app_wdf_wren and wr_dfifo_ren synchronized deliberately added, to solve the problem of the same data for two consecutive cycles. But there is a problem again, to give you a picture to see

logically written to 502 is over, where does the back come from?


Remember what I said before that the address of the read data is also added to 4000 when the end of the read operation, the program writes that every time the data is added one, the address is added 8, then the default is 8 addresses corresponding to a data, but they are address plus 8, corresponding to the writing of 2 data, so, for writing data, although the address is added to more than 4000 the data is only written to the first 2000 addresses, and when reading data, But the reading is 4000 addresses, but only the first 2000 address data is correct, and the data of the next address is still saved with the previously written error data (the two adjacent cycle data are the same), so the above situation will occur, so the read address can be changed to half of the original!


The last problem is that for reading and writing FIFOs, the clock and bit width are best matched, and the gap is not too large, of course, depends on your program


Okay, here’s a simulation diagram for your initial parameters


the user input and output data bit width 256, write clock 100M, app_wdf_data is 128, the burst length is 8, read out of the data to meet the requirements, but there are some problems at the beginning, but the rest is no problem, the following is the simulation result, there are some problems at the beginning of the data, lazy to adjust, it is also placed here, the data back to play a few beats should be able to solve the problem, hehe


Well, typing is tiring, I hope this article can be helpful to beginners, but also hope that the gods can criticize and correct more, thank you first, you can communicate with each other, think it is useful, everyone will like it! I am also debugging some interfaces, if the effect is good, the subject will continue to update, so stay tuned!

this article belongs to the author’s original, if reprinted, please indicate the author and source! hard drives SSD OEM1to SSD drive