The theoretical maximum read or write speed of PCIe3.0x4 is 4GB/s. Regardless of protocol overhead, 4GB/4K 4K IOs can be transmitted per second. The value is 1M, which means the theoretical maximum IOPS is 1000K. Therefore, no matter what media is used in the underlying layer of an SSD, whether it is flash or 3D xpoint, the interface speed is only so low, and the maximum IOPS cannot exceed this value.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
Introduction to PCIe interface
PCIe has developed to the present, and the speed is faster than the previous generation.
Link Width line, we see X1, X2, X4..., what does this mean? This refers to the number of lanes (Lane) of the PCIe connection. Just like highways, there are single lanes, 2 lanes, and 4 lanes. However, highways with 8 lanes or more are not common, but PCIe can have up to 32 lanes.
The PCIe connection between two devices is called a Link, as shown in the figure below:
From A To B, there is a two-way connection. The car can drive from A to B. At the same time, the car can also drive from B to A, each going its own way. There are dedicated sending and receiving channels between two PCIe devices, and data can be transmitted in both directions at the same time. PCIe spec calls this working mode dual-simplex mode, which can be understood as full-duplex mode.
What is the working mode of SATA?
Like PCIe, SATA also has independent sending and receiving channels, but it is different from the PCIe working mode: only one channel can transmit data at the same time. In other words, if you send data on one channel, you cannot receive data on the other channel, and vice versa. This working mode should be half-duplex mode. PCIe is like our mobile phone, both parties can talk at the same time, while SATA is like a walkie-talkie. When one person is talking, the other person can only listen but not speak.
Go back to the previous table of PCIe bandwidth. The bandwidth above, such as PCIe3.0x1, has a bandwidth of 2GB/s, which refers to the two-way bandwidth, that is, the read and write bandwidth. If it refers only to reading or writing, the value should be halved, that is, a reading speed or writing speed of 1GB/s.
Let’s take a look at how the bandwidth in the table is calculated.
PCIe is a serial bus. The online bit transmission rate of PCIe1.0 is 2.5Gb/s. The physical layer uses 8/10 encoding, that is, 8-bit data. The actual physical line is 10 bits need to be transmitted, so:
PCIe1.0 x 1的带宽=(2.5Gb/s x 2(双向通道))/ 10bit = 0.5GB/s
This is the bandwidth of a single Lane. If there are several Lanes, then the entire bandwidth is 0.5GB/s multiplied by the number of Lanes.
The online bit transmission rate of PCIe2.0 has doubled based on PCIe1.0 to 5Gb/s. The physical layer also uses 8/10 encoding, so:
PCIe2.0 x 1的带宽=(5Gb/s x 2(双向通道))/ 10bit = 1GB/s
Similarly, the bandwidth is 1GB/s multiplied by the number of Lanes.
The online bit transmission rate of PCIe3.0 has not doubled based on PCIe2.0. It is not 10Gb/s, but 8Gb/s, but the physical layer uses 128/130 encoding for data transmission. , so:
PCIe3.0 x 1的带宽=(8Gb/s x 2(双向通道))/ 8bit = 2GB/s
Similarly, the bandwidth is 2GB/s multiplied by the number of Lanes.
Due to the use of 128/130 encoding, 128-bit data only adds an additional 2 bits of overhead, and the effective data transmission ratio increases. Although the online bit transmission rate has not doubled, the effective data bandwidth is still It is doubled based on PCIe2.0.
It is worth mentioning here that the data bandwidth calculated above has taken into account 8/10 or 128/130 encoding. Therefore, when calculating bandwidth, there is no need to consider online encoding.
Unlike SATA single channel, PCIe connection can expand the bandwidth by increasing the number of channels, which is full of flexibility. The higher the number of channels, the faster the speed. However, the higher the number of channels, the higher the cost, takes up more space, and consumes more power. Therefore, how many channels to use should be a comprehensive consideration between performance and other factors. Considering performance alone, the maximum bandwidth of PCIe can reach 64GB/s, and the bandwidth corresponding to PCIe 3.0 x 32 is a terrifying figure. However, existing PCIe interface SSDs generally use up to 4 channels, such as PCIe3.0x4, with a bidirectional bandwidth of 8GB/s and a read or write bandwidth of 4GB/s.
The transfer speed is several GB/s, which is great for reading and writing small movies.
Here, let’s calculate the theoretical maximum 4K IOPS of PCIe3.0x4. The theoretical maximum read or write speed of PCIe3.0x4 is 4GB/s. Regardless of protocol overhead, it can transmit 4GB/4K 4K IOs per second. This value is 1M, which means the theoretical maximum IOPS is 1000K. Therefore, for an SSD, no matter what media you use at the bottom, whether it is flash or 3D xpoint, the interface speed is only so low, and the maximum IOPS cannot exceed this value.
PCIe is developed from PCI. The "e" of PCIe is the abbreviation of express, which means fast. How can PCIe be faster than PCI (or PCI-X)? PCIe is fundamentally different from PCI in terms of physical transmission: PCI uses parallel port to transmit data, while PCIe uses serial port transmission. My PCI parallel bus can transmit 32 bits or 64 bits in a single clock cycle. Why can't it be compared to your serial bus that transmits 1 bit of data in a single clock cycle?
When the actual clock frequency is relatively low, the parallel port is indeed faster than the serial port because it can transmit several bits at the same time. With the development of technology, the data transmission rate is required to be faster and faster, and the clock frequency is also required to be faster and faster. However, the parallel bus clock frequency cannot be as fast as you want.
At the sending end, the data is transmitted out on a certain clock edge (the first rising edge of the clock on the left), and at the receiving end, the data is transmitted on the next clock edge (the first rising edge of the clock on the right). Two rising edges) receive. Therefore, to correctly collect data at the receiving end, the clock cycle must be greater than the data transmission time (flight time from the sending end to the receiving end). Limited by the data transmission time (which also increases as the length of the data line increases), the clock frequency cannot be made too high. In addition, when the clock signal is transmitted online, there will also be a phase offset (clock skew), which affects the data collection at the receiving end; also, in parallel transmission, the receiving end must wait for the slowest bit of data to arrive before it can lock the entire Data (signal skew).
PCIe does not have these problems when using the serial bus for data transmission. It has no external clock signal. Its clock information is embedded in the data stream through 8/10 encoding or 128/130 encoding. The receiving end can recover the clock information from the data stream. Therefore, it is not limited by the data transmission time on the line. You It doesn’t matter how long the wire is, or how fast your data transmission frequency is; without an external clock signal, there is naturally no so-called clock skew problem; since it is serial transmission, only one bit is transmitted, so there is no signal skew problem. However, if multiple lanes are used to transmit data (there is parallelism in serial, haha), the problem comes back, because the receiving end also has to wait for the data on the slowest lane to arrive before it can process the entire data.
Basic knowledge of the PCIe bus
Different from the PCI bus, the PCIe bus uses an end-to-end connection method. Each end can only connect to one device. The two devices are the data sending end and the data receiving end. In addition to bus links, the PCIe bus also has multiple layers. The sender will pass through these layers when sending data, and the receiver will also use these layers when receiving data. The hierarchical structure used by the PCIe bus is similar to the network protocol stack.
The PCIe link uses "end-to-end data transmission mode". Both the sending end and the receiving end contain TX (transmitting logic) and RX (receiving logic), and their structure is as shown in the figure.
As shown in the figure above, in a data path (Lane) of the physical link of the PCIe bus, there are two sets of differential signals, a total of 4 signal lines. The TX component at the transmitting end and the RX component at the receiving end are connected using a set of differential signals. This link is also called the transmitting link at the transmitting end and is also the receiving link at the receiving end; while the RX component at the transmitting end and the TX component at the receiving end use another set of differential signals. A group of differential signal connections, this link is also called the receive link at the sender end and is also the transmit link at the receiver end. A PCIe link can be composed of multiple Lanes.
The electrical specification for high-speed differential signals requires that a capacitor be connected in series to the transmitting end for AC coupling. This capacitor is also called an AC coupling capacitor. The PCIe link uses differential signals for data transmission. A differential signal consists of two signals, D and D-. The signal receiving end compares the difference between the two signals to determine whether the sending end sends a logic "1" or a logic "0" ".
Compared with single-ended signals, differential signals are more resistant to interference because differential signals require "equal length", "equal width", "close proximity" during wiring, and are on the same layer. Therefore, the external interference noise will be loaded on the D and D- signals with "the same value" and "simultaneously". The difference is 0 under ideal circumstances, which will have a small impact on the logic value of the signal. Differential signaling can therefore use higher bus frequencies.
In addition, the use of differential signals can effectively suppress electromagnetic interference EMI (Electro Magnetic Interference). Because the differential signals D and D- are very close and have equal signal amplitudes and opposite polarities. The amplitude of the coupled electromagnetic field between these two wires and the ground wire is equal and will cancel each other out, so the differential signal causes less electromagnetic interference to the outside world. Of course, the shortcomings of differential signals are also obvious. First, differential signals use two signals to transmit one bit of data; second, the wiring of differential signals is relatively strict.
PCIe link can be composed of multiple Lanes. Currently, PCIe link can support 1, 2, 4, 8, 12, 16 and 32 Lanes, namely ×1, ×2, ×4, ×8 , ×12, ×16 and ×32 width PCIe links. The bus frequency used on each Lane is related to the version of the PCIe bus used.
The first PCIe bus specification is V1.0, followed by V1.0a, V1.1, V2.0 and V2.1. The latest specification of the PCIe bus is currently V2.1, while V3.0 is under development and is expected to be released in 2010. Different PCIe bus specifications define different bus frequencies and link coding methods, as shown in Table 41.
The relationship between PCIe bus specification and bus frequency and coding
PCIe bus specification | Bus frequency[1 ] | Peak bandwidth of a single Lane | Encoding method |
---|---|---|---|
1.25GHz | 2.5GT/s | 8/10b encoding | |
2.5GHz | 5GT/s | 8/10b encoding | |
4GHz | 8GT/s | 128/130b encoding |
As shown in the above table, although the bus frequency used by the V3.0 specification is only 4GHz, its effective bandwidth is twice that of V2.x. The following will take the V2.x specification as an example to illustrate the peak bandwidth that PCIe links of different widths can provide, as shown in Table 42.
Peak bandwidth of PCIe bus
×1 | ×2 | ×4 | ×8 | ×12 | ×16 | ×32 | |
---|---|---|---|---|---|---|---|
5 | 10 | 20 | 40 | 60 | 80 | 160 |
In the PCIe bus, use GT (Gigatransfer) to calculate the peak bandwidth of the PCIe link. GT is the peak bandwidth transmitted on the PCIe link, and its calculation formula is bus frequency × data bit width × 2.
In the PCIe bus, there are many factors that affect the effective bandwidth, so its effective bandwidth is difficult to calculate. Despite this, the effective bandwidth provided by the PCIe bus is still much higher than that of the PCI bus. The PCIe bus also has its weaknesses, the most prominent of which is transmission latency.
The PCIe link uses serial mode for data transmission. However, inside the chip, the data bus is still parallel, so the PCIe link interface needs to perform serial-to-parallel conversion. This serial-to-parallel conversion will produce a larger Delay. In addition, data packets on the PCIe bus need to pass through the transaction layer, data link layer and physical layer. These data packets will also cause delays when passing through these layers.
Among devices based on the PCIe bus, ×1 PCIe links are the most common, while ×12 PCIe links are rare, and ×4 and ×8 PCIe devices are also rare. Intel usually integrates multiple ×1 PCIe links in the ICH to connect low-speed peripherals, and integrates a ×16 PCIe link in the MCH to connect the graphics card controller. PowerPC processors usually support ×8, ×4, ×2, and ×1 PCIe links.
Data transmission between physical links on the PCIe bus uses a clock-based synchronous transmission mechanism, but there is no clock line on the physical link. The receiving end of the PCIe bus contains a clock recovery module CDR (Clock Data Recovery). CDR will extract the receive clock from the received message to perform synchronous data transfer.
It is worth noting that in a PCIe device, in addition to extracting the clock from the message, REFCLK and REFCLK-signal pairs are also used as the local reference clock
More related knowledge, Please visit the
FAQThe above is the detailed content of What is the maximum speed of pcie3.0x4. For more information, please follow other related articles on the PHP Chinese website!