Sei sulla pagina 1di 66

PCI Express Packet Analysis with Down Stream Port

Model and PIO Example Design

Deepesh Man Shakya

Internal and Unofficial document.


To be used as a reference only.

Page 1 of 66
Table of Contents
Introduction ..................................................................................................................................................................3
PIO Example Design ....................................................................................................................................................3
Downstream Port Model...............................................................................................................................................4
Files Hierarchy .............................................................................................................................................................4
PIO Example Design Schematics .................................................................................................................................5
Downstream Port Model Architecture........................................................................................................................15
Xilinx_pci_exp_dsport.vhd ....................................................................................................................................16
RX_APP (pci_exp_usrapp_rx.vhd) ........................................................................................................................16
PROC_READ_DATA........................................................................................................................................16
PROC_DECIPHER_FRAME ............................................................................................................................16
PROC_3DW/PROC_4dW..................................................................................................................................17
PROC_PARSE_FRAME....................................................................................................................................17
TX_APP (pci_exp_usrapp_tx.vhd).........................................................................................................................19
pio_writeReadBack_test0 (tests.vhd) .................................................................................................................21
test_interface.vhd................................................................................................................................................24
PROC_SYSTEM_INITIALIZATION ...............................................................................................................24
PROC_BAR_INIT..............................................................................................................................................24
PROC_BAR_SCAN ...........................................................................................................................................25
PROC_BUILD_PCIE_MAP ..............................................................................................................................26
PROC_BAR_PROGRAM..................................................................................................................................27
PROC_TX_SYNCHRONIZE ............................................................................................................................28
PROC_TX_TYPE0_CONFIGURATION_WRITE ...........................................................................................28
PROC_TX_TYPE1_CONFIGURATION_READ.............................................................................................30
PROC_READ_DATA / PROC_PARSE_FRAME / PROC_DECIPHER_FRAME / PROC_4DW .................30
PROC_TX_IO_WRITE......................................................................................................................................30
PROC_TX_MEMORY_WRITE_32 ..................................................................................................................31
PROC_WAIT_FOR_READ_DATA..................................................................................................................33
PIO Example Design Packet Analysis........................................................................................................................34
Fmt and Type..........................................................................................................................................................36
Configuration Write................................................................................................................................................37
Configuration Write Completion............................................................................................................................41
Memory Write 64 / Memory Read 64 / Completion .............................................................................................42
Memory Write 32 / Memory Read 32 / Completion ..............................................................................................51
Limitations and Features of Downstream port model ................................................................................................53
Limitations of PIO Example Design...........................................................................................................................54
Playing with Packets...................................................................................................................................................54
Length Parameter Modification..............................................................................................................................54
Poisoning Memory Read TLP ................................................................................................................................56
Poisoning Configuration Write Request.................................................................................................................58
Testing with a new TLP..........................................................................................................................................60
Appendix ....................................................................................................................................................................62
Generic TLP Header Fields ....................................................................................................................................62
IO Request Header Format .....................................................................................................................................64
Memory Request Header Format............................................................................................................................64
Configuration Request Header Format ...................................................................................................................65
Message Request Header Format ...........................................................................................................................65
Completion Header Format ....................................................................................................................................65
Few points to note...................................................................................................................................................65
References ..................................................................................................................................................................66

Page 2 of 66
Introduction
This document discusses the PIO example design and the downstream port model that comes with the generation of
the PCI Express Block Plus core. The main goal of this document is to provide detail information on the
architecture of the PIO example design and the simulation setup that includes downstream port model.

The PIO example design simulation emulates the packet transaction between the real root complex and the
endpoint. We will look into how the initialization process takes place, how the configuration transaction is initiated
by the downstream port model and how the normal Memory Read, Memory Write and IO Read Write transaction is
initiated by the host. We will also look into the generation of the completion packets by the endpoint.

Latter part of this document goes through the packet analysis of the TLPs generated by downstream port model and
the corresponding completions generated by the endpoint example design.

PIO Example Design


The PIO example design is a simple target-only application that interfaces with the Endpoint for PCIe core's
Transaction (TRN) interface.

Following are the main features of the PIO example design:

• Four transaction-specific 2 kb target region using the internal Xilinx FPGA Block RAMs, providing a total
target space of 8192 bytes.
• Supports single DWORD payload Read and Write PCI Express transactions to 32/64 bit address memory
spaces and IO space with support for completion TLPs.
• Utilizes the core's trn_rbar_hit_n[6:0] signals to differentiate between TLP destination Base Address
Registers.
• Provides separate implementations optimized for 32-bit and 64-bit TRN interfaces.

The following block diagram shows the PIO example design components:

Page 3 of 66
Downstream Port Model
Downstream port model acts as a root complex but it is not really a "root complex". The model represents only the
downstream port interface which allows training with the endpoint. A complete root complex functionality
represents a lot of other things. The downstream port and the provided testbench only provide just enough tools to
do writes and read to the user design. The downstream port model is not a full blown simulation model as a true
BFM available from third party vendors. However, it enables enough functionality to do basic testing of the user
design.

Downstream port model provides a mechanism for generating downstream PCI Express TLP traffic to access the
user application. It also provides a mechanism to receive upstream PCI Express TLP traffic from the customer
design in a simulation environment.

Downstream port model initializes the core's configuration space, creates TLP transactions, generates TLP logs,
and provides an interface for creating and verifying tests.

The following diagram shows the high level architecture of downstream port model.

Files Hierarchy
Following screen shots gives the files hierarchy for the PIO example design simulation setup. All the files are
generated during the generation of the PCI express block plus core. This hierarchy has been captured by creating an
ISE project of the VHDL files provided in the example design and the downstream port model.

Page 4 of 66
PIO Example Design Architecture
The following diagram shows the top level schematics of the PIO Example design.

The following schematic shows the block instantiations inside the above block.

Page 5 of 66
The following schematic gives the top level view of the PIO example design user application.

Page 6 of 66
The following schematic gives the top level view of the PCIe endpoint block plus core.

Page 7 of 66
The following schematic shows the PIO Module inside the user application block.

Page 8 of 66
The rest of the IOs in the user application block are tied off as shown below:

Page 9 of 66
Following schematic shows a block (pio_ep) inside the pio interface.

Inside PIO_EP, a memory block and the RX and TX engines are defined as shown below:

Page 10 of 66
The following schematic shows the close-up view of the EP_MEM_ACCESS module.

Page 11 of 66
Inside this module, there is a module called EP_MEM module where entire memory blocks used in the PIO
example design have been instantiated. The PIO_EP_MEM_ACCESS module processes data written to the
memory from incoming Memory and IO Write TLPs and provides data read from the memory in response to
Memory and IO Read TLPs. The EP_MEM module processes 1 DWORD 32- and 64 bit addressable Memory and
IO Write requests based on the information received from the RX Engine. The following schematic shows the
internals of the EP_MEM_ACCESS module.

The following schematic shows the internals of the EP_MEM module.

Page 12 of 66
The following schematic shows the close-up view of the EP_RX_64 (Receive Engine) block.

Page 13 of 66
The following schematic shows the close-up view of the TX engine block. PIO_64_TX_ENGINE
(PIO_32_TX_ENGINE) module generates completions for received memory and IO read TLPS. The PIO design
does not generate outbound read or write requests. However, user can add this functionality to further customize
the design.

Page 14 of 66
The following table shows the inputs required by the TX engine to generate completion packets.

After the completion is sent, the TX engine asserts the compl_done_i output indicating to the RX engine that it can
assert trn_rdst_rdy_n and continue receiving TLPs.

Downstream Port Model Architecture


Downstream port model consists of following components:

Page 15 of 66
Xilinx_pci_exp_dsport.vhd

This block essentially acts as a root complex. However, the model shouldn't be strictly treated as a root complex as
it doesn't provide many features that a real root complex would normally provide. The endpoint PCIe block plus at
the user side transmits TLPs across the PCI express link to the Downstream Port (dsport) model. The dsport and the
PCIe block plus core are responsible for the data link layer and physical layer processing when communicating
across the PCI Express fabric.

dsport_cfg configures the downstream port model.

RX_APP (pci_exp_usrapp_rx.vhd)

Following are different procedures defined in RX_APP:

1. PROC_READ_DATA
2. PROC_DECIPHER_FRAME
3. PROC_3DW
4. PROC_4DW
5. PROC_PARSE_FRAME

PROC_READ_DATA

This procedure reads receive transaction data line (trn_rd) and stores it in frame_store_rx as shown below:

PROC_DECIPHER_FRAME

This procedure extracts the information from the data collected by PROC_READ_DATA as shown below:

Page 16 of 66
PROC_3DW/PROC_4dW

These procedures print the frame information to the output log as shown below:

PROC_PARSE_FRAME

PROC_PARSE_FRAME calls PROC_DECIPHER_FRAME, PROC_4DW and PROC_3DW (last two writes to the
tx.dat and rx.dat file)

Page 17 of 66
The following code gives the RX_APP state machine:

Page 18 of 66
TX_APP (pci_exp_usrapp_tx.vhd)

The usrapp_tx block sends TLPs to the dsport block for transmission across the PCI Express Link to the Endpoint
DUT. Transaction sequences or test programs are initiated by the usrapp_tx block to stimulate the endpoint device's
fabric interface.

All test programs are defined inside the test_interface.vhd. All transaction sequences are defined in tests.vhd file.

There are different tests that you can perform based on the VHDL or Verilog version of the core you generate.
Following table gives the details of entire test suite that you can perform with the downstream port model.

Page 19 of 66
Page 20 of 66
We will discuss here a test flow with reference to pio_write_readback_test0 test which is available both in verilog
and vhdl version of the core generation.

All downstream port model tests follow the same six steps as listed below:

1. Perform conditional comparison of a unique test name


2. Set up master timeout in case of simulation hangs
3. Wait for Reset and link-up
4. Initialize the configuration space of the endpoint
5. Transmit and receive TLPs between the Downstream Port Model and the Endpoint DUT
6. Verify that the test succeeded

An entire source code relating to the TX_APP is presented here along with the description from the userguide (in
the form of screen shot) and relevant description of the procedures where needed. The main objective behind
publishing source code for all procedures defined in the TX_APP is to allow readers to get understanding of the
working mechanism of the PIO example design without needing to generate the core and browse through the
source code and the user guide.

pio_writeReadBack_test0 (tests.vhd)

This section will present the entire source code for pio_writeReadBack_test0 test suite provided in the downstream
port model. The code involves number of procedure calls. Source code will be presented for each procedure used in
the test.

PROC_SYSTEM_INITIALIZATION will cause the test program to wait for the system reset to deassert as well as
the endpoint's trn_lnk_up_n signal to assert. This is an indication that the endpoint is ready to be configured by the
test program via the Downstream Port Model.

PROC_BAR_INIT will perform a series of Type 0 Configuration Writes and Reads to the Endpoint core's PCI
Configuration Space, determine the memory and IO requirements of the endpoint, and then program the endpoint's
Base Address Registers so that it is ready to receive TLPs from the Downstream Port Model.

In the following part of the source code, the sample program work together to cycle through all the endpoint's
BARs and determine whether they are enabled, and if so determine their type, for example, Mem32, Mem64, or
IO).

Whether the BAR is enabled or not is checked by probing BAR_ENABLED[] global array. A non-zero value
indicates that the corresponding BAR is enabled. If the BAR is not enabled then test program flow will move on to

Page 21 of 66
check the next BAR. The previous call to PROC_BAR_INIT performed the necessary configuration TLP
communication to the endpoint device and filled in the appropriate values into the BAR_ENABLED[ ] array.

If the array element is enabled (that is, non-zero), the element's value indicates the BAR type. A value of 1, 2, and 3
indicates IO, Memory 32, and Memory 64 spaces, respectively.

Page 22 of 66
Page 23 of 66
test_interface.vhd

All the procedures called in the tests.vhd are defined in test_interface.vhd. Following are the procedures defined in
this file:

PROC_SYSTEM_INITIALIZATION
PROC_BAR_INIT
PROC_BAR_SCAN
PROC_BUILD_PCIE_MAP
PROC_DISPLAY_PCIE_MAP
PROC_BAR_PROGRAM
PROC_TX_SYNCHRONIZE
PROC_TX_TYPE0_CONFIGURATION_WRITE
PROC_TX_TYPE1_CONFIGURATION_READ
PROC_READ_DATA
PROC_PARSE_FRAME
PROC_DECIPHER_FRAME
PROC_4DW
PROC_TX_IO_WRITE
PROC_TX_IO_READ
PROC_TX_MEMORY_WRITE_32
PROC_TX_MEMORY_READ_32
PROC_WAIT_FOR_READ_DATA
PROC_TX_MEMORY_WRITE_64
PROC_TX_MEMORY_READ_64

PROC_SYSTEM_INITIALIZATION

This procedure waits for transaction interface reset and linkup between the Downstream Port Model and the
Endpoint DUT. This task must be invoked prior to the Endpoint core initialization.

PROC_BAR_INIT

PROC_BAR_INIT will perform a series of Type 0 Configuration Writes and Reads to the Endpoint core's PCI
Configuration Space, determine the memory and IO requirements of the endpoint, and then program the endpoint's
Base Address Registers so that it is ready to receive TLPs from the Downstream Port Model.

Page 24 of 66
PROC_BAR_SCAN

This procedure performs a sequence of PCI Type 0 Configuration Writes and Configuration Reads using the PCI
Express fabric in order to determine the memory and IO requirements for the Endpoint. The task stores this
information in the global array BAR_RANGE[]. This task should only be called after
PROC_SYSTEM_INITIALIZATION.

Page 25 of 66
PROC_BUILD_PCIE_MAP

[From User Guide] Performs memory/IO mapping algorithm and allocates Memory 32, Memory 64, and IO space
based on the Endpoint requirements. This task has been customized to work in conjunction with the limitations of
the PIO design and should only be called after completion of PROC_BAR_SCAN.

[From Source Code] This checks whether the BAR_RANGE has been defined or not. If it is then that BAR is
enabled. If not the BAR is disabled.

Page 26 of 66
PROC_BAR_PROGRAM

Page 27 of 66
PROC_TX_SYNCHRONIZE

The main function of this procedure is to synchronize the trn_clk and trn_tdst_rdy_n signals. Before a TLP is
transferred it waits for trn_clk to detect its positive edge and trn_tdst_rdy_n to be asserted.

Within PROC_TX_SYNCHRONIZE calls PROC_READ_DATA and PROC_PARSE_FRAME


PROC_PARSE_FRAME calls PROC_DECIPHER_FRAME PROC_4DW and PROC_3DW. These chain of
procedures are called to log the outgoing TLPs into output log (i.e. tx.dat). This is shown in the following source
code snippet.

PROC_TX_TYPE0_CONFIGURATION_WRITE

Page 28 of 66
This procedure sends a Type 0 PCI Express Config Write TLP from Downstream Port Model to reg_addr_ of
Endpoint DUT with tag_ and first_dw_be_ inputs. Completion returned from Endpoint DUT will use contents of
global COMPLETE_ID_CFG as completion ID.

Inputs to this procedure are as follows:

First PROC_TX_SYNCHRONIZE is called. The first call only synchronizes the trn_clk and trn_tdst_rdyn_n
signals. After that the TLP information is put into trn_td_c. The second PROC_TX_SYNCHRONIZE call
synchronizes the signals as well as logs the TLP information into local buffer to be parsed and sent to output log.
The whole outgoing TLP information is sent to the output log after the transmission of the last TLP data.

Page 29 of 66
PROC_TX_TYPE1_CONFIGURATION_READ

This procedure sends a Type 1 PCI Express Config Read TLP from Downstream Port Model to reg_addr_ of
Endpoint DUT with tag_ and first_dw_be_ inputs. CplD returned from Endpoint DUT will use contents of global
COMPLETE_ID_CFG as completion ID.

The definition of this procedure is same as that for PROC_TX_TYPE0_CONFIGURATION_WRITE.

PROC_READ_DATA / PROC_PARSE_FRAME / PROC_DECIPHER_FRAME / PROC_4DW

These procedures are the general procedures used to log the TLP information in the output log i.e. tx.dat and rx.dat.
The description for these procedures has been provided in the RX_APP section.

PROC_TX_IO_WRITE

This procedure sends a PCI Express IO Write TLP from Downstream Port Model to IO address addr_[31:2] of
Endpoint DUT. CplD returned from Endpoint DUT will use contents of global COMPLETE_ID_CFG as
completion ID.

The code snippet for PROC_TX_IO_WRITE is as follows:

Page 30 of 66
PROC_TX_MEMORY_WRITE_32

The inputs for this procedure are as follows:

The header format for 32 bit address memory write TLP is as follows:

The code snippet for the PROC_TX_MEMORY_WRTE_32 is as shown below:

Page 31 of 66
Page 32 of 66
PROC_WAIT_FOR_READ_DATA

This procedure waits for the next completion with data TLP that was sent by the Endpoint DUT. On successful
completion, the first DWORD of data from the CplD will be stored in the global P_READ_DATA. This task
should be called immediately following any of the read tasks in the TPI that request Completion with Data TLPs to
avoid any race conditions.

By default this task will locally time out and terminate the simulation after 1000 transaction interface clocks. The
global cpld_to_finish can be set to zero so that local time out returns execution to the calling test and does not
result in simulation timeout. For this case test programs should check the global cpld_to, which when set to one
indicates that this task has timed out and that the contents of P_READ_DATA are invalid.

Page 33 of 66
This procedure is called from the main test (pio_writeReadBack_test0) code as follows:

PIO Example Design Packet Analysis


In the previous section a complete description of the pio_writeReadBack_test0 test was given. In this section a
detail description of TLP packet analysis will be presented by simulating the example design based on
pio_weriteReadBack_test0 downstream port model test suite. To recap the whole working mechanism of this test a
flow chart is presented below:

Page 34 of 66
Page 35 of 66
Fmt and Type

In packet analysis, it is important to understand what Fmt and Type value in the packet indicate to. The following
table shows the Fmt and Type encodings:

Page 36 of 66
Configuration Write

The following waveform shows the zoomed out view of the example design simulation output with the default
settings.

The first TLP that is initiated by the dsport model is the configuration write transaction. If you open tx.dat you will
see following information of the first configuration write transaction:

This configuration write transaction is called from the PROC_BAR_PROGRAM. The first call to
PROC_TX_TYPE0_CONFIGURATION_WRITE is to program BAR0. The code snippet with the parameters for
the first call is as shown below. Our main goal here is to trace the parameters in the tx.dat and the simulation
waveform.

Page 37 of 66
Let’s bring in the PROC_TX_TYPE0_CONFIGURATION_WRITE definition:

Page 38 of 66
Let’s look at the header format for the configuration write transaction:

Based on the source code provided for the configuration write call and the definition of the procedure, let’s track
down the parameter values for the above header format.

R=0
Fmt = 10
Type = 00100
R=0
TC = 000
Reserved = 0000
TD = 0
EP = 0
Attr = 00
R = 00
Length = 0000000001
Requester ID = COMPLETER_ID_CFG ( This value is the global constant definition)

Tag = 0f (This is passed from the PROC_TX_TYPE0_CONFIGURATION_WRITE call)


Last DW BE = 0000
1st DW BE = f (This is passed from the PROC_TX_TYPE0_CONFIGURATION_WRITE call)
Bus Number/Device Number/ Function Number = COMPLETER_ID_CFG
Reserved = 0000
Ext Reg Number / Register Number = x010 (provided in the procedure call)
The reg_data is the content to be programmed into BAR (0) as shown below:

Page 39 of 66
reg_data(7 downto 0) &
reg_data(15 downto 8) &
reg_data(23 downto 16) &
reg_data(31 downto 24);
Here BAR(0) = reg_data
Therefore,
Reg_data (7 downto 0) = 0x00
Reg_data (15 downto 8) = 0x00
Reg_data (23 downto 16) =0x00
Reg_data (31 downto 24) = 0x10

Now let’s check the waveform and see if the above parameter values are reflected in the waveform or not.

The first 64 bits of trn_td in binary representation of the waveform is as follows:

The second 64 bits of trn_td in binary representation of the waveform is as follows:

If we break this down and put it in the header format above, we will get following:

Page 40 of 66
There are altogether 9 configuration writes which is also seen in the waveform:

Configuration Write Completion

The yellow box in the waveform below shows the configuration write completiond. Since there were 9
configuration writes, there are corresponding 9 configuration write completions.

Let’s check rx.dat for the output log of the first configuration write completion.

Let’s look at the completion header format:

If you zoom in the waveform to trace the first configuration write completion TLP, we get following:

Page 41 of 66
In the waveform, both completer ID and Requester ID are same i.e. 01A0. The important bit is the Tag. In the first
configuration write the Tag was 0F. We see the same tag for the completion as well indicating the completion for
the specific configuration write.

Now let’s look at the second configuration write log and the corresponding configuration write completion log and
verify that both have the same Tag or not.

From tx.dat

From rx.dat

Memory Write 64 / Memory Read 64 / Completion

Let’s explore the Memory_Write_64 TLP as shown in the yellow box below:

Page 42 of 66
A close-up view is as shown below. It is broken into two parts for clarity.

The first two hexadecimal bits (60) indicates that this TLP is Memory Write TLP with 4DW header i.e. the write
TLP is addressing the memory location with 64 bits address.

Before going further let’s check how the BARs were configured. You can see it in the modelsim console.

As you can see here, in our current example design configuration the first BAR has been enabled for 64 bits
address. Hence, the second bar is automatically disabled. The third BAR i.e. BAR 2 has been enabled to map to 32
bits memory address location. If you refer to the flow chart presented above, the first TLP that will be sent will be
PROC_TX_MEMORY_WRITE_64. Let’s explore how this procedure has been defined.

Page 43 of 66
Page 44 of 66
PROC_TX_MEMORY_WRITE_64 procedure has been called in pio_writeReadBack_test0 as follows:

Now let’s take a look at 4DW Memory Request header:

Let’s check the corresponding parameters for the TLP as assigned in PROC_TX_MEMORY_WRITE_64 and the
input parameters passed during this procedure call.

R=0
Fmt = 11
Type = 00000
R=0
TC = 000 (Passed during procedure call)
Reserved = 0000

Page 45 of 66
TD = 0
EP = 0
Attr = 00
R = 00
Length = 0000000001 (passed during the procedure call)
Requester ID = COMPLETER_ID_CFG (This value is a global constant definition)

Tag = 04 (passed during the procedure call)


Last DW BE = 0000 (passed during procedure call)
1st DW BE = 0xf (passed during procedure call)
Bus Number/Device Number/ Function Number = COMPLETER_ID_CFG

Address [63:32] = BAR (1) = 0x20000000


Address [31:2] = BAR (0) = 0x10000000
R =00

Let’s put first 32 bits from the above parameters together and see if this matches with what we see in the
waveform:

0110_0000_0000_0000_0000_0000_0000_0001

From the waveform we get:

Let’s look into second 32 bits from the above parameter values. We will look into hexadecimal values.

01A0_04_0_f = 01A0040f

Now, let’s look in the waveform:

Next 64 bits is the memory address, from the above parameter definition, should be:

200000000_10000000

In the waveform we see the same value:

In this TLP, the length of payload size is specified to be ‘1’. Therefore, the payload data of 1DW is attached with
this TLP.

In pio_writeReadBack_test0, DATA_STORE is assigned values as follows:

Page 46 of 66
Now let’s go back to PROC_TX_MEMORY_WRITE_64 definition

………
i := i+8;

The first condition in the ‘if’ statement is true therefore DATA_STORE (4,5,6,7) are filled with ‘0’. The while loop
will run only once. The above source code will transfer 64 bits data with first 32 bits as “64636261” and rest ‘0’.

Let’s see in the waveform if the data is correctly transferred or not:

The output log in tx.dat for this TLP is as shown below:

Page 47 of 66
The yellow box in the following waveform shows the TLP sent from the dsport reaching the trn receive interface at
the endpoint.

The zoomed in view is as shown below:

The data in the yellow box is the garbage data. Since the length of the payload is 1 (i.e. 1DW) this value doesn’t
count.

In pio_writeReadBack_test0, Write TLP is followed by a Read TLP just to make sure that the value is written
correctly. In the waveform below, the blue box is the Write TLP discussed above and the yellow box is the Read
TLP (this can be verified by checking the first two Hexadecimal bits which is ‘20’ here).

The output log for this read TLP in the tx.dat is as follows:

Page 48 of 66
Memory read is the non-posted transaction. Therefore, a completion packet should be sent upstream by the
endpoint example design. This completion packet should be visible at the trn transmit interface of the endpoint
example design and also at the trn receive interface of the dsport model. Let’s trace the packet at these two
interfaces in the simulation waveform.

In the waveform below, the yellow box shows the completion packet at the user’s side.

By zooming into the yellow box we see:

Before analyzing this packet let’s check the format of the completion packet.

The first 8 bits in the completion packet i.e. ‘4A’ indicates it is a ‘Completion with Data’ packet. Let’s look at the
output log for this completion packet received:

Page 49 of 66
As you can see in the waveform, the correct value is returned in the completion packet. The main interest here is
the Tag value. In the Memory read packet the Tag value was 0x05 (shown below for reference)

If we check the Tag value in the completion packet it is the same i.e. 0x05 indicating that this completion belongs
to the memory read packet that was sent earlier.

Also the completion status field is 0x0 indicating that completion is successful:

As seen in the yellow box in the completion packet, this is the byte count field.

Byte count is defined as follows:

Page 50 of 66
Since there are four bytes remaining to be received after this 64 bits, the value of 4 here is correct.

The following waveform shows the packet being received at the receive trn interface of the dsport model.

Following requirement as specified in the specification should be satisfied by the completion packet:

Memory Write 32 / Memory Read 32 / Completion

The yellow box in the following waveform shows the Memory Write TLP that addresses 32 bit memory address.
The memory write TLP is immediately followed by Memory Read TLP to the same memory location. This TLP is
included in the same box.

Page 51 of 66
The close up view of the Memory Write TLP (followed by the Memory Read TLP) is shown below:

‘40’ indicates Memory Write TLP addressing 32 bit memory. ‘00’ indicates Memory Read TLP addressing 32 bit
memory.

The output log for the memory write packet in tx.dat is as follows:

The output log for the memory read packet in tx.data is as follows:

The yellow box in the following waveform shows the completion for the above Memory Read packet.

Page 52 of 66
The close up view of the completion packet is shown below:

The output log for this packet in rx.dat is:

The yellow box in the following waveform shows the Memory Write and Memory Read from the Expansion Rom
which is mapped to BAR6.

Limitations and Features of Downstream port model


Following are the limitations of downstream port model:

1. The PIO design was created to support at most one IO BAR, one Mem64 BAR, and two Mem32 BARs
(one of which must be the EROM space), the Downstream Port Model by default makes a check during
device configuration that verifies that the core has been configured to meet this requirement. A violation of
this check will cause a warning message to be displayed as well as for the offending BAR to be gracefully
disabled in the test bench. This check can be disabled by setting the pio_check_design variable to zero in
the pci_exp_usrapp_tx.v file.

Page 53 of 66
2. Dsport model consists of a parallel test. This test involves more than one process thread. The test
sample_smoke_test1 is an example of a parallel test with two process threads. Parallel tests are very useful
when verifying that a specific set of events have occurred, however the order of these events are not
known.
3. Currently the VHDL version of the Downstream Port Model Test Bench does not support Parallel tests.
4. The Downstream Port Model has a 128-byte MPS capability in the receive direction and a 512-byte MPS
capability in the transmit direction.
5. The downstream port and the provided testbench only provide just enough tools to do writes and read to the
user design. The downstream port model is not a full blows simulation model as a true BFM available from
third party vendors. However, it enables enough functionality to do basic testing of the user design.

Limitations of PIO Example Design


1. The PIO design is a simple target-only application that interfaces with the Endpoint for PCIe core’s
Transaction (TRN) interface.
2. The PIO design only supports single DWORD payload Read and Write PCI Express transactions to 32/64
bit address memory spaces and IO space with support for completion TLPs.
3. The example design supports one IO space BAR, one 32-bit Memory space (that cannot be the Expansion
ROM space), and one 64-bit Memory space. If these limits are exceeded, only the first space of a given
type will be active–accesses to the other spaces will not result in completions.
4. Each space is implemented with a 2 kB memory. If the corresponding BAR is configured to a wider
aperture, accesses beyond the 2 kB limit wrap around and overlap the 2 kB memory space.
5. The PIO design successfully processes single DWORD payload Memory Read and Write TLPs and IO
Read and Write TLPs. Memory Read or Memory Write TLPs of lengths larger than one DWORD are not
processed correctly by the PIO design; however, the core does accept these TLPs and passes them along to
the PIO design. If the PIO design receives a TLP with a length of greater than 1 DWORD, the TLP is
received completely from the core and discarded. No corresponding completion is generated.
6. PIO design handles Memory writes and IO TLP writes in different ways: the PIO design responds to IO
writes by generating a Completion Without Data (cpl), a requirement of the PCI Express specification.
7. The PIO_32_TX_ENGINE and PIO_64_TX_ENGINE modules generate completions for received
memory and IO read TLPs. The PIO design does not generate outbound read or write requests. However,
users can add this functionality to further customize the design.

Playing with Packets


Length Parameter Modification

As stated in the earlier section, the PIO example design will not generate completion for TLP whose payload is
greater that ‘1’. Let’s see the behaviour of the simulation if the memory read TLP has Length parameter of ‘2’.
Let’s change the parameter value in the pio_writeReadBack_test0 as shown below:

If we simulate the resulting design, the simulation times out. The TLP is passed from the dsport to the endpoint. It
appears at the receive trn interface of the endpoint of the example design as well but the completion is never
generated and hence the simulation times out.

In the yellow box in the following waveform, we can see that the TLP with the modified length parameter leaves
the trn transmit interface at the dsport model.

Page 54 of 66
In the waveform below, in the yellow box, the TLP does arrive at the user side but completion is never generated.

Let’s take a closer look at the modified TLP and check whether the new length value shows up in the outgoing
packet or not.

Page 55 of 66
Poisoning Memory Read TLP

Now let’s see what happens if the Memory_Read_32 TLP is poisoned. The TLP is poisoned by assigning the value
of ‘ep’ as ‘1’ as shown below. The following code snippet is from PROC_TX_MEMORY_READ_32 procedure in
test_interface.vhd. .

The packet is generated and passed to the endpoint backend application. The user application detects that the TLP
with poisoned data has been received. The message is printed on the console window as shown below:

The user guide says following:

Page 56 of 66
Let’s see if we see similar behavior in our simulation or not:

Looking at the waveform, it is not clear what the behavior of the completion should be for the poisoned incoming
non-posted packet. The completion packet seems to be generated correctly with completions status set to
successful. This needs to be clarified.

Page 57 of 66
Poisoning Configuration Write Request

Let’s see what happens when configuration write request is poisoned. This is done by modifying the ‘ep’ bit in
PROC_TX_TYPE0_CONFIGURATION_WRITE procedure in test_interface.vhd as shown below:

PCI Express specification says:

We see this behavior in our simulation as well. Following is the message printed on the console when poisoned
configuration write is transmitted from the dsport model.

In the following waveform, the red box is the poisoned configuration write request, the yellow box is the
corresponding completion for this request. We will first look into whether the output configuration write request is
poisoned or not. Then we will look into the incoming completion to see whether the ‘Completion Status’ field is set
to ‘001’ indicating ‘Unsupported Request’.

Page 58 of 66
The following waveform shows the close-up view of the first outgoing configuration write request with poisoned
bit ‘ep’ set to ‘1’.

The header format for the configuration request TLP is presented below for the quick reference for the readers.

Following waveform shows the corresponding completion with completion status set to ‘Unsupported Request’.

The header format from the completion TLP is presented below for the quick reference for the readers.

Page 59 of 66
Testing with a new TLP

Let’s insert a new TLP that will write to BAR mapped to 32-bit memory address and then read it back. This is done
by modifying the tests.vhd file as follows:

We are not creating any fancy TLP here. We are just replicating the PROC_TX_MEMORY_WRITE_32 TLP to
read and write different data set to the same memory location as would be done by
PROC_TX_MEMORY_WRITE_32. For this, a new procedure is defined in test_interface.vhd called
PROC_TX_MEMORY_WRITE_321. The content of this procedure will basically remain the same except for
DATA_STORE. A new array has been defined called DATA_STORE1. A different set of values is stored in this
array as shown in the code snippet above.

Let’s look at the resulting waveform. In our previous simulation we had 3 TLPs going down from dsport to the
endpoint. In this case we will have 5 TLPs.

Page 60 of 66
Let’s take a closer look at the new outgoing TLP. We should see the following data payload with this TLP.

DATA_STORE1(0) := X"DE";
DATA_STORE1(1) := X"AD";
DATA_STORE1(2) := X"BE";
DATA_STORE1(3) := X"EF";

The output log in tx.dat for this TLP is as shown below:

A Memory Read TLP is issued, let’s if we see the same data pattern in the completion sent back from the endpoint
for this Memory Read TLP.

The output log in rx.dat for this completion packet is as shown below:

Page 61 of 66
Appendix
Some reference contents have been added in this section to make it easier for the readers to do packet analysis and
hence debug their design. The contents in this section have been taken from chapter-4 of the ‘PCI Express System
Architecture’ book from Mindshare.

Generic TLP Header Fields

Page 62 of 66
Page 63 of 66
IO Request Header Format

Memory Request Header Format

Page 64 of 66
Configuration Request Header Format

Message Request Header Format

Completion Header Format

Few points to note

1. Byte enables bits are high true. A value of “0” indicates the corresponding byte in the data payload should
not be written by the completer. A value of “1”, indicates it should.
2. If the header Length field indicates a transfer is more than 1DW, the first DW Byte Enabled must have at
least one bit enabled.
3. A write request with a transfer length of 1DW and no byte enables set is legal, but has no effect on the
completer.
4. If a read request of 1 DW is done with no byte enable bits set, the completer returns a 1DW data payload of
undefined data. This may be used as a Flush mechanism. Because of ordering rules, a flush may be used to
force all previously posted writes to memory before the completion is returned.
5. The first byte of the data in the payload (immediately after the header) is always associated with the lowest
(start) address.
6. Receivers also must check for discrepancies between the value in the Length field and the actual amount of
data transferred in a TLP with data. Violations are also handled as Malformed TLPs.

Page 65 of 66
7. Requests must not mix combinations of start address and transfer length which will cause a memory space
access to crass a 4KB boundary. While checking is optional in this case, receivers checking for violations
of this rule will report it as a Malformed TLPs.

References
1. PCI Express™ Base Specification Revision 1.1,March 28, 2005
2. LogiCORE™ IP Endpoint Block Plus v1.9 for PCI Express®,UG341 September 19, 2008
3. PCI Express System Architecture by Ravi Budruk, Don Anderson, Tom Shanley, MindShare, Inc.

Page 66 of 66

Potrebbero piacerti anche