Sei sulla pagina 1di 8

Remote Peer to Peer Made Easy-

Dolphin Express IX
Dolphin Interconnect Solutions Whitepaper
Dolphin Express IX
DISCLAIMER
DOLPHIN INTERCONNECT SOLUTIONS RESERVES THE RIGHT TO MAKE
CHANGES WITHOUT FURTHER NOTICE TO ANY OF ITS PRODUCTS AND DOC-
UMENTATION TO IMPROVE RELIABILITY, FUNCTION, OR DESIGN. DOLPHIN
INTERCONNECT SOLUTIONS DOES NOT ASSUME ANY LIABILITY ARISING
OUT OF THE APPLICATION OR USE OF ANY PRODUCT OR DOCUMENTS.
Table of Contents
1.0 Introduction....................................................................................................... 1
2.0 SISCI API ......................................................................................................... 1
3.0 Hardware configuration.................................................................................... 2
3.1 Connecting computers using PCIe over cable.............................................................. 2
4.0 Software configuration..................................................................................... 2
4.1 How to make a PCIe address space available for remote access.................................. 3
4.2 How to set up a local PCIe device to access a remote segment or device.................... 3
5.0 Data transfers.................................................................................................... 4
5.1 CPU or DMA engine used for direct remote access..................................................... 4
5.2 FPGA direct access to remote memory........................................................................ 4
5.3 Multicast....................................................................................................................... 4
6.0 Interrupt forwarding.......................................................................................... 4
6.1 Optimized remote interrupt forwarding........................................................................ 5
7.0 Availability........................................................................................................ 5
8.0 Reference and more information...................................................................... 5
Introduction
1 of 5
1.0 Introduction
PCI Express peer to peer communication (P2P) specification enables regular PCI
Express devices to establish direct data transfers without using main memory as
temporary storage or the CPU for data movement. Peer to peer communication
significantly reduces communication latency, but up until now has been limited to single
systems.
Dolphins IX product family expands P2P communication beyond single systems. Local
and remote PCI Express devices can communicate via P2P communication as if they
were located in a single system. A single application can directly control all PCIe
devices, while parallel applications running on multiple servers can implement a
protocol to share these devices. Intel Xeon Phi, GPUs
1
, custom FPGAs, specialized
data grabbers, video IO Devices etc all benefit from exploiting remote P2P
communication to reduce latency and communication overhead.
2.0 SISCI API
First defined in 1998, Dolphin utilizes the SISCI (Software Infrastructure Shared-
Memory Cluster Interconnect) API to add support for P2P communication. The SISCI
API simplifies setup and management of peer to peer transfers. Customers developing
applications can directly access and utilize PCI Express functions, all without writing
device drivers or spending time studying PCI Express chipset specifications. The SISCI
software enables applications to use CPU / Programmed IO (PIO) or DMA operations
to move data directly to or from local or remote PCI Express devices. Dolphins
reflective memory functionality enables multi-cast of data transparently to multiple
devices. Dolphin benchmarks show end to end latencies as low as 0.74us and over 3500
Megabytes/sec dataflow at the application level. These benchmarking tools are
included in the SISCI developers kit.
The SISCI API consists of driver and API software, tools, documentation and source
needed to develop your own embedded application utilizing the low latency and high
performance of a PCI Express Cluster. The SISCI API provides a C system call
interface to ease customer integration of PCI Express over cable solutions.
SISCI enables customer applications to easily and safely bypass the limitations of
traditional network solutions, avoiding time consuming operating system calls, and
network protocol software overhead. SISCI resources (memory maps, DMA engines,
Interrupts etc) are identified by assigned IDs and managed by a resource manager
enabling portability and independent applications to run concurrently on the same
system.
The SISCI API has been defined in the European Esprit project 23174 asa de facto
industry standard Application Programming Interface (API) for shared memory based
clustering.
1. PCI Express does also support GPU Direct
TM
functionality.
Hardware configuration
2 of 5
In addition to the reflective memory/multicast functionality, the SISCI API provides
functionality to access remote memory for unicast (single remote read or write) and
Direct Remote DMA (RDMA) using the onboard DMA engine. The API also includes
support for sending and receiving remote interrupts and error checking.
3.0 Hardware configuration
A peer to peer PC configuration includes several
PCI Express slots with an I/O system that supports
standard PCI Express peer to peer communication.
After installing a Dolphin IXH610 card in one PCI
Express slot, the source device, in our example an
FPGA as shown in figure 1, is installed in another
PCI Express slot in the same system. Multiple
devices can be installed in each host.
The FPGA board operates in traditional transparent
mode. Depending on the nature of the FPGA and its functionality, the local FPGA
device driver will need to be aware of the remote connectivity and sharing. It is up to the
system designer to solve any sharing issues that may arise between the local device
driver and applications accessing the device from a remote system; the SISCI software
just enables the sharing functionality.
3.1 Connecting computers using PCIe over cable
To enable communication between systems, a PCI Express iPass cable is connected
between IXH adapter cards installed in each system. If connecting more than two
systems, an IXS600 PCI Express Switch is added to the configuration, to enable system
scale out. The switch configuration enables multiple nodes and devices to be installed
in each system and communicate peer to peer.
4.0 Software configuration
The first step in configuring software for
peer to peer communication is ensuring
that all nodes install the standard Dolphin
DIS driver software package. This package
includes Dolphins SuperSockets
software and the SISCI API. The next step
is to develop a SISCI application
implementing the desired PCIe peer to
peer communication control. The SISCI
API provides the mechanisms to easily
accomplish this task.
Basic SISCI functionality is to allocate
parts of the system main memory and share it with other cluster nodes. Segments and
Figure 1 -Single Node Configuration
Figure 2 -Two nodes interconnected with PCIe
Software configuration
3 of 5
nodes are identified by a cluster wide unique node IDs and a system wide unique
segment IDs. Applications use node IDs and segment IDs to realize connections. The
SISCI and Interconnect Resource Manager (IRM) drivers (part of the Dolphin driver
package) are responsible for safely managing the resources and low level tasks required
to establish connections. Non Transparent Bridging (NTB) mapping tables (LUTs)
perform the required local to remote address space translations after appropriate
physical addresses are exchanged by the drivers.
4.1 How to make a PCIe address space available for remote access
SISCI provides a rich toolbox for creating clustered applications. SISCI API functions
enable each hardware resource made available over PCIe to be mapped into the
controlling applications address space. Several applications, possibly running on
multiple nodes, can share these devices. Although SISCI provides the basic mechanism
for communication, it is the responsibility of the application programmer to implement
and handle the actual sharing.
In order to transfer data, PCIe device memory must be registered as a SISCI segment.
To register a PCIe device memory as a SISCI segment, application programmers uses
the SCIAttachPhysicalMemory() SISCI function to specify the physical address and
number of bytes within the PCIe device that should be made available as a SISCI
segment. The application also needs to call SCIPrepareSegment() and
SCISetSegmentAvailable(). SCIPrepareSegment guarantees that a local segment is
accessible by an adapter. SCISetSegmentAvailable makes a local segment visible to
remote nodes that can then connect to it. After these calls are completed, a remote host
can connect and map to the physical memory. Dolphin provides example code in the
rpcia.c SISCI test program source code for more details.
4.2 How to set up a local PCIe device to access a remote segment or device
To initiate a transfer between a local device and a remote segment or device, the local
PCIe device must have access to a remote SISCI segment (memory or a remote device).
Developers need to identify the corresponding IO address in the local address space.
This address is retrieved using the SISCI SCIQuery() function, flag
SCI_Q_REMOTE_SEGMENT_IOADDR by the SISCI application after the remote
segment is connected and mapped. The address returned by the query function is used
directly by the PCIe master to access the remote segment. The address is inside the
BAR address of the IXH610 card and directly mapped to the remote address. The
customer must make the address available to the PCIe device master. The address is
available after the application completes the SCIConnectSegment() and
SCIMapRemoteSegment() functions. SCIConnectSegment() connects an application to
a memory segment made available on a remote node and creates and initializes a
descriptor for the connected segment. SCIMapRemoteSegment() maps an area of a
remote segment connected with either SCIConnectSegment() into the addressable space
of the program and returns a pointer to the beginning of the mapped area.
PCIe devices must also be registered with the Dolphin IX card as an approved PCIe
master by using the SCIRegisterPCIeRequester() SISCI function. This registration
ensures that the master access is passing through the required NTB function.
Data transfers
4 of 5
5.0 Data transfers
Once properly configured, connected and mapped as described above, concurrent
transfers between any installed devices and CPU or Memory is enabled. Various
configurations are possible with this flexible system.
5.1 CPU or DMA engine used for direct remote access
Figure 3 illustrates a CPU direct remote access using basic CPU load or store operation.
This basic operation is low latency and ideal
for small data transfers.
Larger transfers requiring CPU off-load are
accomplished by using the system bcopy() or
moved from local memory to the remote
FPGA by engaging the IXH610 onboard
DMA engine through the appropriate SISCI
API DMA functions. Please consult the SISCI
API Functional Specifications for more
details on DMA.
5.2 FPGA direct access to remote memory
An FPGA device that can act as a PCIe master can directly place data into remote
memory by using the address provided by the SCIQuery() function as described above.
(Note:slave devices may need special design consideration to achieve desired source /
sink transfer bandwidths. This is no different than a single root P2P implementation.)
5.3 Multicast
The combination of SISCI software and a Dolphin Express IX cluster containing a
IXS600 switch supports PCIe multicast functionality often referenced by Dolphin as
reflective memory. By combining PCIe multicast and PCIe peer to peer transfers, a
FPGA for example can send data to multiple targets using a single posted write
transaction. For detail on Dolphins reflective memory functionality refer to the Dolphin
reflective memory white paper available at www.dolphinics.com/solutions/
whitepapers.html.
6.0 Interrupt forwarding
The Dolphin solution also includes support for interrupts. With interrupt forwarding,
device interrupts for a device that is accessed from remote will by default trigger a local
interrupt in the local host system. The local application that is controlling the device can
use regular SISCI API SCICreateInterrupt() and SCITriggerInterrupt() to send
interrupts to remote nodes.
Figure 3: Remote access to remote IO device
Availability
5 of 5
6.1 Optimized remote interrupt forwarding
Dolphin plans to add an interface to FPGA device interrupt handlers to our kernel
drivers. This will enable remote SISCI interrupts to be triggered directly from the device
interrupt handler. Dolphin also plan to offer additional MSI interrupt functionality. MSI
interrupts would automatically forward to remote doorbell registers, triggering the
SISCI interrupt handler. Please contact Dolphin for further details if you are interested
in exploiting the optimized remote interrupt mechanisms described above.
7.0 Availability
The PCIe peer to peer functionality described above is available in Dolphin Express IX
release DIS 4.4.0 or newer. The functionality is available through the SISCI API using
Linux, Windows or RTX operating systems. The nodes can run any of the above
operating systems inter-communication between Linux, Windows and RTX is fully
supported.
8.0 Reference and more information
Please visit www.dolphinics.com for additional information on the Dolphin Express IX
product family.
Additional information including the online SISCI API Functional Specification can be
found at http://www.dolphinics.com/products/embedded-sisci-developers-kit.html
Please contact pci-support@dolphinics.com if you have any questions.

Potrebbero piacerti anche