Sei sulla pagina 1di 78

Final Year Project

Spring Report

Implementing a USB 2.0 Intellectual Property Core on FPGA

Presented By: Liza Tutunjian Arine Hadidian George Ghanem

Final Year Project Spring Report

Implementing a USB 2.0 Intellectual Property core on FPGA


By Liza Tutunjian Arine Hadidian George Ghanem
Advisors Dr. Mazen Saghir Dr. Ali Chehab

Department of Electrical and Computer Engineering

American University of Beirut


2

May 23, 2006

Abstract
Implementing a USB 2.0 Intellectual Property core on FPGA By Liza Tutunjian, Arine Hadidian and George Ghanem

This report contains the work we have completed concerning the implementation of a high speed USB Intellectual Property (IP) core on an FPGA.

In the first chapter, we define the problem that we are setting to solve, highlight the practical importance of the topic we chose as our FYP, summarize the tools that we shall use to complete the project and the estimated budget and present a schedule of our plan of work for the Spring term.

In the second chapter, we included a Review section that provides the reader with background for USB specific information, a summary of solutions that have already been implemented and the reason why we chose an FPGA implementation over an ASIC implementation.

The third chapter, Design and Analysis, presents a high level block diagram of the USB system that we will implement and explains the relationship between the various blocks as well as the responsibility of each in delivering a working USB IP core.

The fourth chapter, Implementation, defines the design of our hardware system and our software layer. It provides a description of the main components involved in the system.

The fifth chapter, Evaluation, includes the test cases that we performed at different stages of the project to ensure the functionality of our project.

The last chapter, Conclusion, presents some of the problems we faced along with alternative solutions, areas of future work and consideration of design constraints.

Table Of Contents 1.0 Introduction............................................................................................................. 6


1.1 Problem Statement Background ................................................................................................... 6 1.2 Problem Statement ....................................................................................................................... 7 1.3 Practical Importance of our FYP ................................................................................................... 8 1.4 Budget........................................................................................................................................... 8 1.5 Tools for implementing the FYP.................................................................................................... 9 1.5.1 Virtex-II V2MB1000 Development Board ........................................................................... 9 1.5.2 Embedded Development Kit Software................................................................................. 10 2.0 Review ................................................................................................................... 12 2.1 Introduction ................................................................................................................................. 12 2.2 USB Background......................................................................................................................... 13 2.2.1 Interlayer Communication Model......................................................................................... 13 2.2.2 USB Packet Field Formats .................................................................................................. 16 2.3 Approach to solving problem ...................................................................................................... 17 2.4 Literature Survey......................................................................................................................... 18 3.0 Design and Analysis ................................................................................................ 20 3.1 High Level Design ....................................................................................................................... 20 3.2 Datasheet Summary ................................................................................................................... 21 3.2.1 Introduction .......................................................................................................................... 21 3.2.2 Features............................................................................................................................... 21 3.2.3 Block Diagram ..................................................................................................................... 22 3.2.4 System Overview................................................................................................................. 23 3.2.5 Signal Definitions ................................................................................................................. 24 3.2.6 Registers.............................................................................................................................. 24 3.2.7 Choice of Design ................................................................................................................. 27 4.0 Implementation ....................................................................................................... 28 4.1 USB Core General Implementation ............................................................................................ 29 4.1.1 Host Controller..................................................................................................................... 31 4.1.2 Transmitter........................................................................................................................... 36 4.1.3 Receiver............................................................................................................................... 43 4.1.4 Host Controller Driver .......................................................................................................... 47 5.0 Evaluation .............................................................................................................. 50 5.1 Host Controller Testing ............................................................................................................... 50 5.2 Transmitter Testing ..................................................................................................................... 52 5.3 Receiver Testing ......................................................................................................................... 60 5.4 Testing the USB Core on the FPGA ........................................................................................... 65 6.0 Conclusion ............................................................................................................. 74 6.1 Difficulties Faced......................................................................................................................... 74 6.2 Future Work ................................................................................................................................ 77 6.3 Design Constraints...................................................................................................................... 77 7.0 References ............................................................................................................. 78

Table Of Figures
Figure 1: VirtexII System Board .............................................................................................................. 9 Figure 2: Embedded Development Kit Environment ............................................................................. 10 Figure 3: Simple USB Host/Device View............................................................................................... 13 Figure 4: USB Implementation Areas.................................................................................................... 13 Figure 5: Host Communication.............................................................................................................. 14 Figure 6: High Level System Block Diagram......................................................................................... 20 Figure 7: Overall System ....................................................................................................................... 28 Figure 8: Host Controller ....................................................................................................................... 32 Figure 9: Transmitter ............................................................................................................................. 37 Figure 10: USB Cable............................................................................................................................ 37 Figure 11: Speed and Data States ........................................................................................................ 38 Figure 12: Handshake Packet ............................................................................................................... 39 Figure 13: Data Packet.......................................................................................................................... 40 Figure 14: Token Packet ....................................................................................................................... 40 Figure 15: NRZI Encoding..................................................................................................................... 41 Figure 16: Receiver ............................................................................................................................... 44 Figure 17: High Speed Data In Tick ...................................................................................................... 44 Figure 18: Cygnal CP2101 .................................................................................................................... 48 Figure 19: FSM states ........................................................................................................................... 46 Figure 20: SETUP Transaction ............................................................................................................. 51 Figure 21: OUT (0/1) Transaction ......................................................................................................... 51 Figure 22: IN Transaction...................................................................................................................... 52 Figure 23: Writing to the USB wire at High Speed ............................................................................... 53 Figure 24: Idle State .............................................................................................................................. 54 Figure 25: SYNC Bytes 1 and 2 ............................................................................................................ 54 Figure 26: SYNC Byte 3 and 4 .............................................................................................................. 55 Figure 27: Token Packet ....................................................................................................................... 55 Figure 28: Setup Token Packet............................................................................................................. 56 Figure 29: Little Endian ......................................................................................................................... 56 Figure 30:End Of Packet ...................................................................................................................... 57 Figure 31: Data Packet.......................................................................................................................... 58 Figure 32: Bit Stuffing ............................................................................................................................ 59 Figure 33: Handshake Packet ............................................................................................................... 59 Figure 34: High Speed Detection Waveform......................................................................................... 56 Figure 35: Receiving Bits at High Speed Waveform ............................................................................. 58 Figure 36: Forming a Byte Waveform ................................................................................................... 57 Figure 37: Processing Bytes that represent an ACK Waveform ........................................................... 60 Figure 38: Processing Bytes that represent Data Waveform ................................................................ 61 Figure 39: High Speed Rate Problem ................................................................................................... 72

List of Tables
Table 1: Signals...................................................................................................................................................24 Table 2: Register Description ................................................................................................................ 26 Table 3: Compare alternative designs with our design features ........................................................... 27 Table 4: Bus States ............................................................................................................................... 37 Table 5: Speed and Data States ........................................................................................................... 38 Table 6: Bit to Byte Conversion............................................................................................................. 43 Table 7: Bytes input into the Byte Analyzer component........................................................................ 60 Table 8: Processing Bytes that represent Data..................................................................................... 61 Table 9: Difficulties ................................................................................................................................ 74

1.0 Introduction
1.1 Problem Statement Background

The enhancements in hardware technology have led to the existence of Field Programmable Gate Arrays (FPGA) that are large enough to accommodate a complete system on a single device. Thus, such devices have been called system on a programmable chip (SOPC). The single-chip design allows designers to place a large number of functions onto a SOPC and to reprogram this chip from the desktop thus removing engineering costs from prototyping and testing of new designs.

For the past decade, Intellectual Property (IP) cores have been developed to provide consumers with a collection of cores to decrease the customer's time-to-market. IP cores form an essential element of design reuse and are part of the growing trend towards repeated use of previously designed components. At this point, a very diverse set of IP cores is now available on todays advanced FPGA devices. This offers system developers the opportunity to mix a match various microprocessor, embedded memory, system I/O functions on an FPGA while cutting down design time and reducing risk due to the industry availability of well performing and low cost components. The vast majority of those IP cores are created and owned by either the FPGA vendors themselves (such as Xilinx) or licensed from microprocessor vendors.

Popular IP core implementations include functions such as USB, digital signal processing FIR and IIR filters, fast Fourier transforms, adders, multipliers UARTs and more.

The motivation behind developing FPGA plus IP core solutions was accelerated due to the increased functionality, performance and flexibility that these solutions presented over other approaches to system design.

1.2 Problem Statement

Although high speed Universal Serial Bus (USB) IP cores have been implemented by commercial companies, vendors protect the trade secret and patents for their tremendous investment in time and development effort. FPGA vendors are working together with core providers to devise methods to license IP cores. A popular concept is to allow users to try-out the core by down-loading from the net for simulation-place-route trial use. If the user is satisfied with the core, a license fee is paid, enabling a key, which allows programming of the device. The problem here is that these cores are relatively expensive to purchase and only provide a border definition of the interface between the processor and the downloaded core.

With the recent availability of advanced FPGA boards in our digital labs that have physical USB ports on them, we found that a need exists for developing a high speed USB host IP core. Without such a core, any hardware system downloaded on an FPGA cannot communicate with a USB device. In a computer running Windows operating system, this functionality is implemented and the user of a USB device knows very little about USB packets that are sent back and forth between the host computer and a USB device. In order to enable the same thing for a hardware system that has been downloaded on an FPGA, that is, allow it to communicate with an attached USB device, we had to implement the hardware design of a host that implements the USB protocol.

Due to the significant emergence of USB based applications in todays world, we believe that implementing a USB IP core that can be reused by coming generations of engineering students at AUB will provide the groundwork for developing more advanced systems that make use of USB devices.

The main idea behind our Final Year Project is to develop a non-commercial, student research based IP core that implements a high speed Universal Serial Bust Host, and add this core as a block to the system hardware architecture. Basically we will design a hardware system that has a processor running code that communicates with a USB device.

1.3 Practical Importance of our FYP

Having defined the problem that we will be addressing, we need to highlight the practicality of meeting such a goal.

Generally speaking, the practicality of developing IP cores to build complex systems allows companies to sell standard solutions written in HDL for implementation on the designers own FPGAs and thus removes an element of re-inventing the wheel while spreading development time, and thus costs, around different companies.

Moreover, in todays industry, time-to-market pressures continue to increase. Irrespective of how well the previous project was completed, there is pressure to complete the next one in less time, less cost and higher performance. FPGA plus IP core solutions will continue to feed that market need to build faster and better systems-on-chip well into the next decade and beyond.

As a result of our FYP, the availability of an IP core for use by the faculty of engineering at AUB, will remove an element of re-inventing the wheel and allow students/faculty in the future more time to fully focus on optimizing system architectures and developing even more functionality into their USB compliant end products.

1.4 Budget

The Faculty of Engineering and Architecture at AUB have purchased several Virtex-II V2MB1000 Development Boards that include Xilinx FPGAs. The price of a development kit (the FPGA and the board) is approximately $2900. All the hardware that we need to implement the USB core is available in the digital labs. In addition to the hardware, the software is also available and installed on all the computers in the digital lab. To be able to implement the USB host protocol, we will have to study the USB 2.0 Specification, which we downloaded for free and thus this did not affect our budget.

1.5 Tools for implementing the FYP

1.5.1 Virtex-II V2MB1000 Development Board The Virtex-II board, shown in the figure below, utilizes up to 1 million gates and contains a large number of I/Os to facilitate implementation.

Figure 1: VirtexII System Board Some of the components present on the board are: Xilinx FPGA 2 clock sources RS-232 port LEDs, switches and 7 segment displays P160 additional Module interface that can add USB physical interface, SRAM memory and Ethernet interface. 16M x 16 DDR memory

We initially wanted to place the P160 additional module, which contains the physical USB port, in the P160 expansion slot on the board to be able to test our system. However, we faced a problem with this and we had to come up with an alternative solution. Instead of actually testing the USB core using the physical USB wire, we designed a block in VHDL that simulates the USB device to ensure that our system was functional.

1.5.2 Embedded Development Kit Software To implement our core on the FPGA, we used the EDK software development kit that came with the Virtex-II V2MB1000 Development Board. The EDK is an all encompassing design environment for Virtex-II Pro MicroBlaze based embedded systems in Xilinx FPGAs. The figure below shows the tools that the EDK environment provides for the implementation of embedded applications.

Figure 2: Embedded Development Kit Environment We used Xilinx Platform Studio within the EDK environment to build a hardware system that contains:

1- Pre-designed microblaze soft processor: The design of this processor is provided with the EDK kit and can be added with a click of a button. 2- VHDL USB core peripheral: The design of this peripheral was implemented by us.

We connected the VHDL peripheral as a slave to the Microblaze processor through the On-Chip peripheral (OPB) bus. As part of the Hardware Development Flow, we synthesized the VHDL files for the system above into a netlist that contains AND, OR, XOR, NAND gates and so forth. This netlist was then mapped, placed and routed to fit onto the FPGA. Finally in the Hardware Development flow, we generated the bitstream and then download the bitstream onto the FPGA through the JTAG port.

10

We also used XPS to write C code that functions as a simple driver for USB hardware core and is run by the aforementioned Microblaze processor. As part of the Software Development Flow, we compiled and downloaded the C code in on-chip memory. Specifically we used 64 KB of BRAM memory which is the local data and instruction memory.

11

2.0 Review
2.1 Introduction

USB (Universal Serial Bus) is the serial bus which can realize the Plug & Play feature for easy connection of peripherals to PCs. It removes the need to open up a PC when adding a new peripheral device and allows the required software to be installed automatically.

In the mid-1990s, a core team of engineers from Compaq, DEC, IBM, Intel, Microsoft, NEC and Northern Telecom (now, Nortel Networks) led to the development of a high speed serial bus specification, USB 2.0. Today, more than 1000 companies develop products which can be connected to the PC via USB.

Popular in the PC and telecom market for several years now, USB is designed to support standard PC peripherals and specialist devices. PC peripherals supported by USB include modems, keyboards, mice, CD ROM drives, joysticks, tape/floppy/hard drives, scanners and printers. Moreover, a new wave of peripherals such as telephones, digital speakers, digital snapshot and motion cameras, data gloves and digitizers are to take advantage of this exciting and versatile new interface.

A range of data traffic workloads can be serviced over a USB: Low-speed (10-100 kb/s) for interactive devices, full-speed (500 kb/s 10 Mb/s) for phone, audio, compressed video, high-speed (25 400 Mb/s) for video or storage. Note that the signaling rates for the low speed, full speed and high speed protocol are 1.5Mbps, 12Mbps and 480Mbps respectively. But these are maximum values and practically the rate of communication with the device is below these maxima.

12

2.2 USB Background

The following two sections contain background information about the USB 2.0 Specification. The first section describes the different layers of the USB host communication model and the second section defines some common USB packet fields.

2.2.1 Interlayer Communication Model The USB cable provides communication services between a host and attached USB devices. A host is any device that has USB devices attached to it. The view an end user sees of attaching one or more USB devices to a host is little more complicated to implement than is indicated by the figure below.

Figure 3: Simple USB Host/Device View The host is made of three distinct layers, shown in Figure 4 below. A physical device is attached to the host. This device is typically a function that provides capabilities to the system. The physical device is also implemented in three distinct layers (right side). Physical communication between the host and the physical device occurs horizontally through the lowest layer which is the USB wire. The vertical arrows between the layers indicate the actual communication on the host. Moreover, there is logical host-device communication between each horizontal layer above the physical layer.

Figure 4: USB Implementation Areas

13

For our Final Year Project, we implemented parts of the host side only which is why the following discussion will focus on the layers to the left of the figure above. Figure 5 below is a more detailed view of the different layers of a USB host.

Our work focused mainly on the lowest, most physical layer (highlighted in blue in the figure below) that accepts hardware defined data from the level above it and sends bits on the USB cable. In order to test our hardware, we wrote a simple Host Controller Driver in C, highlighted in green in the figure above, which completes a transaction with the USB device. In fact, our software system interface is dependent on our hardware implementation and does not follow the specification for USB drivers. The software layer simply abstracts the details of the protocol that are implemented by the hardware layer. The software layer is involved at the level of transactions, whereas the hardware layer is involved at the level of packets and bits. As for the highest layer which is the client software, this would typically be C code that interacts with a USB device using only very high level functions such as read_USB( ) and write_USB( ). The client layer was not implemented as part of our FYP.

Layers we implemented VHDL C Figure 5: Host Communication

14

1. Physical Layer The physical layer, referred to as USB Bus Interface Layer in a USB environment (see Figure 5), is the hardware that handles the transmission of raw bits over the USB wire. This is the lowest layer in the figure above. It is composed of two blocks: Serial Interface Engine and Host Controller. Data flowing out of the USB host passes through the Host Controller first, then through the Serial Interface Engine.

a) Serial Interface Engine (SIE): The SIE performs several functions including serialization and de-serialization of transmissions, encoding and decoding of the signals, generation and verification of cyclic redundancy checks and detection of packet IDs and special signals.

b) Host Controller (HC): The host controller, initiates transactions and controls access to the USB. It divides the time into frames and issues a start-of-frame packet at each frame interval. In addition, it processes requests for data to and from the host and handles errors.

2. Protocol Engine Layer The middle layer is composed of three sub-blocks: Host Controller Driver, USB Driver and USB system software.

a) Host Controller Driver (HCD): The HCD (see Figure 5), is the lowest tier in the USB software stack. It is the USB software layer that abstracts the Host Controller hardware and provides an interface for interaction with the Host Controller. We wrote part of the HCD to test our hardware system.

The blocks that are described after this are required to enable a client application to interact with a USB device. Implementing these layers was not within the scope of our project, but we will describe them so that the reader can have an idea of the logical flow that occurs from the highest to the lowest layers.

15

b) USB Driver (USBD): The interface between the USB System Software and the Client Software. This interface provides clients with convenient functions for manipulating USB devices.

c) USB system software: The USB system software (see Figure 5), allocates bus bandwidth and manages bus power. It identifies, enumerates, and services data requests from devices on the bus.

3. Application Layer The application layer is also known as Client Software (see Figure 5). Client software determines what transfers need to be made with a function. Client software is aware only of the set of pipes (i.e., the interface) it needs to manipulate its function. Requests made by the client software are presented via the USBD interface.

2.2.2 USB Packet Field Formats For the purposes of our report, we did not find it necessary to define the details of the USB protocol such as the exact format of packets exchanged. However, we need to describe very briefly a few terms because they will be used in the Design and Analysis chapter to explain how our Design meets the standard.

Here, we will simply define some of the most recurring fields in USB packets.

SYNC: All packets begin with a synchronization (SYNC) field. It is used by the input circuitry to align incoming data with the local clock. The Start-of-Packet (SOP) delimiter is part of the SYNC field.

PID: A packet identifier (PID) immediately follows the SYNC field of every USB packet. A PID consists of a four-bit packet type field followed by a four-bit check field. The PID indicates the type of packet and, by inference, the format of the packet and the type of error detection applied to the packet. There are four types of packets: Token (OUT, IN, SOF or SETUP), Data (DATA0, DATA1, DATA2 or

16

MDATA), Handshake (ACK, NAK, STALL or NYET) and Special (PRE, ERR, SPLIT, PING or Reserved). Note that an IN PID specifies a transaction from a function to the host. Whereas OUT/SETUP PIDs specify transactions from the host to a function.

ADDRESS FIELD: Function endpoints are addressed using two fields: the function address field and the endpoint field. A function needs to fully decode both address and endpoint fields.

DATA FIELD: The data field may range from zero to 1,024 bytes and must be an integral number of bytes. Data bits within each byte are shifted out LSB first.

CRC: Cyclic redundancy checks (CRCs) are used to protect all non-PID fields in token and data packets. Token and data packet CRCs provide 100% coverage for all single- and double-bit errors.

2.3 Approach to solving problem

We have decided to solve the problem of designing a hardware system that can interface to a USB device by implementing a USB 2.0 Revision (high-speed) compliant Intellectual Property core on an FPGA.

In fact, an alternative to implementing IP cores using FPGAs is doing so using Application Specific Integrated Circuits (ASIC). In the past, it was a rule of thumb that densities of more than 500,000 gates and volumes above 100,000 units were beyond the capability of FPGAs. Today, FPGAs approach ASIC-equivalent densities of 1 million gates which is the reason why we found it important to discuss why an ASIC based approach was not taken. In what follows, we will briefly discuss the alternative of implementing our project on an ASIC. In doing so, we will compare and contrast it to implementation on an FPGA and state why we chose to use an FPGA based implementation instead. First, an Application Specific Integrated Circuit (ASIC) is a chip designed to do a certain specific job or a small group of jobs. If you want to implement different functionality, then you need to use a different chip.

17

Second, An FPGA can be re-programmed again and again, until all bugs are removed and the system is working correctly. However, an ASIC is hard-wired with a mask. Once it is fabricated, no changes can be made. Usually, in the commercial world, a system that has been prototyped on an FPGA is migrated to an ASIC as one of the final stages before selling the product.

Although an FPGA consumes more power than an ASIC, it was a better choice for us as students implementing an IP core since we could take advantage of the debugging and reprogramming advantages that it offers.

Third, ASICs are usually made in large quantities by big companies. The total investment is large, but the unit cost is small if the chip is manufactured in large quantities. However, ASICs have nonrecurring engineering (NRE) costs that are pretty high if the end result is targeted towards fabricating only one chip. On the other hand, FPGAs can be used for one-offs since they do not have nonrecurring engineering (NRE) costs. Since in our case, we will be implementing only a single chip, the decision of using an FPGA is to avoid high costs.

Moreover, the purpose of our project is not commercial and we are not concerned about the unit cost of implementing the chip since it will not be for sale. In addition to the reasons mentioned above, the defining factor in our choice of an FPGA over ASIC was the availability of the FPGA boards with physical USB boards in our labs.

2.4 Literature Survey

Have USB IPs been implemented before? The answer is yes, USB cores are being implemented with every emerging processor. The USB IP core that interfaces the Microblaze processor is also available. However they are commercial IPs and are not for free.

18

Alternatives to USB core The USB IP core is the controller that is required if, for example, you wish to use your USB mouse or USB memory stick. Our controller is the device that acts as a bridge between the Microblaze processor and other USB devices. If it is not present then there is no way that you can utilize USB devices. To be able to interface your processor with a USB device, three options are available: 1. Buy a standard chip or product 2. Buy a commercial core 3. Design the USB core

Buy a standard chip or product In this solution, an extra chip will be added to the design. This third party chip will have a microcontroller and other logic that will act as a mediator between the processor and the USB devices. Since we are trying to add a USB core to the Microblaze soft CPU on the FPGA, using this design option would be unwise since we would end up using another chip leaving us with a bulky and costly design.

Buy a commercial core Another solution would be to purchase a ready made USB core that will be mapped and downloaded into the FPGA. What we are actually purchasing is the VHDL code that describes the USB IP. This solution is fast and not risky since the purchased USB core ensures high performance according to USB standards. This solution is usually pursued by design companies that require a USB interface with the processor. However, purchasing the IP core ourselves would do us no good since we wish to design a noncommercial IP core, one that will be used freely in AUB labs.

Design the USB core Finally we come to the solution that we have chosen. In this solution, the USB core is designed in VHDL and implemented in the FPGA. This solution is the most tedious of all. It requires a lot of work since familiarity with the exhaustive details of the USB protocol is needed. The hard part in creating the design is verifying compliance and interoperability with the USB standards.

19

3.0 Design and Analysis


3.1 High Level Design We designed a hardware system on an FPGA that consists of the Microblaze Microprocessor, the RS232 Core and the USB core that we implemented. All these blocks are connected through the OPB bus as shown in the figure below.

FPGA MicroBlaze Microprocessor


On Chip Peripheral Bus (OPB)

USB Core
Designed by Xilinx Designed by us

RS232 Core

Figure 6: High Level System Block Diagram

The Microblaze processor runs the C code that implements the Host Controller Driver on the FPGA. This C code interacts with the USB core by writing and reading from registers. Finally we use the RS232 Core to display the results of the C code on the screen.

We researched into a large number of USB Embedded Host Controller datasheets implemented by National Semiconductors, Cypress Semiconductors, Maxim, and Philips. We also designed our system in reference to a full speed version of a USB HostSlave IP core. We wanted a USB host controller block diagram which is at the same time simple enough for us to implement, compliant with the USB 2.0 specifications and including all functionalities necessary for the design of the USB-to-USB

20

data transfer application. We finally settled for a block diagram whose summary datasheet is written below by reference to the datasheets mentioned above.

3.2 Datasheet Summary

3.2.1 Introduction The host controller enables an embedded system to function as a USB Host, dramatically expanding the degree of interconnectivity and extending the applicability of USB into many new areas.

3.2.2 Features USB host controller for embedded applications. USB Specification 2.0 compliant Standard 8-bit microprocessor bus interface Supports high speed, full speed and low speed USB transactions Connected to the Microblaze processor as a slave on the OPB bus.

21

3.2.3 Block Diagram Block Diagram:

USB CORE OPB bus address_i(0:7) data i (0:7) clk rst we_i strobe i Microblaze data o(0:7) Bus Interface

Calculate Reset HostResumeIntOut HostTransDoneIntOut HostConnEventIntOut HostSOFSentIntOut USBSpeed USB Host Controller

Receive FIFO Transmit FIFO

USBWireDataOut (0:1) Physical Interface (USB Port) USBWireDataIn(0:1)

USB Serial Interface Engine

22

3.2.4 System Overview

The host controller block diagram consists of five major blocks (refer to Figure above). The USB Serial Interface Engine: Seen in a dotted black box in the figure above. Provides the interface between the Physical USB wires and the USB Core. It deals with lowlevel bit granularity by processing the incoming and outgoing data bits on the wires. It is composed of a SIE receiver that de-stuffs, parallelizes and NRZI decodes raw incoming data bits, and of a SIE transmitter that does the exact opposite with the outgoing bits. Receive and Transmit FIFOs: Seen in a dotted light blue box in the figure above. Implemented as First-In-First-Out buffers. We use the receive FIFO to hold the data payload of incoming data packets. These will be read later by the software layer. The transmit FIFO holds the data payload of data packets to be transmitted, these are loaded with data by the higher software layer prior to a transaction. Host Controller: Seen in a dotted grey box in the figure above. Operates at the transaction and packet level in contrast to the packet and bit-level at which the USB Serial Interface Engine operates. It manages all transactions, sends packets and waits for response packets, and notifies the software layer when a transaction is done. Bus Interface: Seen in a dotted red box in the figure above Selects and enables either one of the Receive FIFO, Transmit FIFO or Host Controller by processing the address it receives. It also generates the USB clock from the Bus clock (clk_i), the former being 4 times slower than the latter and processed by the majority of the components in the USB Core. Calculate Reset: Seen in a dotted blue box in the figure above Calculates different reset signals for each of the USB clock and the Bus clock. The latter should last 4 times more clock cycles than the former.

23

Note that our USB Core is connected to the Microblaze microprocessor as a slave on the On-Chip Peripheral Bus which is designed for easy connection of the USB peripheral device. It provides a common design point for various on-chip peripherals. In the following sections, we provide a detailed description of each component.

3.2.5 Signal Definitions

Name clk_i rst_i address_i [7:0] data_i[7:0] we_i strobe_i data_o [7:0] hostSOFSentIntOut hostConnEventIntOut hostResumeIntOut hostTransDoneIntOut USBSpeed [1:0] Table 1: Signals

IN/ OUT IN IN IN IN IN IN OUT OUT OUT OUT OUT OUT

Description The bus clock linked to the system clock of the FPGA on the board. Resets all components if active-high. Input Address Input Data Write Enable Indicates the start of a bus cycle period Output data corresponding to the address_i input Active-high when a SOF transmission occurs Active-high when a connect or disconnect of the device occurs Active-high when a resume state of the device is detected Active-high when a transaction is complete Speed of the attached device (low, full or high)

3.2.6 Registers These registers are the ones accessed through the signals address_i and data_i above. They provide the mode of communication between the software layer that implements the Host Controller Driver and the VHDL code that implements the USB core. By writing to and reading from these registers, the C code sends control or configuration information to the Host Controller or data to the Transmit FIFO. These registers also enable the software layer that implements the Host Controller Driver to read data located in the Receive FIFOs, or to read data from the Host Controller in order to check whether the Host Controller has processed the right configuration information.

24

In fact, in our implementation, the values of these registers are stored in VHDL and not in on-chip or off-chip memory. The VHDL code reads the address and data values sent on the OPB bus, it decodes this information to take one out of a set of actions. The details of how the VHDL code deals with these values are described in the table below.

Register name TRANSREQ_PREEN_SO FSYNC

Register Address 0x00

Bit position 1 2 3

Name TRANS_REQ SOF_SYNC PRE_EN SOF_EN FRAMENUM _MSB FRAMENUM _LSB CONNECT_ STATE DEV_ADDR END_ADDR TRANS_ TYPE

Description Set to 1 to enable a transaction, 0 to disable it. Set to 1 to synchronize transaction with end of SOF transmission. Set to 1 to enable preamble packets. Set to 1 to enable automatic transmission of SOF packets Most significant part of the frame number in SOF transmission Least significant part of the frame number in SOF transmission If 00, then we the device is at a disconnected state. If 01, lowspeed state, if 10 , full-speed, if 11 high-speed USB Device Address USB Device Endpoint Address Setup=0, IN=1, OUT0=2, OUT1=3 To specify the transaction type required, Set to1 when a transaction is complete Set to 1 when resume state is detected. Set to 1 when a connect or disconnect occurs Set to 1 when a SOF transmission occurs. Set to1 to enable interrupt when a transaction complete Set to1 to enable interrupt when resume state is detected. Set to1 to enable interrupt when a connect or disconnect occurs Set to1 to enable interrupt when a SOF transmission occurs. Packet ID of the last packet

SOF_ENABLE FRAMENUM_MSB FRAMENUM_LSB

0x01 0x02 0x03 0x04

1 [2:0] [7:0] [1:0]

CONNECT_STATE

DEVICE_ADDRESS ENDPOINT_ADDRESS TRANSACTION_TYPE

0x05 0x06 0x07

[6:0] [3:0] [1:0]

INTERRUPT_STATUS

0x08

0 1 2 3

INTERRUPT_MASK

0x09

0 1 2 3

PID

0xa

[3:0]

TRANS_ DONE_INT RESUME_ INT CONNECTIO N_EVENT_I NT SOF_ SENT_INT TRANS_ DONE_INT RESUME_IN T CONNECTIO N_EVENT_I NT SOF_SENT_I NT RX_PID

25

STATUS

0xb

0 1 2

CRC_ ERROR BIT_STUFF_ ERROR OVERFLOW

3 4 5 6 7

TIME_OUT NAK STALL ACK DATA_ SEQUENCE OR NYET LINE_ STATE DIRECT_ CNTR LINE_ POLARITY_ BIT RX_FIFO_ DATA RX_FIFO_D ATA_NUM_ MSB RX_FIFO_D ATA_NUM_L SB FORCE_ EMPTY TX_FIFO_ DATA FORCE_ EMPTY

LINE_CONTROL_INFO

0xc

[1:0] 2 [3:4]

RX_FIFO_DATA RX_FIFO_DATA_NUM_M SB RX_FIFO_DATA_NUM_L SB RX_FIFO_RESET TX_FIFO_DATA TX_FIFO_RESET Table 2: Register Description

0x20 0x21 0x22 0x23 0x30 0x31

[7:0] [7:0] [7:0] 0 [7:0] 0

received When set to 1, indicates CRC error in the last transaction. When set to 1,indicates bit stuffing error in the last transaction When set to 1, indicates that the receive FIFO is full and cannot accept anymore of the incoming data. When set to 1, indicates no response from USB device. When set to 1, indicates that NAK has been received in response of the last packet sent. When set to 1, indicates that STALL has been received in response of the last packet sent. When set to 1, indicates that ACK has been received in response of the last packet sent. Indicates the sequence number of the last packet received in case of IN transaction, or if it is a handshake packet, this indicates whether it is a NYET. If direct control is enabled, LINE_STATE directly controls the state of the physical wires. Set to 1 to enable direct control of the USB physical wires If 00, enables low-speed line polarity, if 01 full-speed line polarity, if 10 high-speed line polarity. Contains the receive payload of the last IN Transaction Most significant byte of the number of elements in the receive FIFO Least significant byte of the number of elements in the receive FIFO When set to 1, deletes all data in the receive FIFO Contains the transmit payload of the last OUT Transaction When set to 1, deletes all data in the transmit FIFO

26

3.2.7 Choice of Design In our quest for an appropriate block diagram for the USB host controller we came across a number of different implementation designs, such as: ISP1760; Embedded Hi-Speed USB host controller, ISP1563; Hi-Speed Universal Serial Bus PCI Host Controller from Philips, SL811HS; Embedded USB Host/Slave Controller. Many of these implementations had very low level and complicated block diagrams and/or included more features and functionalities than what was needed for our project. The design we settled for is as simple as possible implementing just the functionalities we need. Below is a table including a list of examples that contrast our choice of design to an alternative one along with the reason of our design choice.

Our Choice Implement only the host controller functionality. Processor interface not designed to satisfy common standards among other interfaces. Satisfy only the OPB bus Protocol.

Alternative Design Include slave controller functionality along with that of the host controller. Implement a processor interface which follows certain common standards (e.g. Wishbonecompatible).

Reason of Choice For simplicity purposes. In future work, slave functionality may be added. Other design alternatives have a processor interface to many kinds of microprocessors, microcontrollers, or directly to a variety of buses such as ISA, PCMCIA. Whereas our design only needs to have an interface to the Microblaze processor. Simpler design implementation.

Host Controller can be interfaced directly via 8 bits of its data bus and 8 bits of its address bus.

Provide an 8-bit bidirectional data path along with appropriate control lines to interface to external processors or controllers. Access to memory and control register space is a simple two step process, requiring an address Write (set a certain control line called A0 to 0) followed by a register/memory Read or Write cycle with address line A0 = 1.

Table 3: Compare alternative designs with our design features

27

4.0 Implementation
In order to build the USB High Speed core, we had to implement the USB 2.0 specification, which only specifies the language that high speed USB speaks but provides no details of implementation. Therefore, our first target was to complete the VHDL code that implements the USB High Speed protocol specification. As part of the specification, our core is supposed to be backward compatible with all three speeds of USB devices (low speed, full speed, high speed). This is because a high speed USB port residing on a host computer is expected to succeed in initiating communication with all USB devices, irrespective of their speed of operation. In this chapter, we will describe the implemented design of the core that has been written in VHDL.

Our VHDL code, which implements a host USB IP core, is composed of two main blocks as seen in the figure below: The Host controller component and the SIE component. The SIE component itself is divided into the Transmitter and Receiver components. The core interfaces to the Host Controller Driver implemented in software from the upper part and to the physical USB port from lower part. Note that raw bits of 0 and 1 are sent on the USB port as seen in the figure below.

Figure 7: Overall System

28

4.1 USB Core General Implementation

When a USB device initiates communication with the host computer, the receiver reads the bits that are on the USB cable at the correct speed, decoding whether these bits represent a certain state (such as start of packet, end of packet, idle) or certain fields that are part of USB packets and transferring this information to the Host Controller. To achieve this goal, the receiver reads serial data at one of the USB speeds, and it converts it to bytes which it sends to the Host Controller. Note that between receiving raw bits on the wire and sending bytes to the HC a lot of processing is done by the receiver block. For example, for a packet with a CRC, it recomputes the expected CRC and checks if it is equal to the one received. If it is not, it reports an error to the HC. The receiver also removes the bit stuffing that had been performed by the USB device, because the bytes that go to the HC must be pure of bit stuffing. Moreover, the receiver detects the speed of the USB device upon connection and it provides this information to all other components. For example, if it decodes that a high speed device was connected, it sends this to both the HC and the Transmitter. The Transmitter will then only send to the USB device at High Speed. As for the HC, this is the component that initiates and controls the progress of all transactions. So, it would need to know whether a device is high speed so that it sends a Start of Frame packet more often that in case the speed was low or full. These are just a few examples to explain the sort of communication that happens when raw bits are received on the wire.

Now, assuming that the receiver has decoded what speed the connected device is running at, the core is supposed to have an initial transaction with the USB device. A transaction consists of several packets. The host controller initiates and controls the progress of all transactions. As input to the host controller, we specify the type of transaction that the Host Controller Driver (HCD) wishes to make with the USB device as well as other needed fields such as the address of the device. The host controller, knowing what transaction is to be sent will command the transmitter to send the packets that make up that type of transaction and then it will wait for a response from the receiver which indicates a handshake from the device or a timeout indicating a lack of handshake. For example, let us take the case of having a setup transaction with the device at startup. When the HC realizes that it needs to initiate a setup transaction, it enters a state machine in which it sequentially sends a setup token packet, a data packet and waits for a handshake. The setup token packet simply contains the address

29

of the targeted device and is an indicator that the following packet is used to configure setup information. The data packet contains the setup information itself. So the host controller indicates to the transmitter component that a setup token packet must be send to device with address x. The transmitter, receiving this information from the HC, implements the details of the physical USB protocol. For example, when the transmitter receives a command to send a setup token packet, it cannot simply send the received bits for the packet on the line. The Transmitter needs to do a whole lot of processing before sending the packet. The transmitter first sends a START OF PACKET sequence on the line, serializes the bytes that it receives from the HC into bits, calculates the CRC over the packet, performs bit stuffing and then it sends the resulting bits on the USB cable. Then, realizing that the setup token packet has been sent, it sends an END OF PACKET sequence on the line. Note that all this processing was still for the first packet sent. A similar sequence happens for the data packet in the Transmitter. Now that both token and data packets have been sent, the core is in a state of waiting for some sort of response from the device regarding the packets that were sent previously. Two possible cases can occur. The device will either respond with a handshake packet or it will not respond. The receiver block waits listening attentively on the USB cable. If it receives no bits for a certain amount of time specified by the protocol, it reports a timeout to the HC. The host controller has as output a timeout interrupt signal that it sets in this case. This signal is to be handled by the (HCD) which should try to send the transaction another time. On the other hand, if the receiver starts receiving bits, it decodes them to find out whether they make a handshake packet. If yes, it reports to the HC that a handshake packet has been received from the device. The HC at this stage has completed the whole transaction, so it interrupts the Host Controller layer to say that the transaction is complete.

The explanation above is a very low detail and high level view of the interaction that happens between the components within the core, the software layer and the USB device. The implementation was quite tedious as it required us to take care of so many cases of transactions and packets and also to have the core support all three speeds, each of which has different signaling rate and different transfer protocols.

30

Our first step was to write the IP core in VHDL that implements the hardware and tests it. We defined the interface to our core and made sure that it was working properly as a black box.

As for the HCD software layer, to have it fully working, all cases implemented in the core should be covered which in fact is implemented as a protocol on its own called enhanced host controller interface (EHCI). We concerned ourselves with writing a case that validates that the VHDL core is working but it is not comprehensive. We will explain in what follows what we have implemented and the future work that must be done in orderto complete the core to have it communicate with a USB device from application level software. When we described our VHDL core in the previous section, we only specified the main high level blocks such as Host Controller, Transmitter and Receiver. This was to give the reader an understanding of the overall functionality of the core. However, the code is pretty detailed since it implements most of the USB protocol. In fact, the core is composed of many more components that are sub-components of the previously mentioned ones. In the discussion below, we will explain the block level design of each of these blocks

4.2 Host Controller

The USB Host Controller is the main block in the USB Core that manages all outgoing and incoming transactions. The different components in the USB Host Controller can be logically divided into 2 parts: those that manage all outgoing data transfers (Host Controller Arbiter, Control SOF, Send SOF, HCA&SOF MUX, Check Preamble, Transmit Packet, Direct Control, SOF DC TxPacket MUX) and those that manage all incoming data transfers (Host Controller Arbiter, Receive Packet, Interrupt Generator). The USB Host Controller further contains a component that provides it with an interface to the bus. The figure below depicts the block diagram of the USB Host Controller along with all its subcomponents.

31

Figure 8: Host Controller

The USB Host Controller processes all control information sent by the software layer; whether automatic transmission of SOF and PREAMBLE packets is enabled or not, the type of transaction required and sends this information to the addressed components within the USB core, as a first step. It also sends information about the transaction that is taking place or the device the host is attached to, back to the software layer; the speed of the device, the kind of handshake received and so forth. This component also has the function of sending interrupts to the software layer when a transaction is done, a SOF is sent, resume is detected or the connection state of the USB physical line is changed (the possible states being, DISCONNECTED, LOW-SPEED, FULL-SPEED or HIGH SPEED). Depending on the type of transaction required (IN, OUT0, OUT1 or SETUP) it takes care of sending appropriate packets to the device (token, data or handshake). In case automatic transmission of SOF and/or PREAMBLE packets is enabled, it sends SOF packets at the start of each frame, or PREAMBLE packets before each data or token packet.

Also, in case the software layer has also enabled direct control of the USB physical wires, it takes care of sending to the device, the state of the line as specified within the control information sent by the software layer.

32

The USB Host Controller stores the payload of the data packet it receives from the SIE Receiver and therefore from the device, in the Receive FIFO, to be read later by the software layer. It also packages the data in the Transmit FIFO as part of the payload of the data packet within an OUT or SETUP transaction to be sent to the SIE Transmitter and consequently to the device.

Host Controller Bus Interface The Host Controller Bus Interface interfaces between the Host Controller component and the Bus Interface. Its job is to synchronize between the USB clock and the bus clock. It has a 4-bit address as input, this address represents the address of the register whose content is in the 8 bit dataIn signal, the Host Controller Bus Interface, divides this input data and assigns it to the appropriate signals or assigns it as a whole to the dataOut output, to be sent to different components of the host controller or other components of the wrapper.

Host Controller Arbiter This Host Controller Arbiter checks whether a transaction is required (transReq bit set by software layer components), then checks what is the transaction type required by the upper-level: SETUP, IN, OUT0, OUT1. Accordingly, it sets the PID of the packets that are to be sent and asks for a turn from the HCA&SOF MUX to send the packet, or it enables the Receive Packet component to read incoming packets.

In case an IN transaction is required, the packet ID is set to IN, it waits till this packet is sent, and then that a packet is received from the device, after which it sets the id of the packet ACK, and then notifies the upper level that the required transaction is done.

In case a SETUP transaction is required, it first sets the packet ID to SETUP, it waits till the packet is sent, then it sets the packet ID to DATA0, waits till the packet is sent and an ACK is received, then notifies the upper level that the required transaction is done.

In case an OUT0 transaction is required, if it had received a NYET, as a response for the previous transaction, it sets the packet ID to PING, otherwise it sets it to OUT, waits till the

33

packet is sent, then sets the id to DATA0, wait till its sent and an ACK is received. Finally notifies the upper level that the required transaction is done. In case an OUT1 transaction is required, it sets the packet ID to OUT, waits till the packet is sent, then sets the id to DATA0, wait till its sent and an ACK is received, then notifies the upper level that the required transaction is done.

Control SOF This component keeps track of a timer for SOF. This timer is then used by the Send SOF component.

Send SOF When the timer for SOF, given by the SOF Controller, reaches a certain value (which differs in low speed and high speed) it notifies that a Start of Frame (SOF) packet needs to be transmitted.

HCA&SOF MUX This block arbitrates between requests from the Host Controller Arbiter and the Send SOF components both of which want to send packets. Send SOF wants to transmit SOF packets whereas Host Controller Arbiter wants to transmit packets with any other PID. The block gives priority to the Send SOF because the SOF packet needs to be transmitted first.

Check Preamble As soon as there are packets that need to be sent, this block first checks if the software layer components have enabled automatic transmission of preamble packets. If so, it waits until the Transmit Packet component is ready to send packets, then it signals to it that a PREAMBLE packet needs to be sent. In case the higher-level components have not enabled automatic transmission of preamble packets, or after the PREAMBLE packet is sent, it signals the Transmit Packet component that a packet needs to be sent, and forwards the packets ID with the value provided by the HCA&SOF MUX : either SOF or any other PID provided by the Host Controller Arbiter component itself. Note that PREAMBLE is only sent in low and full-speed transmissions before any token, data or handshake packet.

34

Transmit Packet It acts according to the packet ID (PID) it receives from the Check Preamble. It checks if the PID is SOF and the device it is attached to operates at low speed, in that case, it sends a KEEP_ALIVE control signal to the SIE. In fact, low speed devices do not see SOF packages, this KEEP_ALIVE signal plays the same role as SOF packages for low speed devices; it keeps low-speed device from entering the Suspend state. In case the packet ID is not SOF and at the same time the attached device is not low-speed, it sends a TX_PACKET_START control signal to the SIE Transmitter, then distinguishes between the data and token packets along with their PID types:

If the Packet ID is either DATA0 or DATA1, it reads data from the Transmit FIFO, and sends this data to the SIE Transmitter along with a control signal called TX_PACKET_STREAM to indicate it is sending data. When it has read all the data from the Transmit FIFO, it sends a TX_PACKET_STOP control signal to indicate that it has finished sending data.

If the Packet ID is SOF it sends the frame number to the SIE Transmitter, along with a control signal called TX_PACKET_STREAM, it also increases the frame number

If the Packet ID is either IN, OUT, SETUP, it sends the device endpoint number and address along with a control signal called TX_PACKET_STOP, to indicate the end of the packet.

Direct Control The Direct Control block checks if the higher-level components allow direct control of the state of the USB physical wires, if so it requests the direct control line state specified by the upper-level components along with a control signal TX_DIRECT_CONTROL (to describe the data it is sending) to be sent to the SIE Transmitter. In case direct control is not enabled, it simply sends a control signal called TX_IDLE to the SIE Transmitter.

35

SOF DC TxPacket MUX This block acts as a multiplexer between the Control SOF, Transmit Packet and Direct Control to using the Transmit port of the host controller in order to send packets. It gives priority is given first to the Control SOF, then to the Transmit Packet and finally to Direct Control components.

Receive Packet The Receive Packet block first checks whether the incoming data is valid, then whether the PID is HANDSHAKE or DATA. If it is a HANDSHAKE packet, it sends to the Host Controller Arbiter information it received from the SIE about the packet (errors in CRC, Overflow, RX Time Out and the data sequence). In case it is a DATA packet, as long as the incoming data is valid, it reads it in and sends it to the receive FIFO. However at some point it checks whether the Receive FIFO is full, in that case it delays incoming received data in the FIFO until there is some space in the FIFO.

Interrupt Generator Interrupts the higher-level components in case the connection state is changed (the possible states being disconnected, low speed, full speed or high speed) or resume is detected by the SIE.

SpeedCtrlMux This block sends the speed at which signaling with the USB device should occur to the Transmitter.

4.3 Transmitter The transmitter block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as input signals from the host controller and provides as output bits to be sent on the USB port towards the USB device. The transmitter consists of sub-components each of which has a specific function designed to support high speed, full speed and low speed USB communication. The figure below lists the subcomponents within the transmitter which are: Data States, Transmit Controller, Token CRC, Data CRC, Bit Stuffer and Serializer, Direct Bits, Bit Stuffer and Serializer and Direct Bits MUX and USB Write. In what follows, we will describe each component in more detail.

36

Figure 9: Transmitter

Data States The USB transfers signals and power over a four-wire cable. The signaling occurs over the two wires D+ and D- while power is provided through VBUS and GND wires on each segment to deliver power to devices.

Figure 10: USB Cable When transferring data, there are 4 possible states on the bus:

Differential 0 Differential 1 Single-Ended-Zero Single-Ended-One Table 4: Bus States

D+ 0 1 0 1

D1 0 0 1

37

In addition to the bus states mentioned above, which are defined by voltages on the lines, USB also defines two Data bus states, J and K. The J and K data states are the two logical levels used to communicate differential data in the system. These are defined by whether the bus state is Differential 1 or 0 and whether the cable segment is low or full or high speed.

Data States Bus States Differential 0 Differential 1 Low Speed J K Full Speed K J High Speed K J

Table 5: Speed and Data States

D+
Differential 0 Differential 1

D-

Data States
J (LS) K (FS/HS)

Data States
J (FS/HS) K (LS)

Figure 11: Speed and Data States The reason that J and K states are defined in this manner is so that one standard terminology can be used to describe a logic state on the USB cable although the actual voltages on the Differential 0 and 1 lines are different. For example, a Start-of-Packet (SOP) state exists when the bus toggles between the J and K states. On a high/full speed line, this means that D- becomes more positive than D+, while on a low-speed segment, it means that D+ becomes more positive than D-. Now that we know what the protocol requires of us, we can explain the functionality of Data States. This is a very simple block which takes as input the speed of the USB device that we are connected to and depending on that, it sets the J and K data states to either Differential 1 or Differential 0. In all blocks that follow, we just use the J and K states without having to deal with Differential 0 and 1.

Transmit Controller This block is at the heart of the transmitter block and is the most involved in controlling what states all the other blocks in the transmitter will be in. It receives as input bytes from the host controller. It compares the first byte that it receives to a constant to figure out whether a token, data, handshake or special packet is to be sent. This byte is basically the packet id of the corresponding packet. Now, for

38

each of the four types of packets, it enters into a sequence of states whereby it accepts from the HC the remaining fields of the packet, sends these fields to the Data CRC and Token CRC blocks, sends the bytes to the Bit-Stuffer and Serializer then appends the CRC (if applicable) to the end of the packet and sends it to be serialized and bit stuffed.

For a data packet, it first receives the first byte which is the packet id. From this it decodes that this is a data packet. It sends this packet to the Bit Stuffer and Serializer along with control information to indicate that this is the first byte of the packet. The Bit Stuffer and Serializer sends a start of packet sequence before going on to bit stuff and serialize the packet id. Knowing that a data packet can have a multiple of bytes after the packet id it goes on to a state waiting for the HC to write the data byte into it. Now along with this byte comes control information that informs the Transmitter Controller whether this is the last data byte or there is more to come. If this is the last, it appends the CRC value computed by Data CRC component and it sends all this to the Bit Stuffer and Serializer to be further processed. If this is NOT the last byte, it waits for another byte and stays in this loop until the last byte is received.

For a token packet, it first receives the first byte which is the packet id and decodes that this is a token packet. Knowing that a token packet has a packet id, followed by an address field followed by an endpoint field, it waits in different states until it receives the remaining two bytes. Since this is the last field in the packet that will be sent from the host controller, it reads the CRC value calculated from Token CRC, appends it to the packet and then sends all this to the Bit Stuffer and Serializer to be further processed.

For a handshake/special packet, it first receives the first byte which is the packet id. From this it knows that this is a handshake/special packet. Knowing that a handshake/special packet has only a packet id it sends this to the Bit Stuffer and Serializer to be further processed.

Figure 12: Handshake Packet

39

Data CRC A CRC is a cyclic redundancy check performed on data to see if an error has occurred in reading or writing the data. The result of a CRC is transmitted with the checked data. At the receiving end, the transmitted result is compared to the CRC calculated for the data to determine if an error has occurred. The goal in inserting a CRC as part of the packet is to maximize the probability of detecting errors using only a small number of redundant bits. The Divisor polynomial used to generate the CRC is C(X) = X16 + X15 + X2 + 1. When a data packet is sent (shown below) a special Data CRC of 16 bits for it is calculated. Note that the PID is not included in the CRC check. The data CRC only covers the data field of the Data packet. The Data CRC block has the function of generating the CRC over all the data fields that are sequentially input to it by the Transmit Controller. When the Transmit Controller has sent all the bytes of the DATA field to the Data CRC block, it reads the resulting 16 bit CRC and it sends it to the Serializer and Bit Stuffer block to further process the packet.

Figure 13: Data Packet Token CRC Similarly to the Data CRC block discussed above, TOKEN type packets are protected with a 5 bit CRC. In this case, the function of the Token CRC block is to generate the CRC for a TOKEN packet. The Divisor polynomial used to generate the CRC is C(X) = X5 + X2 + 1. When a token packet is sent (shown below) a special Token CRC for it is calculated. Note that the PID is not included in the CRC check. The Token CRC block has the function of generating the 5 bit CRC over the ADDR and ENDP fields input to it sequentially by the Transmit Controller. It is later appended to the end of the packet and sent as part of it.

Figure 14: Token Packet

40

Encoder, Bit Stuffer and Serializer This block has a multitude of functions that it completes. It receives two types of commands from the Transmit Controller. The first is that it receives a command to send a byte of a packet and the second case is that it receives control commands to send a special sequence on the USB cable that defines a USB Bus State (Idle, Start of Packet, End of Packet and so forth). For example, when the Transmit Controller wants to send a Data Packet, it sends the packet id of the data packet to this component along with control information specifying that this is the start of the packet. This block has been designed to automatically send at its output serial data that corresponds to the Start of Packet sequence defined by the specification.

The encoding format used by the USB protocol is called Non-Return to Zero Inverted (NRZI) where a 1 is represented by no change in level and a 0 is represented by a change in level. A string of zeros causes the NRZI data to toggle each bit time. On the other hand, a string of ones causes long periods with no transitions in the data. The figure below shows a data stream and the NRZI equivalent.

Figure 15: NRZI Encoding Once we have NRZI encoded the data, we need to be able to send it on a physical USB cable, specifically on the D+ and D- wires which were described before. For the sequence shown above, the data sent to the USB cable for a high speed device would be JKKJJKKJKKKKJ. Note a Differential 1 is a J in high speed.

Back to sending the data packet, after the start of packet (SOP) sequence is sent the bytes of PID, DATA and CRC are bit stuffed and encoded. The protocol defines bit stuffing as the insertion of a zero after every six consecutive ones in the data stream before the data is encoded. Note that if the data to be sent included a sequence of 7 or more consecutive one, such as

41

011111110 then the data sent on the USB cable without bit stuffing would be JKKKKKKKJ. With bit stuffing, we insert a 0 after six ones to get 0111111010 and it would be sent as JKKKKKKJJK. In the USB protocol, the host and device do not share any clock and thus bit stuffing, which forces a toggle in the data sent, ensures that the receiver remains synchronized with the transmitter without the overhead of sending a separate clock signal or Start and Stop bits with each byte.

With the last byte of the Data packet (CRC byte), the block receives control information stating that this is the end of the packet so it inserts the End of Packet (EOP) sequence on the line. The output of this block is serial data that has undergone bit stuffing and encoding and is ready to be sent on the USB wire. However, in order to have the data written on the wire at the correct speed, we need to have a few other blocks that manage this.

Encoder, Bit Stuffer and Serializer and Transmit Controller MUX The component above takes care of sending bus states and USB packets on the USB cable. The actual bits to be sent are calculated as the packet passes through the blocks of the Transmitter. The software layer simply needs to provide the core with the type of packet and the values of fields in the packet and the Transmitter along with the Host Controller take care of sending the packet in accordance with the details of the specification.

Apart from this functionality, our VHDL core has been designed to allow the software layer to place specific bits on the line, where these bits do not correspond to packet related information. To achieve this functionality, the Transmit Controller has serial outputs through which it can output the specific bits requested by the software layer. Note that these are predefined serial bits and need not be bit stuffed and encoded. So, these are directly sent from the Transmit Controller to the MUX block. The MUX block receives serial inputs from the Transmit Controller and the Encoder, Bit Stuffer and Serializer blocks. The inputs from the Encoder, Bit Stuffer and Serializer Have priority to be sent first since a packet cannot be interrupted in the middle to send a desired sequence. This is a very simple block that simply forwards the inputs from one of the two blocks to the USB Write block.

42

USB Write This block is the final block before previously processed data is actually sent on the physical USB Wire. It maintains an input buffer to accept data that it receives and also manages an output buffer that is responsible for writing data at the specified speed to the physical USB wire. We have implemented these buffers as FIFO buffers so that the sequence of bits transmitted remains as it was supposed to be.

Since our core supports all three speeds, this block should be able to write to the line at the rates of 1.5Mbps (LS), 12 Mbps (FS) and 480Mbps (HS). With a 960MHz input clock, we implemented counters that enable this block to write at all three speeds. We implemented this as a 7 bit counter. We can send bits at HS each time the LSB of the counter rolls over. This will divide the input clock by two to get a 480Mbps. To send at FS, we wait for the 4 LSBs of the counter to roll over 5 times. This will divide our clock by 24*5=80 and thus we can write at 960/80=12Mbps. To send at LS, we wait for the 7 bits of the counter to roll over 5 times. This will divide our clock by 27*5=640 and thus we can write at 960/640=1.5Mbps.

4.4 Receiver The receiver block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as input signals directly from the USB wire. The two inputs are the D+ and the D- signals. The main function of the receiver is to convert the bits it is receiving into bytes which will then be analyzed and given to the Host Controller (HC). Before it sends the bytes to the HC, it checks for CRC errors and bit stuff errors. The Receiver has another very important task; it is responsible for detecting the speeds of the connecting devices. Every device that works at a certain speed (low, full or high) will signal the receiver giving it the needed data to determine the speed and thus the receiver will find out what the speed of a connecting device is and notify the whole core of this speed. The figure below lists the subcomponents within the receiver which are: USB Read, Bit To Byte Converter, Byte Analyzer And Detect Speed. In what follows, we will describe each component in more detail.

43

Figure 16: Receiver USB read This is the lowest level component of the receiver block. Its function is to read the 1s and 0s from the USB wire which contains the D+, and D-. This component can read the input at low speed, full speed or high speed. The main concept behind reading the input is as follows: There is a 7 bit counter i that is incremented at every rising edge of the clock. If we are working with high speed, then we will take in a new input from the system whenever the least significant bit of i is 0. If we look at the figure below, the first signal is the clock, the second is the counter i and the third signal is the high speed data in tick. The high speed data in tick has a period of 2.0833 ns and this is the rate at which we take in inputs. Thus, we will read from the wire whenever the high speed data in tick goes from 0 to 1. Note that the high speed data in tick is derived from the least significant bit of the counter i.

Figure 17: High Speed Data in Tick

44

The full speed and low speed rates have data in ticks that are 40 times and 320 times slower respectively and thus data will be read from the USB wire at those slower speeds. To toggle the full speed rate data in tick, we would wait until the four least significant bits of the counter i become 0000 5 times. For achieving low speed, we would wait for the seven bits of i to become 0000000 5 times.

This component also takes into account metastabilty issues using a very simple method. To solve the problem of not reading data whenever the USB wire is changing abruptly, we simply reset the counter whenever the input changes. This way, we will never take in bits if they have not been on the line for 2.0833 ns. Whenever a new bit is read from the USB wire, it is first written to a buffer and then output from this component. Whenever a new bit is output, we signal to all the other components that we have received a new bit and thus this is the only component that will have to deal with timings. The rest of the components will be waiting for the data out tick to toggle and will thus know whether a new bit has entered the system. In addition to taking in inputs and outputting them whilst setting the data out bit to 1, this component also checks if a no activity time out has occurred and outputs this signal to other higher level components such as the HC.

Bit to Byte Converter This component deals with converting the bits into bytes and sending the bytes to the Byte Analyzer component. It monitors the output signal from the USB Read component and thus knows that a new bit has entered whenever the data out signal becomes 1. The function of this component is to combine every 8 bits using the NRZI decoding mechanism, forming a byte, and outputting the byte. In addition to NRZI decoding, this component performs bit de-stuffing. The NRZI decoding is performed as follows: when a bit is received, it is compared to the previous bit that was received. If they are different, then a 0 is inserted into a byte, otherwise a 1 is inserted. This whole process is repeated 8 times until a byte is formed. Let us take an example; we receive 8 new bits. Note that the input can be either a differential 0, a differential 1, a single ended 0 or a single ended 1.If the input was J, J, K, K, J, J, J, K, and let us assume that the input that we had before those 8 inputs were received was a J.

45

The byte that will be output is formed as follows: Received Bit: J: J: K: K: J: J: J: K: 00000000 10000000 11000000 01100000 10110000 01011000 10101100 11010110 01101011

Table 6: Bit to Byte Conversion Thus the byte received is 01101011. This byte will then be analyzed by the Byte Analyzer which is the next component that we will be discussing.

Byte Analyzer This component is responsible for analyzing the bytes that were formed in the Bit to Byte Converter component. It is also responsible for calculating the CRC and comparing it with the CRC that was sent with the packet. Whenever a byte is sent to this component from the Bit to Byte Converter component, a data out signal is set to 1. Thus, this component waits for an input by monitoring the data out bit from Bit to Byte Converter component. The first byte that we wait for is the start of packet (SOP) which is 10000000 for low/full sped and 1000000000000000000000000000000 for high speed. Note that this is taking into consideration that we receive the bytes in little endian order. Thus, this component needs to know the speed at which we are working with. After the SOP is received we enter a state where we wait for the next byte to come. When the next byte after the SOP is received, the byte is analyzed and the PID field is checked to see if it is a special, token, handshake or data packet. From the PID field we should know what bytes to expect next. Finally after all the bytes are sent to us, an end of packet byte (EOP) which is 00000000 for low/full speed and 1111111 for high speed will be received. After the EOP is received, this component will signal the upper components that a full transaction has been received and will output the data to the upper components. For example, let us simulate receiving an ACK from a low speed device. The bytes that should be received are SOP: 10000000, ACK: 00101101, EOP: 00000000. After receiving the 00101101 which contains the PID of the ACK, we will know that we should not expect to receive anything else since this is only an ACK

46

transaction and so we should expect an EOP. After receiving those three bytes, the Byte Analyzer component will tell the HC (Host Controller) that an ACK has been received. If for example, instead of receiving an ACK we are receiving a data packet, then the EOP field will tell us when the data stream has ended.

Detect Speed This component will detect the speed at which a connected device is sending us bits. Let us assume that a low speed device is connected to our receiver. The first thing it does is that it sends a J bit (01) for a specific amount of time (2.5 ns). After the 2.5 ns has elapsed with the J bit as an input, the rest of the components will be signaled and told that they should work at low speed. For full speed detection, a K bit (10) has to be received for 2.5 ns. We are left with the detection of high speed which is a bit more complex and requires some interaction between the device and the Detect Speed component. Once a high speed device is connected, it will always connect at full speed. That is, it will send a K bit for 25 ns and thus establish a full speed connection. It will then wait to be reset. Once reset it will send 01 for a certain amount of time. This 01 will confirm that it is indeed a high speed component. After Detect Speed receives the 01 it will send it a sequence of bits to tell the device that it is high speed compatible and has accepted the 01. To summarize the above procedure, this is what happens in this component for high speed connections. A high speed device is connected. The device sends a 10 for 2.5 ns, The receiver commands the transmitter block to send a 00 for 2.5 ns to reset the device. Once the device is reset, it will send a 01 for 2.5 ns. Once the receiver detects the 01, it commands the transmitter to send a sequence of bits to acknowledge the device.

4.5 Host Controller Driver

So far, all the blocks that we described were implemented in VHDL. Concerning the software layer that should interact with the VHDL core, we wrote a C code that implements the sending of a complete

47

SETUP transaction. Such a transaction consists of the host sending token and data packets and waiting for a handshake packet as a response from the device. To test this software code, we would have to run it. It would automatically prompt the core to send token and data packets to the USB device through the wire and then to wait for a response. Here, we faced a problem whereby we could not link the output of the core (bits to be sent on the wire) to the physical USB port that resides on the P160 additional module that we had planned to attach to the Virtex VIIMB Development Board. The reason for this was that there was a physical chip that interfaced to the USB pins. This chip was a RS232_USB Bridge Interface called Cygnal CP2101.

Figure 18: Cygnal CP2101 Thus, to use the USB physical port, we would have to send data according to the RS232 interface. We had already completed a large part of the VHDL core that implements the USB specification and were eager to test our system according to the USB specification, so we had to come up with an alternative to ensure that our system implements its functionality on an FPGA.

Our alternative solution to this was to write a simple VHDL block that acts a device. That is, it simulates the actions of the device at the bit level. We implemented this as a Finite State Machine that has the states shown in the figure to the right.

48

Figure 19: FSM states

The FSM waits until the bits relating to SOF and startup are sent by the VHDL core as if they were being sent to an actual USB device. It moves to a state waiting to receive a setup token packet, then a data packet. Once the receiver is at this stage, it should send a handshake packet. We simulated the device by hard-coding the bits that the device would send to the core if it were to send an actual USB handshake packet. We assumed that the device was sending an ACK Handshake packet which would complete the transaction. Once the device simulator sends the ACK Handshake packet, the job of the FSM is complete and the Receiver part of the VHDL Core comes takes action. The Receiver block of the VHDL core receives these bits from the outputs of the device simulator, it processed the bits to figure out that and ACK response has been received. It then informs the Host Controller that the device responded with an ACK. The Host controller outputs this information to the software layer by setting the Transaction Done bit to a 1. This way, the user who initiated a transaction from the software layer by writing into a few registers can be informed that the transaction was completed by reading the value of the transaction done bit.

49

5.0 Evaluation
In order to verify the functionality of our code, we performed tests on each of the components described in the Implementation chapters separately and combined all the system together and performed more comprehensive tests. In implementing our VHDL system, we worked in parallel developing the three main components (Host Controller, Transmitter, Receiver) after we had understood how they need to interface together, then we tested each separately as a black box and finally we proceeded to test the system as a whole to verify its functionality. Note that for all our test cases provided below, we will describe them assuming a high speed device to avoid redundancy, although the same cases work for low speed and full speed.

5.1 Host Controller Testing The tests done independently on the Host Controller will involve the four possible types of transactions which are: SETUP, IN, OUT0 and OUT1. Note that all transactions are initiated by the host and that every transaction consists of a number of packets.

Send SETUP Transaction

When the software layer requires a SETUP transaction, the Host Controller first sends a SETUP token packet, then a data packet, the payload of which is read from the Transmit FIFO previously loaded by the software layer, it then waits till a handshake packet is received from the device. By following the states of the Host Controller Arbiter component (seen in a yellow box below in the simulation below), we can see that first it waits till the SETUP token packet is sent, then that the data packet is sent, finally it waits till a handshake packet is received, at that point it interrupts the software layer by setting the TransDone signal to 1 (circled in red in the simulation below).

50

Figure 20: SETUP Transaction Send OUT(0/1) Transaction

When the software layer requires an OUT transaction, first a token packet with PID equal to OUT is sent, then a DATA(0/1) packet is sent, the payload of which is read from the Transmit FIFO previously loaded by the upper-layer. The Host Controller core then waits till the device sends back a handshake signal, at that instant it interrupts the software layer with a Transmission Done interrupt signal (TransDone). The figure below shows the states that the Host Controller Arbiter component passes through (show in a yellow box): it waits till the OUT token packet is sent, then it waits till the DATA0 packet is sent, finally that a handshake is received, it then sets the TransDone signal (circled in red below). Note that, in case of an OUT0 transaction and a high speed device, if the previously received handshake is a NYET, the host keeps on sending simply a PING token packet, without a following data packet, until it received an ACK. Then it can start any other transaction.

Figure 21: OUT (0/1) Transaction Send IN Transaction

When the software layer requires an IN transaction, first a token packet with PID equal to IN is sent to indicate to the device that if he has packets to be send it can do so now. The Host Controller core then waits till the device sends a data packet and when it does, it sends back a handshake packet to the device in order to indicate that it processed the data packet. In case the host does not detect anything, it sets the Time-Out bit, which is the 4th bit in the RxStatus signal. The software layer reads this signal

51

and sees that there is a Time-Out and initiates the IN transaction one more time. The figure below deals with the second case where there is a time-out, it shows the states that the Host Controller Arbiter component passes through (shown in a yellow box below): it waits till the IN token packet is sent, then it waits till a DATA packet is received, which it doesnt, finally the SIE Receiver detects a time-out, and then the Host Controller Arbiter sets the TransDone bit (circled in red below) and sets the 4th bit of the RxStatus signal to 1 (shown in a red box below).

Figure 22: IN Transaction

5.2 Transmitter Testing

The tests done independently on the Transmitter will have granularity of packets as this is the unit of transfer that the Transmitter deals with. In other words, we will provide test cases that ensure proper transmission of packets to the USB device.

Writing bits on the USB wire at the correct speed The figure below is a screenshot of the Transmitter sending a Token Packet (SETUP). From the figure above, we see that a bit is written every 2.084 ns. This verifies that our core can write at a speed of 1 / 2.084 ns 480Mbps. This is the rate at which a high speed signaling occurs.

52

Figure 23: Writing to the USB wire at High Speed In order to test the functionality of the Transmitter, we will display tests that were performed to send a typical transaction to the USB device. As we explain each case, we will highlight how the details of the protocol were tested.

Sending a Token Packet (SETUP): The only inputs to the transmitter required to send a SETUP Token packet are the packet id, the address and endpoint of the targeted device. All the steps described below are carried out by the transmitter in the mentioned sequence.

1-Transmitter sends IDLE state The start of a packet transmission requires the USB bus to be in an idle state. Therefore, prior to sending a packet, the Transmitter sends an Idle state. In the figure above, we see the output signal USBwirectrlout becoming one for the first time. This signifies that a bit is being written onto the USB wire. When we check the value of the corresponding bit, we see that USBwiredataout is a 00 (Single Ended Zero). This signifies an IDLE state on the bus and is necessary before we send a SYNC pattern in the next step.

53

Figure 24: Idle State 2- Transmitter sends SYNCHRONIZATION (SYNC) In the USB protocol, the host and devices do not share a clock. Thus, the device cannot identify when the host will send a transition that signals the beginning of a new packet. Only one transition is not sufficient to synchronize the receiver for the duration of a packet. Therefore, every packet has to begin with a SYNC field to enable the device to align, or synchronize, its clock to the transmitted data. For high speed devices, the host must send a SYNC pattern that is 4 bytes: {1 and 31 zeros} encoded according to NRZI as fifteen KJ successions, and then a KK. The alternating Ks and Js provide the transitions for synchronizing, and the last two Ks mark the end of the field.

Figure 25: SYNC Bytes 1 and 2

54

Figure 26: SYNC Byte 3 and 4

In the figures above, we illustrate the HS SYNC pattern being sent. As stated earlier, it is fifteen KJ successions, and then a KK. Now note that when transmission is at high speed, J=2 or 10 and K=1 or 01. Therefore when we see a 1 on USBWiredataOut highlighted in the figures above this is a K and similarly a 2 is a J. In Part 1, we can see that the first 2 bytes of the sync field are sent. In Part 2, we see the last two bytes. Note that every switch from a K to a J helps the receiver synchronize. The last 2 bits sent are 11 or KK, which indicates the end of the SYNC field. These 2 bits are circled in yellow.

3- Transmitter sends a Token Packet of type SETUP

Figure 27: Token Packet This is the information in the packet that we input to the Transmitter PID=00101101, ADDR=00000000, ENDP=00000000, CRC5= to be calculated

55

Figure 28: Setup Token Packet

Note that we first have the 3 SYNC bytes (00) then the last sync byte (80) then the token PID (2d) then the ADDR+1bit of ENDP (00) then the 3 bits of ENDP and CRC5.

As we can see in the figure above, the following sequence of bytes pass through the stages of bit stuffing, CRC calculation, NRZI encoding and result in bits on the wire.

Note that, although PID=00101101 where MSB=0 and LSB=1, bits need to be sent out on the bus in little endian order, as specified by the USB protocol. That is, the LSB of a byte is sent out first, followed by the next LSB and through to the MSB.

Figure 29: Little Endian

56

In the figure above, we can follow how the PID (00101101) bits are encoded and send. Please note on the figure how a J(10) and a K(01) are represented. Note that little endian is used to send a byte. So 10110100 K->KJJJKKJK

4-Transmitter sends a HS EOP (End of Packet) In high-speed signaling, a sequence that would generate a bit stuff error at the receiver device is intentionally sent to indicate EOP. For almost all high-speed packets the End of High-speed Packet is an encoded byte of 01111111, without bit stuffing. If the preceding bit was a J, the End of High-speed Packet is KKKKKKKK. The initial 0 causes the first bit to be a change of state from J to K, and the following 1s mean that the rest of the bits don't change. If the preceding bit was a K, the End of Highspeed Packet is JJJJJJJJ. The initial 0 causes the first bit to be a change of state from K to J, and the following 1s mean that the rest of the bits don't change. In either case, a sequence of seven bits without a transition causes a bit stuff error.

When all fields of the token packet are sent, a HS EOP pattern must be sent. As illustrated above, we will see this experimentally from the simulation in the figure below. When the packet has been sent, a signal called HSEOP which has been 0 all along will become one. This will cause a sequence of 8 data states on the wire that are opposite to the last data state that was send by the last field of the packet. In the figure, the last bit was a J (2) and so we can see a sequence of 8 Ks sent consecutively to signal the end of the packet. This is highlighted in the white box.

Figure 30:End Of Packet

57

We have now tested the correct transmission of a token packet. Next, we will describe that of a Data Packet as it is a bit different.

Sending a DATA packet:

The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets sent, so we will skip the testing of these stages and directly start discussing the fields of the DATA packet.

This is the information in the packet that we wish to send: PID=11000011 , DATA(1st byte)= 11110000, DATA(2nd byte)=00001111, CRC16 (1st byte) and CRC16 (2nd byte)=to be calculated

Figure 31: Data Packet As we can see in the figure above, the PID, then DATA(1st byte), DATA(2nd byte), CRC16(1st byte) and CRC16(2nd byte) are sent out in succession. This completes the DATA packet.

In order to ensure adequate signal transitions, bit stuffing is employed by the transmitting device when sending a packet on USB. The rule for bit stuffing was described earlier in the Implementation chapter.

In the figure below, the two bytes highlighted in red are sent in a little endian ordering. This means that 11110000 is sent, followed by 00001111. Since this means that there are 8 consecutive one bits, the txonecount signal shown below is asserted and a new 0 bit is stuffed and decoded. The bit in purple is bit stuffed, the ones in orange are the 2+6=8 encoded bits of the 11110000 (2nd)

58

Figure 32: Bit Stuffing The EOP pattern that exist after a packet is sent are identical for all packets sent, so we will not repeat the details again.

Sending a HANDSHAKE packet:

The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets sent, so we will directly start discussing the fields of the HANDSHAKE packet.

This is the information in the packet that we need to send to have an ACK handshake PID=11010010

Figure 33: Handshake Packet As we can see in the figure above, the PID is sent which means the packet has been sent. This completes the Handshake packet. The EOP pattern that exist after a packet is sent are identical for all packets sent, so we will not repeat the details again.

59

5.3 Receiver Testing The tests done independently on the receiver will have granularity of bits that are converted to bytes at the output. In other words, we will provide test cases that ensure proper reception of packets from the USB device and proper sending of bytes to the HC.

High Speed Detection The first component to test in the receiver is the Detect Speed component since this component will notify all other components as to what speed they should be working at. If this component malfunctions then all the other components will be working at the wrong speed. Below is the test waveform that shows that a high speed device has been connected.

Figure 34: High Speed Detection Waveform In the figure above, the input rxwiredatain is what we are receiving from the USB wire. Connectstate highlighted in red above is the output of this component. 00, 01, 10 and 11 correspond to low speed, full speed, high speed and disconnected respectively. The first thing we notice above is that connectstate has been changed from 11 to 10 and this means that we have detected a high speed device. We can see that the resetdevice signal, highlighted in purple, goes high upon receiving the 2 input. This reset signal will force the transmitter to send an SE0 to reset he device. This will force the device to respond and tell us if it is high speed or not. If the device is high speed then the device will reply with a 01. Upon receiving the 01 input the sendjkjkjk and outputs are set to 1. This signal will inform the receiver to send the acknowledgment sequence to the device and thus complete the process of detecting high speed.

60

Receiving Bits The USB Read component is the component that reads the input from the USB wire and performs the timing issues. Below is a waveform that shows the timings.

Figure 35: Receiving Bits at High Speed Waveform In the figure above, Rxbitsin is what is the input read from the D+ and D- lines on the USB wire and is highlighted in red on the waveform. The highspeedtick signal directly below the highlighted bits in red is the speed at which we take in the input. At every rising edge of the highspeedtick signal, we take in a new input. The highspeedtick signal has a period of 2.0833 ns and if the USB device sends each bit for this amount of time, then we are guaranteed that we will not miss any of the bits since they will all witness a rising edge of the highspeedtick. The bits highlighted in yellow are the outputs and will be given to the Byte Analyzer component. We notice here that the output is 1 even though we are reading new inputs from the wire. This is because, due to the 64 buffers present, there is a delay in outputting the data. We also notice that fullspeedrate is 2 and this means that, as required, we are reading data at a high speed rate.

Processing the bits In this test we will be receiving the byte: 10000000 which is part of the start of packet for high speed. The USB device will send the 10000000 starting from the least significant bit. Thus we will first detect seven 0s and then one 1.

61

Figure 36: Forming a Byte Waveform

In the figure above, the bits highlighted in red are the inputs that are coming from the USB read component. We see that the inputs are 01, 10, 01, 10, 01, 10, 01, and 01. The bit count highlighted in purple shows us which bit number we are at before we form the byte. The bits highlighted in yellow represent the bytes that will be sent to the Byte Analyzer component. Notice that after we receive the two 01s, the byte being formed becomes an 80 which is 10000000. Thus we have successfully received a byte that will be sent to the Byte Analyzer.

Processing the Bytes 1-Receiving a Handshake Packet (ACK) Here we will test the component that processes the bytes and forwards the information to the HC (Host Controller). Below is a waveform showing an ACK handshake packet being received. The ACK packet is made up of the following parts: SOP, PID, EOP where the SOP is 80,00,00,00 hex, the ACK PID is D2 hex, and the EOP is FF.

62

Figure 37: Processing Bytes that represent an ACK Waveform

In the figure above, highlighted in red we see that the Bit to Byte converter component signals the Byte Analyzer component when it is giving it inputs by setting the processrxdatainwen bit to 1. Here we see the inputs highlighted in yellow that are 80, 00, 00, 00, d2. Highlighted in purple, we see that, after the byte has been analyzed, the ackrxed signal has been set to one so that higher level components know that the ACK has just been received.

2-Receiving a DATA packet In this test, we will be receiving a DATA packet that has two bytes in the data payload which are 11110000 and 00110000. Before receiving the data we need to receiver the PID telling us that this is in fact a data packet. After receiving the data we should expect to receive the CRC and the EOP. Below is the list of bytes that we should receive to complete a Data packet. SOP PID Byte1 Byte2 CRCByte1 CRCByte2 80 C3 F0 30 BA 5B

Table 7: Bytes input into the Byte Analyzer component

63

Figure 38: Processing Bytes that represent Data Waveform In the figure above, highlighted in red we see the bytes that we are receiving. Note that these bytes are coming from the Bit to Byte Converter component. We see that we are receiving the correct bytes that are present in the table above. The CRC being calculated is highlighted in yellow and we can see that the CRC error does not become 1 and thus the CRC that was calculated is correct. The output from this component is highlighted in purple and will be sent to upper level components, specifically the HC. There are three things that should be looked at in the area highlighted in purple: rxdataout, rxcontrolout and rxdataoutwen. We see that the rxdataoutwen becomes high 6 times and so we output data 6 times.

The outputs are shown in the table below: DataOut 0 c3 f0 30 ba 5b 0 ControlOut Rx_packet_start ( 0 ) Rx_packet_stream ( 1 ) Rx_packet_stream ( 1 ) Rx_packet_stream ( 1 ) Rx_packet_stream ( 1 ) Rx_packet_stream ( 1 ) Rx_packet_stop ( 2 )

Table 8: Processing Bytes that represent Data After the rx_packet_stop has been output to the upper layer, the transaction will be complete and the bytes would have been successfully sent to the HC (Host controller) that will use this information.

64

5.4 Testing the USB Core on the FPGA

After testing the VHDL USB core, we proceeded with downloading the system on the FPGA and testing it using the C code. We first had to add our VHDL USB core as a component in our hardware system.

Note that an IP core for USB is not available in the EDK peripheral libraries; therefore after having designed it ourselves, we had to import it into our project in XPS in order to be able to use it. To achieve this target, we used the Create and Import Peripheral Wizard that guided us through the design flow.

Within the latter mentioned wizard, we added our core peripheral as a slave device on the On-chip peripheral bus (OPB) which is attached to the Microblaze soft-core processor on the FPGA.

In this case, normally, our IP core should have had an interface compliant to the OPB bus protocol, however EDK uses the Intellectual-Property Interface (IPIF) library which gives a set of simplified bus protocol called IP interconnect which is much easier to use compared to operating on the bus using the OPB protocol.

Moreover, the Create and Import Peripheral wizard generates templates that take care of all the OPB bus interface protocol and connection between IPIF and our code. In fact, this wizard generates 2 files (among many others) in the pcores subdirectory under the project directory, one peripheral top-level file which we dont modify, and another called user-logic. In the user-logic file (VHDL), we added the top-level wrapper of our USB core as a component and we port mapped each input and output of this wrapper to a register. These registers are different from the ones described in the Design and Analysis Chapetr. In fact, these registers are 11 in number and they correspond to address_i, data_i, rst, we_i, strobe_i, data_o HostResumeIntOut, HostTransDoneIntOut, HostConnEventIntOut, HostSOFSentIntOut and USB Speed. Note that if we compare to the block diagram, we see that registers for clk, USBWireDataOut and USBWireDataIn are missing. The reason is that we connected the clk directly to the system clock running at 100 MHz. As for the USBWireDataOut and

65

USBWireDataIn signals, we omitted these because we used the device simulator which was implemented in VHDL. Thus, these signals will not interface to the physical USB port but instead they are connected within the VHDL core to the device simulator.

The last step was to generate a file with an extension .pao (peripheral analysis order file) in which HDL Analysis Information is found (dependent library files and HDL source files to compile the peripheral, as well as corresponding logical libraries those files will be compiled into). At

After having added the IP core, in order to interact with the VHDL core from the software layer, all the C code has to do is to write to and read from registers described above. This will enable it to send input data to the core and read output data from the core. These registers are located within the address space assigned to the core. The functions used to read and write to these registers in C code are fairly simply.

In fact, the above mentioned wizard generates a C header file called HIGH_SPEED_USB_CORE.h (in accordance with our core which is called HIGH_SPEED_USB_CORE) in which one can find the functions used to read and write to the registers. This header provides many functions to choose from in order to read and write to the registers. A prototype of the functions we chose is:

HIGH_SPEED_USB_CORE_mWriteSlaveRegX(BaseAddress, Value) HIGH_SPEED_USB_CORE_mReadSlaveRegX(BaseAddress)

Where

X: The number of the register we would like to read from or write to. BaseAddress: The base address of the address space assigned to the core. Value: The value we would like to write to register X.

66

Our C code configures the USB Core before requesting that a transaction begins. To do this, it assigns appropriate values to the inputs of the USB Core and consequently to the registers to which these inputs are assigned.

It is composed of 3 files: main.c, driver.c and HIGH_SPEED_USB_CORE.h. The last file is generated by the wizard and contains a list of functions we can choose from to write and read to registers (as mentioned above), whereas the other 2 files were written by us and are as follows:

main.c

The file main.c simply calls the function HIGH_SPEED_USB_CORE_SETUP_TRANSACTION() which is located in the driver.c file. It provides a higher level of abstraction to the user; the user will just have to call a function without having to deal with writing to registers at the bit level granularity.
#include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "High_Speed_USB_Core.h" #define BASEADDR 0x77400000 int main(void) { print("-- Entering Main() --\r\n"); print("-- Call the function that starts a setup transaction --\r\n"); HIGH_SPEED_USB_CORE_SETUP_TRANSACTION( ); }

driver.c
#include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "signal.h" #include "High_Speed_USB_Core.h" #define baseaddr 0x77400000 int HIGH_SPEED_USB_CORE_SETUP_TRANSACTION(void ) { Xuint32 Reg32Value;

67

xil_printf("**************************************************************\n\r "); xil_printf("First reset all the components\n\r "); //rst_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); //address_i=x"34" xil_printf(" Set the address equal to that of the Transmit Fifo\n\r "); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 52); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 xil_printf(" Set the data to be sent to the Transmit Fifo equal to 1\n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); //we_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); //strobe_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i\n\r", Reg32Value); //address_i=x"24" xil_printf(" Set the address equal to that of the Receive Fifo \n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 36); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i \n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //rst_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); xil_printf(" Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time\n\r", Reg32Value); //address_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>no transaction required HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("Set the transaction type equal to SETUP\n\r", Reg32Value); //address_i=7=>TRANSACTION_TYPE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 7); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>Setup transaction HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0);

68

Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("If it has processed the incoming data, it sends it back as output\n\r"); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg6(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the DEVICE_ADDRESS\n\r"); //address_i=5=>DEVICE_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 5); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the ENDPOINT_ADDRESS\n\r"); //address_i=6=>ENDPOINT_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 6); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1111 to the INTERRUPT_MASK\n\r"); //address_i=9=>INTERRUPT_MASK HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 9); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=15 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //address_i=12 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 12); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); // xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1010000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 80); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); // xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the SOF_ENABLE\n\r"); //address_i=1=>SOF_ENABLE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr);

69

xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 11110000 to the TX_FIFO_DATA\n\r"); //address_i=48=>TX_FIFO_DATA HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 48); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=11110000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 240); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 00001111 to the TX_FIFO_DATA\n\r"); //data_i=00001111 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the TRANSREQ_PREEN_SOFSYNC\n\r"); //address_i=0=>TRANSREQ_PREEN_SOFSYNC HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value);

xil_printf("Write 0 to we_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); xil_printf("Write 0 to strobe_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i \n\r", Reg32Value); Reg32Value==0; xil_printf("Wait till the transaction is done\n\r ", Reg32Value); while (Reg32Value==0) { Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg9(baseaddr);

70

} xil_printf(" - read %d from Transaction Done interrupt bit\n\r", Reg32Value); xil_printf("Transaction completed with no errors ! "); return 0;

} In the following we will try to provide a brief explanation of what the C code does; it first sends some configuration information to the USB core such as: enable automatic transmission of Start-Of-Frame, the address of the device and its endpoint and others. It also loads the Transmit FIFO with data to be sent as part of a data packet, then specifies the type of transaction required (in this case a setup transaction), and when the transaction should start exactly.

Note that every time we write values to the address (address_i) and data (data_i) registers, we can check the value in data_o, which should be equal to data_i if the VHDL has processed the address and data correctly, meaning it is properly configured before transaction start.

Finally we wait for the USB Core to set the Transaction Done bit, meaning that the transaction has been completed. Below is the output result we get on the Hyperlink terminal on the computer screen. This terminal is attached to the COM1 port on which the FPGA sends data.

-- Entering Main() --- Call the function that starts a setup transaction -************************************************************** First reset all the components - wrote 1 to rst_i Set the address equal to that of the Transmit Fifo - wrote 52 to address_i b Set the data to be sent to the Transmit Fifo equal to 1 , so as to delete all data in the fifo - wrote 1 to data_i - wrote 1 to we_i - wrote 1 to strobe_i Set the address equal to that of the Receive Fifo , so as to delete all data in the fifo - wrote 36 to address_i

71

- read 0 from data_o - wrote 0 to rst_i Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time - wrote 0 to address_i b - wrote 0 to data_i Set the transaction type equal to SETUP - wrote 7 to address_i b - wrote 0 to data_i If it has processed the incoming data, it sends it back as output - read 0 from data_o Write 0 to the DEVICE_ADDRESS - wrote 5 to address_i b - wrote 0 to data_i b - read 0 from data_o Write 0 to the ENDPOINT_ADDRESS - wrote 6 to address_i b - read 0 from data_o Write 1111 to the INTERRUPT_MASK - wrote 9 to address_i b - wrote 15 to data_i b - read 15 from data_o Write 1 to the SOF_ENABLE - wrote 1 to address_i b - wrote 1 to data_i b - read 1 from data_o Write 11110000 to the TX_FIFO_DATA - wrote 48 to address_i b - wrote 240 to data_i b Write 00001111 to the TX_FIFO_DATA - wrote 15 to data_i b Write 1 to the TRANSREQ_PREEN_SOFSYNC - wrote 0 to address_i b - wrote 1 to data_i b Write 0 to we_i - wrote 0 to we_i Wait till the transaction is done

72

- read 1 from Transaction Done interrupt bit Transaction completed with no errors !

We wrote the C code for the test case described above. As for the other cases of transactions that we had tested in simulation, we did not implement them because the FSM would have to change for each case since this deals with information at the bit level. For example, for an IN transaction, if the device wishes to send a token packet followed by a data packet that has 10 bytes in its data field then this would require that we simulate the sending of around a 1000 bits to run the test case. And note that these bits need to be 100% correct or else the VHDL core would not work. For example, if one mistake is made in a bit of the PID field and the PID is invalid, then the whole test case fails. In any case, since the system worked on the FPGA for the case we tested, and it worked for the remaining three cases of transactions in the VHDL simulator, it is expected to work on the FPGA for all other cases.

73

6.0 Conclusion
6.1 Difficulties Faced Throughout our work on the Final Year Project we faced many difficulties and problems, some of which were mentioned throughout the report. The table below provides a summary of these along with possible alternatives solutions we found to overcome them:

Difficulties Understanding and implementing the whole USB protocol.

Alternatives We implemented the parts of the USB protocol that were relevant to our project. In many cases we did not implement parts of the specification. An example is support of split transactions that are required in isochronous transfers.

We had planned to link our core to the physical USB port on the FPGA but after having implemented most of the VHDL design, we noticed that the USB port has a parallel interface instead of a serial interface; as a result we could not test our code with an actual physical device. The maximum frequency at which the internal FPGA clock runs is 100 MHz whereas our High Speed Host Controller Core needs to run on 960 MHz Regulating the timing between commands in the C code at the software layer, because both the USB Core and its test bench are very sensitive on timing issues. Table 9: Difficulties

We simulated a USB device as part of a SETUP transaction required by the software layer; we implemented a finite state machine which upon receiving correctly all the bits it should receive in the scope of a SETUP transaction, responds with an ACK. We had to readjust our core on 100 MHz to be able to make it work on the FPGA. This meant that signaling to the device simulator was lower than required by the USB specification By testing with several timing patterns on the FPGA, we managed to get the correct timing

Other than the problems listed in the table above, one of the major problems that we have faced while making the core low, full and high speed compatible is that the high speed was too fast for us to handle. The high speed frequency is 960 MHz whereas the full speed is 12 MHz. Our receiver, which

74

is the component responsible for reading the USB wire, can only handle a certain speed. Let us look at a waveform to be able to understand the situation.

Figure 39: High Speed Rate Problem

Highlighted in red is the speed at which we take in bits. Highlighted in yellow is the speed at which we output the bits. Clearly, we are taking in bits at a faster rate then we are outputting them. The reason for the slow outputting of the bits is because the bits have to go through 3 machine states before they are output and thus we can output one bit every 3 rising edges of the main clock. This problem can be approached using two different methods. The first method was adding buffers. We tried adding several numbers of buffers and eventually chose to add 64 buffers. Let us do the analysis: We take in one bit every 2.0833ns, we output one bit every 3.125 ns. With those rates, data that has been written to a buffer and has not yet been output can be re-written at a later stage if the buffers become full and thus the old data will be lost. Below is a table that shows us the time at which we take in inputs, output them and the number of buffers that are in use.

75

Time(ns) 2.0833 3.125 4.1666 6.2499 6.25 8.3332 9.375 10.4165 12.4998 12.5 14.5831 15.625 16.6664 18.7497 18.5 20.833

take in new input Here Here Here

output Here

Here Here Here Here Here Here Here Here Here Here Here Here Table 10: High Speed Rate Problem

buffercounter 1 0 1 2 1 2 1 2 3 2 3 2 3 4 3 4

As we can see from the table above, the buffers are being filled up fast and it will not take long until we start overwriting data. If we have 64 buffers than the maximum number of sequential bits that we can have is solved below

X X (2.0833)/3.125 = 64 => X = 96 inputs which is obviously not enough since we must receive thousands of bits in sequence. Thus we should either increase the number of buffer or look for other solutions. Increasing the buffers would increase the area used by the FPGA, power consumption and so on and is not considered as a good solution. We thus analyze another solution which would try to make the component output data at a faster rate.

Te second approach actually speeds up the rate at which we output the results. Since we are constrained by the act that we have to go through three state machines to be able to output, the best solution would be to make it possible to move from one state to the next on the rising edge and the falling edge and thus it would take us less time to output the data. The period would then go down from 3.125 ns to 1.5625 ns and thus we will be able to output at a faster rate than our input. Buffers will not be necessary in this solution.

76

6.2 Future Work

Our USB Core functions properly concerning the main deliverables, but it still has room for improvement. We were not able to attempt the following suggestions due to time constraints. Future work may involve the following:

Implementing the whole USB protocol with all its details. Finding a way to have the internal clock of the FPGA equal to the USB clock needed for high speed (960 MHz). For example, this can only be achieved by working on a faster processor on the FPGA.

Implementing a USB Device Core and trying to attach our USB Host Controller Core to it, or as a second option, designing a simple board with only a USB physical port on it, which can be attached to the Virtex development board through the P160 expansion slot.

Developing the 3 remaining transaction cases in C code and generating libraries that abstract the Host Controller Driver Layer and implement the layers above it.

6.3 Design Constraints FPGAs give us flexibility at the cost of performance. Since the FPGAs were present in the AUB labs, economic constraints do not apply to us. Even so, designing the USB core on an FPGA would be more expensive than designing the core on another chip that can be mass manufactured. FPGAs are devices that have been used and can stay operational for many years. However, technology is evolving rapidly along with the design of FPGAs. The VirtexII board that we used has already been succeeded by two newer versions and thus we expect that the FPGA that we are using will become obsolete in about a decade and thus sustainability is a major issue in our design. Furthermore, a new USB specification might come out and a newer core will need to be re-designed.

77

7.0 References
1. Axelson, Jan, (2001), USB Complete: Everything You Need to Develop Custom USB Peripherals. Third Edition 2. Birkner, J. (1998). HDL IP cores in FPGAs to drive pace of innovation. Cahners Publishing Company: Gale Group. Retrieved from http://www.findarticles.com/p/articles/mi_m0EKF/is_n2203_v44/ai_20201029 3.Copyright 2000, Compaq Computer Corporation, Hewlett-Packard Company, Intel Corporation, Lucent Technologies Inc, Microsoft Corporation, NEC Corporation, Koninklijke Philips Electronics N.V, Universal Serial Bus Specification Revision 2.0 4. Cypress Semiconductor Corporation (2005), SL811HS Embedded USB Host/Slave Controller From Cypress, Document 38-08008 5. Fanning, J (1999). Literature Survey of Present State of FPGA's. Department of Instrumentation ad Analyical Science. Retrieved from: http://dias.umist.ac.uk/old_pages/njg/fpga2.htm 6. Fielding, Steve. USBHostSlave IP Core Specification. Retrieved from: http://www.opencores.org 7. Hyde, John. USB Design by Example: A Practical Guide to Building I/O Devices 8. Philips Semiconductors (1999), PDIUSBD11 USB device with serial interface, Retrieved from: http://www.semiconductors.philips.com/acrobat_download/datasheets/PDIUSBD11_N_3.pdf 9. Philips Semiconductors (2005). ISP1760 Hi-Speed Universal Serial Bus host controller for embedded applications 10. Saini, M. (2004). FPGA Solutions: Using Synplify Software Synthesis with Xilinx Platform Studio. The Syndicated. Retrieved From http://www.synplicity.com/literature/syndicated/pdf/v4_i2/platform_studio_v4_i2.pdf 11. TransDimension Inc. (2002). UHC124 USB Host Controller Data Sheet. TransDimension Document Number: MU1002 Retrieved From: http://www.transdimension.com/downloads/assets/hardware/uhc124/UHC124%20Product%20Brief.pdf 12. Vilakathara H, Challenges in developing a reusable IP core USB OTG IP case study , D & R Industry Articles 13. Xilinx, Virtex-II V2MB1000 Development Board Users Guide

78

Potrebbero piacerti anche