The Elements of Logic Design Style

By Shing Kong

Author: Shing Kong, March 2001
Original URL:http://www.cs.wisc.edu/~markhill/kong/
Copyright: Copyright (c) 2001 by Shing Ip Kong. All Rights Reserved.
Revisions by:Milo Martin, January 2006 (with permission)

This document was originally writen by Shing Kong and published in March 2001. In January 2006, Milo Martin converted the document to html (via rST) and removed references to files not part of the distribution.

Contents

1. Introduction

The goal of this document is to summarize some ideas I find useful in logic design and Verilog coding (Note 1). Logic design is not the same as Verilog coding. One common mistake of some inexperience logic designers is to treat logic design as a Verilog programming task. This often results in Verilog code that is hard to understand, hard to implement, and hard to debug.

Logic design is a process:

  1. Understand the problem.
  2. If necessary, divide the problem into multiple modules with clean and well defined interfaces.
  3. For each module
    1. Design the datapath that can process the data for that module.
    2. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules.

Verilog coding, on the other hand, is a modeling task. More specifically, after one has done some preliminary designs on the datapaths and controllers, Verilog code is then used to:

  1. Model the datapaths and the controllers.
  2. Connect the datapath and controller together to form modules.
  3. Connect the modules together to form the final design.

Note

Verilog is used as an example in this document. The ideas discussed in this document, however, should also applicable to other Hardware Description Language (such as VHDL) with minor adjustments.

The rest of this document is organized as follows:

Section 2:discusses the most important rule of logic design: keep it easy to understand. This section also introduces some basic Verilog coding guidelines.
Section 3:discusses the art of dividing a design into high-level modules and then how these modules can be divided into datapaths and controllers.
Section 4:discusses the logic design and Verilog coding guidelines for the datapath.
Section 5:discusses the logic design and Verilog coding guidelines for the controller.
Section 6:discusses some miscellaneous Verilog coding guidelines.
Section 7:is a summary of all the logic design and Verilog coding guidelines introduced in this document. This summary serves as a quick reference for readers who either: (a) may not have the time to read this entire document, or (b) have already read this document once but want a quick reminder later on.

The example Verilog files that model a module, a datapath, and a controller are included in Appendix A, Appendix B, and Appendix C for those readers who are interested in looking at the structure of a complete Verilog file.

2. The Most Important Rule of Logic Design & Basic Verilog Coding Guidelines

The most important logic design rule is more a philosophy than a rule :-)

Tip

Logic Design Guideline 2-1 (MOST IMPORTANT): The design MUST be as simple as possible and easy to understand!

If a design is hard to understand, then nobody will be able to help the original designer with his or her work. Also as time passes, the hard to understand design will become impossible to maintain and debug even for the original designer. Therefore, a logic designer must keep his or her design simple and easy to understand even if that means the design is slightly bigger or slightly slower as long as the design is still small enough and fast enough to meet the specification.

One important step in keeping a design simple and the Verilog code that models the design easy to understand is to use standard logic elements such as register, multiplexer, decoder, ... etc. Consequently, the first step in any Verilog coding project is:

Tip

Verilog Coding Guideline 2-1: Model all the standard logic elements in a library file to be SHARED by ALL engineers in the design team.

Below are some examples of the basic logic elements defined in such a library:

/***************************************************************
 * Simple N-bit register with a 1 time-unit clock-to-q time
 ***************************************************************/
module v_reg( q, c, d );
    parameter   n = 1;

    output  [n-1:0] q;
    input   [n-1:0] d;
    input           c;

    reg     [n-1:0] state;

    assign  #(1) q = state;

    always @(posedge c) begin
        state  = d;
    end

endmodule // v_reg

/***************************************************************
 * Simple N-bit latch with a 1 time-unit clock-to-q time
 ***************************************************************/
module v_latch ( q, c, d );
    parameter   n = 1;

    output  [n-1:0] q;
    input   [n-1:0] d;
    input           c;

    reg     [n-1:0] state;

    assign  #(1) q = state;

    always @(c or d) begin
        if (c) begin
            state  = d;
        end
    end

endmodule // v_latch

/***************************************************************
 * Simple N-bit 2-to-1 Multiplexer
 ***************************************************************/
module v_mux2e( z, s, a, b );
    parameter n = 1;

    output    [n-1:0] z;
    input     [n-1:0] a, b;
    input             s;

    assign  z =  s ? b : a ;    // s=1, z<-b; s=0, z<-a

endmodule

One key observation from the logic elements defined in this library:

Tip

Verilog Coding Guideline 2-2: Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay.

The non-zero clock-to-q time of the storage elements will prevent hold time problems at all registers' inputs. In general, a logic designer must NOT rely on a combinational logic block to have a certain minimum delay. The zero delay in the verilog model of the combinational logic elements will ensure logic designer does not rely on any minimum delay during simulation.

Once the basic logic elements have been modeled in the library file:

Tip

Verilog Coding Guideline 2-3: Use explicit register and latch (example: v_reg and v_latch as defined in the examples above ) in your verilog coding. Do not rely on logic synthesis tools to generate latches or registers for you.

By making the logic designer explicitly place the registers and/or latches, the logic designer is forced to consider timing implication of their logic early in the design cycle. In other words, the designer is forced to ask himself or herself questions such as: am I having too much logic in between registers so that it may not meet the cycle time? Also with explicit registers and latches in the the Verilog code, it will be much easier for those who read the code to draw a simple block diagrams showing all the registers in the design. Such a block diagram (see Section 4 and Section 5) is very useful in terms of understanding the design (remember the MOST important Logic Design Guideline above: the design must be easy to understand) as well as making timing tradeoffs when such tradeoffs are necessary.

At first glance, it seems ironic that the logic designer needs to always keep in mind how much combinational logic exists between any two storage elements (registers or latches) while in Verilog coding (see Verilog Coding Guideline 2-2), we want to treat all combinational logic to have zero delay. The reason for this apparent contradiction is that in logic design, the delay of the combinational logic between storage elements determines the cycle time. Consequently, it is important for the logic designer to be aware of the complexity of the logic between two storage components at all time. On the other hand, in order to reduce potential hold time problems, we also do not want the correct operation of the logic to depend on the logic having a certain minimum delay. The best way to make sure the logic can operate correctly without relying on the combinational logic blocks to have certain minimum delay is to run the Verilog simulation with all combinational logic blocks having zero delay and rely on the storage elements' (registers and/or latches) non-zero clock-to-q time to satisfy the hold time requirement of the next register.

3. Hierarchal Design and Clock Domain Consideration

Another important step in keeping a design simple and the Verilog code that models the design easy to understand is to adopt a hierarchal approach to the design process and then make the Verilog code follows the same hierarchy.

Hierarchal design, however, should not be carry to an extreme. For example, as pointed out by one of my colleagues Kyutaeg Oh [1], too deep an hierarchy can cause too many module instantiations, which will cause synthesis to run too slowly. Below is an hierarchal design strategy I find useful.

Tip

Logic Design Guideline 3-1: Use an hierarchal strategy that breaks the design into modules that consists of datapaths and controllers. More specifically:

  1. Divide the problem into multiple modules with clean and well defined interface.
  2. For each module:
    1. Design the datapath that can process the data for that module.
    2. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules.

One example for such an hierarchal approach can be found in the Serial ATA to Parallel ATA Converter for the Disk (Device Dongle). And as shown in Figure 3-1, the Device Dongle are divided into three modules:

  1. The Parallel ATA Interface to the disk: ATAIF. See Reference [2].
  2. The Transport Layer: Transport. See Reference [3].
  3. The Link Layer: Link. See Reference [4].
          +-----------+   +-------------+   +--------+
          |           |   |             |   |        |
 /-------\| Parallel  |--->  Transport  |--->  Link  +--> To Serializer
< ATA Bus >   ATA     |   |    Layer    |   |  Layer |
 \-------/| Interface |<--+             |<--+        |
          |  (ATAIF)  |   | (Transport) |   | (Link) <--- From Deserializer
          |           |   |             |   |        |
          +-----------+   +-------------+   +--------+

         Figure 3-1: The Three Modules that Form the Device Dongle

The Parallel ATA Interace (ATAIF), the Transport Layer (Transport) and the Link Layer (Link) shown in Figure 3-1 are further divided into datapath and controller modules as described below and shown in Figure 3-2:

                       +----------------------+  +----------------------+
                       |   Transport Layer    |  |      Link Layer      |
                       |        dtrans        |  |         link         |
                       | +------------------+ |  | +------------------+ |
                       | | Transmit Engine  | |  | | Transmit Engine  | |
                       | |    dtrans_tx     | |  | |     link_tx      | |
                       | | +--------------+ | |  | | +--------------+ | |
                       | | |   Datapath   | | |  | | |   Datapath   | | |
                       | | | dtrans_txdp  | | |  | | |  link_txdp   | | |
+------------------+   | | +--------------+ | |  | | +--------------+ | |
|   Parallel ATA   |   | |                  | |  | |                  | |
|    Interface     |   | | +--------------+ | |  | | +--------------+ | |
|      dataif      |   | | |  Controller  | | |  | | |  Controller  | | |
|  +-----------+   |   | | | dtrans_txctl | | |  | | |  link_txctl  | | |
|  | Datapath  |   |   | | +--------------+ | |  | | +--------------+ | |
|  |           |   |   | |                  | |  | |                  | |
|  | dataif_dp |   |   | | +--------------+ | |  | | +--------------+ | |
|  +-----------+   |   | | | Synchronizer | | |  | | | Synchronizer | | |
|                  |   | | | dtrans_txsyn | | |  | | |  link_txsyn  | | |
|                  |   | | +------------^-+ | |  | | +------------^-+ | |
|  +------------+  |   | +---+----------|---+ |  | +---+----------|---+ |
|  | Controller |  |   |     |(3)       |(3)  |  |     |(1)       |(2)  |
|  |            |  |   | +---|----------+---+ |  | +---|----------+---+ |
|  | dataif_ctl |  |   | | +-v------------+ | |  | | +-v------------+ | |
|  +------------+  |(4)| | | Synchronizer | | |  | | | Synchronizer | | |
|                  +------->              | | |  | | |              | | |
|                  |   | | | dtrans_rxsyn | | |  | | |  link_rxsyn  | | |
| +--------------+ |   | | +--------------+ | |  | | +--------------+ | |
| | Synchronizer | |(5)| | +--------------+ | |  | | +--------------+ | |
| |              <-----+ | |  Controller  | | |  | | |  Controller  | | |
| |  dataif_syn  | |   | | | dtrans_rxctl | | |  | | |  link_rxctl  | | |
| +--------------+ |   | | +--------------+ | |  | | +--------------+ | |
+------------------+   | |                  | |  | |                  | |
                       | | +--------------+ | |  | | +--------------+ | |
                       | | |   Datapath   | | |  | | |   Datapath   | | |
                       | | | dtrans_rxdp  | | |  | | |  link_rxdp   | | |
                       | | +--------------+ | |  | | +--------------+ | |
                       | |  Receive Engine  | |  | |  Receive Engine  | |
                       | |    dtrans_rx     | |  | |    dtrans_rx     | |
                       | +------------------+ |  | +------------------+ |
                       +----------------------+  +----------------------+

       Figure 3-2: Further Divisions of the Device Dongle Modules

The Parallel ATA Interface, modeled by the "module dataif" in the Verilog file dataif.v (see Reference [2]), is further divided into the followings (see Reference [5]):

The Transport Layer, modeled by the "module dtrans" in the Verilog file dtrans.v (see Reference [3]), is further divided into the followings (see Reference [6]):

Similarly the Link Layer, modeled by the "module link" in the Verilog file link.v (see Reference [4]), is further divided into the followings (see Reference [7]):

The detail contents of these Verilog files (References [2 to 7]) are not needed to illustrate the following Verilog Coding Guideline:

Tip

Verilog Coding Guideline 3-1: A separate Verilog file is assigned to the Verilog code for:

  1. Each datapath. Example: dtrans_txdp.v
  2. Each controller. Example: dtrans_txctl.v
  3. As well as the Verilog code for each high level module, that is a module at a hierarchy level higher than the datapath and the controller. Examples: link_tx.v, link_rx.v, and link.v

A corollary of the above Verilog Coding Guideline is as follows:

Tip

Verilog Coding Guideline 3-2: In order to keep the number of Verilog files under control, one should try not to assign a separate Verilog file to any low level module that is at a hierarchy level lower than the datapath and the controller.

For example as I will show you in Section 4, the datapath will contain many datapath elements. Instead of assigning a separate Verilog file for each of these datapath elements, the datapath elements are all grouped into a single "library" file (link_library.v). Similarly, as I will show you in Section 5, the controller will contain a "Next State Logic" and an "Output Logic" blocks. Instead of assigning a separate Verilog file for each logic block, the logic blocks will be included in the Verilog file assigned to the controller.

Enclosed in Appendix A are the Verilog files dtrans.v, dtrans_tx.v, and dtrans_rx.v. Here is something worth noticing:

Tip

Verilog Coding Guideline 3-3: The Verilog code for the high level module, that is module at a hierarchy level higher than the datapath and the controller (examples: module dtrans_tx, module dtrans_rx, and module dtrans) should not contains any logic. It should only shows how the lower level modules are connected.

For example, if you look at the dtrans.v file in Appendix A, the "module dtrans" only shows how its transmit engine (dtrans_tx) and its receive engine (dtrans_rx) are connected. Similarly, if you look at the dtrans_tx.v file in Appendix A, the "module dtrans_tx" contains only the information on how its datapath (dtrans_txdp), its controller (dtrans_txctl), and its synchronizer (dtrans_txsyn) are connected together. In any case, neither the "module dtrans," the "module dtrans_tx," nor the "module dtrans_rx" contain any Verilog code that models raw logic.

Notice from Figure 3-2 that the ATA Interface module is divided into the datapath and the controller. On the other hand, the Transport Layer and the Link Layer are first partitioned into the Transmit Engine and the Receive Engine before further divided into controller and datapath. The reason for this extra level of hierarchy for the Transport Layer and the Link Layer is because their Transmit Engines and their Receive Engines work in different clock domains. More specifically, the ATA Interface, the Transmit Engine of the Link Layer, and the Transmit Engine of the Transport Layer all operates under the same clock, the transmit clock while the Receive Engines of the Link and Transport Layers both operates on a different clock, the receive clock. This leads to the following design guidelines:

Tip

Logic Design Guideline 3-2: Keep different clock domains separate and have an explicit synchronization module for signals that cross the clock domain.

For example, please refer to the places in Figure 3-2 labeled with numbers in parentheses as you read the numbered paragraph below:

  1. All signals going from the Link Layer's Transmit Engine to its Receive Engine must go through synchronization via the module "link_rxsyn" before the signals can be used by the Receive Engine.
  2. Similarly, all signals going from the Link Layer's Receive Engine to its Transmit Engine must go through synchronization via the module "link_txsyn" before the signals can be used by the Transmit Engine.
  3. The discussion in Paragraph 1 and Paragraph 2 above also applies to the signals between the Transmit Engine and the Receive Engine of the Transport Layer.
  4. Since the Parallel ATA Interface and the Receive Engine of the Transport Layer operate on different clock domain, all signals going from the Parallel ATA Interface to the Transport Layer's Receive Engine must go through synchronization via the module "dtrans_rxsyn" before the signals can be used by that Receive Engine.
  5. Similarly, all signals going from the Transport Layer's Receive Engine to the Parallel ATA Interface must go through the synchronization module "dataif_syn" before the signals can be used by the ATA Interface.

4. Datapath Design

Figure 4-1 is an example of a generalized datapath and the next paragraph describes some important observations from this figure:

            |<--- Control Inputs from the Controller (2) -->|
            |   |           |              |   |        |   |
            |...| (3a)      |              |   |  ...   |   |
          +-v---v--+        |Select        | +-v--------v-+ |
Input  N  |  See   | N      |              | | Simple (4) | |
  A ---/--> Figure +-/-+  + |    (5)       | |Random Logic| |
 (1)      |  4-2   |   |  |\v    (3d)      | +-+--------+-+ |
          +-+---+--+   |  | \    +---+     |...|        |...|  (3b)
            |...|      +-->0 +   |   |   +-v---v--+   +-v---v-+-+
            v   v         |  | N | R | N |  See   | N |  See  | |  N  Output
                          |  +-/-> E +-/-> Figure +-/->Figure | +--/--> Q
            |...| (3a)    |  | ^ | G | ^ |  4-2   |   |  4-3  | |      (1)
          +-v---v--+   +-->1 + | |   | | +-+---+--+   +-+---+-+^+
Input  N  |  See   | N |  | /  | +-^-+ |   |...|        |...|  | (5)
  B ---/--> Figure +-/-+  |/   |   |   |   v   v        v   v CLK
 (1)      |  4-2   |      +    |  CLK  |                    |
          +-+---+--+    (3c)   |       | (1)                |
            |...|              K (1)   Y Internal Signals   |
            v   v                                           |
            |<--- Control Outputs to the Controller (2) --->|

             Figure 4-1: Block Diagram of the General Datapath

When you read the numbered paragraphs below, please refer to the places in Figure 4-1 labeled with the same numbers in parentheses:

  1. This simple N-bit datapath has two N-bit data inputs (A and B) and one N-bit data output Q. The internal signals K and Y are marked here to facilitate the discussion of the pipeline register in Paragraph 5 below.
  2. Other than the N-bit data inputs and outputs discussed in Item 1, a generalized datapath should also have Control Inputs from the controller and Control Outputs to the controller (see Section 5).
  3. In general, a datapath consists of the following components:
    1. Combinational Datapath Elements shown in Figure 4-2 where the N-bit Data Output and the Control Outputs depend ONLY on the current values of the N-bit Data Input and Control Inputs. Examples of Combinational Datapath Elements are the multiplexer and the ALU.
    2. Sequential Datapath Elements shown in Figure 4-3 where the N-bit Data Output can depend on the current N-bit Data Input, the current Control Input, as well as the previous cycle's N-bit Data output. An 8-bit counter is an example of a Sequential Datapath Element.
    3. Multiplexers, which is just a special case of the Combinational Datapath Elements shown in Figure 4-2.
    4. Registers or Register File, which can be consider as a special case of the Sequential Datapath Elements shown in Figure 4-3.
  4. The "Simple Random Logic" here are commonly referred to as "glue logic" which consists of simple inverters, AND gates, and OR gates. In theory, all these "glue logic" can be integrated into the controller that is is discussed in Section 5. In practice, however, it is sometimes simplier to just use some "glue" logic in the datapath.
  5. The register described in Item 3d as well as the implicit register at the output of the Sequential Datapath Element (Item 3b) are commonly referred to as the pipeline register.
                             Control Inputs
                                | |...| |
                                | |   | |
                            +---v-v---v-v---+
                            |               |
                     N      | Combinational |      N
                -----/------>   Datapath    +------/----->
                   N-bit    |   Elements    |    N-bit
                Data Input  |               | Data Output 
                            +---+-+---+-+---+
                                | |...| |
                                | |   | |
                                v v   v v
                             Control Outputs 

             Figure 4-2: A Combinational Datapath Elements

                            Control Inputs
                              | |...| |
                              | |   | | 
                          +---------------------+
                          |   | |   | |         |
                          | +-v-v---v-v---+---+ |
                          | |             |   | |
                          +->             |   +-+
                     N      | Sequential  | R |      N
                -----/------>  Datapath   | E +------/----->
                   N-bit    |  Elements   | G |    N-bit
                Data Input  |     +-----+ |   | Data Output
                            |     | REG < |   |
                            +-+-+-+-+-+-+-+-^-+
                              | |...| |     |
                              | |   | |    CLK
                              v v   v v
                            Control Outputs

Figure 4-3: A Sequential Datapath Elements with Register at its Outputs

The main function of the explicit pipeline register shown in Figure 4-1's Item 3d and Item 5 is to limit the datapath's critical path delay to a value less than the desired cycle time of the system. The effect of such pipeline register can be best understood with a timing diagram.

Tip

Logic Design Guideline 4-1: The best way to study the effect of the datapath's pipeline registers is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock.

Figure 4-4 below is an example of such a timing diagram for the generalized datapath example shown in Figure 4-1. In this timing diagram example (when you read the numbered paragraphs below, please refer to the places in Figure 4-4 labeled with the same numbers in parentheses):

  1. The N-bit Input A and Input B settle to their known values "A" and "B" sometimes after the rising edge of Cycle 2.

    For the sake of simplicity, let's assume all the Control Inputs (likely generated by a controller similar to the one described in Section 5) of this datapath are stable prior to the rising edge of the Cycle 2 so that they are not factors in the critical delay path considerations. In actual design, such assumptions will be verified by static timing analysis.

  2. Due to the assumption of the Control Inputs listed in Item 1, we only need to make sure Input A and Input B settle early enough to allow the two Combinational Datapath Elements (Item 3a in Figure 5-1) and the multiplexer (Item 3c in Figure 5-1) to produce the Internal Signal K at least one set-up time prior to the rising edge of Cycle 3.

  3. If the condition listed in Item 2 is met, the pipeline register can then capture the value of Internal Signal K and set the Internal Signal Y to the value "Y" one clock-to-q time after the rising edge of Cycle 3.

  4. Once again due to the assumption of the Control Inputs listed in Item 1, then as long as the Combinational Datapath Element after the pipeline register (Item 3d in Figure 4-1) together with the combinational logic within the Sequential Datapath Element (Item 3b in Figure 4-1) can produce the result for the Sequential Datapath Element's "implicit" register at least one set-up time prior to the rising edge of Cycle 4, then the Output of this datapath will be set to the stable value "Q" one clock-to-q time after the rising edge of Cycle 4

          |    1    |    2    |    3    |    4    |    5    |    6    |
          |         |         |         |         |         |         |
          +----+    +----+    +----+    +----+    +----+    +----+    +-
Clock     |    |    |    |    |    |    |    |    |    |    |    |    |
----------+    +----+    +----+    +----+    +----+    +----+    +----+ 
          |         |         |         |         |         |         |
---------------------+ +-------+ +--------------------------------------
Input A ///////////// X    A    X //////////////////////////////////////
---------------------+ +-------+ +--------------------------------------
          |         | (1)     |         |         |         |         |
---------------------+ +-------+ +--------------------------------------
Input B ///////////// X    B    X //////////////////////////////////////
---------------------+ +-------+ +--------------------------------------
          |         |     (2) |         |         |         |         |
---------------------------+ +-------+ +--------------------------------
Internal Signal K ///////// X    Y    X ////////////////////////////////
---------------------------+ +-------+ +--------------------------------
          |         |         |  (3)    |         |         |         |
-------------------------------+ +-------+ +----------------------------
Internal Signal Y ///////////// X    Y    X ////////////////////////////
-------------------------------+ +-------+ +----------------------------
          |         |         |         |  (4)    |         |         |
-----------------------------------------+ +-------+ +------------------
Output Q //////////////////////////////// X    Q    X //////////////////
-----------------------------------------+ +-------+ +------------------

    Figure 4-4: A Timing Diagram of the Datapath's Pipeline Register

Item 4 above brings up an interesting observation of the Sequential Datapath Element shown in Figure 4-3 where the implicit register of this datapath element is shown to be on the output side of the element. The placement of the register on the output side (versus the input side) in the drawing is intentional. It reflects the actual placement of the register in hardware. I like to place such a register at the output (versus input) so that all N-bit of the output will be stable at the same time at one clock-to-q time after each rising edge of the clock. Also shown in Figure 4-3 is that some Control Outputs of the Sequential Datapath Element can also be registered. This, however, is not as common as having the Control Outputs to be strictly combinational and allows the user of these signals (likely to be the controller, see Section 5) the flexibility of using these values one cycle earlier if the critical timing is not violated.

The above discussion of the timing diagram in Figure 4-4 illustrates that the logic designer cannot draw an accurate timing diagram unless he or she knows the exact location of the registers relative to the combinational logic. This brings us a corollary of the Logic Design Guideline 4-1:

Tip

Logic Design Guideline 4-2: The block diagram of the datapath should show ALL registers, including the implicit register of the Sequential Datapath Element.

Enclosed in Appendix B is the example Verilog file link_txdp.v which models the datapath for the Link Layer Transmit Engine (see Reference [8]). Let's take a look at some interesting observations from link_txdp.v:

Tip

Verilog Coding Guideline 4-1: Keep the verilog coding of the datapath simple and straight forward. Leave the fancy coding (IF any) to the datapath elements and place such elements in a separate (library) file.

For example, the Verilog coding of link_txdp.v is simplified by using the following two Sequential Datapath Elements:

/*
 * Scrambler
 */
l_scramble scrambler (
    .scr_out (scr_out),             .scr_in (32'hc2d2768d),
    .scr_init (txscr_init),         .scr_run (txscr_run),
    .clk (txclk4x),                 .reset (lktx_reset));

/*
 * CRC Calculator
 */
l_crccal crc_calculator (
    .crc_out (crc_out),
    .crc_in (32'h52325032),         .datain (tp_txdata),
    .crc_init (txcrc_init),         .crc_cal (txcrc_cal),
    .clk (txclk4x),                 .reset (lktx_reset));

As well a Combinational Datapath Element:

/* 
 * Generate the primitive (prime_out) based on the selection (sel_prim)
 */
l_primgen primgen (.prim_out (prim_out),    .sel_prim (sel_prim));

More specifically, the Verilog code in link_txdp.v only shows what the logic designer cares about the most at the datapath level: how the datapath elements (register, multiplexers, counters ... etc.) are connected together. The detailed modeling of these datapath elements are done in link_library.v which contains all library elements for the Link Layer. For your reference, link_library.v is also attached in Appendix B (see Reference [9]). Below are a few lines from link_library.v that defines the Scrambler:

/********************************************************************
 * l_scramble: 32-bit scrambler that can be:
 *  a. Reset to all zeros asynchronously
 *  b. Load a fix pattern synchronously.
 *  c. Keep its old value if scramble is not enable.
 *  d. Update its output synchronously based on a LFSR algorithm.
 ********************************************************************/
module l_scramble (scr_out, scr_in, scr_init, scr_run, clk, reset);

output [31:0]       scr_out;        // Scrambler's output

input [31:0]        scr_in;         // Initial pattern to be loaded
input               scr_init;       // Load the initial pattern
input               scr_run;        // Update scr_out based on a LFSR
input               clk;
input               reset;

reg [31:0]          scram;          // Scramble data pattern
reg                 a15, a14, a13,  // Intermediate scramble bits
                    a12, a11, a10,
                    a9, a8, a7, a6, a5, a4, a3, a2, a1, a0;

wire [31:0]         runmuxout;      // Output of the scr_run MUX
wire [31:0]         lastmux;        // Output of the final MUX

/*
 * Combinational logic to produce the scramble pattern,
 * which should be updated whenever scr_out changes.
 * This logic was copied from Frank Lee's scramble.v
 */
always @(scr_out) begin
    a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16];
    a14 =               scr_out[30] ^ scr_out[21] ^ scr_out[17];
    a13 =               scr_out[31] ^ scr_out[22] ^ scr_out[18];
     :                     :
    scram[2]  = a15^a14^a13;
    scram[1]  = a15^a14;
    scram[0]  = a15;
end // Scrambling logic

/*                                   Priority:
 *          scram   scr_out          -------------------------------
 *              |   |                reset (asynchronous):   highest
 *          +---v---v---+            scr_init (synchronous): middle
 * scr_run-->\S 1   0  /             scr_run (synchronous):  lowest
 *            +---+---+  scr_in
 *                |       |
 *            +---v-------v---+
 *             \  0       1 S/<--scr_init (higher priority than scr_run)
 *              +-----+-----+
 *                    |
 *                    v
 *                  lastmux
 */
v_mux2e #(32) run_mux (runmuxout, scr_run, scr_out, scram);
v_mux2e #(32) init_mux (lastmux, scr_init, runmuxout, scr_in);
v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset);

endmodule // l_scramble

The definition of the Scrambler l_scramble (the l_ pre-fix indicates this is defined in link_library.v) illustrates another Logic Design Guideline:

Tip

Logic Design Guideline 4-3: While designing the Sequential Datapath Elements, separates the element into the two parts: (1) the combinational logic, and (2) the register.

For example in l_scramble.v, the combinational logic of the Scrambler is modeled by the "always" statement:

always @(scr_out) begin
    a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16];
     :                     :
    scram[2]  = a15^a14^a13;
end // Scrambling logic

while the register is modeled the 32-bit wide v_regre defined in library shared by the entire design team (see Verilog Coding Guideline 4-2 below):

v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset);

The use of v_regre (the pre-fix v_ indicates this element is defined in the common library) illustrates the following Verilog Coding Guideline:

Tip

Verilog Coding Guideline 4-2: The Verilog coding of the datapath elements should make use of the standard logic elements (registers, multiplexers, ... etc.) already defined in the library discussed in Verilog Coding Guideline 2-1.

The last file included in Appendix B is "link_defs.v" (see Reference [10]) which defines all the "symbolic values" (i.e. assign a symbolic name to a given constant value) to be used by all the Verilog files for the Link layer. For example, this following line:

`include "link_defs.v"

is used in both the datapath file (link_txdp.v) and the Link Layer library file (link_library.v) so that all the symbolic values defined in link_defs.v. can be used by these two files. Below are some examples of these symbolic values that are specific to the datapath:

/*
 * Number of primitives and the bit position of the 1-hot encoded vector
 */
`define num_prim         18

// Basic Primitives
`define B_ALIGN           0
`define B_SYNC            1
`define B_CONT            2
   :                      :
`define B_X_RDY           9
   :                      :
`define B_PMACK          16
`define B_PMNAK          17

These symbolic values are then used by datapath file (link_txdp.v) in the following way:

/*
 * Interconnections within this portion of the datapath
 */
wire [`num_prim:0]                  // Number of primitives + D10.2
                    sel_prim;       // Select the proper primitives

// Primitive send by the Transmit Controller
assign sel_prim[`B_ALIGN] = txsn_align;
assign sel_prim[`B_X_RDY] = txsn_xrdy;

It should be obvious that the Verilog code above is much easier to maintain and much easier to understand than the equivalent Verilog code:

/*
 * Interconnections within this portion of the datapath
 */
wire [18:0]         sel_prim;

// Primitive send by the Transmit Controller
assign sel_prim[0] = txsn_align;
assign sel_prim[9] = txsn_xrdy;

This example of how Verilog code uses symbolic values to improve its ease of maintenance leads us to the following Verilog Coding Guideline:

Tip

Verilog Coding Guideline 4-3: Define symbolic values (see also Verilog Coding Guideline 5-2) in a header file (example: link_defs.v) and include this header file in all files that can make use of these symbolic values to make the Verilog code easier to maintain and easier to understand.

Other symbolic values defined in link_defs.v such as:

// Number of TX states and bit position of the 1-hot state encoding
`define num_lktxstate   15
`define B_NOCOMM         0
`define B_SENDALIGN      1
`define B_NOCOMMERR      2
   :        :            :
`define B_BUSYRCV       13
`define B_POWERDOWN     14

// State Values
`define RESET           15'h0000        // All bits are zeros
`define NOCOMM          15'h0001        // Bit 0 is set
`define SENDALIGN       15'h0002        // Bit 1 is set
   :        :            :
`define POWERSAVE       15'h4000        // Link layer is power down

are used for the Verilog code that models the controller for the Link Layer. How these symbolic values can be used to simplify the Verilog code of the controller will be explained in Section 5. More specifically, please refer to Verilog Coding Guideline 5-1 in Section 5.

5. Controller Design

Almost without exception, within the core of every controller is one or more finite state machine(s). This is shown in Figure 5-1 where only one finite state machine is shown for simplicity. Reader with enough imagination should be able to visualize how this picture can be generalized with multiple finite state machines.

         +------------------------------------------------+
         |             A General Controller               |
         |           +--------------+---+---+             |  (2a)
Inputs   | (1)       | Finite State |S  |   |             | Outputs
-----------+--------->    Machine   |T R|   +-+-------------------->
         | | +---+   |      (4)     |A E|   | | +---+     | Type 1
         | | | R |   |              |T G|   | | | R |     |
         | +-> E +-+->See Figure 5-2|E  |   | +-> E +-+------------>
         | | | G | | | or Figure 5-3|   |   | | | G | |   | Outputs
         | | +-^-+ | +--------------+-^-+---+ | +-^-+ |   | Type 2
         | |   |   |                  |       |   |   |   |  (2b)
         | |  clk  |                 clk      |  clk  |   |
         | |       |                          |       |   |
         | |       |                        +-v-------v-+ |
         | |       |                        |           | |
         | |       +------------------------>  Simple   | | Outputs
         | |                                |  Random   +---------->
         | +--------------------------------> Logic (3) | | Type 3
         |                                  |           | |  (2c)
         |                                  +-----------+ |
         +------------------------------------------------+

          Figure 5-1: Block Diagram of the General Controller

Here are some important observations from Figure 5-1. When you read the numbered paragraphs below, please refer to the places in Figure 5-1 labeled with the same numbers in parentheses:

  1. The inputs to the controller are divided into two groups. The first group is used as inputs to the finite state machine directly while the second group is "staged" by one or more stage(s) of pipeline registers before being used as inputs by the finite state machine.
  2. As far as the outputs of the controller are concerned, they can be classified into three types:
    1. Outputs that come directly from the finite state machine's outputs.
    2. Outputs of the finite state machine after they have been staged by one or more stage(s) of pipeline register.
    3. Outputs of some random logic (see also Paragraph (3) below) whose inputs can either be any of the signals described in Paragraph (1), Paragraph (2a), or Paragraph (2b) above.
  3. The "Simple Random Logic" here are commonly referred to as "glue logic" which consists of simple inverters, AND gates, and OR gates. In theory, all these "glue logic" can be integrated into the finite state machine shown in either Figure 5-2 or Figure 5-3. In practice, however, it is sometimes simplier to just use some "glue" logic.
  4. In general, there are two types of finite state machines:
    1. The simple Moore Machine shown in Figure 5-2 whose outputs depend ONLY on the current state.
    2. The more complex Meally Machine shown in Figure 5-3 whose outputs depend on BOTH the current state as well as the inputs.
         +---------------------------+
         |     +-------+       +---+ |         +--------+
         |  N  | Next  | Next  |S  | | Current |        |
         +--/-->       | State |t R| |  State  | Output |
               | State +--/---->a e+-+---/----->        +--/--> Outputs
Inputs -----/-->       |  N    |t g|     N     | Logic  |  P
            M  | Logic |       |e  |           |        |
               +-------+       +-^-+           +--------+
                                 |
                               Clock

               Figure 5-2: The Moore State Machine

         +---------------------------+
         |     +-------+       +---+ |         +--------+
         |  N  | Next  | Next  |S  | | Current |        |
         +--/-->       | State |t R| |  State  | Output |
               | State +--/---->a e+-+---/----->        +--/--> Outputs
Inputs --+--/-->       |  N    |t g|     N     | Logic  |  P
         |  M  | Logic |       |e  |        +-->        |
         |     +-------+       +-^-+        |  |        |
         |                       |          |  +--------+
         |  Q  (Q <= M)        Clock        |
         +--/-------------------------------+

               Figure 5-3: The Meally State Machine

One question raised by Figure 5-1's Item 1 and Item 2 (see Paragraph 1 and 2 above) is when and where should we use pipeline registers to stage the inputs or outputs? This leads us to the following logic design guideline:

Tip

Logic Design Guideline 5-1: The best way to decide when and where to use pipeline register or registers to stage the controller inputs and outputs is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock.

       |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  | 10  |
       |     |     |     |     |     |     |     |     |     |     |
       +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +-
Clock  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
-------+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+
       |     |     |     |     |     |     |     |     |     |     |
--------------+ +---------+ +----------------------------------------
Inputs /////// X  (1) A    X ////////////////////////////////////////
--------------+ +---------+ +----------------------------------------
       |     |     |     |     |     |     |     |     |     |     |
---------------+ +---------+ +---------------------------------------
Next State //// X    B (2)  X ///////////////////////////////////////
---------------+ +---------+ +---------------------------------------
       |     |     |     |     |     |     |     |     |     |     |
--------------------+ +---------+ +----------------------------------
Current State ////// X    B (3)  X //////////////////////////////////
--------------------+ +---------+ +----------------------------------

        Figure 5-4: A Timing Diagram Showing Relative Timing

One simple example of such a timing diagram is shown in Figure 5-4, which shows the effect of the State Register in Figure 5-2, the Moore State Machine. When you read the numbered paragraphs below, please refer to the places in Figure 5-4 labeled with the same numbers in parentheses:

  1. We assume the M-bit inputs changes from unknown to "A" right after the rising edge of Cycle 2.
  2. Assume the Next State Logic is such that as a result of Input being "A," the Next State will become "B" regardless of its Current State. Then assuming the Next State Logic can generate the Next State output within the cycle time of Clock (this assumption needs to be verified with static timing analysis), then we no longer need to worry about the absolute delay of the Next State Logic.
  3. Because as long as Next State becomes "B" one set-up time before the rising edge of Cycle 3, the Current Sate will change to "B" one clock to q delay AFTER the rising edge of Cycle 3 due to the State Register.

In this simple example, only one register and three signals are shown. Needless to say, in a real timing diagram, one will have multiple registers and many more signals. The basic idea, however, remains the same: shows only the "relative timing," that is shows how the registers affect the timing of the signals with respect the clock edge(s) but not the absolute delay timing.

A corollary of the Design Guideline 5-1 is:

Tip

Logic Design Guideline 5-2: The block diagram of the controller should show ALL registers explicitly while the random logic can be represented by a simple black box.

By drawing all the registers EXPLICITLY in the block diagram, the designer will less likely to make a mistake when he or she attempt to draw the "relative timing" diagram similar to the one shown in Figure 5-4 (see note below) when the designer thinks about the sequence of events need to be controlled. Notice that in Figure 5-1, we try to meet the Design Guideline 5-2 by showing the State Register in the blackbox representing the Finite State Machine.

Note

Even if the designer does not draw such a timing diagram explicitly on paper, he or she may still has to "draw" it implicitly in his or her head.

Notice that both Figure 5-2 and Figure 5-3 show finite state machines with a M-bit input, a N-bit state register, and a P-bit output. The only difference is that in Figure 5-2, the Moore machine, the P-bit output is a function of the N-bit current state only while in Figure 5-3, the Meally Machine, the P-bit output depends on both the N-bit current state as well as a sub-set (Q is an integer smaller or equal to M) of the M-bit inputs. Depending on the state encoding, the N-bit state registers can represents a maximum of 2**N states or a minimum of N states if one-hot encoding is used.

Tip

Logic Design Guideline 5-3: If possible, use one-hot encoding for the finite state machine's state encoding to simplify the Output Logic as well as the Next State Logic.

One hot encoding refer to the encoding style where each bit of the State Register represents one state and the corresponding bit is asserted only when the finite state machine is at the state represents by that bit. Consequently, only ONE bit of the N-bit state register will be asserted at any given time. My experience is that one-hot encoding can greatly simplify the logic equations for the Output Logic block (in most cases, reduce to simple inverters, AND gates, and OR gates) as well as for the Next State Logic block. Philosophically, the reason why one-hot encoding can simplify the output logic is simple: when the finite state machine designer designs a finite state machine, he or she creates a state for one purpose: the state indicates the need to set the outputs to some values different than any other state (if not, there is no need to have a separate state!) Therefore if the state information is not one-hot encoded, the Output Logic must first decode the N-bit state register before it can generates the output. On the other hand, when one-hot encoding is used, the need for doing a N-to-2**N decode is eliminated. Similarly when one-hot encoding is used, the Next State Logic does not need to perform the equivalent of the N-to-2**N decode before deciding what is the next state and once the next state is decided, it does not need to perform the equivalent of a 2**N-to-N encoding of the next state.

One draw-back of one-hot encoding is that for a finite state machine with a large number of states (i.e. N is a big number is Figure 5-2 and Figure 5-3), the State Register can be very wide. A wide register, however, is usually not that bad a problem. In any case, in order to keep the design easy to understand and debug, one may want to avoid using "one BIG and complex" finite state machine anyway:

Tip

Logic Design Guideline 5-4: Instead of designing a controller with a giant and complex finite state machine at its core, it may be easier to break the controller into multiple smaller controllers, each with a smaller and simplier finite state machine at its core.

In both Figure 5-2 and Figure 5-3, it is possible to integrate the Output Logic block and the Next State Logic block into one single random logic block. However, in order to keep the logic design easy to understand:

Tip

Logic Design Guideline 5-5: For finite state machine design, keep the Next State Logic block separate from the Output Logic block.

As I will show you later in Verilog Coding Guideline 5-3 and 5-4, the Verilog code that models the finite state machine is also easier to read and understand if the Next State Logic block is kept separate from the Output Logic block.

One final word on the Meally Machine shown in Figure 5-3. The Output Logic's input are shown to come from both the Current State and Input. In order to simplify the Output Logic block, it is "logically" equivalent to use some of the Next State bits (i.e. output of the Next State Logic prior to the State Register) as input to the Output Logic block. This is shown in Figure 5-5. This, however, should be done with extreme care.

Tip

Logic Design Guideline 5-6: In a Meally Machine design, it is possible to use the Next State Logic block's output as inputs to the Output Logic block. This must be done with caution since the total delay of the two logic block may become the critical path of the controller.

         +---------------------------+
         |     +-------+       +---+ |.           +--------+
         |  N  | Next  | Next  |S  | |  Current   |        |
         +--/-->       | State |t R| |   State    | Output |
               | State +--/-+-->a e+-+----/------->        +--/--> Outputs
Inputs --+--/-->       |  N |  |t g|      N       |        |  P
         |  M  | Logic |    |  |e  |          +---> Logic  |
         |     +-------+    |  +-^-+          |   |        |
         |                  |    |   (R <= M) | +->        |
         |                  |  Clock    R     | | |        |
         |                  +------------/----+ | +--------+
         |  Q (Q <= M)                          |
         +--/-----------------------------------+

         Figure 5-5: An Alternate Form of the Meally State Machine

Enclosed in Appendix C are two Verilog files illustrating the various controller design guidelines:

trans_defs.v:define all the symbolic values applys to all Transport Layer files, see Reference [11].
dtrans_txctl.v:models the controller for the Device Dongle's Transport Layer Transmit Engine, see Reference [12].

First, let's take a look at some interesting observations from trans_defs.v.

Tip

Verilog Coding Guideline 5-1: If one-hot encoding is used for the finite state machine (see Logic Design Guideline 5-3), define a symbolic value for each bit position as well as a symbolic value for the binary value when that bit position is set. This makes the Verilog code much easier to read and understand.

For example, here are some lines from the trans_defs.v file attached in Appendix C (reader can read the entire definition in Appendix C):

/*
 * Define the state values and bit position for Device's Transmit Finite
 * State machine (FSM in dtran_txctl).  This FSM implements the "transmit"
 * states describes in Section 8.7 (PP. 197-205) of SATA Spec, 1.0.
 */
`define num_dttxfsm     15
`define B_DTTXIDLE       0
`define B_DTCHKTYP       1
`define B_DTREGFIS       2      // Spec's DT_RegHDFIS
`define B_DTPIOSTUP      3      // Spec's DT_PIOSTUPFIS
    :       :            :
`define B_DTBISSTA      14

// Device Dongle's TX FSM State Values
`define DTTXIDLE        15'h0001
`define DTCHKTYP        15'h0002
`define DTREGFIS        15'h0004        // Spec's DT_RegHDFIS
`define DTPIOSTUP       15'h0008        // Spec's DT_PIOSTUPFIS
    :       :            :
`define DTBISSTA        15'h4000

Notice that I have use "define" to create these symbolic values:

Tip

Verilog Coding Guideline 5-2: One common convention used by many Verilog code writer is to use "define" for constant values such as:

`define DTTXIDLE        15'h0001

while "parameter" is used ONLY for things that can changed such as the width of the register, muxes ... etc. (see also Section 2):

/***************************************************************
 * Simple N-bit register with a 1 time-unit clock-to-q time
 ***************************************************************/
module v_reg( q, c, d );

    parameter   n = 1;

    input   [n-1:0] d;
    input           c;
    output  [n-1:0] q;

    reg     [n-1:0] state;

    assign  #(1) q = state;

    always @(posedge c) begin
        state  = d;
    end

endmodule // v_reg

Next, lets look at the file dtrans_txctl.v. The main module of this file consists of the following sections clearly labeled by comments:

module dtrans_txctl (
    // Outputs
    tp_acksendreg,
        :
    senddata,

    // Inputs
    at_sendreg,
        :
    tptx_reset);

    /*
     * Next State Logic and the State Register for the finite state machine
     */
    // Next State Logic
    dtrans_txfsm dtrans_txfsm ( ...

    // State Register
    v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state);

    /*
     * Counter and its MUX tree to select the count limit
     * for the generation of the expire signal
     */

    /*
     * Output Logic for generating output signals
     */

endmodule // dtrans_txctl

This leads to the following Logic Design and Verilog Coding guidelines.

Tip

Verilog Coding Guideline 5-3: Use an explicit State Register and separate the Next State Logic from this explicit register.

For example in dtrans_txctl.v, we have:

/*
 * Next State Logic and the State Register for the finite state machine
 */
// Next State Logic
dtrans_txfsm dtrans_txfsm (
    // Outputs
    .next_state (next_state),

    // Inputs
    .cur_state (cur_state),
    .at_sendreg (at_sendreg),       .at_senddmaa (at_senddmaa),
         :
    .txtimeout (txtimeout),         .expire (expire),
    .tptx_reset (tptx_reset));

// State Register
v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state);

The Next State Logic here is implemented in the separate "dtrans_txfsm" module in dtrans_txctl.v. The module "dtrans_txfsm" has only one output, the "next_state" vector, and contains only one thing: a "Case Statement" enclosed in a "always" block:

Tip

Verilog Coding Guideline 5-4: The Next State Logic, with only ONE output (the "next_state" vector), can be implemented easily with a Verilog Case statement.

/*************************************************************************
 * Module dtrans_txfsm: Random logic for the transmit finite state machine
 ************************************************************************/
module dtrans_txfsm (
    // Outputs
    next_state,

    // Inputs
    cur_state,
    at_sendreg,
    at_senddmaa,
      :
    expire,
    tptx_reset);
        :
        :
    always @(cur_state or at_sendreg or at_senddmaa ...

        /*** List ALL Inputs of this module ***/

            txtimeout or expire or tptx_reset) begin

        if (tptx_reset) begin
            next_state = `DTTXIDLE;
        end
        else begin
            case (cur_state)
            `DTTXIDLE:
                if (~r2t_rxempty) begin
                    /*
                     * Give the receive engine higher priority
                     */
                    next_state = `DTCHKTYP;
                end
                 :
            end
                 :

            `DTBISSTA:
                if (~lk_txfsmidle & ~txtimeout) begin
                    next_state = `DTBISSTA;
                end
                 :

            default: begin  // We should never be here
                next_state = `DTWAITTXID;
                $display (
                "*** Warning: Undefined HTP RX State, cur_state = %b ***",
                cur_state);
                end
            endcase
        end // End else (tptx_reset == 0)

    end // End always

endmodule // dtrans_txfsm

Notice that the module "dtran_txfsm" has ONLY one output "next_state." This is a very desirable feature when we use the Verilog "Case Statement" because one thing we have to be careful when we use the "Case Statement" is that every output MUST have a defined value for each branch of the Case statement. Otherwise, the synthesis tool will generate a latch to keep the old value, which in most cases is NOT what the logic designer intends. This, having only one output (the "next_state") for the Next State Logic, is one reason why the Logic Design Guideline 5-5 encourages you to separate the Next State Logic block from the Output Logic block.

In many finite state machine design, the number of states can be reduced and the Next State Logic can therefore be simplified if one take advantage of the fact that the state machine wants to stay at a certain state for "N cycles" (where N is a fix integer >=1) then go to the next state and stay there for another "M cycles" (M is another integer >= 1 but != N) before move onto another state. One example of this behavior is the DRAM controller where the controller will enter the "Row Address Active" state for a few cycles, then go to the "Column Address Active" state for a few cycles, before moving onto the "Precharge" state ... etc.

Tip

Logic Design Guideline 5-7: A finite state machine containing states whose transition to their next states are governed only by the number of cycles it has to wait can be simplified by building a multiplexer tree to select the number of cycles a counter must count before generating an "expire" signal to trigger the state transition.

Logic Design Guideline 5-7 is illustrated by the following Verilog code in dtrans_txctl.v. In a nutshell:

  1. We start the counter (count_enable = 1) when the current state is either: DTREGFIS, DTPIOSTUP, DTXMITBIS, or DTDMASTUP. Since we are using one-hot encoding, we are in one of this state when the corresponding bit in the cur_state register: cur_state[`B_DTREGFIS], cur_state[`B_DTPIOSTUP], cur_state[`B_DTXMITBIS], or cur_state[`B_DTDMASTUP] is set:

    assign count_enable = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
      cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP];
    
    v_countN #(`log_maxfis) expire_count (
      .count_out (wcount),
      .count_enable (count_enable),
      .clk (txclk4x),
      .reset (tptx_reset | expire));
    
  2. Based on the current state, the multiplexer tree is used to select the number of cycles the counter must count (count_limit) before the state is triggered to transition to the next state:

    /*
     * Counter and its MUX tree to select the count limit
     * for the generation of the expire signal
     */
    v_mux2e #(`log_maxfis) regpio_mux (num_regpio,
        cur_state[`B_DTPIOSTUP], `NDFISREGm1, `NDFISPIOSm1);
    v_mux2e #(`log_maxfis) dmabis_mux (num_dmabis,
        cur_state[`B_DTXMITBIS], `NBFISDMASm1, `NBFISBISTAm1);
    v_mux2e #(`log_maxfis) cntlmt_mux (count_limit,
        (cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP]),
        num_regpio, num_dmabis);
    

    The number of cycles the counter needs to count for each state is defined in trans_defs.v:

    `define NDFISREGm1      3'd4    // Device-to-Host (D) Register (REG)
    `define NDFISPIOSm1     3'd4    // Device-to-Host (D) PIO Setup (PIOS)
    `define NBFISDMASm1     3'd6    // Bidirectional (B) DMA Setup (DMAS)
    `define NBFISBISTAm1    3'd2    // Bidirectional (B) BIST Activate (BISTA)
    
  3. Finally, the 3-bit comparator is used to generate the "expire" signal, which is used as input to the Next State Logic, to trigger the state transition when the counter reaches the "count_limit" selects by the MUX tree in Step 2:

    v_comparator #(`log_maxfis) expire_cmp (count_full, wcount, count_limit);
    assign expire = count_full & count_enable;
    

The last part of the Verilog code in dtrans_txctl.v:

/*
 * Random logic for generating output signals 
 */ 
assign tp_acksendreg  = cur_state[`B_DTREGFIS];
          :
assign tp_acksenddata = cur_state[`B_DTDATAFIS]; 

assign tp_sendndfis = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
    cur_state[`B_DTDMASTUP] | cur_state[`B_DTDMAACT] |
    cur_state[`B_DTXMITBIS]; 

shows how the output logic and the "glue logic" (see Item 3 of Figure 5-1) can be implemented with simple "assign" statements.

Tip

Verilog Coding Guideline 5-5: With the more complex Next State Logic already taken care of by the "Case Statement" (see Verilog Coding Guideline 5-3) and with the help of one-hot encoding for the state machine, the Output Logic can usually be implemented easily with simple assign statements.

6. Miscellaneous Verilog Coding Guidelines

If you look at the Verilog files in Appendix A, Appendix B, and Appendix C, you will notice all the verilog files have very similar format.

Tip

Verilog Coding Guideline 6-1: In order to keep the Verilog files easy to read and easy to understand for every member of the design team, adopt a standard format and use the same format for all Verilog files.

For example, the link_txdp.v file in Appendix B follows this format:

module module_name (
  // Bi-directional ports (if any)
  bi_port1,               //*** First list the inout ports (if any)
  bi_port2,               //*** List one port per line

  // Output ports
  o_port3,                //*** Then list the output ports
  o_port4,

  // Input ports
  i_port5);               //*** Finally, list the input ports

  /*
   * Declare all bi-directional ports 
   */
  inout           bi_port1;       //*** Declare one port per line
  inout           bi_port2;

  /*
   * Declare all output ports
   */
  output          o_port3;
  output          o_port4;
  
  /*
   * Declare all input ports
   */
  input           i_port5;

  /*
   * After all ports are declared, declare all the wires
   */
  wire            wire1;          //** Declare one wire per line
  wire            wire2;

  /*
   * Declare all registers (if any)
   */
  reg             reg1;           //** Declare one register per line
  reg             reg2;

  /*
   * Core of the Verilog code
   */

endmodule

Notice that in link_txdp.v file in Appendix B, when the module "l_scramble" is instantiated, explicit connection (example: .reset (lktx_reset)) is used:

l_scramble scrambler (
    .scr_out (scr_out),             .scr_in (32'hc2d2768d),
    .scr_init (txscr_init),         .scr_run (txscr_run),
    .clk (txclk4x),                 .reset (lktx_reset));

Tip

Verilog Coding Guideline 6-2: In order to avoid confusion on which wire is connected which port, use explicit connection (example: .port_name (wire)) when a module is instantiated.

The module l_scramble module is defined in the file link_library.v which is also included in Appendix B. Notice the detailed comment in this module:

/*                                   Priority:
 *          scram   scr_out          -------------------------------
 *              |   |                reset (asynchronous):   highest
 *          +---v---v---+            scr_init (synchronous): middle
 * scr_run-->\S 1   0  /             scr_run (synchronous):  lowest
 *            +---+---+  scr_in
 *                |       |
 *            +---v-------v---+
 *             \  0       1 S/<--scr_init (higher priority than scr_run)
 *              +-----+-----+
 *                    |
 *                    v
 *                  lastmux
 */

Tip

Verilog Coding Guideline 6-3: In order to keep the Verilog code easy to understand for everyone (including yourself :-), use detailed comments. More importantly, put in the comments as you do the coding because if you do not put in the comments now, it is unlikely you will put them in later.

Finally, one may notice the absent of the "timescale" statements in any of the files that models the high level modules (Appendix A), the datapath (Appendix B), and the controller (Appendix C). The reason is that there is no need to have any timescale statements in the Verilog code if the Verilog Coding Guideline 2-2 is followed:

Tip

Verilog Coding Guideline 2-2: Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay.

More specifically, as shown in Section 2, the v_reg and v_latch each has "1 time unit" clock-to-q delay. This clock-to-q delay is the ONLY delay we have in our Verilog code. Consequently, our Verilog code will work no matter what time scale this time unit is set to (i.e. it can set to 1ps, 1ns, 1ms, ... etc.). The only time we need to have a timescale statement is when we want to run simulation on our Verilog model.

Tip

Verilog Coding Guideline 6-4: Ideally, there should not be any "timescale" directive in any of the Verilog file that models the hardware (because they are not needed if we follow the Verilog Coding Guideline 2-2). Consequently, there should only be ONE and only ONE timescale directive in any Verilog simulation run and that timescale directive should be placed at the beginning of the test bench file (see Reference [13]).

7. Summary of Logic Design and Verilog Coding Guidelines

Below is a summary of all the logic design guidelines:

Tip

Logic Design Guideline 2-1 (MOST IMPORTANT): The design MUST be as simple as possible and easy to understand!

Tip

Logic Design Guideline 3-1: Use an hierarchal strategy that breaks the design into modules that consists of datapaths and controllers. More specifically:

  1. Divide the problem into multiple modules with clean and well defined interface.

  2. For each module:
    1. Design the datapath that can process the data for that module.
    2. Design the controller to control the datapath and produce control outputs (if any) to other adjacent modules.

Tip

Logic Design Guideline 3-2: Keep different clock domains separate and have an explicit synchronization module for signals that cross the clock domain.

Tip

Logic Design Guideline 4-1: The best way to study the effect of the datapath's pipeline registers is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock.

Tip

Logic Design Guideline 4-2: The block diagram of the datapath should show ALL registers, including the implicit register of the Sequential Datapath Element.

Tip

Logic Design Guideline 4-3: While designing the Sequential Datapath Elements, separates the element into the two parts: (1) the combinational logic, and (2) the register.

Tip

Logic Design Guideline 5-1: The best way to decide when and where to use pipeline register or registers to stage the controller inputs and outputs is to draw a timing diagram showing each register's effect on its outputs with respect to rising or falling edge of the register's input clock.

Tip

Logic Design Guideline 5-2: The block diagram of the controller should show ALL registers explicitly while the random logic can be represented by a simple black box.

Tip

Logic Design Guideline 5-3: If possible, use one-hot encoding for the finite state machine's state encoding to simplify the Output Logic as well as the Next State Logic.

Tip

Logic Design Guideline 5-4: Instead of designing a controller with a giant and complex finite state machine at its core, it may be easier to break the controller into multiple smaller controllers, each with a smaller and simplier finite state machine at its core.

Tip

Logic Design Guideline 5-5: For finite state machine design, keep the Next State Logic block separate from the Output Logic block.

Tip

Logic Design Guideline 5-6: In a Meally Machine design, it is possible to use the Next State Logic block's output as inputs to the Output Logic block. This must be done with caution since the total delay of the two logic block may become the critical path of the controller.

Tip

Logic Design Guideline 5-7: A finite state machine containing states whose transition to their next states are governed only by the number of cycles it has to wait can be simplified by building a MUX tree to select the number of cycles a counter must count before generating an "expire" signal to trigger the state transition.

Below is a summary of all the Verilog coding guidelines:

Tip

Verilog Coding Guideline 2-1: Model all the standard logic elements in a library file to be SHARED by ALL engineers in the design team.

Tip

Verilog Coding Guideline 2-2: Only the storage elements (examples: register and latch) have non-zero clock-to-q time. All combinational logic (example: mux) has zero delay.

Tip

Verilog Coding Guideline 2-3: Use explicit register and latch (example: v_reg and v_latch as shown in Section 2) in your verilog coding. Do not rely on logic synthesis tools to generate latches or registers for you.

Tip

Verilog Coding Guideline 3-1: A separate Verilog file is assigned to the Verilog code for:

  1. Each datapath. Example: dtrans_txdp.v
  2. Each controller. Example: dtrans_txctl.v
  3. As well as the Verilog code for each high level module, that is a module at a hierarchy level higher than the datapath and the controller. Examples: link_tx.v, link_rx.v, and link.v

Tip

Verilog Coding Guideline 3-2: In order to keep the number of Verilog files under control, one should try not to assign a separate Verilog file to any low level module that is at a hierarchy level lower than the datapath and the controller.

Tip

Verilog Coding Guideline 3-3: The Verilog code for the high level module, that is module at a hierarchy level higher than the datapath and the controller (examples: module dtrans_tx, module dtrans_rx, and module dtrans) should not contains any logic. It should only shows how the lower level modules are connected.

Tip

Verilog Coding Guideline 4-1: Keep the verilog coding of the datapath simple and straight forward. Leave the fancy coding (IF any) to the datapath elements and place such elements in a separate (library) file.

Tip

Verilog Coding Guideline 4-2: The Verilog coding of the datapath elements should make use of the standard logic elements (registers, multiplexers, ... etc.) already defined in the library discussed in Verilog Coding Guideline 2-1.

Tip

Verilog Coding Guideline 4-3: Define symbolic values (see also Verilog Coding Guideline 5-2) in a header file (example: link_defs.v) and include this header file in all files that can make use of these symbolic values to make the Verilog code easier to maintain and easier to understand.

Tip

Verilog Coding Guideline 5-1: If one-hot encoding is used for the finite state machine (see Logic Design Guideline 5-3), define a symbolic value for each bit position as well as a symbolic value for the binary value when that bit position is set. This makes the Verilog code much easier to read and understand.

Tip

Verilog Coding Guideline 5-2: One common convention used by many Verilog code writer is to use "define" for constant values such as:

`define DTTXIDLE        15'h0001

while "parameter" is used ONLY for things that can changed such as the width of the register, muxes ... etc. (see also Section 2).

Tip

Verilog Coding Guideline 5-3: Use an explicit State Register and separate the Next State Logic from this explicit register.

Tip

Verilog Coding Guideline 5-4: The Next State Logic, with only ONE output (the "next_state" vector), can be implemented easily with a Verilog Case statement.

Tip

Verilog Coding Guideline 5-5: With the more complex Next State Logic already taken care of by the "Case Statement" (see Verilog Coding Guideline 5-3) and with the help of one-hot encoding for the state machine, the Output Logic can usually be implemented easily with simple assign statements.

Tip

Verilog Coding Guideline 6-1: In order to keep the Verilog files easy to read and easy to understand for every member of the design team, adopt a standard format and use the same format for all Verilog files.

Tip

Verilog Coding Guideline 6-2: In order to avoid confusion on which wire is connected which port, use explicit connection (example: .port_name (wire)) when a module is instantiated.

Tip

Verilog Coding Guideline 6-3: In order to keep the Verilog code easy to understand for everyone (including yourself :-), use detailed comments. More importantly, put in the comments as you do the coding because if you do not put in the comments now, it is unlikely you will put them in later.

Tip

Verilog Coding Guideline 6-4: Ideally, there should not be any "timescale" directive in any of the Verilog file that models the hardware (because they are not needed if we follow the Verilog Coding Guideline 2-2). Consequently, there should only be ONE and only ONE timescale directive in any Verilog simulation run and that timescale directive should be placed at the beginning of the test bench file (see Reference [13]).

With all these logic design and Verilog coding guidelines, does this mean there is no room for logic designer to be creative? Not at all. Artists such as movie directors and music composers need to follow many guidelines and yet nobody can say they are not doing creative work. They just spend their creativity at tasks that require creativity and follow the standard guidelines (such as a movie should be approximately 2 hours long) when creativity is not needed. Logic design is the same: be creative on tasks that truly deserves innovation (such as how to build a datapath that can process data at half the power) but not on tasks such as how to write a complex Verilog statement that can save a few lines of Verilog code but nobody else can understand.

The ultimate goal for any logic designer is to keep his or her design and the Verilog code that models the design AS EASY TO UNDERSTAND AS POSSIBLE. Remember this, the easier other people can understand your design and your Verilog code, more people can help you in your work and less likely will your vacation be interrupted by late night phone calls from your coworker covering for you :-) So make your design easy to understand :-)

8. References

[1] Private communications, October 2001.

[2] For those readers who can access my home directory, the Parallel ATA
Interface to the disk is modeled by the module dataif in the Verilog file:
/home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v
[3] For those readers who can access my home directory, the Transport Layer
is modeled by the module dtrans in the Verilog file:
/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v
[4] For those readers who can access my home directory, the Link Layer
is modeled by the module link in the Verilog file:
/home/kong/P2001/Verilog/DeviceDongle/Link/link.v
[5] For those readers who can access my home directory, the files are in:
/home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_dp.v /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_ctl.v
[6] For those readers who can access my home directory, the files are in:

/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_tx.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txdp.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v

/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rx.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxdp.v /home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxctl.v

[7] For those readers who can access my home directory, the files are in:

/home/kong/P2001/Verilog/DeviceDongle/Link/link.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_tx.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_txctl.v

/home/kong/P2001/Verilog/DeviceDongle/Link/link_rx.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_rxdp.v /home/kong/P2001/Verilog/DeviceDongle/Link/link_rxctl.v

[8] For readers have accessed to my home directory, link_txdp.v is in:
/home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v
[9] For readers have accessed to my home directory, link_library.v is in:
/home/kong/P2001/Verilog/CommonFiles/link_library.v
[10] For readers have accessed to my home directory, link_defs.v is in:
/home/kong/P2001/Verilog/CommonFiles/link_defs.v
Note: Both the link_library.v (Reference [9] above) and link_defs.v
are placed in the "CommonFiles" directory because they are used by all Link Layer files.
[11] For readers have accessed to my home directory, trans_defs.v is in:
/home/kong/P2001/Verilog/CommonFiles/trans_defs.v

Note: The file trans_defs.v is placed in the "CommonFiles" directory because it is used by all Transport Layer files.

[12] For readers have accessed to my home directory, dtrans_txctl.v is in:
/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v
[13] For those readers who can access my home directory, please refer to:
/home/kong/P2001/Verilog/SATASys/Tests/test_init.v

------------------------ That's all for now folks :-) ------------------------

Appendix A: Sample Verilog Files for Modeling a Module

dtrans.v

/**************************************************************************** 
 *
 * File Name: dtrans.v
 *
 * Comment: Device Dongle Transport Layer
 *
 * Author: Shing Kong
 * Creation Date: 5/9/2001
 *
 * $Source: /proj/gemini/cvs_root/P2002/Notes/Style/appendixA,v $
 * $Date: 2001/12/06 21:49:07 $
 * $Revision: 1.1 $
 *
 *===========================================================================
 * Copyright (c) 2001 by Shing Ip Kong.  All Rights Reserved.
 ****************************************************************************/
/*
 * $Id: appendixA,v 1.1 2001/12/06 21:49:07 kong Exp $
 */
module dtrans (
    // Transmit Engine's Outputs
    tp_acksendreg,
    tp_acksenddmaa,
    tp_acksendpios,
    tp_acksenddmas,
    tp_acksendbist,
    tp_acksenddata,
    tp_txdata,
    tp_txgoempty,
    tp_txempty,
    tp_sendndfis,
    tp_senddafis,
    tp_partial,
    tp_slumber,
    tp_spdsel,

    // Transmit Engine's Inputs
    at_data,
    at_error,
    at_seccnt,
    at_secnum,
    at_cyllow,
    at_cylhi,
    at_devhd,
    at_status,
    at_interrupt,
    at_sendreg,
    at_senddmaa,
    at_sendpios,
    at_senddmas,
    at_sendbista,
    at_senddata,
    lk_txfsmidle,
    lk_rdtxfifo,
    lk_txerror,
    txclk,
    txclk4x,

    // Receive Engine's Output
    tp_datain,
    tp_featurein,
    tp_seccntin,
    tp_secnumin,
    tp_cyllowin,
    tp_cylhiin,
    tp_devhdin,
    tp_cmdin,
    tp_devctlin,
    tp_cbit,
    tp_wrdata,
    tp_wrATAreg,
    tp_rxnearfull,
    tp_rxfull,
    tp_rxdabort,
    tp_fisgood,
    tp_fisundef,

    // Receive Engine's Input
    at_rxdabort,
    lk_rxdata,
    lk_wrrxfifo,
    lk_eofis,
    rxclk,
    rxclk4x,

    // Input for both the Transmit and Receive Engines
    at_reset);

    /*
     * Transmit Engine's Outputs to the Parallel ATA Interface (dataif.v)
     */
    output             tp_acksendreg;
    output             tp_acksenddmaa;
    output             tp_acksendpios;
    output             tp_acksenddmas;
    output             tp_acksendbist;
    output             tp_acksenddata;

    /*
     * Transmit Engine's Outputs to the Link Layer (link.v)
     */
    output [31:0]      tp_txdata;
    output             tp_txgoempty;
    output             tp_txempty;
    output             tp_sendndfis;   // Sending a non-data FIS
    output             tp_senddafis;   // Sending a data FIS
    output             tp_partial;
    output             tp_slumber;
    output             tp_spdsel;

    /*
     * Transmit Engine's Inputs from the Parallel ATA Interface (dataif.v)
     */
    input [15:0]       at_data;
    input [7:0]                at_error;
    input [7:0]                at_seccnt;
    input [7:0]                at_secnum;
    input [7:0]                at_cyllow;
    input [7:0]                at_cylhi;
    input [7:0]                at_devhd;
    input [7:0]                at_status;
    input              at_interrupt;

    input              at_sendreg;
    input              at_senddmaa;
    input              at_sendpios;
    input              at_senddmas;
    input              at_sendbista;
    input              at_senddata;

    /*
     * Transmit Engine's Inputs from the Link Layer (link.v)
     */
    input              lk_txfsmidle;   // TX FSM has returned to IDLE
    input              lk_rdtxfifo;
    input              lk_txerror;

    /*
     * Transmit Engine's Clocks
     */
    input              txclk;
    input              txclk4x;

    /*
     * Receive Engine's Outputs to the Parallel ATA Interface (dataif.v)
     */
    // These signals will not be synchronized.  Instead we will
    // synchronize the "write strobe" at "dataif"--see below
    output [15:0]      tp_datain;
    output [7:0]       tp_featurein;
    output [7:0]       tp_seccntin;
    output [7:0]       tp_secnumin;
    output [7:0]       tp_cyllowin;
    output [7:0]       tp_cylhiin;
    output [7:0]       tp_devhdin;
    output [7:0]       tp_cmdin;
    output [2:1]       tp_devctlin;
    output             tp_cbit;
    // These signals will be synchronized at "dataif"
    output             tp_wrdata;
    output             tp_wrATAreg;    // Write ATA registers (except Data)

    /*
     * Receive Engine's Outputs to the Link Layer (link.v)
     */
    output             tp_rxnearfull;
    output             tp_rxfull;
    output             tp_rxdabort;    // Data RX has been aborted
    output             tp_fisgood;     // Receive a valid FIS
    output             tp_fisundef;    // FIS not recognized by Transport

    /*
     * Receive Engine's Inputs from the Parallel ATA Interface (dataif.v)
     * These signals need to be synchronized w.r.t. the rxclk4x clock
     */
    input              at_rxdabort;    // ATA interface aborts data receiving

    /*
     * Receive Engine's Inputs from the Link Layer (link.v)
     */
    input [31:0]       lk_rxdata;
    input              lk_wrrxfifo;
    input              lk_eofis;       // Link layer finish writing the FIS

    /*
     * Receive Engine's Clocks
     */
    input              rxclk;
    input              rxclk4x;

     /*
      * Inputs to both the Transmit and Receive Engines
      */
    input              at_reset;

    /*
     * Interconnections within this module
     */
    // Outputs of the Transmit Engine (dtrans_tx.v)
    wire               txfsmidle;
    wire               txokrxgo;       // TX FSM gives RX FSM the OK

    // Outputs of the Recevie Engine (dtrans_rx.v)
    wire               waittxid;
    wire               rxempty;

    /*
     * Device Dongle Transport Layer Transmit Engine (dtrans_tx.v)
     */
    dtrans_tx dtrans_tx (
       // Outputs
       .tp_acksendreg (tp_acksendreg),
       .tp_acksenddmaa (tp_acksenddmaa),
       .tp_acksendpios (tp_acksendpios),
       .tp_acksenddmas (tp_acksenddmas),
       .tp_acksendbist (tp_acksendbist),
       .tp_acksenddata (tp_acksenddata),
       .tp_txdata (tp_txdata),         .tp_txgoempty (tp_txgoempty),
       .tp_txempty (tp_txempty),       .tp_sendndfis (tp_sendndfis),
       .tp_senddafis (tp_senddafis),   .tp_partial (tp_partial),
       .tp_slumber (tp_slumber),       .tp_spdsel (tp_spdsel),
       .txfsmidle (txfsmidle),         .txokrxgo (txokrxgo),

       // Inputs
       .at_data (at_data),             .at_error (at_error),
       .at_seccnt (at_seccnt),         .at_secnum (at_secnum),
       .at_cyllow (at_cyllow),         .at_cylhi (at_cylhi),
       .at_devhd (at_devhd),           .at_status (at_status),
       .at_interrupt (at_interrupt),   .at_sendreg (at_sendreg),
       .at_senddmaa (at_senddmaa),     .at_sendpios (at_sendpios),
       .at_senddmas (at_senddmas),     .at_sendbista (at_sendbista),
       .at_senddata (at_senddata),     
       .lk_txfsmidle (lk_txfsmidle),   .lk_rdtxfifo (lk_rdtxfifo),
       .lk_txerror (lk_txerror),       .waittxid (waittxid),
       .rxempty (rxempty),             .txclk (txclk),
       .txclk4x (txclk4x),             .at_reset (at_reset));

    /*
     * Device Dongle Transport Layer Receive Engine (dtrans_rx.v)
     */
    dtrans_rx dtrans_rx (
       // Outputs
       .tp_datain (tp_datain),         .tp_featurein (tp_featurein),
       .tp_seccntin (tp_seccntin),     .tp_secnumin (tp_secnumin),
       .tp_cyllowin (tp_cyllowin),     .tp_cylhiin (tp_cylhiin),
       .tp_devhdin (tp_devhdin),       .tp_cmdin (tp_cmdin),
       .tp_devctlin (tp_devctlin),     .tp_cbit (tp_cbit),
       .tp_wrdata (tp_wrdata),         .tp_wrATAreg (tp_wrATAreg),
       .tp_rxnearfull (tp_rxnearfull), .tp_rxfull (tp_rxfull),
       .tp_rxdabort (tp_rxdabort),     .tp_fisgood (tp_fisgood),
       .tp_fisundef (tp_fisundef),
       .waittxid (waittxid),           .rxempty (rxempty),

       // Inputs
       .at_rxdabort (at_rxdabort),     .lk_rxdata (lk_rxdata),
       .lk_wrrxfifo (lk_wrrxfifo),     .lk_eofis (lk_eofis),
       .txfsmidle (txfsmidle),         .txokrxgo (txokrxgo),
       .rxclk (rxclk),                 .rxclk4x (rxclk4x),
       .at_reset (at_reset));

endmodule // dtrans

dtrans_tx.v

/****************************************************************************
 *
 * File Name: dtrans_tx.v
 *
 * Comment: Device Dongle Transport Layer Transmission Engine
 *
 * Author: Shing Kong
 * Creation Date: 3/25/2001
 *
 * $Source: /proj/gemini/cvs_root/P2002/Notes/Style/appendixA,v $
 * $Date: 2001/12/06 21:49:07 $
 * $Revision: 1.1 $
 *
 *===========================================================================
 * Copyright (c) 2001 by Shing Ip Kong.  All Rights Reserved.
 ****************************************************************************/
/*
 * $Id: appendixA,v 1.1 2001/12/06 21:49:07 kong Exp $
 */
`include "trans_defs.v"                // See ../../CommonFiles

module dtrans_tx (
    // Outputs
    tp_acksendreg,
    tp_acksenddmaa,
    tp_acksendpios,
    tp_acksenddmas,
    tp_acksendbist,
    tp_acksenddata,
    tp_txdata,
    tp_txgoempty,
    tp_txempty,
    tp_sendndfis,
    tp_senddafis,
    tp_partial,
    tp_slumber,
    tp_spdsel,
    txfsmidle,
    txokrxgo,

    // Inputs
    at_data,
    at_error,
    at_seccnt,
    at_secnum,
    at_cyllow,
    at_cylhi,
    at_devhd,
    at_status,
    at_interrupt,
    at_sendreg,
    at_senddmaa,
    at_sendpios,
    at_senddmas,
    at_sendbista,
    at_senddata,
    lk_txfsmidle,
    lk_rdtxfifo,
    lk_txerror,
    waittxid,
    rxempty,
    txclk,
    txclk4x,
    at_reset);

    /*
     * Outputs to the Parallel ATA Interface Layer (dataif.v)
     */
    output             tp_acksendreg;
    output             tp_acksenddmaa;
    output             tp_acksendpios;
    output             tp_acksenddmas;
    output             tp_acksendbist;
    output             tp_acksenddata;

    /*
     * Outputs to the Link Layer (link.v)
     */
    output [31:0]      tp_txdata;
    output             tp_txgoempty;
    output             tp_txempty;
    output             tp_sendndfis;   // Sending a non-data FIS
    output             tp_senddafis;   // Sending a data FIS
    output             tp_partial;
    output             tp_slumber;
    output             tp_spdsel;

    /*
     * Outputs to the Transport Layer Receive Engine (dtrans_rx.v)
     */
    output             txfsmidle;
    output             txokrxgo;       // TX FSM gives RX FSM the OK

    /*
     * Inputs from the Parallel ATA Interface (dataif.v)
     */
    input [15:0]       at_data;
    input [7:0]                at_error;
    input [7:0]                at_seccnt;
    input [7:0]                at_secnum;
    input [7:0]                at_cyllow;
    input [7:0]                at_cylhi;
    input [7:0]                at_devhd;
    input [7:0]                at_status;
    input              at_interrupt;

    input              at_sendreg;
    input              at_senddmaa;
    input              at_sendpios;
    input              at_senddmas;
    input              at_sendbista;
    input              at_senddata;

    /*
     * Inputs from the Link Layer (link.v)
     */
    input              lk_txfsmidle;   // TX FSM has returned to IDLE
    input              lk_rdtxfifo;
    input              lk_txerror;

    /*
     * Inputs from the Transport Layer Receive Engine (dtrans_rx.v)
     */
    input              waittxid;
    input              rxempty;

    /*
     * Reset signal and clocks
     */
    input              txclk;
    input              txclk4x;
    input              at_reset;

    /*
     * Interconnections within this module
     */
    // Outputs of the Synchronizer
    wire               tptx_reset;
    wire               r2t_waittxid;
    wire               r2t_rxempty;

    // Outputs of the Transport Layer Transmit Controller (dtrans_txctl.v)
    wire [`log_maxfis-1:0]
                       wcount;
    wire               wrtxfifo;
    wire               sendreg;
    wire               senddmaa;
    wire               sendpios;
    wire               senddmas;
    wire               sendbista;
    wire               senddata;

    // Outputs of the Transport Layer Transmit Datapath (dtrans_txdp.v)
    wire               txfull;
    wire               txtimeout;

    /*
     * Synchronizer that synchronizes the signals to the txclk4x domain
     */
    dtrans_txsyn dtrans_txsyn (
       .tptx_reset (tptx_reset),       .r2t_waittxid (r2t_waittxid),
       .r2t_rxempty (r2t_rxempty),
       .waittxid (waittxid),           .rxempty (rxempty),
       .txclk (txclk),                 .txclk4x (txclk4x),
       .at_reset (at_reset));

    /*
     * Device Transport Layer Transmit Controller (dtrans_txctl.v)
     */
    dtrans_txctl dtrans_txctl (
       // Outputs
       .tp_acksendreg (tp_acksendreg),
       .tp_acksenddmaa (tp_acksenddmaa),
       .tp_acksendpios (tp_acksendpios),
       .tp_acksenddmas (tp_acksenddmas),
       .tp_acksendbist (tp_acksendbist),
       .tp_acksenddata (tp_acksenddata),
       .tp_sendndfis (tp_sendndfis),   .tp_senddafis (tp_senddafis),
       .tp_partial (tp_partial),       .tp_slumber (tp_slumber),
       .tp_spdsel (tp_spdsel),
       .txfsmidle (txfsmidle),         .txokrxgo (txokrxgo),
       .wcount (wcount),               .wrtxfifo (wrtxfifo),
       .sendreg (sendreg),             .senddmaa (senddmaa),
       .sendpios (sendpios),           .senddmas (senddmas),
       .sendbista (sendbista),         .senddata (senddata),

       // Inputs
       .at_sendreg (at_sendreg),       .at_senddmaa (at_senddmaa),
       .at_sendpios (at_sendpios),     .at_senddmas (at_senddmas),
       .at_sendbista (at_sendbista),   .at_senddata (at_senddata),
       .lk_txfsmidle (lk_txfsmidle),   .lk_txerror (lk_txerror),
       .r2t_waittxid (r2t_waittxid),   .r2t_rxempty (r2t_rxempty),
       .txfull (txfull),               .txtimeout (txtimeout),
       .txclk4x (txclk4x),             .tptx_reset (tptx_reset));

    /*
     * Device Transport Layer Transmit Datapath (dtrans_txdp.v)
     */
    dtrans_txdp dtrans_txdp (
       // Outputs
       .tp_txdata (tp_txdata),         .tp_txgoempty (tp_txgoempty),
       .tp_txempty (tp_txempty),       .txfull (txfull),
       .txtimeout (txtimeout),

       // Inputs
       .at_data (at_data),             .at_error (at_error),
       .at_seccnt (at_seccnt),         .at_secnum (at_secnum),
       .at_cyllow (at_cyllow),         .at_cylhi (at_cylhi),
       .at_devhd (at_devhd),           .at_status (at_status),
       .at_interrupt (at_interrupt),   .lk_rdtxfifo (lk_rdtxfifo),
       .wcount (wcount),               .wrtxfifo (wrtxfifo),
       .sendreg (sendreg),             .senddmaa (senddmaa),
       .sendpios (sendpios),           .senddmas (senddmas),
       .sendbista (sendbista),         .senddata (senddata),
       .txclk4x (txclk4x),             .tptx_reset (tptx_reset));

endmodule // dtrans_tx

/****************************************************************************
 * Module dtrans_txsyn: Synchronize signals for the txclk4x clock domain
 ****************************************************************************/
module dtrans_txsyn (
    // Outputs
    tptx_reset,
    r2t_waittxid,
    r2t_rxempty,

    // Inputs
    waittxid,
    rxempty,
    txclk,
    txclk4x,
    at_reset);

    output     tptx_reset;

    output     r2t_waittxid;   // RX FSM waiting TX FSM to be idle
    output     r2t_rxempty;    // RX FIFO is empty

    input      waittxid;
    input      rxempty;

    input      txclk;
    input      txclk4x;

    input      at_reset;

    /*
     * Connection within this module
     */
    wire       txclk_waittxid;
    wire       txclk_rxempty;

    /*
     * Register the reset signal before using it locally
     */
    v_reg #(1) reset_ff (tptx_reset, txclk4x, at_reset);

    /*
     * For signals coming from the rxclk or rxclk4x domain,
     * Step 1: Sample it with the txclk clock.
     * Step 2: Sample it again with the txclk4x clock.
     *
     * Note: we assume the rising edges of txclk and txclk4x are aligned.
     * Consequently, sampling a signal with txclk followed by sampling it
     * with txclk4x has the same effect as sampling a signal twice with
     * txclk as far as the prevention of meta-stability is concerned.
     */
    v_reg #(1) txclk_ff0 (txclk_waittxid, txclk, waittxid);
    v_reg #(1) txclk_ff1 (txclk_rxempty, txclk, rxempty);

    v_reg #(1) syn_ff0 (r2t_waittxid, txclk4x, txclk_waittxid);
    v_reg #(1) syn_ff1 (r2t_rxempty, txclk4x, txclk_rxempty);

endmodule // dtrans_txsyn

dtrans_rx.v

/****************************************************************************
 *
 * File Name: dtrans_rx.v
 *
 * Comment: Device Dongle Transport Layer Receive Engine
 *
 * Author: Shing Kong
 * Creation Date: 5/3/2001
 *
 * $Source: /proj/gemini/cvs_root/P2002/Notes/Style/appendixA,v $
 * $Date: 2001/12/06 21:49:07 $
 * $Revision: 1.1 $
 *
 *===========================================================================
 * Copyright (c) 2001 by Shing Ip Kong.  All Rights Reserved.
 ****************************************************************************/
/*
 * $Id: appendixA,v 1.1 2001/12/06 21:49:07 kong Exp $
 */
module dtrans_rx (
    // Outputs
    tp_datain,
    tp_featurein,
    tp_seccntin,
    tp_secnumin,
    tp_cyllowin,
    tp_cylhiin,
    tp_devhdin,
    tp_cmdin,
    tp_devctlin,
    tp_cbit,
    tp_wrdata,
    tp_wrATAreg,
    tp_rxnearfull,
    tp_rxfull,
    tp_rxdabort,
    tp_fisgood,
    tp_fisundef,
    waittxid,
    rxempty,

    // Inputs
    at_rxdabort,
    lk_rxdata,
    lk_wrrxfifo,
    lk_eofis,
    txfsmidle,
    txokrxgo,
    rxclk,
    rxclk4x,
    at_reset);

    /*
     * Outputs to the Parallel ATA Interface Layer (dataif.v)
     */
    // These signals will not be synchronized.  Instead we will
    // synchronize the "write strobe" at "dataif"--see below
    output [15:0]      tp_datain;
    output [7:0]       tp_featurein;
    output [7:0]       tp_seccntin;
    output [7:0]       tp_secnumin;
    output [7:0]       tp_cyllowin;
    output [7:0]       tp_cylhiin;
    output [7:0]       tp_devhdin;
    output [7:0]       tp_cmdin;
    output [2:1]       tp_devctlin;
    output             tp_cbit;

    // These signals will be synchronized at "dataif"
    output             tp_wrdata;
    output             tp_wrATAreg;    // Write ATA registers (except Data)

    /*
     * Outputs to the Link Layer (link.v)
     */
    output             tp_rxnearfull;
    output             tp_rxfull;

    output             tp_rxdabort;    // Data RX has been aborted
    output             tp_fisgood;     // Receive a valid FIS
    output             tp_fisundef;    // FIS not recognized by Transport

    /*
     * Output to the Transport Layer Transmit Engine (dtrans_tx.v)
     */
    output             waittxid;       // RX FSM waiting TX FSM to be idle
    output             rxempty;

    /*
     * Inputs from the Parallel ATA Interface Layer (dataif.v)
     * These signals need to be synchronized w.r.t. the rxclk4x clock
     */
    input              at_rxdabort;    // ATA interface aborts data receiving

    /*
     * Inputs from the Link Layer (link.v)
     */
    input [31:0]       lk_rxdata;
    input              lk_wrrxfifo;
    input              lk_eofis;       // Link layer finish writing the FIS

    /*
     * Inputs from the Transport Layer Transmit Engine (dtrans_tx.v)
     */
    input              txfsmidle;
    input              txokrxgo;               // TX FSM gives RX FSM the OK

    /*
     * Miscellaneous Inputs
     */
    input              rxclk;
    input              rxclk4x;
    input              at_reset;

    /*
     * Connections within this module
     */
    // Outputs of the Synchronizer
    wire               tprx_reset;
    wire               t2r_rxdabort;           // Abort receiving data
    wire               t2r_txfsmidle;
    wire               t2r_txokrxgo;           // TX FSM gives RX FSM the OK

    // Outputs of the Transport Layer Receive Controller (dtrans_rxctl.v)
    wire               incomefis;      // Receiving FIS
    wire               rdrxfifo;
    wire               hld_feature;
    wire               hld_seccnt;
    wire               hld_secnum;
    wire               hld_cyllow;
    wire               hld_cylhi;
    wire               hld_devhd;
    wire               hld_cmd;
    wire               hld_devctl;
    wire               hld_cbit;
    wire               upperdata;      // Use bits[31:16] of the Data FIS

    // Outputs of the Transport Layer Receive Datapath (dtrans_rxdp.v)
    wire               rxtimeout;
    wire               hregfis;
    wire               dmasfis;
    wire               bisafis;
    wire               datafis;

    /*
     * Synchronizer that synchronize the signals to the rxclk4x domain
     */
    dtrans_rxsyn dtrans_rxsyn (
       // Outputs
       .tprx_reset (tprx_reset),       .t2r_rxdabort (t2r_rxdabort),
       .t2r_txfsmidle (t2r_txfsmidle), .t2r_txokrxgo (t2r_txokrxgo),

       // Inputs
       .at_reset (at_reset),   .at_rxdabort (at_rxdabort),
       .txfsmidle (txfsmidle),         .txokrxgo (txokrxgo),
       .rxclk (rxclk),                 .rxclk4x (rxclk4x));

    /*
     * Device Transport Layer Receive Controller (dtrans_rxctl.v)
     */
    dtrans_rxctl dtrans_rxctl (
       // Outputs
       .tp_wrdata (tp_wrdata),         .tp_wrATAreg (tp_wrATAreg),
       .tp_rxdabort (tp_rxdabort),     .tp_fisgood (tp_fisgood),
       .tp_fisundef (tp_fisundef),     .waittxid (waittxid),
       .incomefis (incomefis),         .rdrxfifo (rdrxfifo),
       .hld_feature (hld_feature),     .hld_seccnt (hld_seccnt),
       .hld_secnum (hld_secnum),       .hld_cyllow (hld_cyllow),
       .hld_cylhi (hld_cylhi),         .hld_devhd (hld_devhd),
       .hld_cmd (hld_cmd),             .hld_devctl (hld_devctl),
       .hld_cbit (hld_cbit),           .upperdata (upperdata),

       // Inputs
       .t2r_rxdabort (t2r_rxdabort),   .lk_eofis (lk_eofis),
       .t2r_txfsmidle (t2r_txfsmidle), .t2r_txokrxgo (t2r_txokrxgo),
       .rxempty (rxempty),             .rxtimeout (rxtimeout),
       .hregfis (hregfis),             .dmasfis (dmasfis),
       .bisafis (bisafis),             .datafis (datafis),
       .rxclk4x (rxclk4x),             .tprx_reset (tprx_reset));

    /*
     * Device Transport Layer Receive Datapath (dtrans_rxdp.v)
     */
    dtrans_rxdp dtrans_rxdp (
       // Outputs
       .tp_datain (tp_datain),         .tp_featurein (tp_featurein),
       .tp_seccntin (tp_seccntin),     .tp_secnumin (tp_secnumin),
       .tp_cyllowin (tp_cyllowin),     .tp_cylhiin (tp_cylhiin),
       .tp_devhdin (tp_devhdin),       .tp_cmdin (tp_cmdin),
       .tp_devctlin (tp_devctlin),     .tp_cbit (tp_cbit),
       .tp_rxnearfull (tp_rxnearfull), .tp_rxfull (tp_rxfull),
       .rxempty (rxempty),             .rxtimeout (rxtimeout),
       .hregfis (hregfis),             .dmasfis (dmasfis),
       .bisafis (bisafis),             .datafis (datafis),

       // Inputs
       .lk_rxdata (lk_rxdata),         .lk_wrrxfifo (lk_wrrxfifo),
       .incomefis (incomefis),         .rdrxfifo (rdrxfifo),
       .hld_feature (hld_feature),     .hld_seccnt (hld_seccnt),
       .hld_secnum (hld_secnum),       .hld_cyllow (hld_cyllow),
       .hld_cylhi (hld_cylhi),         .hld_devhd (hld_devhd),
       .hld_cmd (hld_cmd),             .hld_devctl (hld_devctl),
       .hld_cbit (hld_cbit),           .upperdata (upperdata),
       .rxclk4x (rxclk4x),             .tprx_reset (tprx_reset));

endmodule // dtrans_rx

/****************************************************************************
 * Module dtrans_rxsyn: Synchronize signals for the rxclk4x clock domain
 ****************************************************************************/
module dtrans_rxsyn (
    // Outputs
    tprx_reset,
    t2r_rxdabort,
    t2r_txfsmidle,
    t2r_txokrxgo,

    // Inputs
    at_rxdabort,
    txfsmidle,
    txokrxgo,
    rxclk,
    rxclk4x,
    at_reset);

    output     tprx_reset;

    output     t2r_rxdabort;           // Abort receiving data
    output     t2r_txfsmidle;
    output     t2r_txokrxgo;           // TX FSM gives RX FSM the OK

    input      at_rxdabort;    // ATA interface aborts data receiving
    input      txfsmidle;
    input      txokrxgo;       // TX FSM gives RX FSM the OK

    input      rxclk;
    input      rxclk4x;
    input      at_reset;

    /*
     * Interconnections within this module
     */
    wire       rxclk_rxdabort;         // Abort receiving data
    wire       rxclk_txfsmidle;
    wire       rxclk_txokrxgo;         // TX FSM gives RX FSM the OK

    v_reg #(1) reset_ff (tprx_reset, rxclk4x, at_reset);

    /*
     * For signals coming from the txclk or txclk4x domain,
     * Step 1: Sample it with the rxclk clock.
     * Step 2: Sample it again with the rxclk4x clock.
     *
     * Note: we assume the rising edges of rxclk and rxclk4x are aligned.
     * Consequently, sampling a signal with rxclk followed by sampling it
     * with rxclk4x has the same effect as sampling a signal twice with
     * rxclk as far as the prevention of meta-stability is concerned.
     */
    v_reg #(1) rxclk_ff0 (rxclk_rxdabort, rxclk, at_rxdabort);
    v_reg #(1) rxclk_ff1 (rxclk_txfsmidle, rxclk, txfsmidle);
    v_reg #(1) rxclk_ff2 (rxclk_txokrxgo, rxclk, txokrxgo);

    v_reg #(1) syn_ff0 (t2r_rxdabort, rxclk4x, rxclk_rxdabort);
    v_reg #(1) syn_ff1 (t2r_txfsmidle, rxclk4x, rxclk_txfsmidle);
    v_reg #(1) syn_ff2 (t2r_txokrxgo, rxclk4x, rxclk_txokrxgo);

endmodule // dtrans_rxsyn

Appendix B: Sample Verilog Files for Modeling a Datapath

Appendix C: Sample Verilog Files for Modeling a Controller

trans_defs.v

/****************************************************************************
 *
 * File Name: trans_defs.v
 *
 * Comment: Definitions for the Transport Layer
 *
 * Author: Shing Kong
 * Creation Date: 3/21/2001
 *
 * $Source: /proj/gemini/cvs_root/P2002/Notes/Style/appendixC,v $
 * $Date: 2001/12/06 21:49:07 $
 * $Revision: 1.1 $
 *
 *===========================================================================
 * Copyright (c) 2001 by Shing Ip Kong.  All Rights Reserved.
 ****************************************************************************/
/*
 * $Id: appendixC,v 1.1 2001/12/06 21:49:07 kong Exp $
 */

/*
 * When the (Number of Unused Entries - 1) in the FIFO is less than
 * this value, declare the receive FIFO as "nearly full."
 */
`define NEARFULL        5'd20  // Declare "nearfull" when we have 20 entries

/*
 * For the Receive engine, assert timeout if the Transport layer is
 * expecting an incoming FIS and the Link Layer has not written anything
 * into the rxfifo for `RXTIMEOUT cycles.
 *
 * For the Transmit engine, assert timeout if the Transport layer has
 * finished writing the FIS into the txfifo but the Link Layer has not
 * done anymore reading for `TXTIMEOUT cycles.
 */
`define RXTIMEOUT      10'd999
`define TXTIMEOUT      10'd999

/*
 * FIS Type: the 8-bit hex value as appeared in Byte 0 Word 0 of the FIS
 */
`define HFISREG                8'h27   // Host-to-Device (H) Register (REG)
`define DFISREG                8'h34   // Device-to-Host (D) Register (REG)
`define DFISDMAA       8'h39   // Device-to-Host (D) DMA Activate (DMAA)
`define DFISPIOS       8'h5F   // Device-to-Host (D) PIO Setup (PIOS)
`define BFISDMAS       8'h41   // Bidirectional (B) DMA Setup (DMAS)
`define BFISBISTA      8'h58   // Bidirectional (B) BIST Activate (BISTA)
// `define BFISDATA    8'h73   // Bidirectional (B) Data (DATA) FIS
`define BFISDATA       8'h46   // Changed per erata (see ktoh's 5/1's email)

/*
 * Number of 32-bit words in various FIS minus one
 *     For example, the Register (REG) Host-to-Device (H) FIS (see P.167 of
 *     of the SATA Spec, Version 1.0) has 5 words => REGHFISm1 = 5 - 1 = 4.
 */
`define NHFISREGm1     3'd4    // Host-to-Device (H) Register (REG)
`define NDFISREGm1     3'd4    // Device-to-Host (D) Register (REG)
`define NDFISPIOSm1    3'd4    // Device-to-Host (D) PIO Setup (PIOS)
`define NBFISDMASm1    3'd6    // Bidirectional (B) DMA Setup (DMAS)
`define NBFISBISTAm1   3'd2    // Bidirectional (B) BIST Activate (BISTA)

/*
 * Log 2 of the biggest number appears above.  The biggest number at this
 * point is: DMASFISBm1 = 6; so Integer (log (6)) = 3.  Notice that if this
 * number change, we need to change the "3'd" definition above as well as
 * changing the v_count3 to a differnt width counter in dtrans_ctl.v.
 */
`define log_maxfis      3

/*
 * Define the state values and bit position for the Host's Transmit Finite
 * State machine (FSM in htran_txctl).  This FSM implements the "transmit"
 * states describes in Section 8.6 (PP. 181-197) of SATA Spec, 1.0.
 */
`define num_httxfsm    16
`define B_HTTXIDLE      0
`define B_HTCHKTYP      1
`define B_HTCMDFIS      2
`define B_HTCTLFIS      3
`define B_HTDMASTUP     4      // Spec's HT_DMASTUPFIS
`define B_HTXMITBIS     5
`define B_HTPIOTX2      6      // Spec's HT_PIOOTrans2

// These states are entered via HT_CHKTYP after Transport Receiver
// decode FISs that require the Host Transport Layer to transmit.
`define B_HTPIOTX1      7      // Spec's HT_PIOOTrans1
`define B_HTDMATX1      8      // Spec's HT_DMAOTrans1
`define B_HTDMATX2      9      // Spec's HT_DMAOTrans2

// The following are collectively referred to as: HT_TransStatus in the spec
`define B_HTCMDSTA     10
`define B_HTCTLSTA     11
`define B_HTDMASSTA    12
`define B_HTBISSTA     13

`define B_HTPIOTXEND   14      // Spec's HT_PIOEND
`define B_HTDMATXEND   15      // Spec's HT_DMAEND

// Host Dongle's TX FSM State Values
`define HTTXIDLE       16'h0001
`define HTCHKTYP       16'h0002
`define HTCMDFIS       16'h0004
`define HTCTLFIS       16'h0008
`define HTDMASTUP      16'h0010
`define HTXMITBIS      16'h0020
`define HTPIOTX2       16'h0040
`define HTPIOTX1       16'h0080
`define HTDMATX1       16'h0100
`define HTDMATX2       16'h0200
`define HTCMDSTA       16'h0400
`define HTCTLSTA       16'h0800
`define HTDMASSTA      16'h1000
`define HTBISSTA       16'h2000
`define HTPIOTXEND     16'h4000
`define HTDMATXEND     16'h8000

/*
 * Define the state values and bit position for the Host's Receive Finite
 * State machine (FSM in htran_rxctl).  This FSM implements the "Decompose"
 * states describes in Section 8.6 (PP. 181-197) of SATA Spec, 1.0.
 *
 * The following states are not defined in the spec:
 *     HTWAITTXID: wait for TX to return to Idle state.  This is added
 *                 so that TX and RX FSM can be partitioned cleanly.
 *     HTRXCLEAN:  clean up the mess if we receive an unknown FIS
 */
`define num_htrxfsm    15
`define B_HTRXIDLE      0
`define B_HTRCVREG      1      // Spec's HT_RegFIS: receive Register FIS
`define B_HTRCVDS       2      // Spec's HT_DS_FIS: receive DMA Setup
`define B_HTRCVPS       3      // Spec's HT_PS_FIS: receive PIO Setup
`define B_HTRCVBIST     4
`define B_HTRCVDAC      5      // Spec's HT_DMA_FIS: receive DMA Activate
`define B_HTPIORX1      6      // Spec's HT_PITITrans1
`define B_HTPIORX2      7      // Spec's HT_PITITrans2
`define B_HTDMARX       8      // Spec's HT_DMAITrans
`define B_HTBISTTRAN1   9
`define B_HTPIORXEND   10      // Spec's HT_PIOEND
`define B_HTDMARXEND   11      // Spec's HT_DMAEND
`define B_HTRXCLEAN    12      // Receive an unknown FIS, needs to clean up
`define B_HTTXBUSY     13
`define B_HTWAITTXID   14

// Host Dongle's RX FSM State Values
`define HTRXIDLE       15'h0001
`define HTRCVREG       15'h0002
`define HTRCVDS                15'h0004
`define HTRCVPS                15'h0008
`define HTRCVBIST      15'h0010
`define HTRCVDAC       15'h0020
`define HTPIORX1       15'h0040        // *** Debug 4/27: combine with RX2?
`define HTPIORX2       15'h0080        // *** Debug 4/27: combine with RX1?
`define HTDMARX                15'h0100
`define HTBISTTRAN1    15'h0200
`define HTPIORXEND     15'h0400
`define HTDMARXEND     15'h0800
`define HTRXCLEAN      15'h1000
`define HTTXBUSY       15'h2000
`define HTWAITTXID     15'h4000

/*
 * Define the state values and bit position for the Device's Transmit Finite
 * State machine (FSM in dtran_txctl).  This FSM implements the "transmit"
 * states describes in Section 8.7 (PP. 197-205) of SATA Spec, 1.0.
 */
`define num_dttxfsm    15
`define B_DTTXIDLE      0
`define B_DTCHKTYP      1
`define B_DTREGFIS      2      // Spec's DT_RegHDFIS
`define B_DTPIOSTUP     3      // Spec's DT_PIOSTUPFIS
`define B_DTDMASTUP     4      // Spec's DT_DMASTUPFIS
`define B_DTDMAACT      5      // Spec's DT_DMAACTFIS
`define B_DTXMITBIS     6
`define B_DTDATAFIS     7      // Spec's DT_DATAIFIS
`define B_DTDATATX      8      // Spec's DT_DATAITrans
`define B_DTDTXEND      9      // Spec's DT_DATAIEnd

// The following are collectively referred to as: DT_TransStatus in the spec
`define B_DTREGSTA     10
`define B_DTPIOSSTA    11
`define B_DTDMASSTA    12
`define B_DTDMAASTA    13
`define B_DTBISSTA     14

// Devcie Dongle's TX FSM State Values
`define DTTXIDLE       15'h0001
`define DTCHKTYP       15'h0002
`define DTREGFIS       15'h0004        // Spec's DT_RegHDFIS
`define DTPIOSTUP      15'h0008        // Spec's DT_PIOSTUPFIS
`define DTDMASTUP      15'h0010        // Spec's DT_DMASTUPFIS
`define DTDMAACT       15'h0020        // Spec's DT_DMAACTFIS
`define DTXMITBIS      15'h0040
`define DTDATAFIS      15'h0080        // Spec's DT_DATAIFIS
`define DTDATATX       15'h0100        // Spec's DT_DATAITrans
`define DTDTXEND       15'h0200        // Spec's DT_DATAIEnd

// The following are collectively referred to as: DT_TransStatus in the spec
`define DTREGSTA       15'h0400
`define DTPIOSSTA      15'h0800
`define DTDMASSTA      15'h1000
`define DTDMAASTA      15'h2000
`define DTBISSTA       15'h4000

/*
 * Define the state values and bit position for the Device's Receive Finite
 * State machine (FSM in dtran_rxctl).  This FSM implements the "Decompose"
 * states describes in Section 8.7 (PP. 197-210) of SATA Spec, 1.0.
 *
 * The following states are not defined in the spec:
 *     DTWAITTXID: wait for TX to return to Idle state.  This is added
 *                 so that TX and RX FSM can be partitioned cleanly.
 *     DTRXCLEAN:  clean up the mess if we receive an unknown FIS
 */
`define num_dtrxfsm    10
`define B_DTRXIDLE      0
`define B_DTRCVREG      1      // Spec's DT_RegHDFIS: receive Register FIS
`define B_DTRCVDMAS     2      // Spec's DT_DAMSTUPFIS
`define B_DTRCVBIST     3
`define B_DTRCVDFIS     4      // spec's DT_DATAOFIS
`define B_DTRCVDATA     5      // Spec's DT_DATAOREC
`define B_DTDEVABORT    6      // Spec's DT_DeviceAbort
`define B_DTBISTTRAN1   7
`define B_DTRXCLEAN     8      // Receive an unknown FIS, needs to clean up
`define B_DTWAITTXID    9

// Devcie Dongle's RX FSM State Values
`define DTRXIDLE       10'h001
`define DTRCVREG       10'h002 // Spec's DT_RegHDFIS: receive Register FIS
`define DTRCVDMAS      10'h004 // Spec's DT_DAMSTUPFIS
`define DTRCVBIST      10'h008
`define DTRCVDFIS      10'h010 // spec's DT_DATAOFIS
`define DTRCVDATA      10'h020 // Spec's DT_DATAOREC
`define DTDEVABORT     10'h040 // Spec's DT_DeviceAbort
`define DTBISTTRAN1    10'h080
`define DTRXCLEAN      10'h100 // Receive an unknown FIS, needs to clean up
`define DTWAITTXID     10'h200

dtrans_txctl.v

/****************************************************************************
 *
 * File Name: dtrans_txctl.v
 *
 * Comment: Device Dongle Transport Layer controller for data transmission
 *
 * Author: Shing Kong
 * Creation Date: 3/25/2001
 *
 * $Source: /proj/gemini/cvs_root/P2002/Notes/Style/appendixC,v $
 * $Date: 2001/12/06 21:49:07 $
 * $Revision: 1.1 $
 *
 *===========================================================================
 * Copyright (c) 2001 by Shing Ip Kong.  All Rights Reserved.
 ****************************************************************************/
/*
 * $Id: appendixC,v 1.1 2001/12/06 21:49:07 kong Exp $
 */
`include "trans_defs.v"                // See ../../CommonFiles

module dtrans_txctl (
    // Outputs
    tp_acksendreg,
    tp_acksenddmaa,
    tp_acksendpios,
    tp_acksenddmas,
    tp_acksendbist,
    tp_acksenddata,
    tp_sendndfis,
    tp_senddafis,
    tp_partial,
    tp_slumber,
    tp_spdsel,
    txfsmidle,
    txokrxgo,
    wcount,
    wrtxfifo,
    sendreg,
    senddmaa,
    sendpios,
    senddmas,
    sendbista,
    senddata,

    // Inputs
    at_sendreg,
    at_senddmaa,
    at_sendpios,
    at_senddmas,
    at_sendbista,
    at_senddata,
    lk_txfsmidle,
    lk_txerror,
    r2t_waittxid,
    r2t_rxempty,
    txfull,
    txtimeout,
    txclk4x,
    tptx_reset);

    /*
     * Outputs to the Parallel ATA Interface Layer (dataif.v)
     */
    output             tp_acksendreg;
    output             tp_acksenddmaa;
    output             tp_acksendpios;
    output             tp_acksenddmas;
    output             tp_acksendbist;
    output             tp_acksenddata;

    /*
     * Outputs to the Link Layer (link.v)
     */
    output             tp_sendndfis;   // Sending a non-data FIS
    output             tp_senddafis;   // Sending a data FIS
    output             tp_partial;
    output             tp_slumber;
    output             tp_spdsel;

    /*
     * Outputs to the Transport Layer Receive Engine (dtrans_rx.v)
     */
    output             txfsmidle;
    output             txokrxgo;               // TX FSM gives RX FSM the OK

    /*
     * Outputs to the Transport Layer Datapath (dtrans_dp.v)
     */
    output [`log_maxfis-1:0]
                       wcount;
    output             wrtxfifo;
    output             sendreg;
    output             senddmaa;
    output             sendpios;
    output             senddmas;
    output             sendbista;
    output             senddata;

    /*
     * Inputs from the Parallel ATA Interface Layer (dataif.v)
     * These signals must remain asserted until the FSM has changed state
     */
    input              at_sendreg;
    input              at_senddmaa;
    input              at_sendpios;
    input              at_senddmas;
    input              at_sendbista;
    input              at_senddata;

    /*
     * Inputs from the Link Layer (link.v)
     */
    input              lk_txfsmidle;   // TX FSM has returned to IDLE
    input              lk_txerror;     // Error in transmitting a FIS

    /*
     * Inputs from the Device Transport Layer Receive Engine (dtrans_rx.v)
     */
    input              r2t_waittxid;   // RX FSM waiting TX FSM to be idle
    input              r2t_rxempty;    // RX FIFO is empty

    /*
     * Inputs from the Transport Layer Transmit Datapath (dtrans_txdp.v)
     */
    input              txfull;
    input              txtimeout;

    /*
     * Reset signal and clocks
     */
    input              txclk4x;        // 150 MHz Transmit Clock
    input              tptx_reset;

    /*
     * Interconnections within this controller
     */
    wire [`num_dttxfsm-1:0]    next_state;
    wire [`num_dttxfsm-1:0]    cur_state;

    // Output of the MUXes for selecting the count limit for various states
    wire [`log_maxfis-1:0]     num_regpio;
    wire [`log_maxfis-1:0]     num_dmabis;
    wire [`log_maxfis-1:0]     count_limit;

    wire                       count_enable;
    wire                       count_full;

    /*
     * Next State Logic and the State Register for the finite state machine
     */
    // Next State Logic
    dtrans_txfsm dtrans_txfsm (
       // Outputs
       .next_state (next_state),

       // Inputs
       .cur_state (cur_state),
       .at_sendreg (at_sendreg),       .at_senddmaa (at_senddmaa),
       .at_sendpios (at_sendpios),     .at_senddmas (at_senddmas),
       .at_sendbista (at_sendbista),   .at_senddata (at_senddata),
       .lk_txfsmidle (lk_txfsmidle),   .lk_txerror (lk_txerror),
       .r2t_waittxid (r2t_waittxid),   .r2t_rxempty (r2t_rxempty),
       .txtimeout (txtimeout),         .expire (expire),
       .tptx_reset (tptx_reset));

    // State Register
    v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state);

    /*
     * Counter and its MUX tree to select the count limit
     * for the generation of the expire signal
     */
    v_mux2e #(`log_maxfis) regpio_mux (num_regpio,
       cur_state[`B_DTPIOSTUP], `NDFISREGm1, `NDFISPIOSm1);
    v_mux2e #(`log_maxfis) dmabis_mux (num_dmabis,
       cur_state[`B_DTXMITBIS], `NBFISDMASm1, `NBFISBISTAm1);
    v_mux2e #(`log_maxfis) cntlmt_mux (count_limit,
       (cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP]),
       num_regpio, num_dmabis);

    assign count_enable = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
       cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP];
    v_countN #(`log_maxfis) expire_count (
        .count_out (wcount),
       .count_enable (count_enable),
        .clk (txclk4x),
        .reset (tptx_reset | expire));

    v_comparator #(`log_maxfis) expire_cmp (count_full, wcount, count_limit);
    assign expire = count_full & count_enable;

    /*
     * Output Logic for generating output signals
     */
    assign tp_acksendreg  = cur_state[`B_DTREGFIS];
    assign tp_acksenddmaa = cur_state[`B_DTDMAACT];
    assign tp_acksendpios = cur_state[`B_DTPIOSTUP];
    assign tp_acksenddmas = cur_state[`B_DTDMASTUP];
    assign tp_acksendbist = cur_state[`B_DTXMITBIS];
    assign tp_acksenddata = cur_state[`B_DTDATAFIS];

    assign tp_sendndfis = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
       cur_state[`B_DTDMASTUP] | cur_state[`B_DTDMAACT] |
       cur_state[`B_DTXMITBIS];

    /*** Debug 5/7/2001: may need to fix the logic later for this ***/
    assign tp_senddafis = cur_state[`B_DTDATAFIS] | cur_state[`B_DTDATATX];

    /*** Debug 5/18/2001: need to fix the logic later ***/
    assign tp_partial = 1'b0;
    assign tp_slumber = 1'b0;
    assign tp_spdsel  = 1'b0;

    assign txfsmidle = cur_state[`B_DTTXIDLE];
    assign txokrxgo  = cur_state[`B_DTCHKTYP];

    assign wrtxfifo = ~txfull & (tp_sendndfis | tp_senddafis);

    assign sendreg   = cur_state[`B_DTREGFIS]; 
     assign senddmaa  = cur_state[`B_DTDMAACT];
    assign sendpios  = cur_state[`B_DTPIOSTUP];
    assign senddmas  = cur_state[`B_DTDMASTUP];
    assign sendbista = cur_state[`B_DTXMITBIS];
    assign senddata  = cur_state[`B_DTDATAFIS];
       
endmodule // dtrans_txctl

/****************************************************************************
 * Module dtrans_txfsm: Random logic for the transmit finite state machine
 ****************************************************************************/
module dtrans_txfsm (
    // Outputs
    next_state,

    // Inputs
    cur_state,
    at_sendreg,
    at_senddmaa,
    at_sendpios,
    at_senddmas,
    at_sendbista,
    at_senddata,
    lk_txfsmidle,
    lk_txerror,
    r2t_waittxid,
    r2t_rxempty,
    txtimeout,
    expire,
    tptx_reset);

    output [`num_dttxfsm-1:0]  next_state;

    input [`num_dttxfsm-1:0]   cur_state;

    /*
     * Inputs from the Parallel ATA Interface (dataif.v)
     */
    input              at_sendreg;
    input              at_senddmaa;
    input              at_sendpios;
    input              at_senddmas;
    input              at_sendbista;
    input              at_senddata;

    /*
     * Inputs from the Link Layer (link.v)
     */
    input              lk_txfsmidle;   // TX FSM has returned to IDLE
    input              lk_txerror;     // Error in transmitting a FIS

    /*
     * Inputs from the Device Transport Layer Receive Engine (dtrans_rx.v)
     */
    input              r2t_waittxid;   // RX FSM waiting TX FSM to be idle
    input              r2t_rxempty;    // RX FIFO is empty

    /*
     * Inputs from the Device Transport Layer Transmit Datapath (htrans_txdp.v)
     */
    input              txtimeout;      // Link layer did not empty the FIFO

    input              expire;
    input              tptx_reset;

    reg [`num_dttxfsm-1:0]     next_state;

    always @(cur_state or at_sendreg or at_senddmaa or at_sendpios or
    at_senddmas or at_sendbista or at_senddata or
    lk_txfsmidle or lk_txerror or
    r2t_waittxid or r2t_rxempty or
    txtimeout or expire or tptx_reset) begin

       if (tptx_reset) begin
           next_state = `DTTXIDLE;
       end
       else begin
           case (cur_state)
           `DTTXIDLE:
               if (~r2t_rxempty) begin
                   /*
                    * Give the receive engine higher priority
                    */
                   next_state = `DTCHKTYP;
               end
               else if (~lk_txfsmidle) begin
                   /*
                    * Do not send a new FIS until the Link Layer has
                    * finished reading the last one.
                    */
                   next_state = `DTTXIDLE;
               end
               else if (at_sendreg) begin
                   next_state = `DTREGFIS;
               end
               else if (at_sendpios) begin
                   next_state = `DTPIOSTUP;
               end
               else if (at_senddmas) begin
                   next_state = `DTDMASTUP;
               end
               else if (at_senddmaa) begin
                   next_state = `DTDMAACT;
               end
               else if (at_sendbista) begin
                   next_state = `DTXMITBIS;
               end
               else if (at_senddata) begin
                   next_state = `DTDATAFIS;
               end

           `DTCHKTYP:  // Start the Receive Engine for receiving FIS
               if (r2t_waittxid) begin
                   /*
                    * RX FSM is done receiving
                    */ 
                   next_state = `DTTXIDLE;
               end
               else begin
                   /*
                    * RX FSM is still busy receiving
                    */ 
                   next_state = `DTCHKTYP;
               end

           `DTREGFIS:  // Send out a Device-to-Host Register FIS
               if (~expire) begin
                   next_state = `DTREGFIS;
               end
               else begin
                   next_state = `DTREGSTA;
               end

           `DTPIOSTUP: // Send out a PIO Setup FIS
               if (~expire) begin
                   next_state = `DTPIOSTUP;
               end
               else begin
                   next_state = `DTPIOSSTA; 
               end

           `DTDMASTUP: // Send otu a DMA Setup FIS
               if (~expire) begin
                   next_state = `DTDMASTUP;
               end
               else begin
                   next_state = `DTDMASSTA;
               end

           `DTDMAACT:  // Send out a DMA Activate FIS (only one Dword)
               next_state = `DTDMAASTA;

           `DTXMITBIS: // Send out a BIST Activate FIS
               if (~expire) begin
                   next_state = `DTXMITBIS;
               end
               else begin
                   next_state = `DTBISSTA;
               end

           `DTDATAFIS: // Send out a DATA FIS
               next_state = `DTDATATX;

           `DTDATATX:  //*** Debug 5/7/2001: may be combined with DTDATAFIS?
               //*** Debug ***: may need signals other than expire?
               if (~expire) begin
                   next_state = `DTDATATX;
               end
               else begin
                   next_state = `DTDTXEND;
               end

           `DTDTXEND:  //*** Debug 5/7/2001: may need a better state
               next_state = `DTTXIDLE;

           `DTREGSTA:  // Check status after sending out a Register FIS
               if (~lk_txfsmidle & ~txtimeout) begin
                   next_state = `DTREGSTA;
               end
               else if (lk_txfsmidle & lk_txerror) begin
                   next_state = `DTREGFIS;     // Retry sending the FIS
               end
               else begin
                   next_state = `DTTXIDLE;
               end

           `DTPIOSSTA: // Check status after sending out a PIO Setup FIS
               if (~lk_txfsmidle & ~txtimeout) begin
                   next_state = `DTPIOSSTA;
               end
               else if (lk_txfsmidle & lk_txerror) begin
                   next_state = `DTPIOSTUP;    // Retry sending the FIS
               end
               else begin
                   next_state = `DTTXIDLE;
               end

           `DTDMASSTA: // Check status after sending the DMA Setup FIS
               if (~lk_txfsmidle & ~txtimeout) begin
                   next_state = `DTDMASSTA;
               end
               else if (lk_txfsmidle & lk_txerror) begin
                   next_state = `DTDMASTUP;    // Retry sending the FIS
               end
               else begin
                   next_state = `DTTXIDLE;
               end

           `DTDMAASTA: // Check status after sending the DMA Activate FIS
               if (~lk_txfsmidle & ~txtimeout) begin
                   next_state = `DTDMAASTA;
               end
               else if (lk_txfsmidle & lk_txerror) begin
                   next_state = `DTDMAACT;     // Retry sending the FIS
               end
               else begin
                   next_state = `DTTXIDLE;
               end

           `DTBISSTA:  // Check status after sending the BIST Activate FIS
               if (~lk_txfsmidle & ~txtimeout) begin
                   next_state = `DTBISSTA;
               end
               else if (lk_txfsmidle & lk_txerror) begin
                   next_state = `DTXMITBIS;    // Retry sending the FIS
               end
               else begin
                   next_state = `DTTXIDLE;
               end

           default: begin      // We should never be here
               next_state = `DTWAITTXID;
               $display (
               "*** Warning: Undefined HTP RX State, cur_state = %b ***",
               cur_state);
               end
           endcase
       end // End else (tptx_reset == 0)

    end // End always

endmodule // dtrans_txfsm
</pre>
<hr>
<hr>

</body>
</html>