CACTI 3.0 Revision

CACTI 3.0 is a cache area, access and cycle time and power consumption estimation tool.  It was written by Norm Jouppi at Compaq WRL. This distribution is simply a software re-engineered version of the original package. It is free for use under the original terms of distribution.  The revision authors (Amir Roth and Vlad Petric) make no claims or restrictions other than those made by the original authors.

This page does not document the functionality implemented within CACTI 3.0.  That documentation exists as a collection of Compaq WRL technical reports.

Bug reports are welcome.

Summary of Changes

The primary modification we made is to the interface. The old interface specified the structural parameters indirectly in terms of the ABC cache parameters: total byte capacity, block size again in bytes, and associativity.  From these, CACTI computed the number of sets, tag size, etc. Our interface allows the user to specify the structural parameters directly and at bit granularity. Thus it supports the cache-style SRAM modeling of structures that aren't "caches" per se.  We have also parameterized the implementation technology and the objective function. With our new interface, CACTI will choose a cache sub-banking configuration that minimizes an arbitrarily weighted geometric combination of access time, energy consumption per access, and total area.

New Command Line Interface

The new CACTI command line interface has a maximum of 14 parameters but be used with only the first 6, 9, or 11 parameters. The unspecified parameters are given default values.  The first six parameters do not have default values.  These are:

./cacti tech_size nsets assoc dbits tbits nbanks

The total data bit capacity of the structure is nsets * assoc * dbits * nbanks. Thus a structure with 512 total sets and 2 banks must be specified with nsets equal to 256.

The next three fields that can be added specify the number and configuration of the ports.  The default values are given in brackets.  CACTIs hardcoded behavior (which you are free to change is to treat all read ports as implemented using single-ended bitlines.

./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0]

The next two fields are the address and data bus width specifications.

./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0] abits[32] obits[64]

The final three fields are the objective function weights. The objective function is defined as pow(delay, dweight) * (power, pweight) * pow(area, aweight).

./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0] abits[32] obits[64] dweight[0.33] pweight[0.33] aweight[0.33]

New Programming Interface

The new CACTI programming interface is supplied by a single function:

void cacti_delay_power_area(const struct cacti_struct_params_t *cp,
                            struct cacti_tech_params_t *tp,
                            struct cacti_subarray_params_t *sap,
                            struct cacti_delay_power_result_t *dprp,
                            struct cacti_area_result_t *arp);

The function takes five arguments, the first two are the inputs.  The last three are the outputs.  The first inputs describes the structural parameters and objective function.  The second describes the technology parameters.  The three outputs are the chosen subarry configuration, the delay and power results and the result.  A little more about each structure.

The first input describes the structural parameters and objective function. The first four parameters describe the number and logical arrangement of the bits in the SRAM. nsets and assoc are obvious; dbits and tbits are the bit widths of an individual data block and an individual block tag.  nbanks is an important parameter as well.  The total data bit capacity of the structure is nsets * assoc * dbits * nbanks. The next four parameters are the banks and ports. serport is single-ended bitline read port. obits and abits are the bits widths of the address and data interface buses. Finally, dweight, pweight, and aweight are the objective function weights of delay, power consumption, and area, respectively. The objective function is defined as pow(delay, dweight) * (power, pweight) * pow(area, aweight).

struct cacti_struct_params_t
{
  unsigned int nsets;
  unsigned int assoc;
  unsigned int dbits;
  unsigned int tbits;
  unsigned int nbanks;

  unsigned int rport;
  unsigned int wport;
  unsigned int rwport;
  unsigned int serport;

  unsigned int obits;
  unsigned int abits;

  double dweight;
  double pweight;
  double aweight;
};

The second input describes the technology parameters. tech_size is the feature size. CACTI 3 has a "native" feature size of 0.8um, and uses scaling to approximate smaller feature sizes all the way to 0.10um. The caveat of course is that the more scaling is used, the less accurate the results. As of this point, vdd is not externally parameterizable, but rather is scaled from 5.0V using the technology scaling factor.

struct cacti_tech_params_t
{
  double tech_size;
  double vdd;
  double scaling_factor;
};

The first output parameter is the subarray configuration. Ndwl and Ndbl are the number of data array wordline and bitline (i.e., horizontal and vertical) subarray partitions. Nspd is the number of logical data sets aggregated per wordline. Ntwl, Ntbl, and Ntspd are the corresponding quantities for the tag array.  These are generally not useful as outputs per se.

struct cacti_subarray_params_t
{
  unsigned int Ndwl;
  unsigned int Ndbl;
  unsigned int Nspd;
  unsigned int Ntwl;
  unsigned int Ntbl;
  unsigned int Ntspd;
};

The second output structure gives the final delay and power consumption results.  The primarily useful fields are access_time and cycle_time for delay; total_power_onebank and total_power_allbanks for power consumption.  The remaining fields show component breakdowns of delay and power consumption components.  data_bitline_power and tag_bitline_power may be useful for computing power consumption using dynamic bitline activity factors as well.

struct cacti_delay_power_result_t
{
  int muxover;

  double access_time;
  double cycle_time;
  double senseext_scale;

  double total_power_onebank;
  double total_power_allbanks;
  double total_power_allbanks_norouting;
  double total_address_routing_power;

  double subbank_address_routing_delay;
  double subbank_address_routing_power;
 
  /* data-side decoder */
  double data_decoder_delay;
  double data_decoder_driver_delay;
  double data_decoder_3to8_delay;
  double data_decoder_inv_delay;
  double data_decoder_power;
  int data_decoder_nor_inputs;

  /* data-side wordline */
  double data_wordline_delay;
  double data_wordline_power;

  /* data-side bitlines */
  double data_bitline_delay;
  double data_bitline_power;
 
  /* data-side senseamps */
  double data_senseamp_delay;
  double data_senseamp_power;

  /* data-side output driver */
  double data_output_delay;
  double data_output_power;

  /* data-side total (all banks) output driver */
  double data_total_output_delay;
  double data_total_output_power;

  /* tag-side decoder */
  double tag_decoder_delay;
  double tag_decoder_driver_delay;
  double tag_decoder_3to8_delay;
  double tag_decoder_inv_delay;
  double tag_decoder_power;
  int tag_decoder_nor_inputs;

  /* tag-side wordline */
  double tag_wordline_delay;
  double tag_wordline_power;

  /* tag-side bitlines */
  double tag_bitline_delay;
  double tag_bitline_power;
 
  /* tag-side senseamps */
  double tag_senseamp_delay;
  double tag_senseamp_power;

  double tag_compare_delay;
  double tag_compare_power;
 
  double mux_driver_delay;
  double mux_driver_power;
 
  double selb_driver_delay;
  double selb_driver_power;
 
  double valid_driver_delay;
  double valid_driver_power;
 
  double precharge_delay;
};

The final output is the area result.  This structure is a collection of hw_t structures, from which area is calculated using the function hw2area, which scales the "raw" area parameters height and width using the technology scaling factor. The useful fields here are bank_area and bank_efficiency, total_area and total_efficiency, where efficiency is defined as the fraction of the total area taken by the data and tag areas. Breakdowns are provided if more detail is necessary.

struct hw_t
{
   double height;
   double width;
};

struct cacti_area_result_t
{
  struct hw_t data_mem_hw;
  struct hw_t data_subarray_hw;
  struct hw_t data_subblock_hw;
  struct hw_t data_array_hw;

  struct hw_t data_predecode_hw;
  struct hw_t data_colmux_predecode_hw;
  struct hw_t data_colmux_postdecode_hw;
  struct hw_t data_write_sig_hw;
  double data_aspect_ratio;
  double data_area;
  double data_mem_all_area;
  double data_subarray_all_area;

  struct hw_t tag_mem_hw;
  struct hw_t tag_subarray_hw;
  struct hw_t tag_subblock_hw;
  struct hw_t tag_array_hw;

  struct hw_t tag_predecode_hw;
  struct hw_t tag_colmux_predecode_hw;
  struct hw_t tag_colmux_postdecode_hw;
  struct hw_t tag_outdrv_decode_hw;
  struct hw_t tag_outdrv_sig_hw;
  double tag_aspect_ratio;
  double tag_area;
  double tag_mem_all_area;
  double tag_subarray_all_area;

  double bank_area;
  double bank_efficiency;
  double bank_aspect_ratio;

  struct hw_t total_hw;
  double total_area;
  double total_efficiency;
  double total_aspect_ratio;
};