CACTI 3.0 is a cache area, access and cycle time and power consumption estimation tool. It was written by Norm Jouppi at Compaq WRL. This distribution is simply a software re-engineered version of the original package. It is free for use under the original terms of distribution. The revision authors (Amir Roth and Vlad Petric) make no claims or restrictions other than those made by the original authors.
This page does not document the functionality implemented within CACTI 3.0. That documentation exists as a collection of Compaq WRL technical reports.
Bug reports are welcome.
./cacti tech_size nsets assoc dbits tbits nbanks
The total data bit capacity of the structure is nsets * assoc * dbits * nbanks. Thus a structure with 512 total sets and 2 banks must be specified with nsets equal to 256.
The next three fields that can be added specify the number and configuration of the ports. The default values are given in brackets. CACTIs hardcoded behavior (which you are free to change is to treat all read ports as implemented using single-ended bitlines.
./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0]
The next two fields are the address and data bus width specifications.
./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0] abits[32] obits[64]
The final three fields are the objective function weights. The objective function is defined as pow(delay, dweight) * (power, pweight) * pow(area, aweight).
./cacti tech_size nsets assoc dbits tbits nbanks rwport[1] rport[0] wport[0] abits[32] obits[64] dweight[0.33] pweight[0.33] aweight[0.33]
New Programming Interface
The new CACTI programming interface is supplied by a single function:
void cacti_delay_power_area(const
struct cacti_struct_params_t *cp,
struct cacti_tech_params_t *tp,
struct cacti_subarray_params_t *sap,
struct cacti_delay_power_result_t *dprp,
struct cacti_area_result_t *arp);
The function takes five arguments, the first two are the inputs. The last three are the outputs. The first inputs describes the structural parameters and objective function. The second describes the technology parameters. The three outputs are the chosen subarry configuration, the delay and power results and the result. A little more about each structure.
The first input describes the structural parameters and objective function. The first four parameters describe the number and logical arrangement of the bits in the SRAM. nsets and assoc are obvious; dbits and tbits are the bit widths of an individual data block and an individual block tag. nbanks is an important parameter as well. The total data bit capacity of the structure is nsets * assoc * dbits * nbanks. The next four parameters are the banks and ports. serport is single-ended bitline read port. obits and abits are the bits widths of the address and data interface buses. Finally, dweight, pweight, and aweight are the objective function weights of delay, power consumption, and area, respectively. The objective function is defined as pow(delay, dweight) * (power, pweight) * pow(area, aweight).
struct cacti_struct_params_t
{
unsigned
int nsets;
unsigned
int assoc;
unsigned
int dbits;
unsigned
int tbits;
unsigned
int nbanks;
unsigned
int rport;
unsigned
int wport;
unsigned
int rwport;
unsigned
int serport;
unsigned
int obits;
unsigned
int abits;
double
dweight;
double
pweight;
double
aweight;
};
The second input describes the technology parameters. tech_size is the feature size. CACTI 3 has a "native" feature size of 0.8um, and uses scaling to approximate smaller feature sizes all the way to 0.10um. The caveat of course is that the more scaling is used, the less accurate the results. As of this point, vdd is not externally parameterizable, but rather is scaled from 5.0V using the technology scaling factor.
struct cacti_tech_params_t
{
double
tech_size;
double
vdd;
double
scaling_factor;
};
The first output parameter is the subarray configuration. Ndwl and Ndbl are the number of data array wordline and bitline (i.e., horizontal and vertical) subarray partitions. Nspd is the number of logical data sets aggregated per wordline. Ntwl, Ntbl, and Ntspd are the corresponding quantities for the tag array. These are generally not useful as outputs per se.
struct cacti_subarray_params_t
{
unsigned
int Ndwl;
unsigned
int Ndbl;
unsigned
int Nspd;
unsigned
int Ntwl;
unsigned
int Ntbl;
unsigned
int Ntspd;
};
The second output structure gives the final delay and power consumption results. The primarily useful fields are access_time and cycle_time for delay; total_power_onebank and total_power_allbanks for power consumption. The remaining fields show component breakdowns of delay and power consumption components. data_bitline_power and tag_bitline_power may be useful for computing power consumption using dynamic bitline activity factors as well.
struct cacti_delay_power_result_t
{
int muxover;
double
access_time;
double
cycle_time;
double
senseext_scale;
double
total_power_onebank;
double
total_power_allbanks;
double
total_power_allbanks_norouting;
double
total_address_routing_power;
double
subbank_address_routing_delay;
double
subbank_address_routing_power;
/* data-side
decoder */
double
data_decoder_delay;
double
data_decoder_driver_delay;
double
data_decoder_3to8_delay;
double
data_decoder_inv_delay;
double
data_decoder_power;
int data_decoder_nor_inputs;
/* data-side
wordline */
double
data_wordline_delay;
double
data_wordline_power;
/* data-side
bitlines */
double
data_bitline_delay;
double data_bitline_power;
/* data-side
senseamps */
double
data_senseamp_delay;
double
data_senseamp_power;
/* data-side
output driver */
double
data_output_delay;
double
data_output_power;
/* data-side
total (all banks) output driver */
double
data_total_output_delay;
double
data_total_output_power;
/* tag-side
decoder */
double
tag_decoder_delay;
double
tag_decoder_driver_delay;
double
tag_decoder_3to8_delay;
double
tag_decoder_inv_delay;
double
tag_decoder_power;
int tag_decoder_nor_inputs;
/* tag-side
wordline */
double
tag_wordline_delay;
double
tag_wordline_power;
/* tag-side
bitlines */
double
tag_bitline_delay;
double
tag_bitline_power;
/* tag-side
senseamps */
double
tag_senseamp_delay;
double
tag_senseamp_power;
double
tag_compare_delay;
double
tag_compare_power;
double
mux_driver_delay;
double
mux_driver_power;
double
selb_driver_delay;
double
selb_driver_power;
double
valid_driver_delay;
double
valid_driver_power;
double
precharge_delay;
};
The final output is the area result. This structure is a collection of hw_t structures, from which area is calculated using the function hw2area, which scales the "raw" area parameters height and width using the technology scaling factor. The useful fields here are bank_area and bank_efficiency, total_area and total_efficiency, where efficiency is defined as the fraction of the total area taken by the data and tag areas. Breakdowns are provided if more detail is necessary.
struct hw_t
{
double height;
double width;
};
struct cacti_area_result_t
{
struct
hw_t data_mem_hw;
struct
hw_t data_subarray_hw;
struct
hw_t data_subblock_hw;
struct
hw_t data_array_hw;
struct
hw_t data_predecode_hw;
struct
hw_t data_colmux_predecode_hw;
struct
hw_t data_colmux_postdecode_hw;
struct
hw_t data_write_sig_hw;
double
data_aspect_ratio;
double
data_area;
double
data_mem_all_area;
double
data_subarray_all_area;
struct
hw_t tag_mem_hw;
struct
hw_t tag_subarray_hw;
struct
hw_t tag_subblock_hw;
struct
hw_t tag_array_hw;
struct
hw_t tag_predecode_hw;
struct
hw_t tag_colmux_predecode_hw;
struct
hw_t tag_colmux_postdecode_hw;
struct
hw_t tag_outdrv_decode_hw;
struct
hw_t tag_outdrv_sig_hw;
double
tag_aspect_ratio;
double
tag_area;
double
tag_mem_all_area;
double
tag_subarray_all_area;
double
bank_area;
double
bank_efficiency;
double
bank_aspect_ratio;
struct
hw_t total_hw;
double
total_area;
double
total_efficiency;
double
total_aspect_ratio;
};