Part Number Hot Search : 
01456 2T256 23Z435SM H3310DN3 ST6317 ML7808A 2SB870 ZMM15
Product Description
Full Text Search
 

To Download CW400X Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
  minirisc? CW400X building blocks technical manual
ii this document contains data derived from functional simulations and perfor- mance estimates. lsi logic has not veri?ed either the functional descriptions, or the electrical and mechanical speci?cations using production parts. document db14-000022-00, first edition (september 1996) this document describes revision a of lsi logic corporations minirisc? CW400X building blocks and will remain the of?cial reference source for all revisions/releases of this product until rescinded by an update. to receive product literature, call us at 1-800-574-4286 (or 415-940-6877 outside the u.s. and canada) and ask for department jds; or visit us at http://www.lsilogic.com. lsi logic corporation reserves the right to make changes to any products herein at any time without notice. lsi logic does not assume any responsibility or lia- bility arising out of the application or use of any product described herein, except as expressly agreed to in writing by lsi logic; nor does the purchase or use of a product from lsi logic convey a license under any patent rights, copyrights, trademark rights, or any other of the intellectual property rights of lsi logic or third parties. copyright ? 1996 by lsi logic corporation. all rights reserved. lsi logic corporation claims copyright in all original works of authorship herein, including, without limitation, modi?cations to the mips instruction set and the instruction set as modi?ed. unauthorized use and/or copying thereof is expressly forbidden. trademark acknowledgment lsi logic logo design and coreware are registered trademarks and minirisc, minisim, and right-first-time are trademarks of lsi logic corporation. mips is a trademark of mips technologies, inc. verilog is a registered trademark of cadence design systems, inc. all other brand and product names may be trade- marks of their respective companies.
preface iii preface this book is the primary reference and technical manual for the minirisc? CW400X building blocks. it contains a complete functional and operational description of the building blocks. for layout guidelines, see chapter 8 of the minirisc CW400X microprocessor core technical manual . for complete physical and electrical speci?cations, see the minirisc CW400X building blocks technical manual speci?cations addendum . audience this document assumes that you have some familiarity with microproces- sors and related support devices. this book is written for: engineers and managers who are evaluating the processor for pos- sible use in a system engineers who are designing the processor into a system organization this document has the following chapters and appendixes: chapter 1, introduction chapter 2, adding a coprocessor chapter 3, multiply/divide unit (mdu) chapter 4, memory management unit (mmu) chapter 5, basic biu and cache controller (bbcc) chapter 6, adding or removing write buffers chapter 7, timer chapter 8, debugger (dbx)
iv preface related publications minirisc? CW400X building blocks technical manual speci?cations addendum , order no. c14031.a1 minirisc? CW400X microprocessor core technical manual , order no. c14030 minirisc? CW400X microprocessor core technical manual cw4001 and cw4002 speci?cations addendum , order no. c14030.ad1 minirisc? cw4001 microprocessor core preliminary datasheet , order no. c15001 minirisc? mr4001 microprocessor lead vehicle preliminary datasheet , order no. c15005 minirisc? mr4002 microprocessor reference device preliminary datasheet , order no. c15008 conventions used in this manual the ?rst time a word or phrase is de?ned in this manual, it is italicized. the following signal naming conventions are used throughout this manual: a level-signi?cant signal that is true or valid when the signal is low always has an overbar ( ) over its name. an edge-signi?cant signal that initiates actions on a high-to-low transition always has an overbar ( ) over its name. the word assert means to drive a signal true or active. the word deassert means to drive a signal false or inactive. hexadecimal numbers are indicated by the pre?x 0x before the numberfor example, 0x32cf. binary numbers are indicated by a sub- scripted 2 following the numberfor example, 0011.0010.1100.1111 2 .
contents v contents chapter 1 introduction 1.1 system overview 1-1 1.2 CW400X features summary 1-3 1.3 minirisc product family 1-3 1.4 coreware program 1-4 1.4.1 coreware building blocks 1-4 1.4.2 design environment 1-4 1.4.3 expert support 1-5 chapter 2 adding a coprocessor 2.1 overview 2-1 2.2 connection block diagram 2-3 2.3 signals 2-3 2.4 instructions 2-6 2.5 read/write transactions 2-7 2.5.1 CW400X register writes from coprocessor (cfcz, mfcz) 2-8 2.5.2 coprocessor register writes from memory (lwcz) 2-8 2.5.3 coprocessor register writes from CW400X (ctcz, mtcz) 2-9 2.5.4 memory writes from coprocessor to external memory (swcz) 2-10 2.6 condition bits 2-10 2.7 interrupt protocol 2-10 2.8 instruction cancellation 2-11 2.9 global instruction register module (gir) 2-11 chapter 3 multiply/divide unit (mdu) 3.1 overview 3-1
vi contents 3.2 architecture 3-3 3.3 connection block diagram 3-5 3.4 signals 3-5 3.5 instructions 3-8 3.6 operation 3-11 3.6.1 mult followed by mfhi/lo 3-11 3.6.2 div followed by a mflo 3-12 3.6.3 mult followed by madd/msub 3-13 3.6.4 div followed by madd/msub 3-13 3.6.5 destructive mdu instructions 3-14 3.6.6 effect of ckillxp on mdu operations 3-14 3.6.7 effect of cpipe_r unn on mdu operations 3-15 3.6.8 effect of bcpuresetn 3-17 chapter 4 memory management unit (mmu) 4.1 overview 4-1 4.2 function and operation 4-2 4.3 mmu modules 4-5 4.3.1 tlb 4-5 4.3.2 mmu stub 4-5 4.3.3 gir 4-6 4.4 signals 4-6 4.5 tlb registers 4-10 4.5.1 tlb exception processing registers 4-10 4.5.2 other tlb registers 4-14 4.6 address translation functional waveform 4-15 4.7 tlb exceptions 4-16 4.7.1 tlb miss exception 4-17 4.7.2 tlb modi?ed exception 4-18 4.7.3 utlb miss exception 4-18 4.8 differences from the r3000 mmu 4-20 4.8.1 writing and reading mmu registers 4-20 4.8.2 unique features of the minirisc mmu 4-20 4.9 operation peculiarities and details 4-21 4.9.1 mtc0 x2/if tlb miss peculiarities 4-21 4.9.2 exceptions between mtc0 entryhi/entrylo instructions 4-23 4.9.3 register consistency in the pipeline 4-23
contents vii 4.9.4 tlb initialization 4-24 chapter 5 basic biu and cache controller (bbcc) 5.1 overview 5-1 5.2 features 5-2 5.3 functional description 5-3 5.3.1 cache controller (cc) 5-4 5.3.2 queue controller (qc) 5-4 5.3.3 bbus controller (bc) 5-5 5.3.4 system con?guration module (bsys) 5-6 5.4 signals 5-8 5.5 interfaces 5-25 5.5.1 cbus 5-25 5.5.2 basic bus (bbus) 5-31 5.5.3 caches 5-47 5.5.4 on-chip memory (ocm) 5-60 5.5.5 write buffer 5-64 5.6 cache-miss penalty, bbus latency 5-65 5.7 adding cache 5-66 5.7.1 ram sizes 5-67 5.7.2 examples 5-69 5.7.3 adding smaller caches 5-79 5.8 bbus arbitration 5-80 5.9 timing considerations 5-83 5.9.1 cache data ram clocks 5-83 5.9.2 cache data ram address 5-85 5.9.3 tag match logic 5-86 chapter 6 adding or removing write buffers 6.1 overview 6-1 6.2 signals 6-2 6.3 basic operation of a write buffer 6-8 6.4 adding a write buffer 6-8 6.4.1 connect the inputs 6-10 6.4.2 connect the outputs 6-10 6.5 removing a write buffer 6-10
viii contents chapter 7 timer 7.1 overview 7-1 7.2 features 7-1 7.3 functional description 7-2 7.4 signals 7-3 7.5 registers 7-7 7.6 operation 7-9 7.6.1 reset 7-9 7.6.2 bus control (request/grant) 7-9 7.6.3 external logic half-speed mode 7-9 7.6.4 timer 0 7-10 7.6.5 timer 1 7-13 chapter 8 debugger (dbx) 8.1 overview 8-1 8.2 functional description 8-2 8.3 connection block diagram 8-3 8.4 signals 8-4 8.5 registers 8-8 8.5.1 dcs register (7) 8-8 8.5.2 bpc register (18) 8-10 8.5.3 bda register (19) 8-10 8.5.4 bpcm register (20) 8-11 8.5.5 bdam register (21) 8-11 8.6 instructions 8-12 8.6.1 mfd instruction 8-12 8.6.2 mtd instruction 8-12 8.7 operation 8-13 8.7.1 breakpoints 8-13 8.7.2 dbx module operation 8-15 8.7.3 clock synchronization 8-19 customer feedback
contents ix figures 1.1 CW400X in a typical system 1-2 2.1 typical pipeline flow 2-2 2.2 coprocessor in a CW400X system 2-3 2.3 cfcz, mfcz waveforms 2-8 2.4 load scheduling lwcz waveforms 2-9 2.5 non-load scheduling lwcz waveforms 2-9 2.6 mtcz/ctcz waveforms 2-9 2.7 coprocessor sending interrupt 2-11 2.8 instruction cancellation 2-11 2.9 global instruction register logic 2-12 2.10 instruction grabbing 2-12 3.1 mdu architecture 3-3 3.2 attaching the mdu 3-5 3.3 mfhi, mflo 3-8 3.4 mthi, mtlo 3-8 3.5 mult(u), div(u), madd(u), msub(u) 3-8 3.6 mult followed by mfhi/lo 3-11 3.7 div followed by mflo 3-12 3.8 mult followed by madd/msub 3-13 3.9 effect of ckillxp on a mult operation 3-14 3.10 effect of ckillxp on a mflo/hi operation 3-15 3.11 effect of cpipe_r unn on a mflo/hi operation 3-16 4.1 virtual to physical address mapping 4-2 4.2 cam entry matching virtual address 4-3 4.3 tlb miss exception conditions 4-4 4.4 index register 4-11 4.5 random register 4-13 4.6 context register 4-13 4.7 bad virtual address register 4-14 4.8 tlb entryhi register 4-14 4.9 tlb entrylo register 4-15 4.10 mmu address translation 4-16 4.11 mtc0 inconsistency pipeline 4-21 4.12 wb exception followed by an if exception 4-23 5.1 CW400X system with the bbcc 5-2 5.2 bbcc internal block diagram 5-3 5.3 system con?guration register 5-6
x contents 5.4 cbus transactions 5-27 5.5 bus error during instruction fetch 5-29 5.6 bus error during data transaction 5-30 5.7 bbus transactions 5-32 5.8 block fetch with four-word block size 5-35 5.9 series of burst writes 5-37 5.10 examples of bus arbitration 5-39 5.11 hardware cache test transactions 5-41 5.12 i-cache and d-cache snooping 5-43 5.13 default driver logic 5-46 5.14 default driver logic in system 5-47 5.15 i-cache set 0/d-cache data ram 5-48 5.16 i-cache set 0/d-cache tag ram 5-48 5.17 i-cache set 1 data ram 5-48 5.18 i-cache set 1 tag ram 5-48 5.19 normal i-cache transactions 5-50 5.20 d-cache transactions 5-53 5.21 tag ram read data 5-56 5.22 normal bbus cache transactions 5-58 5.23 ocm transactions 5-62 5.24 cache-miss penalty, bbus latency 5-65 5.25 cache rams for a system 5-67 5.26 example 2 ram con?guration 5-72 5.27 example 3 ram con?guration 5-75 5.28 example 4 ram con?guration 5-77 5.29 block diagram of example arbiter 5-80 5.30 example bbus arbiter state diagram 5-81 5.31 conceptual system diagram 5-83 5.32 writes to the d-cache/i-cache set 0 data ram 5-84 5.33 ram transactions (clock with a 50% duty cycle) 5-85 5.34 ram transactions (clock with a 30% duty cycle) 5-86 6.1 typical write buffer con?guration 6-1 6.2 write buffer connection diagram 6-9 7.1 CW400X system with the timer 7-1 7.2 timer internal block diagram 7-2 7.3 mode register 7-7 7.4 interrupt status register 7-8 7.5 half-speed mode 7-10
contents xi 7.6 timer 0 enabled, read, and disabled 7-11 7.7 timer 0 output 7-13 7.8 timer 1 enabled, read, and disabled 7-15 7.9 timer 1 watch dog mode triggers berr 7-16 8.1 dbx interface to the CW400X and building blocks 8-2 8.2 dbx in a CW400X system 8-3 8.3 dcs register 8-8 8.4 bpc register 8-10 8.5 bda register 8-10 8.6 bpcm register 8-11 8.7 bdam register 8-11 8.8 mfd instruction 8-12 8.9 mtd instruction 8-12 8.10 dbx internal block diagram 8-16 tables 2.1 coprocessor input signals summary 2-4 2.2 coprocessor output signals summary 2-4 2.3 coprocessor bidirectional signals summary 2-4 2.4 coprocessor instructions 2-7 3.1 mdu input signals summary 3-6 3.2 mdu output signals summary 3-6 3.3 mdu instructions 3-9 3.4 multiply/divide instruction summary 3-10 3.5 execution time of multiply/divide instructions using mdu 3-11 4.1 mmu input signals summary 4-6 4.2 mmu output signals summary 4-7 4.3 mmu bidirectional signals summary 4-7 4.4 tlb exception processing register addresses 4-10 5.1 bbcc input signals summary 5-8 5.2 bbcc output signals summary 5-10 5.3 bbcc bidirectional signals summary 5-13 5.4 transaction type signal decoding 5-25 5.5 transaction type signal decoding simpli?ed 5-25 5.6 example data output enable logic 5-45 5.7 system con?guration register settings for software cache test mode 5-55 5.8 direct mapped i-cache 5-67
xii contents 5.9 two-way set associative i-cache 5-68 5.10 d-cache 5-68 5.11 direct-mapped i-cache with d-cache 5-68 5.12 two-way set associative i-cache with d-cache 5-69 5.13 d-cache/i-cache set 0 data ram 5-70 5.14 d-cache/i-cache set 0 tag ram 5-70 5.15 i-cache set 1 data ram 5-71 5.16 i-cache set 1 tag ram 5-71 5.17 d-cache/i-cache set 0 data ram 5-73 5.18 d-cache/i-cache set 0 tag ram 5-73 5.19 i-cache set 1 data ram 5-74 5.20 i-cache set 1 tag ram 5-74 5.21 d-cache/i-cache data ram 5-75 5.22 d-cache/i-cache set 0 tag ram 5-76 5.23 cache data ram 5-78 5.24 d-cache/i-cache set 0 tag ram 5-78 6.1 write buffer input signals summary 6-2 6.2 write buffer output signals summary 6-3 6.3 write buffer operation for reads and writes 6-8 7.1 timer input signals summary 7-3 7.2 timer output signals summary 7-4 7.3 bbcc bidirectional signals summary 7-4 7.4 timer register addresses 7-7 8.1 dbx input signals summary 8-4 8.2 dbx output signals summary 8-5 8.3 register selection 8-17
1-1 chapter 1 introduction this chapter introduces lsi logics minirisc CW400X building blocks. chapter 2 describes how to attach a coprocessor to the CW400X core, while chapters 3 through 8 describe the building blocks in detail. for building block methodologies and layout guidelines, see chapter 8 of the minirisc CW400X microprocessor core technical manual . this chapter contains the following sections: section 1.1, system overview, page 1-1 section 1.2, CW400X features summary, page 1-3 section 1.3, minirisc product family, page 1-3 section 1.4, coreware program, page 1-4 1.1 system overview the minirisc CW400X microprocessor core family, components of the lsi logic coreware ? library, are exceptionally compact, high- performance microprocessors compatible with the mips r4000 mips-ii instruction set (for details see chapter 4 of the minirisc CW400X micro- processor core technical manual ). the CW400X can be easily designed into a wide range of products. the CW400X can be combined with indus- try standard cores and proprietary functional building blocks to create a completely customized embedded system on a chip. lsi logic currently provides the following optional building blocks: multiply/divide unit (mdu) memory management unit (mmu) basic bus interface unit and cache controller (bbcc) write buffers timer debugger (dbx)
1-2 introduction system designers can use these building blocks (unmodi?ed or modi?ed) and/or add their own customized logic to the CW400X core. lsi logic also provides the following external modules (for more informa- tion, see chapter 6 of the minirisc CW400X microprocessor core tech- nical manual ): global output enable module (goe) mmu stub (to be used if there is no mmu in the system) the CW400X has been optimized for low-power and cost-sensitive appli- cations such as portable telecommunications, digital cameras, and con- sumer multimedia systems. the CW400Xs flexlink interface allows customer-speci?c microproces- sor instructions. the core implements a simple three-stage pipeline and provides a single cache/memory interface for both instructions and data. the core implements full scan to achieve greater than 99% fault coverage. figure 1.1 shows how the CW400X microprocessor core interfaces with system logic in a typical customer design. figure 1.1 CW400X in a typical system CW400X mmu or coprocessor cbus ram/rom cache dram dma timer biu and cache controller (bbcc) write buffer(s) mdu bbus cbus interface controller controller flexlink interface goe mmu stub md96.171
CW400X features summary 1-3 1.2 CW400X features summary the CW400X has the following features: mips-ii instruction set compatible con?gurable, compact, modular design and uni?ed bus architecture simple three-stage pipeline: fetch, execute, and writeback waiti (wait for interrupt) instruction for power savings powerful flexlink interface allows customer-speci?c microprocessor instructions high-performance coprocessor interface for user-de?nable copro- cessors and high-performance hardware floating point unit 32-bit memory and cache interfaces optional building blocks: timer, mmu, mdu, bbcc 3.3-volt operation implementation of full scan to achieve 99% fault coverage 85-mhz worst case commercial maximum clock rate using high- performance 0.35-micron process 85 mips peak, 64 mips sustained with standard compiled mips code at 85 mhz performance and software development, vhdl, verilog, and gate- level, timing-accurate models available compatible with the full range of mips, third party software develop- ment, and system veri?cation environment tools fully testable in embedded asic designs mr4001 lead vehicle chip available with cache, mmu, and mdu 1.3 minirisc product family the minirisc product family has all the necessary tools to develop a system on a chip, including: lsi logics minisim? architectural simulator verilog and vhdl models a system veri?cation environment a prom monitor third party software support a core bond-out chip for emulation
1-4 introduction 1.4 coreware program through the coreware program, lsi logic lets customers combine the CW400X microprocessor core with other cores on a single chip to create products uniquely suited to specific applications. this approach C combining high-performance building blocks, sophisticated design software, and expert support C provides unparalleled design flexibility and lets designers create high-quality, leading-edge products for a wide range of markets. the coreware program consists of three main elements: a library of cores, a design development and simulation package, and expert appli- cations support. the coreware library contains a wide range of complex cores based on accepted and emerging industry standards, including high-speed interconnection, digital video, dsp, and others. lsi logic pro- vides a complete framework for device and system development and simulation. lsi logics advanced asic technologies consistently produce right-first-time? silicon. lsi logics in-house experts provide design support from system architecture de?nition through chip layout and test vector generation. 1.4.1 coreware building blocks the coreware building blocks include elements based on the lsi logic high-performance standard products as well as other, industry-standard products. the coreware building blocks, which include embedded minirisc mips processors, bus interface controllers, and a family of ?oating-point processors, are fully supported library elements for use in the lsi logic hardware development environment. the building blocks include gate-level simulation models with timing information, so design- ers can accurately simulate device performance and trade off various implementation options. in addition to gate-level simulation models, the building blocks also include behavioral simulation models. 1.4.2 design environment the new asic families are supported by lsi logics comprehensive system-on-a-chip design methodology. this design methodology uses both internally developed and industry-standard tools integrated with the lsi toolkit. lsi toolkit is a system of software and libraries that lets engineers use third-party software to access lsi logic's technology. designers can select from a suite of industry standard simulators, syn- thesizers, timing analyzers, and test tools that are seamlessly integrated into a common environment for veri?cation and sign-off.
coreware program 1-5 1.4.3 expert support lsi logics in-house experts support the coreware program with high- level design experience in a wide variety of application areas. these experts provide design support from system architecture de?nition through chip layout and test vector generation. they help determine how many functions to integrate on a single chip, trading off functionality ver- sus cost to ?nd the most cost-effective solution.
1-6 introduction
2-1 chapter 2 adding a coprocessor this chapter explains how to integrate a customer-de?ned coprocessor with the minirisc CW400X microprocessor core. an example of a customer-de?ned coprocessor might be a floating point unit or graphics processor. this chapter contains the following sections: section 2.1, overview, page 2-1 section 2.2, connection block diagram, page 2-3 section 2.3, signals, page 2-3 section 2.4, instructions, page 2-6 section 2.5, read/write transactions, page 2-7 section 2.6, condition bits, page 2-10 section 2.7, interrupt protocol, page 2-10 section 2.8, instruction cancellation, page 2-11 section 2.9, global instruction register module (gir), page 2-11 2.1 overview a coprocessor is a user-de?ned, external sub-module to the CW400X. since a coprocessor is not a stand-alone unit, it can either pass data through the CW400X, or pass it directly to, and from, external memory. a CW400X coprocessor can: read the data bus for instructions read/write to external memory (load/store) read/write to CW400X registers (mtcz/mfcz) stall and interrupt the CW400X
2-2 adding a coprocessor the system designer must adhere to four guidelines to correctly imple- ment an interface between the CW400X microprocessor core and a coprocessor. 1. the coprocessor must be compatible with the CW400X three-stage pipeline. figure 2.1 shows a typical pipeline ?ow of instruction fetches and data transactions. for more information, see section 2.3, pipeline architecture, of the minirisc CW400X microprocessor core technical manual . the coprocessor should execute instructions in lock-step with the CW400X (whatever instructions are in the instruction fetch (if)/execute (x)/writeback (wb) stages in a spe- ci?c cycle in the CW400X should be the same ones in the coproces- sors if/x/wb stages). figure 2.1 typical pipeline flow 2. the system designer must implement a de?ned set of signals between the CW400X and the coprocessor. these signals allow the coprocessor to perform the four transaction types listed above. 3. the coprocessor must conform to speci?c signal protocol when per- forming the four transactions listed above. 4. the user must include two modules, in addition to the coprocessor, as part of the CW400X-coprocessor interface: C goe (global output enable module), which is external to the coprocessor C gir (global instruction register module), which is internal to the coprocessor for more information on the goe, see chapter 6 of the minirisc CW400X microprocessor core technical manual . for more information on the gir, see section 2.9, global instruction register module (gir). if x1 x2 wb if x1 wb if wb x1 2. load / store / mtcz / mfcz 1. add example instruction 3. ori md96.172
connection block diagram 2-3 2.2 connection block diagram figure 2.2 shows how the coprocessor connects to the CW400X micro- processor core and a biu. notice the required interface modules, the goe and the gir, are included. lsi logic recommends that the gir be integrated into the coprocessor, and that the goe remain external to the coprocessor. figure 2.2 coprocessor in a CW400X system 2.3 signals this section describes the signals that comprise the bit-level interface of a coprocessor. tables 2.1 through 2.3 summarize the coprocessor sig- nals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the grun_outxp 3 ckillxp bintpz 1 copexistpx 2 bcpcondp[3:0] addrp[31:0] cop_r unn copxoen birdyp bresetn bdrdyp pclkp cip_dn gir coprocessor x CW400X goe biu se si so 1. z = 3, 4, or 5. coprocessor 1 connects to bintp3, coprocessor 2 connects to bintp4, coprocessor 3 connects to bintp5. 2. x=1,2or3. 3. x=1or2. md96.173 datap[31:0]
2-4 adding a coprocessor mnemonics for active low signals end with an n and have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. table 2.1 coprocessor input signals summary input source description addrp[31:0] CW400X address bus bresetn biu coprocessor reset bdrdyp biu load word ready - data on data bus birdyp biu instruction ready cip_dn CW400X CW400X instruction/data indication ckillxp CW400X cancel instruction in execute stage copxoen goe coprocessor enable (x =1, 2, or 3) cop_r unn goe coprocessor run 1 pclkp system logic system clock se system logic scan enable si scan chain scan data in 1. since the goe does not have a cop_r unn signal, the user must connect cop_r unn to either cr un_inn, br un_inn, or mr un_inn. lsi logic rec- ommends using the least loaded of the three. table 2.2 coprocessor output signals summary output destination description bcpcondp[3:0] CW400X coprocessor condition bintpz CW400X interrupts 1 (z=0,1,2,3,4,or5) copexistpx goe coprocessor exists (x = 1, 2, or 3) grun_outxp goe coprocessor run (x = 1 or 2) so scan chain scan data out 1. coprocessor 1 connects to bintp3, coprocessor 2 connects to bintp4, coprocessor 3 connects to bintp5 to maintain mips compatibility. table 2.3 coprocessor bidirectional signals summary bidirectional connect description datap[31:0] CW400X data bus
signals 2-5 addrp[31:0] address bus input the CW400X drives these signals with the instruction or data address. bcpcondp[3:0] coprocessor condition output these signals inform the CW400X of the corresponding coprocessor condition. bcpcondp[3:0] correspond to coprocessors 3, 2, 1, 0. the CW400X tests these signals during the execute stage of bczf, bczfl, bczt, and bcztl instructions. bdrdyp load word ready - data on data bus input the biu asserts this signal to inform the coprocessor that datap[31:0] contains valid data for a data fetch. bintpz interrupt (z = 0, 1, 2, 3, 4, or 5) output the coprocessor asserts this signal to cause the CW400X to take an interrupt exception when interrupts are enabled. to maintain mips compatibility, coproces- sor 1 connects to bintp3, coprocessor 2 connects to bintp4, coprocessor 3 connects to bintp5. birdyp instruction ready input the biu asserts this signal to inform the coprocessor that datap[31:0] contains valid data for an instruction fetch. bresetn coprocessor reset input the biu asserts this signal to reset the coprocessor. (if the bbcc is present, the bbcc bcpuresetn output drives this input.) cip_dn CW400X instruction/data indication input this signal quali?es the type of memory fetch. the CW400X drives this signal high to indicate that it is performing an instruction fetch. the CW400X drives this signal low to indicate that it is performing a data fetch. ckillxp cancel instruction in execute stage input after the CW400X kills an instruction in the execute stage, it asserts this signal to cause the coprocessor to also kill the instruction in the execute stage. copexistpx coprocessor exists (x = 1, 2, or 3) output the coprocessor asserts this signal to inform the goe that it is coprocessor x.
2-6 adding a coprocessor cop_r unn coprocessor run input the goe asserts this signal to inform the coprocessor that it can continue running. the goe deasserts this sig- nal to stall the coprocessor. copxoen coprocessor enable (x = 1, 2, or 3) input this signal from the goe indicates which coprocessor (1, 2, or 3) drives the data bus. datap[31:0] data bus bidirectional these signals transfer data to, and from, the CW400X. grun_outxp coprocessor run (x = 1 or 2) output this output is an input to the goe. deasserting this sig- nal low stalls the CW400X. pclkp system clock input this signal is the global clock input. se scan enable input asserting this signal enables the scan chain. si scan data in input this signal is the scan data input. so scan data out output this signal is the scan data output. 2.4 instructions table 2.4 summarizes the prede?ned instructions that the CW400X sup- ports for coprocessors 1 through 3. for more detailed descriptions, see section 4.10, coprocessor instructions, of the minirisc CW400X microprocessor core technical manual . the user can de?ne new coprocessor instructions, as long as they adhere to the opcode bit encoding de?ned in chapter 4 of the minirisc CW400X microprocessor core technical manual .
read/write transactions 2-7 table 2.4 coprocessor instructions 2.5 read/write transactions this section explains how the CW400X sends and receives data to, and from, an attached coprocessor. it also provides a functional description and waveform for each coprocessor transaction type. the coprocessor must decode its own instructions off the data bus, datap[31:0], and know when to read, and write, datap[31:0]. there are several guidelines to observe, and several signals to monitor, when performing these transactions. control signals are valid during run cycles ( cop_r unn asserted), and pipe stages are extended by stalls. the goe asserts the run/stall signal, cop_r unn, for every bus cycle, including the ?rst cycle of the x2 stage. (for a detailed description of the pipeline stages and run/stall cycles, see chapter 2 of the minirisc CW400X microprocessor core technical manual .) the coprocessor should also monitor the ckillxp signal to determine when to invalidate an instruction. the coprocessor should use ckillxp to prevent writes (mtcz/ctcz/lwcz) to coprocessor registers, to pre- vent altering the coprocessor state. using ckillxp properly in load invalidation is crucial to maintaining coprocessor integrity. if ckillxp is asserted in either the x1 or x2 stage, the load is not scheduled, and is, thus, invalidated. if the ckillxp occurs in the x1 stage, the x2 stage will not occur. however, once the load passes the x2 stage, it has been scheduled and cannot be invalidated. instruction description bczt branch on coprocessor z true 1 1. z = 1, 2, or 3. bczf branch on coprocessor z false bcztl branch on coprocessor z true likely bczfl branch on coprocessor z false likely ctcz move control to coprocessor z cfcz move control from coprocessor z lwcz load word to coprocessor z mtcz move to coprocessor z mfcz move from coprocessor z swcz store word from coprocessor z
2-8 adding a coprocessor 2.5.1 CW400X register writes from coprocessor (cfcz, mfcz) the data for cfcz or mfcz should be available on the data bus during the second execute (x2) stage of the instruction whenever copxoen is asserted. if the data is not ready to be driven onto the data bus, the coprocessor should stall the CW400X by deasserting grun_outxp. when the coprocessor does drive the data bus ( copxoen asserted), the data should be valid on the rising edge of the clock. this cycle (x2 stage) is a CW400X register write, so the CW400X latches the data from the data bus. the coprocessor must monitor the appropriate copxoen to know when to drive the data bus. figure 2.3 cfcz, mfcz waveforms 2.5.2 coprocessor register writes from memory (lwcz) the CW400X coprocessor interface supports two mechanisms for reads from external memory (loads): load scheduling and non-load scheduling. the CW400X executes instructions while it is waiting for the load data from external memory, until a data dependency occurs, then it stalls. the coprocessor should emulate this behavior if it supports load scheduling, that is, it should continue to execute instructions and stall only if there is a data dependency. assertion of bdrdyp informs the coprocessor that the lwcz has fetched the data, and the data is ready on the data bus. the coprocessor should immediately write the data on the bus back to the coprocessor register file to preserve the pipeline. if the coprocessor does not support load-scheduling, it should signal a stall in the execute stage by deasserting grun_outxp in the x2 stage. the CW400X waits until the load data is ready (the biu asserts bdrdyp and the coprocessor asserts grun_outxp) before resuming execution. when bdrdyp is asserted, the coprocessor should latch the load data from the data bus and write it in to its register ?le in the next run cycle (wb). pclkp datap[31:0] cop_r unn x2 1 x1 1 if 1 wb 1 x1 2 x2 2 x2 2 x2 2 if 2 copxoen wb 2 x2 biu cop cop biu cop 1. mfcz and cfcz transaction with no stall in execute stage. 2. mfcz and cfcz transaction with stall in execute stage. md96.174
read/write transactions 2-9 figure 2.4 shows load scheduling lwcz waveforms. figure 2.5 shows non-load schedulin g lwcz waveforms. figure 2.4 load scheduling lwcz waveforms figure 2.5 non-load scheduling lwcz waveforms 2.5.3 coprocessor register writes from CW400X (ctcz, mtcz) the data for mtcz or ctcz is always ready on the data bus during the last cycle of the instructions x2 stage. thus, the data latched in the ?nal cycle of the x2 stage is the valid data. the data asserted in the previous cycles in the x2 stage should be disregarded. if the coprocessor requires more than one cycle to latch the data, or is not ready to latch the data, it should deassert grun_outxp to stall the CW400X. figure 2.6 shows mtcz/ctcz waveforms. figure 2.6 mtcz/ctcz waveforms pclkp bdrdyp datap[31:0] data cop_r unn cache miss scheduled load x2 wb x1 data md96.175 pclkp x2 x2 x2 x2 x2 x2 x1 wb grun_outp cop_r unn datap[31:0] bdrdyp data md96.176 pclkp datap[31:0] biu data x2 x2 x2 x2 x2 x1 wb cip_dn cop_r unn md96.177 cop data
2-10 adding a coprocessor 2.5.4 memory writes from coprocessor to external memory (swcz) the data for the write to memory should be ready in the x2 stage of the store (swcz). if it is not, the coprocessor should stall the pipeline by deasserting grun_outxp. the coprocessor should always drive the data bus with the data for the store when the appropriate copxoen is asserted during the x2 stage. see figure 2.3 for the waveform. substitute swcz for mtcz/ctcz as the instruction being executed. 2.6 condition bits the CW400X supports the branch on coprocessor condition for copro- cessors 0 through 3. the instructions bczt/l and bczf/l branch to coprocessor z depending on the state of the condition bit. the copro- cessor can set these bits in any way. the coprocessor usability bit must be set in the CW400X status register for the coprocessor instructions relating to the condition bits to function properly. 2.7 interrupt protocol the coprocessor can request an interrupt by asserting one of the bintp[5:0] inputs to the CW400X, causing an interrupt exception. the interrupt from the coprocessor should be asserted until the CW400X writes a clear interrupt value to the coprocessors control register. even though asserting one of the bintp[5:0] inputs for one run cycle causes the CW400X to take an exception, to be compatible with the mips architecture, the coprocessor must hold the interrupt high until it is cleared by the CW400X. the coprocessor should signal the interrupt to the CW400X in the exe- cute stage of the instruction causing the interrupt. the CW400X asserts ckillxp in the same run cycle, informing the coprocessor to cancel the instruction in the execute stage. figure 2.7 shows a waveform of a coprocessor sending an interrupt. note that ckillxp is asserted during the x stage of two instructions. the ?rst instruction is the one that caused the interrupt. the second is the one following the interrupt. for more details regarding the instruction cancellation mechanism, see chapter 5, in the minirisc CW400X micro- processor core technical manual .
instruction cancellation 2-11 figure 2.7 coprocessor sending interrupt 2.8 instruction cancellation when an exception occurs, the CW400X indicates to the coprocessor which instructions to cancel by asserting ckillxp at the appropriate time. the duration of the instruction cancellation signal (ckillxp) depends on when the exception was signalled (in the if, x, or wb stage). the example in figure 2.8 shows an exception signalled in the execute stage of the exceptional instruction. the CW400X asserts ckillxp for two run cycles to cancel the execute stage of the exceptional instruction and the following instruction in the pipeline, which is currently in the if stage. figure 2.8 instruction cancellation 2.9 global instruction register module (gir) all coprocessors are required to include a global instruction register (gir) as the interface between the coprocessor decode and the data bus (see figure 2.9 ). this module allows the coprocessor to latch the correct instruction off the data bus for a decode. a master-slave type circuit is used in the instruction bus latching. the instruction is latched in the mas- ter at the rising edge of the clock cycle when birdyp is high. the instruction is then latched into the slave and made available to the decode at the rising edge of the clock cycle in the next run cycle. figure 2.10 shows a typical transaction. the instruction is on the data bus at point a, and is latched in by the master ?ip-?op. it is gated by the cop_r unn pclkp bintpz 1 ckillxp x1 x1 x1 wb 1. z = 3, 4, or 5. coprocessor 1 connects to bintp3, coprocessor 2 connects to bintp4, coprocessor 3 connects to bintp5. md96.178 wb cop_r unn pclkp xx x x if ckillxp md96.179
2-12 adding a coprocessor slave only in the instructions x stage, point b. the slave gate logic is complicated because load scheduling must be taken into account (data from a scheduled load can be asserted on the bus at any time). figure 2.9 global instruction register logic figure 2.10 instruction grabbing cp dq cp dq birdyp 1 0 to decode gn dq pclkp cip_dn cop_r unn datap[31:0] gscan_enablep dq gn 32 md96.180 pclkp birdyp datap[31:0] cip_dn a b x stage if stage cop_r unn valid instr valid instr md96.181
3-1 chapter 3 multiply/divide unit (mdu) this chapter describes the minirisc multiply/divide unit (mdu) building block for the CW400X. this chapter contains the following sections: section 3.1, overview, page 3-1 section 3.2, architecture, page 3-3 section 3.3, connection block diagram, page 3-5 section 3.4, signals, page 3-5 section 3.5, instructions, page 3-8 section 3.6, operation, page 3-11 3.1 overview the multiply/divide unit (mdu) is a high-performance arithmetic engine, closely coupled to the minirisc CW400X microprocessor core through the flexlink interface. it supports the following arithmetic functions: 32-bit x 32-bit signed and unsigned integer multiplication, with 2-cycle latency 32-bit x 32-bit signed and unsigned integer multiplication- accumulation, with a 2-cycle latency and a throughput of one multiply-accumulate per cycle 32-bit signed and unsigned division, with a 34-cycle latency for the quotient and a 35-cycle latency for the remainder mfhi, mflo, mthi, mtlo instructions for moving data between the CW400X and the mdu the mdu generates hardware interlocks if the CW400X tries to read from the mdus hi or lo registers before it has written the results from any previous mdu operations to the hi and lo registers.
3-2 multiply/divide unit (mdu) multiplication is a full 32-bit by 32-bit operation. the mdu takes the oper- ands from crsp[31:0] and crtp[31:0], multiplies them, and writes the higher 32 bits of the result into the hi register, and the lower 32 bits into the lo register. it takes two cycles to complete a multiplication. multiplication-accumulation is similar to multiplication, except, instead of writing the result back to the hi/lo registers, the mdu adds (madd) or subtracts (msub) the multiplication result from the previous result in the hi/lo registers. a multiplication-accumulation takes two cycles to com- plete. the mdu supports one instruction per cycle throughput. for division, the mdu takes crsp[31:0] as the dividend and crtp[31:0] as the divisor. after performing the division, the mdu puts the quotient into the lo register, and the remainder into the hi register. the latency for the quotient is 34 cycles, and for the remainder it is 35 cycles. based on the latency and throughput of the various operations, the mdu does not generate a hardware interlock for the following pieces of code: mult(u) or madd(u) or msub(u) nop mfhi or mflo ------------------------------ mult(u) or madd(u) or msub(u) madd(u) or msub(u) 1 madd(u) or msub(u) nop mfhi or mflo ------------------------------ div(u) 33 nops mflo mfhi no over?ow exceptions occur for any multiplications, multiplication- accumulations, or divisions. the mdu has no hardware detection for division-by-zero, which takes 34/35 cycles to complete, and causes the hi/lo registers to contain unde?ned values. 1. note that madd or msub adds its result to that of the immediately preceding instruction even though the preceding mdu instruction has not completed when this madd or msub is executed.
architecture 3-3 3.2 architecture logically, the mdu consists of two parts: the control unit, and the data- path (which has a uni?ed multiply and divide unit). this con?guration allows the multiplier to share resources with the divider. figure 3.1 illus- trates the high-level structure of the mdu. figure 3.1 mdu architecture crsp[31:0] booth recoding and mult array 64-bit adder hi lo control set-up for the mult array crsp[31:0] crtp[31:0] flip-flop crtp[31:0] bypass axbusp[31:0] unit datapath hi[31:0] lo[31:0] logic1 flip-flop 64-bit adder crsp[31:0] 1. power-saving logic which blocks the operands for non-mdu instructions. md96.17
3-4 multiply/divide unit (mdu) the control unit decodes the instructions from cir_topp[5:0] and cir_botp[5:0], and asserts aselp if it decodes valid mdu instructions. it also generates interlocks between the mult/div/madd/msub and the mfhi/lo instructions if needed, and forwards the multiply/divide result to axbusp[31:0]. the multiply operation starts in the execute stage. to minimize power dissipation, the mdu loads operands off crsp[31:0] and crtp[31:0] only after decoding a valid mdu instruction. the multiply unit uses 3-bit booth recoding to encode the 32-bit multiplier. a multiply array sums par- tial products to produce a 64-bit sum, and a carry bit. in multiplication- accumulation, the result of the multiply operation is added to or sub- tracted from the previous result, contained in the hi/lo registers. the lower 32 bits of the result are loaded into the lo register, and the higher 32 bits are loaded into the hi register. the divide unit implements a nonrestoring algorithm, and the operation is set off in phase 1 of the execute stage. when the operation is com- pleted, the mdu loads the quotient into the lo register and the remain- der into the hi register. since it takes two cycles to complete a multiplication or a multiplication- accumulation, and 34/35 cycles to complete a division, the control unit stalls the CW400X (by asserting astallp), if there is an attempt to read the hi/lo registers before an outstanding operation is ?nished.
connection block diagram 3-5 3.3 connection block diagram figure 3.2 shows how to attach the mdu to the CW400X, the building blocks, and system logic. figure 3.2 attaching the mdu 3.4 signals this section describes the signals that comprise the bit-level interface of the mdu. tables 3.1 and 3.2 summarize the mdu signals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for active low signals end with an n and have an overbar over their names; the mnemonics for active high signals end in a p. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. pclkp aselp astallp axbusp[31:0] cir_botp[5:0] cir_topp[5:0] ckillxp crsp[31:0] crtp[31:0] crx_v alidn gscan_enablep gscan_inp system clock CW400X pclkp aselp astallp axbusp[31:0] cir_botp[5:0] cir_topp[5:0] ckillxp crsp[31:0] crtp[31:0] mdu bbcc goe bcpuresetn cpipe_r unn global scan enable scan chain test output from another module gscan_outp scan chain test input to another module md96.18 (unconnected output)
3-6 multiply/divide unit (mdu) note that the crx_v alidn CW400X flexlink interface signal does not connect to the mdu. this signal indicates when crsp[31:0] and crtp[31:0] are valid during stall cycles, and is intended for the imple- mentation of arithmetic operations that write results directly to the CW400X registers. since all the mdu operations write results back to the hi/lo registers, and an mfhi/lo instruction is needed to move results back to the CW400X registers, the mdu does not use crx_v alidn. table 3.1 mdu input signals summary table 3.2 mdu output signals summary aselp mdu select output the mdu asserts this signal to inform the CW400X that the current instruction is an mdu instruction. this pre- vents the CW400X from signalling a reserved instruction exception. astallp mdu stall request output the mdu asserts this signal to request a stall of the pipe- line. the mdu asserts astallp if it discovers any data input source de?nition bcpuresetn bbcc global reset cir_botp[5:0] CW400X instruction register bottom six bits cir_topp[5:0] CW400X instruction register top six bits ckillxp CW400X kill instruction in execute stage cpipe_r unn goe CW400X pipe run crsp[31:0] CW400X CW400X source register ( rs ) bus crtp[31:0] CW400X CW400X source register ( rt ) bus gscan_enablep system logic scan test mode enable gscan_inp system logic scan test input pclkp system logic system clock output destination de?nition aselp CW400X mdu select astallp CW400X mdu stall request axbusp[31:0] CW400X mdu result bus gscan_outp system logic scan test output
signals 3-7 dependencies that prevent it from executing the upcom- ing mdu instruction. the CW400X might override this stall if ckillxp is also asserted, because the CW400X kills the upcoming mdu instruction. axbusp[31:0] mdu result bus output the mdu puts the result from the hi or lo registers onto this bus. bcpuresetn global reset input asserting this signal resets the mdu, and kills any out- standing mdu instructions. the contents inside the hi and lo registers become unknown. cir_botp[5:0] bottom six bits of instruction register input these signals from the CW400X contain the bottom six bits of the instruction register. these signals allow the mdu to decode its own instructions. cir_topp[5:0] top six bits of instruction register input these signals from the CW400X contain the top six bits of the instruction register. these signals allow the mdu to decode its own instructions. ckillxp instruction killed in execute stage input the CW400X asserts this signal to request that the mdu kill the mdu instruction that is in the execute stage. cpipe_r unn CW400X pipeline run indicator input the goe module asserts this signal to inform the mdu that the core is in a pipeline run cycle. the goe deas- serts this signal to inform the mdu that the core is in a pipeline stall cycle. for more information on the goe, see the minirisc CW400X microprocessor core techni- cal manual . crsp[31:0] CW400X source register ( rs ) bus input these signals contain the rs operand of the current instruction from the CW400X.
3-8 multiply/divide unit (mdu) crtp[31:0] CW400X source register ( rt ) bus input these signals contain the rt operand of the current instruction from the CW400X. gscan_enablep scan test mode enable input asserting this signal enables scan testing. gscan_inp scan test input input another module drives this signal with the scan test input. gscan_outp scan test output output the mdu drives this signal with the scan test output. pclkp system clock input this signal is the global clock input. 3.5 instructions all mdu instructions are in one of the three formats shown in figures 3.3 through 3.5 . table 3.3 lists all of the mdu instructions and their corresponding opcode bits. table 3.4 summarizes and describes the multiply/divide instructions. figure 3.3 mfhi, mflo figure 3.4 mthi, mtlo figure 3.5 mult(u), div(u), madd(u), msub(u) 31 16 15 11 10 6 5 0 0000000000000000 2 rd address 00000 2 mdu opcode 31 26 25 21 20 6 5 0 000000 2 rs address 000000000000000 2 mdu opcode 31 26 25 21 20 16 15 6 5 0 000000 2 rs address rt address 0000000000 2 mdu opcode
instructions 3-9 table 3.3 mdu instructions instruction description opcode div divide signed numbers 011010 divu divide unsigned numbers 011011 madd 1 multiply, and add the result to hi/lo registers 011100 maddu 1 unsigned multiply, and add result to hi/lo registers 011101 mflo move from lo register 010010 mfhi move from hi register 010000 msub 1 multiply, then subtract the result from hi/lo registers 011110 msubu 1 unsigned multiply, then subtract the result from hi/lo registers 011111 mthi move to hi register 010001 mtlo move to lo register 010011 mult multiply signed numbers 011000 multu multiply unsigned numbers 011001 1. mr400x-speci?c instruction.
3-10 multiply/divide unit (mdu) table 3.4 multiply/divide instruction summary 1 instruction format and description multiply and add madd rs, rt multiplies the contents of registers rs and rt as twos complement values. adds the 64-bit result to special registers hi and lo 2 . multiply and add unsigned maddu rs, rt multiplies the contents of registers rs and rt as unsigned values. adds the 64-bit result to special registers hi and lo 2 . multiply and subtract msub rs, rt multiplies the contents of registers rs and rt as twos complement values. subtracts the 64-bit result from special registers hi and lo 2 . multiply and subtract unsigned msubu rs, rt multiplies the contents of registers rs and rt as unsigned values. subtracts the 64-bit result from special registers hi and lo 2 . multiply mult rs, rt multiplies the contents of registers rs and rt as twos complement values. stores the 64-bit result into special registers hi and lo 2 . multiply unsigned multu rs, rt multiplies the contents of registers rs and rt as unsigned values. stores the 64-bit results into special registers hi and lo 2 . divide div rs, rt divides the content of register rs by the content of register rt as twos complement values. stores the 32-bit quotient into special register lo, and the 32-bit remainder into special register hi. divide unsigned divu rs, rt divides the content of register rs by the content of register rt as unsigned values. stores the 32-bit quotient into special register lo, and the 32-bit remainder into special register hi. move from hi register mfhi rd moves the content of special register hi into register rd . move from lo register mflo rd moves the content of special register lo into register rd . move to hi register mthi rd moves the content of register rd into special register hi. move to lo register mtlo rd moves the content of register rd into special register lo. 1. these instructions require the addition of a multiply/divide unit. 2. the hi and lo registers are used as one 64-bit register.
operation 3-11 table 3.5 shows the execution time of multiply and divide instructions when using the mdu. table 3.5 execution time of multiply/divide instructions using mdu 3.6 operation this section explains mdu operation and contains waveforms for each operation. 3.6.1 mult followed by mfhi/lo figure 3.6 shows a mult followed by a mfhi/lo. figure 3.6 mult followed by mfhi/lo a. the mdu asserts aselp as soon as it decodes a valid instruction to prevent the CW400X from generating a reserved instruction exception. b. the mdu asserts astallp for interlock since the CW400X tries to read the result (mfhi/lo) before it is ready. c. both the hi and lo results are ready. the mdu writes them back into the hi/lo registers at this rising clock edge. at the same time, multiply operands number of cycles mult 2 madd/msub 1 1. with the results accumulating in the hi and lo reg- isters. the throughput is one instruction per cycle. 2 div 34 (quotient) 35 (remainder) cir_topp[5:0] astallp axbusp[31:0] aselp 2 13 crsp[31:0]/ crtp[31:0] cir_botp[5:0]/ pclkp mult mf hi/lo a b c source operands result md96.185
3-12 multiply/divide unit (mdu) the mdu forwards them to axbusp[31:0] so that the CW400X can latch the result back to its register ?le. 3.6.2 div followed by a mflo figure 3.7 shows a div followed by a mflo. figure 3.7 div followed by mflo a. the quotient result is ready, and the mdu writes it back into the lo register at this rising clock edge. at the same time, the mdu also forwards the quotient to axbusp[31:0] so that the CW400X can latch the result back to its register ?le. b. the remainder result is ready, and the mdu writes it back to the hi register at this rising clock edge. cir_topp[5:0] astallp axbusp[31:0] 1 aselp 2 1 345 crsp[31:0]/ crtp[31:0] cir_botp[5;0]/ pclkp 1. contains the lo result. div mflo quotient 32 cycles ab adjust cycle remainder adjust cycle source operands result md96.186 mflo
operation 3-13 3.6.3 mult followed by madd/msub figure 3.8 shows a mult followed by a madd/msub. figure 3.8 mult followed by madd/msub a. the mdu writes the result of mult back into the hi/lo registers. b. the mdu writes the result of madd back into the hi/lo registers, and the hi result is also forwarded to axbusp[31:0] at the same rising clock edge. note that sequences like the following do not require mdu stalls: mult madd msub madd madd the mdu accumulates each madd and msub multiplication result (added to, or subtracted from, the result of the preceding instruction) instead of overwriting it, even if the preceding instruction is not ?nished when the madd or msub is executed. 3.6.4 div followed by madd/msub if there are any unkilled madd or msub instructions issued during an outstanding div, the mdu does not generate any interlocks. the mdu preempts the div in the middle of its operation, executes the madd or msub, and then adds or subtracts the division result to whatever values the hi/lo registers contain when the div is preempted. therefore, the hi/lo results are unknown. cir_topp[5:0] astallp axbusp[31:0] aselp 2 1345 crsp[31:0]/ crtp[31:0] pclkp md96.4 mult madd mfhi mult 1 madd 1 ab result 1. operands for mult and madd.
3-14 multiply/divide unit (mdu) 3.6.5 destructive mdu instructions any mult, multu, div, divu, mthi, or mtlo kill outstanding mult(u), div(u), madd(u), or msub(u) instructions. therefore, for a sequence of instructions like the one below, mfhi gets the result of the mthi, so the content of the lo register is unde?ned. mult mult mthi mfhi the same happens to the hi register during an mtlo. 3.6.6 effect of ckillxp on mdu operations figure 3.9 shows the effect of ckillxp on a mult. the same waveform applies to madd, msub, and div. ckillxp kills the mult(2) instruction which is in its execute stage. mult(1) ?nishes gracefully, and the mflo gets the lo result of mult(1). the mdu does not generate any inter- lock stalls. figure 3.9 effect of ckillxp on a mult operation ckillxp cpipe_r unn astallp axbusp[31:0] 2 134 cir_topp[5:0] pclkp md96.5 mult(1 ) mult(2) mflo aselp result 1 1. lo result of mult(1).
operation 3-15 figure 3.10 shows the effect of ckillxp on a mfhi or mflo. figure 3.10 effect of ckillxp on a mflo/hi operation cycle 1: a mult or div is issued. cycle 2: a mflo or mfhi is issued. the assertion of ckillxp kills the mflo/hi; however, because of the critical timing, astallp remains asserted. the CW400X detects the assertion of ckillxp, and ignores astallp. cpipe_r unn does not go high even though astallp is asserted. this example assumes that there are no other stalls (for example, stalls ini- tiated by the CW400X itself or the coprocessors); otherwise, cpipe_r unn might still go high due to other stalls. cycle 3: the CW400X moves on to the next instruction, which is not an mdu instruction; thus, astallp and aselp are both deasserted. 3.6.7 effect of cpipe_r unn on mdu operations figure 3.11 shows the effect of cpipe_r unn on an mdu operation. ckillxp kills the mult(2) instruction, which is in its execute stage. mult(1) ?nishes correctly, and the mflo gets the lo result of mult(1). the mdu does not generate interlock stalls. ckillxp cpipe_r unn astallp axbusp[31:0] 2 13 cir_topp[5:0] pclkp md96.6 aselp 1. non-mdu instruction. mult/div mflo/hi instruction 1
3-16 multiply/divide unit (mdu) figure 3.11 effect of cpipe_r unn on a mflo/hi operation cycle 1: a mult or div is issued. cycle 2: another mult or div is issued. cpipe_r unn goes high by the end of cycle 2. this extends the execute stage of the second mult or div, and prevents the instruction from being executed. cycle n: cpipe_r unn goes low, and the mult/div which has been held in its execute stage, proceeds again. however, ckillxp is asserted by the end of the cycle, which kills this second mult or div. the hi/lo registers still hold the results of the ?rst mult or div. in conclusion, both ckillxp and cpipe_r unn have effects only on the mdu instruction in its execute stage. cpipe_r unn can extend an instructions execute stage, and ckillxp can kill an instruction in its execute stage if it is asserted in the last execute cycle of that instruc- tion. after an instruction has passed beyond its execute stage, neither input signals have any effect on the operation of that instruction. for example, if ckillxp or cpipe_r unn goes high in the twentieth cycle of a div operation, the div keeps going without stalling or being killed. ckillxp cpipe_r unn crsp[31:0]/ astallp 2 1n cir_topp[5:0] pclkp md96.7 aselp mult/div crtp[31:0] mflo/hi mflo/hi source operands
operation 3-17 3.6.8 effect of bcpuresetn the mdu acknowledges the assertion of bcpuresetn within one clock cycle. however, since the CW400X requires a minimum of two clock cycles to reset correctly, lsi logic recommends asserting bcpure- setn for at least two clock cycles. asserting bcpuresetn: causes outstanding mdu instructions to stall (not ?nish) resets the state machine for the divide operation deasserts astallp if it has been asserted causes the contents in hi/lo registers to be unde?ned
3-18 multiply/divide unit (mdu)
4-1 chapter 4 memory management unit (mmu) this chapter describes the memory management unit (mmu) building block for the CW400X. for information on the mmu stub, the logic required if the system has no mmu, see chapter 6 of the minirisc CW400X microprocessor core technical manual . this chapter contains the following sections: section 4.1, overview, page 4-1 section 4.2, function and operation, page 4-2 section 4.3, mmu modules, page 4-5 section 4.4, signals, page 4-6 section 4.5, tlb registers, page 4-10 section 4.6, address translation functional waveform, page 4-15 section 4.7, tlb exceptions, page 4-16 section 4.8, differences from the r3000 mmu, page 4-20 section 4.9, operation peculiarities and details, page 4-21 4.1 overview the minirisc memory management unit (mmu) translates virtual addresses from the minirisc CW400X core into physical addresses. the mmu is mips-compatible and similar in design and function to the mips r3000 mmu. the eight-entry fully-associative translation lookaside buffer (tlb) is a cache of page table entries (pte) for the operating system. the page table contains the information for mapping processes from virtual to physical addresses. caching ptes minimizes the size of the page table structure implemented in hardware. the tlb is implemented as an 8-entry, 20-bit data content-addressable memory (cam) coupled with a 23-bit ram array custom-designed for the mmu.
4-2 memory management unit (mmu) 4.2 function and operation the mmu takes virtual addresses from the minirisc CW400X core and translates them to physical addresses. figure 4.1 illustrates how the mmu performs this translation by replacing the 20 most signi?cant bits (bits [31:12]) of the virtual address (called the virtual page number, or vpn) with 20 bits from the page table (called the page table entry, or pte). the mmu reads the pte from the tlb. figure 4.1 virtual to physical address mapping before the pte can replace the vpn, the mmu uses the tlb virtual page number cam (tlb, vpn, cam) to determine which of the ptes stored in the tlb to use. the tlb is a 8 x 20 array of cam cells that can be read, and written, and used in match cycles. in a match cycle, the CW400X passes a vpn from the address bus, addrp[19:0], to the cam array, which activates a match line for the entry in the tlb that matches. once this match line is driven, the cam notes a match. the match line drives the word line in the adjoinin g8x23pte ram, which in turn, drives the 23-bit pte ram word out of the ram array and onto the mmu address bus, maddroutp[19:0]. figure 4.2 shows an example where the address in cam entry 1 matches the virtual address. the mmu drives ram entry 1 as the phys- ical address. 31 31 12 12 0 0 vpn pte offset offset physical address virtual address md96.187 11 11
function and operation 4-3 figure 4.2 cam entry matching virtual address the 20-bit pte/vpn size allows for a 12-bit offset ?eld, which allows for up to 4 kbytes of cache with a physical address. the 23-bit word consists of the pte, the valid bit, the dirty bit and the no-cache bit. the valid bit set to one indicates that the tlb entry is valid. the dirty bit set to one indicates that the page is writable. the no-cache bit set to one indi- cates that the entry is not cacheable. in addition to the soft-mapping made possible by the tlb, the mmu per- forms hard-mapping. to perform hard mapping, the mmu implements the r3000 kseg0/kseg1 memory map. the CW400X generates addresses that are within kseg0 or kseg1 . the mmu maps these addresses imme- diately to the lowest 512 mbytes of physical space, not actually using the tlb for this map function. if the CW400X address is in kseg1 , the mmu signals that the access is not to be cached. for further details, see the mmu stub section in chapter 6 of the minirisc CW400X microprocessor core technical manual . under the following conditions, the mmu does not softmap the vpn to a pte: the mmuenp input is inactive, which disables the mmu. the two msbs of the address are 10 2 , which indicates operation in kseg1 or kseg0 (de?ned in the mips architecture to be unmapped segments in kernel space). these segments are physically mapped to the address where the two msbs are 00 2 by the mmu stub. the current cycle is not a bus run cycle. cam 0 1 2 3 4 5 6 7 ram 0 1 2 3 4 5 6 7 md96.188
4-4 memory management unit (mmu) there are several conditions in which, although an exception has been generated, the mmu performs a translation on the available data. the exception handler software should prevent any erroneous memory accesses based on this falsely translated address. these conditions are: there is no vpn match, which causes an exception. a reference to kernel space causes a tlb miss exception. a reference to user space causes a utlb miss exception. the vpn match points to a pte ram entry with the valid bit low. this condition indicates that the page is invalid, and causes a tlb miss exception to the CW400X. the dirty bit is not set for the current page. this condition occurs only if the entry was valid; it indicates that an attempt was made to write to memory that is protected (clean), which causes a tlb mod- i?ed exception to the CW400X. this condition occurs only for a store cycle. figure 4.3 illustrates the conditions that cause tlb exceptions. figure 4.3 tlb miss exception conditions if there is a tlb hit, and the entry is valid and dirty (the valid and dirty bits are set), the vpn in the mmu replaces the address with the pte. vpn match msb = 1 v = 1 d = 1 write no utlb miss ye s no vpn no ye s ye s ye s n o no pte tlb modi?ed ye s tlb miss md96.189
mmu modules 4-5 if there is a tlb modi?ed exception or a tlb miss exception because the entry was invalid, the mmu encodes the eight match lines and writes them to the CW400X index register. if the tlb miss exception or utlb miss exception was simply due to no vpn match, the mmu writes the value of the random register to the index register, and sets the p bit. setting the p bit suggests that the entry needs to be replaced; however, the operating system does not need to use the p bit. 4.3 mmu modules the following modules are part of the mmu: tlb mmu stub gir 4.3.1 tlb the tlb is made of a8x20 cam, tightly coupled with a 8 x 23 ram. the cam match lines drive the word lines in the ram. the entire tlb is a fully-customized, special high-density structure capable of writes and reads to the cam and ram portions. it also performs match cycles. the match lines are driven out so that the values can be encoded. the CW400X must load the index register (ir) before writing to or read- ing from the hi (cam) or lo (ram) words in the tlb. the ir drives the address for a write or read to the tlb. writes to the tlb occur at the end of the ?rst x2 cycle, and because of this, the mmu generates a stall to hold the CW400X in x2 for one more cycle, so it can complete the write. thus, every tlb write requires two x2 cycles. tlb reads require only one x2 cycle. the tlb also detects the tlb shutdown condition (multiple matching entries), and shuts down the read, to prevent permanent hardware damage. 4.3.2 mmu stub the stand-alone mmu stub module is also a part of the mmu. this mod- ule registers the address for every bus run cycle and also translates the CW400X address, if in kseg0 or kseg1 , to the least signi?cant 512 mbytes of physical memory. in addition, it indicates that the address received by the mmu stub was in kseg0 or kseg1 , and has been hard- mapped, rather than mapped through the tlb (see section 6.2, mmu stub, of the minirisc CW400X microprocessor core technical manual ).
4-6 memory management unit (mmu) 4.3.3 gir the global instruction register (gir) is the speci?ed interface to the CW400X data bus, datap[31:0]. it guarantees that instructions are reg- istered, since they are to be fetched at the end of the if stage of the instruction. the gir guarantees that a valid instruction is loaded during the run cycle in which the if stage occurred. thus, designers do not need to design logic to: detect stalls from other instructions distinguish between data and instructions on the uni?ed bus (datap[31:0]) when the instruction is passed out of the gir, it has reached the x stage. for more information on the gir, see section 2.9, global instruction register module (gir). 4.4 signals this section describes the signals that comprise the bit-level interface of the mmu. tables 4.1 through 4.3 summarize the mmu signals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for active low signals end with an n and have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. table 4.1 mmu input signals summary input source de?nition addrp[19:0] CW400X CW400X virtual address bus bb us_stealn biu bbus bus steal birdyp biu bbus instruction data ready bsysresetn biu reset caddr_errorp CW400X CW400X memory address error cip_dn CW400X CW400X instruction/data indication (sheet 1 of 2)
signals 4-7 addrp[19:0] CW400X virtual address bus input these signals are the 20 msbs of the CW400X address bus (the virtual page number). bb us_stealn bbus bus steal input the biu asserts this signal to inform the mmu that the biu has stolen the bbus, and the mmu should not drive the bbus or write the value on the bbus into a register (because the CW400X is not controlling the bus). birdyp bbus instruction data ready input the biu asserts this signal to inform the mmu that datap[31:0] contains valid data for an instruction fetch. this is a control signal to the gir. for more information on the gir, see section 2.9, global instruction register module (gir). ckillxp CW400X CW400X instruction killed in execute stage cmem_fetchp CW400X CW400X memory fetch request cstorep CW400X CW400X store to memory request mmuenp system logic mmu enable mr un_inn goe system running pclkp system logic system clock table 4.1 (cont.) mmu input signals summary input source de?nition (sheet 2 of 2) table 4.2 mmu output signals summary output destination de?nition maddroutp[19:0] biu CW400X physical address bus mnocachep biu mmu non-cacheable page mrun_outp goe mmu running mtlbmissexcp CW400X mmu tlb miss exception mtlbmodexcp CW400X mmu tlb modi?ed exception mutlbmissexcp CW400X mmu user tlb miss exception table 4.3 mmu bidirectional signals summary bidirectional connect description datap[31:0] CW400X CW400X data bus
4-8 memory management unit (mmu) bsysresetn reset input the biu asserts this signal to inform the mmu that the system is in reset state, and to reset the mmu. this sig- nal must be valid for at least one clock cycle, although lsi logic recommends that for future design ?exibility, this signal be valid for at least three clock cycles. caddr_errorp CW400X memory address error input the CW400X asserts this signal to inform the mmu that a memory transaction address error has occurred. the CW400X can assert this signal in either the if or the x2 stage. this information is necessary to store the bad vir- tual address in the badva register. cip_dn CW400X instruction/data indication input the CW400X drives this signal low to inform the mmu that the decoded mmu mtc0/mfc0 instruction has entered the x2 stage, and that the mmu (if bb us_stealn is not asserted) can either write a regis- ter or drive data on datap[31:0]. this signal quali?es the type of memory fetch when a memory fetch is indicated by cmem_fetchp. the CW400X drives this signal high to indicate that it is per- forming an instruction fetch. the CW400X drives this sig- nal low to indicate that it is performing a data fetch. ckillxp CW400X instruction killed in execute stage input the CW400X asserts this signal to inform the mmu that the instruction currently executing in the x stage (x1 or x2) has been cancelled due to an exception. asserting this signal causes the mmu to ignore: any mtc0 instruction currently in the x1 or x2 stage before the write occurs to the register, or any mfc0 instruction in the x1 stage, before the read data is driven on the bus. if this signal is not asserted by the end of the x1 stage, the mmu drives the data during the x2 stage of an mfc0 instruction.
signals 4-9 cmem_fetchp CW400X memory fetch request input the CW400X asserts this signal to inform the mmu that the CW400X is now ready to load data from memory. this indicates that the mmu is now able to drive maddroutp[19:0] in a translation. the CW400X asserts this signal in the x1 stage. it is valid at the rising edge of the clock in the x2 stage. cstorep CW400X store to memory request input the CW400X asserts this signal to inform the mmu that the CW400X is now ready to store data to memory. the mmu can now drive maddroutp[19:0] in a translation. the CW400X asserts this signal in the x1 stage. it is valid at the rising edge of the clock in the x2 stage. datap[31:0] CW400X data bus bidirectional these signals transfer data to and from the CW400X. maddroutp[19:0] CW400X physical address bus output the mmu drives the 20 msbs of the translated CW400X address bus (the page table entry) onto these signals. when the mmu is not translating, these signals are a reg- istered version of addrp[19:0]. mmuenp mmu enable input asserting this signal enables the mmu. it should be a bit in a register elsewhere in the design, or it should be hard- wired high. mnocachep mmu non-cacheable page output the mmu asserts this signal to indicate that the mmu is preventing data from being stored into, or read from, the cache. mr un_inn system running input the goe asserts this signal low to inform the mmu that the system is running. the goe deasserts this signal high to inform the mmu that the system is stalled. the cause of this stall is unknown to the mmu. mrun_outp mmu running output the mmu asserts this signal high to indicate that it is running. the mmu deasserts this signal low in the ?rst x2 cycle of a mtc0 entryhi/entrylo instruction and
4-10 memory management unit (mmu) deasserts it one cycle later, indicating that the mmu requires a stall cycle. mtlbmissexcp mmu tlb miss exception output the mmu asserts this signal to indicate that it has detected an mmu tlb miss exception condition. mtlbmodexcp mmu tlb modi?ed exception output the mmu asserts this signal to indicate that it has detected a mmu tlb modi?ed exception condition. mutlbmissexcp mmu user tlb miss exception output the mmu asserts this signal to indicate that it has detected a mmu user tlb miss exception condition. pclkp system clock input this signal is the global clock input. it is used to clock elements in the CW400X interface. 4.5 tlb registers this section describes: tlb exception processing registers other tlb registers 4.5.1 tlb exception processing registers table 4.4 lists the tlb exception processing registers and their addresses. table 4.4 tlb exception processing register addresses address register name 0 index 1 random 4 context (read only) 8 bad virtual address (read only)
tlb registers 4-11 4.5.1.1 index register (r0) the index register contains the matching tlb entry for any tlb- generated exception. if the exception was caused by no matching entries in the tlb, the index register contains a random number from the ran- dom register. this number is the address of a possible tlb entry to be replaced. the minirisc mmu probe bit differs in function from the r3000s. the mmu sets the probe bit when it writes the value from the random register to the index register, not when executing the tlbp instruction, as in the r3000. the mmu sets the index register probe bit only on a tlb miss. a tlb hit-based exception does not cause the mmu to set the probe bit. when attempting to write to the index register, the CW400X stalls and drives the data bus in the x2 stage of the pipeline. the CW400X writes the register at the end of the x2 stage. index register reads occur dur- ing the x2 stage when the CW400X has stalled and allowed the mmu to drive the data bus. the index register contains: the address of the tlb entry to be accessed in a mtc0 entryhi, mtc0 entrylo, mfc0 entryhi, or mfc0 entrylo, the address of the matching tlb entry on any tlb-generated exception, or a random number from the random register that is the address of a tlb entry that can be replaced when there are no tlb matches. figure 4.4 shows the format of the index register. upon reset, the con- tent of this register is unde?ned. figure 4.4 index register p probe 31 the mmu sets this bit to indicate that no tlb entry matched the virtual page number and that the index field contains a random entry address to be used for replacement. 31 30 11 10 8 7 0 p reserved index reserved
4-12 memory management unit (mmu) reserved reserved bits [30:11], [7:0] these bits are not writable, and read as zero. index index bits [10:8] the mmu uses these bits to: indirectly index the address of the tlb entry of an upcoming tlb access by a CW400X mtc0/mfc0 instruction indicate the value of a matching tlb entry when that entry caused a tlb exception hold the random number from the random register after a tlb miss 4.5.1.2 random register (r1) the mmu random register is smaller than the r3000 random register because the mmu tlb is smaller (8 entries, instead of 64). therefore, the random field in the register only requires three bits, instead of six. entries can be made safe, or reserved, by writing a value to the random field. for example, writing a ?ve to the random register, makes entries 0 through 4 safe. the random register provides the address of a random entry in the tlb that can be replaced when a tlb miss occurs. the random reg- ister module consists of a three-bit counter and a three-bit register that is the output of the counter. the CW400X can load the register using the mtc0 random instruction, and read the counter using the mfc0 ran- dom instruction. the mmu loads the counter value into the register and counts continually every run cycle, as long as the mmu is enabled. when the counter reaches seven, the mmu reloads it with the value in the reg- ister. loading a value into the random register allows the CW400X to reserve certain entries in the tlb. when the CW400X attempts to write to the random register, the CW400X stalls and drives the data bus in the x2 stage of the pipeline. the CW400X writes the register at the end of the x2 stage. random register reads occur during the x2 stage when the CW400X has stalled and allowed the mmu to drive the data bus. figure 4.5 shows the format of the random register. upon reset, the content of this register is unde?ned.
tlb registers 4-13 figure 4.5 random register reserved reserved bits [31:11], [7:0] these bits are not writable, and read as zero. random random register counter [10:8] these bits contain the random register count. a write to this register initializes the counter. a read from this regis- ter reads from the counter. 4.5.1.3 context register (r4) lsi logic has included this register to maintain r3000 compatibility. it provides a register and format useful for tlb exception handlers. how- ever, the minirisc mmu does not implement the full r3000 context reg- ister. lsi logic has removed the physical table entry (pte) base field. the badvpn field is in the same location as in the r3000, bits [20:2]. the CW400X cannot write to this register. when the CW400X reads from this register, the mmu drives bits [30:12] of the badva register onto the data bus. figure 4.6 shows the format of the context register. upon reset, the content of this register is unde?ned. figure 4.6 context register res reserved bits [30:21], [1:0] these bits are not writable, and read as zero. badvpn bad virtual page number [20:2] this ?eld holds the virtual page number from the badva register, which contains the virtual page number that caused the last address error or mmu exception. 31 11 10 8 7 0 reserved random reserved 31 21 20 2 1 0 res badvpn res
4-14 memory management unit (mmu) 4.5.1.4 bad virtual address (badva) register (r8) the bad virtual address (badva) register is a read-only register that saves the bad virtual address associated with an illegal access (an address exception, either an address error or a tlb exception). this reg- ister saves only addresses for addressing errors (CW400X cause regis- ter exception code adel, ades, tlbmod, tlbl, or tlbs), not bus errors. figure 4.7 shows the format of the bad virtual address register. upon reset, the content of this register is unde?ned. figure 4.7 bad virtual address register badva bad virtual address [31:0] this ?eld stores the virtual address that was the cause of either an address error or an mmu exception. 4.5.2 other tlb registers the entryhi and entrylo registers are the tlb entries. 4.5.2.1 tlb entryhi register (r10) this register refers to the cam entries. figure 4.8 tlb entryhi register vpn virtual page number [31:12] this 20-bit ?eld contains the virtual page number for an entry in the tlb. this is the value compared with the 20 msbs of the address from the CW400X microprocessor (cam) to detect a tlb hit. reserved reserved bits [11:0] these bits are not writable, and read as zero. 31 0 badva 31 12 11 0 vpn reserved
address translation functional waveform 4-15 4.5.2.2 tlb entrylo register (r2) this register refers to the ram entries. figure 4.9 tlb entrylo register pte page table entry [31:12] this 20-bit ?eld contains the page table entry for each entry in the tlb. this is the value that replaces the 20 msbs of the address from the CW400X microprocessor to generate a physical address. n non-cacheable 11 this bit set to one, indicates that caching this page is not allowed. d dirty 10 this bit set to one, indicates that this line is dirty and is writable. writing to a page with this bit cleared to zero causes a tlb modi?ed exception. v valid 9 this bit set to one, indicates that this line is valid. an access to a line with this bit cleared to zero causes a tlb load or tlb store miss exception. 1 one 8 this bit is hardwired to one for mips compatibility. reserved reserved bits [7:0] these bits are not writable, and read as zero. 4.6 address translation functional waveform figure 4.10 shows the timing for a simple mmu address translation. a match cycle can occur every cycle, and therefore is independent of the instruction pipeline. a translation occurs for each bus run cycle. the ris- ing edge of the clock starts the match cycle. if there is a match, the mmu drives data onto maddroutp[19:0]. the mnocachep signal is directly driven from the ram contents, or from a kseg1 hard translation. asserting mnocachep causes the biu to invalidate any cache access 31 12 11 10 9 8 7 0 pte n d v 1 reserved
4-16 memory management unit (mmu) for the current transaction. the mmu determines the type of tlb exception from the values in the cam array. these values are available after the rising clock edge. figure 4.10 is also valid for nontranslated addresses. if the translation does not occur, the mmu registers the address and drives it onto maddroutp[19:0] after the clock edge, just as it drives a translated address. for a detailed description of mmu register reading and writing, see sec- tion 2.5, read/write transactions. figure 4.10 mmu address translation 4.7 tlb exceptions the following exceptions are valid for the minirisc mmu. none of them are maskable. for more information on exceptions, see chapter 5 of the minirisc CW400X microprocessor core technical manual . three types of tlb exceptions that can occur in the minirisc mmu: if the input virtual page number (vpn) does not match the vpn of any tlb entry, it causes a: C utlb miss, for kuseg C tlb miss, for kseg2 if an entry matches, but the valid bit is not set, it causes a tlb miss. if the dirty bit in a matching tlb entry is not set and the access is a write, it causes a tlb modi?ed exception. pclkp cstorep cmem_fetch mr un_inn addrp[19:0] maddroutp[19:0] mnocachep mtlbmissexcp mutlbmissexcp mtlbmodexcp md96.196 virtual address physical address virtual address virtual address
tlb exceptions 4-17 for more information on kuseg and kseg2 , see section 6.2 of the minirisc CW400X microprocessor core technical manual . 4.7.1 tlb miss exception 4.7.1.1 cause a tlb miss exception occurs when the mmu does not map a virtual address reference to memory and the most signi?cant bit of the address is one, or when a virtual address reference to memory matches an invalid tlb entry (the valid bit in the matching entrylo register is zero). 4.7.1.2 handling the CW400X branches to the general exception vector (0x80000080 or 0xbfc00180) and sets the tlbl or tlbs code in the cause register exccode field to indicate whether the miss was due to an instruction fetch, a load operation (tlbl), or a store operation (tlbs). the epc register points to the instruction that caused the exception, unless the instruction is in a branch delay slot and the branch is taken. in that case, the epc register points to the branch instruction that pre- ceded the exceptional instruction, and the CW400X sets the bd bit of the cause register. the mmu saves the kup, iep, kuc, and iec bits of the status register into the kuo, ieo, kup, and iep bits, respectively, and clears the kuc and iec bits. when this exception occurs, the badva and context registers contain the virtual address that failed address translation. if the exception was caused by an invalid entry, the index register contains the address of the invalid entry. if the exception was caused by an unmapped reference, the index register contains the value from the random register, and the p bit is set. the value in the random register is the address of a sug- gested entry to be replaced. 4.7.1.3 servicing the index register refers to the invalid or suggested entry to be replaced. the operating system should load the entry's entrylo register with the appropriate pte that contains the physical page frame and access control bits.
4-18 memory management unit (mmu) 4.7.2 tlb modi?ed exception 4.7.2.1 cause a tlb modi?ed exception occurs when a store operation's virtual address reference to memory matches a tlb entry that is marked valid but not dirty (the valid bit is set, but the dirty bit is not set for the match- ing entry). 4.7.2.2 handling the CW400X branches to the general exception vector (0x80000080 or 0xbfc00180) and sets the tlbmod code in the cause register exccode field. the epc register points to the instruction that caused the exception, unless the instruction is in a branch delay slot and the branch is taken. in that case, the epc register points to the branch instruction that pre- ceded the exceptional instruction, and the CW400X sets the bd bit of the cause register. the mmu saves the kup, iep, kuc, and iec bits of the status register into the kuo, ieo, kup, and iep bits, respectively, and clears the kuc and iec bits. when this exception occurs, the badva and context registers contain the virtual address that failed address translation. 4.7.2.3 servicing the badva register and index register hold the address and index of the failing virtual address and entry. the operating system should trans- fer control to the appropriate system routine. 4.7.3 utlb miss exception 4.7.3.1 cause a virtual address reference to unmapped user memory space (the most signi?cant bit in the address is zero) causes a utlb miss exception. 4.7.3.2 handling the CW400X branches to the utlb miss exception vector (0x80000000 or 0xbfc00100) and sets the tlbl or tlbs code in the cause register
tlb exceptions 4-19 exccode field to indicate whether the miss was due to an instruction fetch or load operation (tlbl) or a store operation (tlbs). the epc register points to the instruction that caused the exception, unless the instruction is in a branch delay slot and the branch is taken. in that case, the epc register points to the branch instruction that pre- ceded the instruction that caused the exception, and the CW400X sets the bd bit of the cause register. the mmu saves the kup, iep, kuc, and iec bits of the status register into the kuo, ieo, kup, and iep bits, respectively, and clears the kuc and iec bits. when this exception occurs, the badva and context registers contain the virtual address that failed address translation. the index register contains the value from the random register and the p bit is set, indi- cating no matching tlb entries. 4.7.3.3 servicing the value in the index register refers to a suggested entry to be replaced. the operating system should load the entry's entrylo register with the appropriate pte that contains the physical page frame and access control bits. the operating system should load the respective entryhi with the virtual page number.
4-20 memory management unit (mmu) 4.8 differences from the r3000 mmu this section explains the differences between the minirisc mmu build- ing block and the mips r3000 mmu. 4.8.1 writing and reading mmu registers each register in the minirisc mmu is part of the cp0 register set, but the mmu does not have access to the internal CW400X data bus. there- fore, the CW400X must stall at least one cycle to allow the external data bus (datap[31:0]) to be driven with the data for the mtc0 or mfc0 instruction. the mmu ignores the address bus (addrp[31:0]) during an mtc0/mfc0 instruction, since the instruction itself contains all informa- tion necessary for the write. in contrast to the r3000, the minirisc mmu allows direct tlb access. this makes the tlbp, tlbr, tlbwi, and tlbwr instructions obsolete. the mtc0/mfc0 entryhi/entrylo instructions directly access the data in the cam/ram instead of requiring an mtc0 entryhi/tlbwi-type instruc- tion combination. this difference makes the mmu design simpler and smaller, as well as more testable. to access a tlb entry, software must load the index register with the value to be accessed, then perform a mtc0/mfc0 entryhi/entrylo instruction. 4.8.2 unique features of the minirisc mmu other differences between the minirisc mmu and a mips r3000 mmu are: the mmu only implements an 8-entry tlb, there are no pid fields and no g bit. to maintain a compatibility with mips code, the g bit is set high for mfc0 entrylo instructions. all tlb entries are assumed to be global. the context register only implements the badvpn field, not the pte base field.
operation peculiarities and details 4-21 4.9 operation peculiarities and details this section describes conditions in which the pipeline, or events, cause unexpected results unless these conditions are prevented. 4.9.1 mtc0 x2/if tlb miss peculiarities this section explains scenarios in which an exception in the if stage of an instruction following a mtc0/mfc0 instruction (whose action occurs one cycle later, in the x2 stage) causes an unexpected result. figure 4.11 illustrates the pipeline stages for each instruction when this situation occurs. figure 4.11 mtc0 inconsistency pipeline index register (ir) if a write to the ir is followed by an instruction that causes a tlb miss exception due to a faulty instruction fetch (if), the mmu gen- erates the exception at the end of the if stage. the write to the ir from the mtc0 ir instruction does not occur until the end of the x2 stage, which causes the mmu to write this value over the value gen- erated by the if exception. this causes an incorrect value to be passed to the operating system. in addition, if the mtc0 ir, mfc0 ir sequence is followed by an instruction generating a tlb miss condition in the if stage, the if causes the mmu to overwrite the ir with the value causing the exception, before the mfc0 instruction can do the ir read. this overwrite causes the register to have an unexpected value. translation lookaside buffer (tlb) the state of the tlb must be preserved so the CW400X can exam- ine it. if, however, an mtc0 entryhi or mtc0 entrylo instruction if x1 x2 wb if xwb mtc0 ir next instruction write to register write to ir due to tlb miss exception md96.197
4-22 memory management unit (mmu) occurs, followed by an if tlb miss, the instruction might replace the entry in the tlb that caused the if exception. random register (rr) the mmu might produce an unexpected result when using the rr to protect low address entries in the tlb. if the mtc0 instruction is an mtc0 rr, increasing the value held in the rr to protect more addresses (say from 1 to 3), and the instruction following creates an if tlb miss, the rr address written to the ir might be in the region of the tlb that the previous mtc0 rr is attempting to protect (1 or 2). in this case, if the value in the ir is read and used as a replace- ment location for a new tlb entry, the address might be inside the newly protected area. to prevent this condition, ensure that a mct0 rr instruction that increases the protected area is not followed by an instruction that can cause a tlb miss. the converse situation (decreasing the protected area) can cause a nonoptimal page replacement scheme for the missed page, causing fewer locations than possible to be considered. programmers should avoid creating software that executes an mtc0/mfc0 instruction before any instruction that might cause an if exception. software should only perform tlb maintenance in non- mapped space. if this is not possible, the programmer must ensure that the page containing the instruction that follows the mtc0 instruction is valid, and not on a page that is not loaded into the tlb. there is another case where a previous instruction's write to the mmu registers overwrites the current instruction's writes. however, in this case, the overwrite does not cause an error, so nothing special need be done. this situation occurs, for example, when the instruction labeled mtc0 ir in figure 4.11 is a store or load instruction with a tlb miss exception in the x2 stage, followed by any instruction generating a tlb miss in the if stage. the write to the ir in the x2 stage would, in this case, be caused by an exception, not an mtc0 instruction. the exception from the store or load should be taken. in this case, the correct exception overwrites the write to the ir because of the second exception.
operation peculiarities and details 4-23 4.9.2 exceptions between mtc0 entryhi/entrylo instructions unlike the r3000, the minirisc mmu tlb entryhi and entrylo regis- ters are actually located in the tlb. because of this difference, the tlb entry is not updated in one cycle, as in the r3000. if an exception occurs between a mtc0 entryhi and an mtc0 entrylo instruction, the mmu tlb enters an intermediate state, only partially updated. (in the r3000, these instructions only write to separate registers that are then loaded simultaneously into the tlb with a tlbw instruction. this causes the entire entry to be written at once, removing the possibility of an interme- diate state stored in the tlb if an exception occurs.) because of this condition, it is necessary to prevent exceptions and inter- rupts from occurring between mtc0 entryhi and entrylo instructions (in either order) to maintain r3000-compatibility. 4.9.3 register consistency in the pipeline it is possible to corrupt all of the mmu registers unexpectedly when a wb exception in one instruction is followed by an if exception (an address error, or mmu exception, that causes a write to the mmu regis- ters) in the following instruction (see figure 4.12 ). figure 4.12 wb exception followed by an if exception since wb exceptions do not write the mmu registers, and it is not known until after the if exception occurs that the registers should not be written, this condition unexpectedly corrupts the mmu registers. therefore, if the value in the registers is important, it is important to ensure that this con- dition does not occur. in general, if these registers are important, lsi logic recommends that the user prevent the generation of an if or x2 exception in the subsequent code (usually the exception handler). this condition also occurs during simultaneous exceptions. the cp0 pri- oritizes exceptions based on the type of exception and which stage of the pipeline it is in. therefore, it is possible that the mmu exception (or if x if x wb first instruction next instruction wb exception if exception wb md96.198
4-24 memory management unit (mmu) address error) will cause a lower priority exception than the exception that is actually recognized. however, this exception writes the mmu reg- isters, causing the mmu registers to be in an unexpected state. 4.9.4 tlb initialization initializing the tlb requires at least one nop cycle with a valid clock signal, and that all control inputs be inactive. the mmu generates this required nop when executing an mt/mfc0 to those mmu registers not in the tlb. to properly and quickly initialize the tlb and prevent an intermediate state, assert bsysresetn for at least one, preferably three clock cycles. asserting bsysresetn forces all control inputs low and allows the gated clock to clock the tlb during a reset cycle.
5-1 chapter 5 basic biu and cache controller (bbcc) this chapter describes the basic biu and cache controller (bbcc) building block for the CW400X core. for information on how to add or remove a write buffer, see chapter 6 . this chapter contains the following sections: section 5.1, overview, page 5-1 section 5.2, features, page 5-2 section 5.3, functional description, page 5-3 section 5.4, signals, page 5-8 section 5.5, interfaces, page 5-25 section 5.6, cache-miss penalty, bbus latency, page 5-65 section 5.7, adding cache, page 5-66 section 5.8, bbus arbitration, page 5-80 section 5.9, timing considerations, page 5-83 5.1 overview the basic biu and cache controller (bbcc) is a generic bus interface unit and cache controller for use with the CW400X microprocessor core. it serves as the CW400Xs interface to on-chip memory (ocm), the cache rams, and other devices. figure 5.1 shows a block diagram of a system using the CW400X and the bbcc.
5-2 basic biu and cache controller (bbcc) figure 5.1 CW400X system with the bbcc 5.2 features the bbcc supports: two-way set associative or direct-mapped instruction-cache (i-cache) direct-mapped data-cache (d-cache) cache sizes of 1 to 32 kbytes (for d-cache and each i-cache set) locking of i-cache set 0 lines snooping on i-cache and/or d-cache software cache test mode hardware cache test mode minimum cache miss penalty of two clock cycles 1-8 deep write buffer (one internal entry, with support for seven exter- nal write buffer entries) load scheduling instruction streaming data streaming CW400X bbcc cbus ocm write buffer caches bbus device device md96.199
functional description 5-3 read priority block fetching burst writes reset control system con?guration register 5.3 functional description the bbcc consists of four modules: cache controller (cc) queue controller (qc) bbus controller (bc) system con?guration module (bsys) figure 5.2 shows the communication between the modules and which modules control the external interfaces. figure 5.2 bbcc internal block diagram qc cc bc CW400X bbus bsys write ocm buffer reset and system system con?guration re?ll control queue information queue control cache information cache rams register control con?guration (cbus) register output md96.200
5-4 basic biu and cache controller (bbcc) 5.3.1 cache controller (cc) the cache controller (cc) controls the system instruction and data caches. it identi?es instruction and data transactions, checks if the appropriate line is in cache, and (if the line is in cache) performs the requested transaction. the cc also informs the qc of the status of the transaction (cache hit/miss and other cache-related information), and allows the bc to read and write the cache and tag rams. the cc has the following features: cache con?guration: the cc allows for the i-cache to be con?g- ured to be direct-mapped or two-way set associative. the d-cache is always direct-mapped. instruction set 0 line-locking: each line of i-cache set 0 can be locked (lock bit) to guarantee that the contents of the selected lines remain in cache. flexible cache size: the cc supports maximum cache sizes of 64 kbytes for i-cache (32 kbytes for each set) and 32 kbytes for d-cache. the minimum supported size is 1 kbyte for i-cache (2 kbytes for two-way set associative) and 1 kbyte for d-cache. smaller i-caches can be supported with a small amount of additional logic. software cache test mode: the cc allows the CW400X to directly write/read to/from the cache and tag rams. direct cache access: the cc allows the bc to directly access the cache and tag rams. this mechanism can be used to perform hardware test of the cache rams. snooping: the cc provides logic to check the tags and invalidate lines in the d-cache and i-cache. if snooping is performed on only d-cache or i-cache, each snoop requires two clock cycles. if both d-cache and i-cache snooping are performed, each snoop requires four clock cycles. integrated instruction-cache set 0 and data-cache: i-cache set 0 and d-cache share the same physical data and tag rams. 5.3.2 queue controller (qc) the queue controller (qc) orders memory requests to the main memory. it consists of three queues: instruction fetch queue (one entry)
functional description 5-5 data fetch queue (one entry) data store queue (one internal entry, with the ability to add up to seven additional write buffer entries external to the bbcc) the qc monitors the CW400X memory transaction signals, then enters requests into the appropriate queues. the qc arbitrates requests in the queue and generates transaction requests to the bc. the qc also issues the ready signals to the CW400X, and when necessary, stalls the system. the qc has the following features: burst writes: when consecutive stores are to the same page, and the bbus device is able to handle burst writes, the qc gives priority to stores. read priority: if con?gured to do so, the qc gives data fetches higher priority than stores, when possible. ocm control: the qc issues the write enable and output enable signals to the ocm. variable write buffer depth: one store queue entry is internal to the bbcc, and up to seven additional write buffer entries can be added externally. 5.3.3 bbus controller (bc) the bbus is the basic bus, which serves as a generic interface to memory, dma controllers, dram controllers, and other devices. its characteristics are: 32-bit bus with separate address and data buses zero to n wait states (one-cycle transactions are possible) burst transactions (bbcc supports bursts of 1, 2, 4, and 8 words) back to back transactions maximum bandwidth of 200 mbytes/s at 50 mhz bus error and bus retry reporting multiple bus masters possible for more information on the bbus, see section 5.5.2, basic bus (bbus). the bbus controller (bc) handles the bbus. the bc handles requests by the CW400X, and uses a direct cache access interface to re?ll the
5-6 basic biu and cache controller (bbcc) cache with instructions or data. the bc also uses a direct cache access interface to re?ll the caches during hardware cache test mode. the bc has two modes: master and slave. when the bc is master, there are two types of requests: memory reads and memory writes. when the bc is a slave (when an external device is granted the bus and starts a bbus transaction), the bc can perform a direct cache access operation, or snoop on the bbus transaction. the bc has the following features: block fetching support system con?guration control snooping support 5.3.4 system con?guration module (bsys) the system configuration module issues reset signals to the system, and contains the system configuration register. the bc controls the system configuration register. when the CW400X loads or stores to the address 0xbfff0000, the transaction accesses the system configuration register rather than memory. all the bits of the system configuration register are output by the bbcc on bs_configp[31:0]. some of the bits are pre- defined; those that are not predefined are available for use by the system designer. figure 5.3 shows the system configuration register. figure 5.3 system con?guration register a available 31, [29:14] these bits are currently unassigned and are available. w write buffer enable 30 setting this bit, enables the write buffer. e data error 13 the bbcc sets this bit when a bus error occurs during a data load/store. writing a zero to this bit, clears it. 31 30 29 14 13 12 10 9 8 7 6 5 4 3 2 1 0 a w a e ps cm r drs d irs 1e ie
functional description 5-7 ps page size [12:10] writing these bits, informs the bbcc of the page size so it can determine whether stores are burst writes (consec- utive stores to the same page). cm cache mode [9:8] writing these bits, sets the cache mode. r read priority enable 7 setting this bit, causes the bbcc to give loads higher pri- ority than stores, when possible. drs d-cache block re?ll size [6:5] writing these bits, sets the d-cache block re?ll size. d d-cache enable 4 setting this bit, enables the data cache. irs i-cache block re?ll size [3:2] writing these bits, sets the i-cache block re?ll size. ps page size ps page size 000 16 words 100 256 words 001 32 words 101 512 words 010 64 words 110 1024 words 011 128 words 111 2048 words cm cache mode 00 normal 01 i-cache software test - data ram 10 i-cache software test - tag ram 11 d-cache software test - tag ram drs re?ll size 00 1 word 01 2 words 10 4 words 11 8 words
5-8 basic biu and cache controller (bbcc) 1e i-cache set 1 enable 1 setting this bit, enables the instruction cache set 1. ie i-cache enable 0 setting this bit, enables the instruction cache. 5.4 signals this section describes the signals that comprise the bit-level interface of the bbcc. tables 5.1 through 5.3 summarize the bbcc signals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for active low signals have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. irs re?ll size 00 1 word 01 2 words 10 4 words 11 8 words table 5.1 bbcc input signals summary input source description addrp[14:2] CW400X CW400X address bus baddrpi[31:2] bbus bbus address input bus bcache_selp bbus hardware cache test mode bdatapi[31:0] bbus bbus data input bus bdsnoop user-speci?ed value data snoop enable berr orn bbus bbus bus error bgntn bbus bbus bus grant bip_dni bbus bbus instruction/data input indicator bisetp bbus bbus instruction cache set bisnoop user-speci?ed value instruction snoop enable biuoen global output enable bbcc data output enable (sheet 1 of 3)
signals 5-9 blkgntn bbus bbus block transaction grant bocmexistp user-speci?ed value on-chip memory (ocm) exists indicator bocmselp ocm ocm memory transaction brd yni bbus bbus ready input bresetn system logic/reset module reset bretr yn bbus bbus bus retry br un_inn global output enable run enable bst ar tni bbus bbus transaction start input bt a gtestn bbus bbus tag ram transaction btxni bbus bbus transaction indicator input bwb urst_reqn bbus bbus burst write request bwrni bbus bbus write transaction indicator input cbytep[3:0] CW400X cbus byte enables cip_dn CW400X instruction/data indicator ckillmemp CW400X kill memory transaction request cmem_fetchp CW400X fetch indicator cstorep CW400X store indicator idlckp i-set 0/d tag ram lock bit idmatchp i-set 0/d tag match logic tag match idvldp[3:0] i-set 0/d tag ram valid bits i1matchp i-set 1 tag match logic tag match i1vldp[3:0] i-set 1 tag ram valid bits maddroutp[31:2] mmu mapped address mearlyks1p mmu mmu early kseg1 indicator mnocachep mmu mapped address not cacheable pclkp system logic system clock se system logic scan enable si scan chain scan data in tst system logic test enable wb_addrp[31:2] write buffer write buffer address wb_arrivebfldp write buffer write buffer store arrived before load wb_bytep[3:0] write buffer write buffer byte enables table 5.1 (cont.) bbcc input signals summary input source description (sheet 2 of 3)
5-10 basic biu and cache controller (bbcc) wb_cfgp write buffer write buffer store to system configuration register wb_datap[31:0] write buffer write buffer data wb_fullp write buffer write buffer full wb_stpndp write buffer write buffer store pending wb_vwbfldp write buffer write buffer valid write before load table 5.1 (cont.) bbcc input signals summary input source description (sheet 3 of 3) table 5.2 bbcc output signals summary output destination description baddrpo[31:2] bbus bbus address output bus bb_sl vdoen global output enable slave data output enable bb us_stealn various, external logic cbus bus steal indicator bcpuresetn CW400X CW400X reset bdatapo[31:0] bbus bbus data output bus bdoep bbus bbcc data output enable request bdrdyp CW400X load data ready biberrorp CW400X instruction bus error bip_dno bbus bbus instruction/data output indicator birdyp CW400X instruction data ready blkreqn bbus bbus block transaction request bmcntloep bbus bbcc master control output enable request bq_ocmoep ocm ocm output enable bq_ocmwen ocm ocm write enable brd yno bbus bbus ready output breqn bbus bbus bus request brun_outp global output enable run enable output bscntloep bbus bbus slave control output enable request bs_configp[31:0] various, external logic system con?guration register bsnoopwaitp bbus bbus snoop wait (sheet 1 of 3)
signals 5-11 bst ar tno bbus bbus transaction start output bsysresetn various, external logic reset btxno bbus bbus transaction indicator output bwb urst_gntn bbus bbus burst write grant bwrno bbus bbus write transaction indicator output bw_arrivebfldp write buffer store arrived before load bw_cfgselp write buffer store to system con?guration register bw_dfdonep write buffer data fetch done bw_dfqaddrp[3:0] write buffer data fetch queue address bits bw_dfqupdatep write buffer data fetch queue update bw_rdstqp write buffer read store queue bw_stpndp write buffer store pending bw_wrstqp write buffer write store queue byteno[3:0] bbus bbus byte enables output bz_iddclkp i-set 0/d data ram i-cache set 0/d-cache data ram clock bz_iddoep i-set 0/d data ram i-cache set 0/d-cache data ram output enable bz_iddwep[3:0] i-set 0/d data ram i-cache set 0/d-cache data ram write enables bz_idtclkp i-set 1 tag ram i-cache set 0/d-cache tag ram clock bz_idtwep[5:0] i-set 1 tag ram i-cache set 0/d-cache tag ram write enables bz_idt_oen external 3-state gates i-cache set 0/d-cache tag ram output enable bz_indexp[12:0] cache rams cache ram index bz_ip_dn cache rams instruction/data cache select bz_ip_dn_l tag match logic instruction/data cache select, registered bz_i1dclkp i-set 1 data ram i-cache set 1 data ram clock bz_i1doep i-set 1 data ram i-cache set 1 data ram output enable bz_i1dwep i-set 1 data ram i-cache set 1 data ram write enable bz_i1tclkp i-set 0/data tag ram i-cache set 1 tag ram clock bz_i1twep[4:0] i-set 0/data tag ram i-cache set 1 tag ram write enables table 5.2 (cont.) bbcc output signals summary output destination description (sheet 2 of 3)
5-12 basic biu and cache controller (bbcc) bz_i1t_oen external 3-state gates i-cache set 1 tag ram output enable bz_lockp i-set 0/data tag ram cache ram lock bit bz_tagp[21:0] cache tag rams cache ram tag bz_tag4matchp[21:0] tag match logic tag for tag match bz_validp[3:0] cache tag rams cache ram valid bits so scan chain scan data out table 5.2 (cont.) bbcc output signals summary output destination description (sheet 3 of 3)
signals 5-13 addrp[14:2] CW400X address bus input the CW400X drives the lower bits of the unmapped address for cbus transactions onto these inputs. these signals are valid in the clock cycle before the run cycle. baddrpi[31:2] bbus address input bus input when the bbcc is a slave on the bbus, the bbus master drives the address bus for bbus transactions onto these inputs. baddrpo[31:2] bbus address output bus output when the bbcc is a master on the bbus, the bbcc drives the address for bbus transactions onto these outputs. bb_sl vdoen slave data output output this signal is valid when the bbcc asserts bb us_stealn. the bbcc asserts this signal to the glo- bal output enable module (goe) to indicate that in the next clock cycle, one of the cache rams will drive datap[31:0]. bb us_stealn cbus bus steal output the bbcc asserts this signal to indicate that it will control driving data on datap[31:0] during the next clock cycle. bcache_selp hardware cache test mode input this signal connects to the bbus. asserting this signal causes the bbcc to interpret bbus transactions as hard- ware cache mode transactions. bcpuresetn CW400X reset output the bbcc asserts this signal to reset the CW400X. bdatapi[31:0] bbus data input bus input the bbus device drives the input data for bbus transac- tions onto these signals. table 5.3 bbcc bidirectional signals summary bidirectional connect description datap[31:0] various, external logic data bus
5-14 basic biu and cache controller (bbcc) bdatapo[31:0] bbus data output bus output the bbcc drives the output data for bbus transactions onto these signals. bdoep bbus data output enable request output this signal is provided for systems with a 3-state bbus design. the bbcc asserts this signal to inform the bbus output enable logic that the bbcc wants to place bdatapo[31:0] on the appropriate 3-state bus. bdrdyp load data ready output the bbcc asserts this signal to inform the CW400X that datap[31:0] contains valid data for a data fetch. bdsnoop data snoop enable input asserting this signal causes the bbcc to snoop on the d-cache when writes are performed by an external mas- ter on the bbus. berr orn bbus bus error input asserting this signal informs the bbcc that the current bbus transaction terminated with an error. bgntn bbus bus grant input asserting this signal informs the bbcc that it has been granted mastership of the bbus. biberrorp instruction bus error output the bbcc asserts this signal to inform the CW400X that the current instruction fetch terminated with an error. bip_dni bbus instruction/data input indicator input this signal is used during hardware cache test mode. the bbus drives this signal high to inform the bbcc that the i-cache is being accessed. the bbus drives this signal low to inform the bbcc that the d-cache is being accessed. bip_dno bbus instruction/data output indicator output the bbcc drives this signal high to indicate that an instruction transaction is being performed on the bbus. the bbcc drives this signal low to indicate that an data transaction is being performed on the bbus.
signals 5-15 birdyp instruction data ready output the bbcc asserts this signal to inform the CW400X that datap[31:0] contains valid data for an instruction fetch. bisetp bbus instruction cache set input this signal is used during hardware cache test mode. the bbus drives this signal high to inform the bbcc that the i-cache set 1 is being accessed. the bbus drives this signal low to inform the bbcc that the i-cache set 0 is being accessed. bisnoop instruction snoop enable input asserting this signal causes the bbcc to snoop on the i-cache when writes are performed by an external master on the bbus. biuoen bbcc data output enable input the global output enable module (goe) asserts this signal to cause the bbcc to drive datap[31:0]. blkgntn bbus block transaction grant input asserting this signal informs the bbcc that it has been granted permission to perform a block transaction on the bbus. blkreqn bbus block transaction request output the bbcc asserts this signal to request a block transac- tion on the bbus. bmcntloep bbcc master control output enable request out- put this signal, which connects to the bbus, is provided for systems with a 3-state bbus design. the bbcc asserts this signal to inform the bbus output enable logic that the bbcc is attempting to place baddrpo[31:2], bip_dno, bst ar tno, btxno, and bwrno on their appropriate 3-state buses. bocmexistp on-chip memory (ocm) exists indicator input asserting this signal informs the bbcc that the system has ocm. bocmselp ocm memory transaction input asserting this signal informs the bbcc that the current cbus transaction is for the ocm.
5-16 basic biu and cache controller (bbcc) bq_ocmoep ocm output enable output the bbcc asserts this signal to cause the ocm to drive its data onto datap[31:0]. bq_ocmwen ocm write enable output this output connects to the ocm. the bbcc asserts this signal to enable writes to the ocm. brd yni bbus ready input input asserting this signal informs the bbcc that the bbus transaction will ?nish at the end of the current clock cycle. brd yno bbus ready output output the bbcc asserts this signal to indicate that the bbus transaction will ?nish at the end of the current clock cycle. breqn bbus bus request output the bbcc asserts this signal to request mastership of the bbus. bresetn reset input asserting this signal causes the bbcc to reset and gen- erate the bcpuresetn and bsysresetn signals. bretr yn bbus bus retry input asserting this signal informs the bbcc that it should retry the current bbus transaction. br un_inn run enable input the goe asserts this signal to inform the bbcc that the next clock cycle will be a run cycle. brun_outp run enable output output the bbcc asserts this signal to the goe to enable the system to run. the bbcc deasserts this signal to the goe to cause the system to stall. bscntloep bbus slave control output enable request output this signal is provided for systems with a 3-state bbus design. the bbcc asserts this signal to inform the bbus output enable logic that the bbcc wants to place the brd yno signal on the appropriate 3-state bus.
signals 5-17 bs_configp[31:0] system con?guration register output the bbcc places the value of the system con?guration register on these signals. bsnoopwaitp bbus snoop wait output the bbcc asserts this signal to inform other devices attached to the bbus that they should not start another bbus transaction because the bbcc is performing both i-cache and d-cache snooping on a bbus transaction. bst ar tni bbus transaction start input input asserting this signal informs the bbcc that a bbus trans- action is starting. bst ar tno bbus transaction start output output the bbcc asserts this signal to indicate when it starts a bbus transaction. bsysresetn reset output the bbcc asserts this signal to reset the system. bt a gtestn bbus tag ram transaction input this signal is used during hardware cache test mode. the bbus drives this signal low to inform the bbcc that the tag ram is being accessed. the bbus drives this signal high to inform the bbcc that the data ram is being accessed. btxni bbus transaction indicator input input asserting this signal informs the bbcc that a bbus trans- action is in progress. btxno bbus transaction indicator output output the bbcc asserts this signal to indicate that a bbus transaction is in progress. bwb urst_gntn bbus burst write grant output the bbcc asserts this signal to indicate that it will perform a burst write.
5-18 basic biu and cache controller (bbcc) bwb urst_reqn bbus burst write request input asserting this signal requests that the bbcc perform a burst write. bwrni bbus write transaction indicator input input asserting this signal informs the bbcc that the current bbus transaction is a write. bwrno bbus write transaction indicator output output the bbcc asserts this signal to indicate that the current bbus transaction is a write. bw_arrivebfldp store arrived before load output the bbcc asserts this signal to inform the write buffer that the current cbus store transaction is occurring while the data fetch queue is empty. bw_cfgselp store to system con?guration register output the bbcc asserts this signal to inform the write buffer that the current cbus store is to the system con?gura- tion register. bw_dfdonep data fetch done output the bbcc asserts this signal to inform the write buffer that a data fetch transaction has just completed. bw_dfqaddrp[3:0] data fetch queue address bits output these signals are a few bits of the address from the data fetch queue. the write buffer uses these bits to detect load/store dependencies. bw_dfqupdatep data fetch queue update output the bbcc asserts this signal to inform the write buffer that the data fetch queue is being updated. bw_rdstqp read store queue output the bbcc asserts this signal to initiate a read operation to the write buffer.
signals 5-19 bw_stpndp store pending output the bbcc asserts this signal to inform the write buffer that a store is pending in the store queue. bw_wrstqp write store queue output the bbcc asserts this signal to initiate a write operation to the write buffer. byteno[3:0] bbus byte enables output output these signals are the byte enables from the bbcc. the bbcc asserts the byte enables high to inform the bbus device that the corresponding bytes are valid on bdatapo[31:0]. the following table shows the correspondence between byte enables and the data bus bytes. bz_iddclkp i-cache set 0/d-cache data ram clock output this output is connected to the i-cache set 0/d-cache data ram clock input. bz_iddoep i-cache set 0/d-cache data ram output enable output the bbcc asserts this signal to enable the i-cache set 0/d-cache data ram to drive data onto datap[31:0]. bz_iddwep[3:0] i-cache set 0/d-cache data ram write enables output the bbcc asserts these signals to enable writes to the i-cache set 0/d-cache data ram from datap[31:0]. the following table shows the correspondence between write enables and the data bus bytes. byte enable corresponding bdatapo[31:0] byte byteno3 [31:24] byteno2 [23:16] byteno1 [15:8] byteno0 [7:0] write enable corresponding datap[31:0] byte bz_iddwep3 [31:24] bz_iddwep2 [23:16] bz_iddwep1 [15:8] bz_iddwep0 [7:0]
5-20 basic biu and cache controller (bbcc) bz_idtclkp i-cache set 0/d-cache tag ram clock output this output is connected to the i-cache set 0/d-cache tag ram clock input. bz_idtwep[5:0] i-cache set 0/d-cache tag ram write enables output the bbcc asserts these signals to enable writes to the i-cache set 0/d-cache tag ram. figure 5.16 on page 5-48 shows the correspondence between byte enables and what is enabled. bz_idt_oen i-cache set 0/d-cache tag ram output enable output this output is an input to a set of external 3-state gates. the bbcc asserts this signal to enable the 3-state gates to drive data from the i-cache set 0/d-cache tag ram onto datap[31:0]. bz_indexp[12:0] cache ram index output these signals are connected to the cache ram address inputs. bz_ip_dn instruction/data cache select output the bbcc drives this signal high to read/write from/to the i-cache, and drives this signal low to read/write from/to the d-cache. bz_ip_dn_l instruction/data cache select , registered output this output is an input to the tag match logic. this signal is the same as bz_ip_dn, but delayed by one clock cycle. bz_i1dclkp i-cache set 1 data ram clock output this output is connected to the i-cache set 1 data ram clock input. bz_i1doep i-cache set 1 data ram output enable output the bbcc asserts this signal to enable the i-cache set 1 data ram to drive data onto datap[31:0]. bz_i1dwep[3:0] i-cache set 1 data ram write enable output the bbcc asserts this signal to enable writes to the i-cache set 1 data ram.
signals 5-21 bz_i1tclkp i-cache set 1 tag ram clock output this output is the i-cache set 1 tag ram clock input. bz_i1twep[4:0] i-cache set 1 tag ram write enables output the bbcc asserts these signals to enable writes to the i-cache set 1 tag ram. figure 5.18 ,on page 5-48 , shows the correspondence between byte enables and what is enabled. bz_i1t_oen i-cache set 1 tag ram output enable output this output is an input to a set of external 3-state gates. the bbcc asserts this signal to enable the 3-state gates to drive data from the i-cache set 1 tag ram onto datap[31:0]. bz_lockp cache ram lock bit output this output is connected to the lock bit input to the i-set 0/d tag ram. the bbcc asserts this signal to lock the i-set 0 tag ram line. bz_tagp[21:0] cache ram tag output these signals contain the tag which the bbcc writes to the cache tag rams. bz_tag4matchp[21:0] tag for tag match output this signal is an input to the tag match logic. these sig- nals contain the tag which is compared against the tag in the tag rams to determine if there is a cache hit or miss. bz_validp[3:0] cache ram valid bits output these signals are the valid bits which are written to the cache tag rams. cbytep[3:0] cbus byte enables input these signals are the byte enables from the CW400X. the CW400X asserts the byte enables high to inform the bbcc that the corresponding bytes are valid on datap[31:0].
5-22 basic biu and cache controller (bbcc) the following table shows the correspondence between byte enables and the data bus bytes. cip_dn instruction/data indicator input the CW400X drives this signal high to indicate that it is performing an instruction fetch. the CW400X drives this signal low to indicate that it is performing a data fetch or store. ckillmemp kill memory transaction request input the CW400X asserts this signal to request that the bbcc kill the current cbus transaction. ckillmemp is valid only during run cycles. cmem_fetchp fetch indictor input the CW400X asserts this signal to inform the bbcc that the current cbus transaction is a fetch. cstorep store indictor input the CW400X asserts this signal to indicate that the cur- rent cbus transaction is a store. datap[31:0] data bus bidirectional these signals are the cbus data bus. idlckp lock bit input this signal is the lock bit from the i-cache set 0/d-cache tag ram. idmatchp tag match input the i-set 0/d tag match logic asserts this signal to inform the bbcc that the tag from the i-cache set 0/ d-cache tag ram matched the appropriate bits of bz_tag4matchp[21:0]. idvldp[3:0] valid bits input these signals are the valid bits from the i-cache set 0/ d-cache tag ram. byte enable corresponding datap[31:0] byte cbytep3 [31:24] cbytep2 [23:16] cbytep1 [15:8] cbytep0 [7:0]
signals 5-23 i1matchp tag match input the i-set 1 tag match logic asserts this signal to inform the bbcc that the tag from the i-cache set 1 tag ram matched the appropriate bits of bz_tag4matchp[21:0]. i1vldp[3:0] valid bits input these signals are the valid bits from the i-cache set 1 tag ram. maddroutp[31:2] mapped address input these signals are the mapped cbus address from the mmu or mmu stub. mearlyks1p mmu early kseg1 indicator input the mmu or mmu stub asserts this signal to inform the bbcc that the cbus unmapped (virtual) address is in kseg1 (non-cacheable address space). mnocachep mapped address not cacheable input the mmu or mmu stub asserts this signal to inform the bbcc that the current cbus transaction is non-cacheable. pclkp system clock input this signal is the global clock input. se scan enable input asserting this signal enables the scan chain. si scan data in input this signal is the scan data input. so scan data out output this signal is the scan data output. tst test enable input asserting this signal puts the bbcc in test mode for scan. wb_addrp[31:2] write buffer address input these signals are the address of the earliest store trans- action held in the write buffer.
5-24 basic biu and cache controller (bbcc) wb_arrivebfldp write buffer store arrived before load input the write buffer asserts this signal to inform the bbcc that the earliest store transaction held in the write buffer was started while the data fetch queue was empty. wb_bytep[3:0] write buffer byte enables input the write buffer drives these signals, which are the byte enables of the earliest store transaction held in the write buffer. wb_cfgp write buffer store to con?guration register input the write buffer asserts this signal to inform the bbcc that the earliest store transaction held in the write buffer is to the system con?guration register. wb_datap[31:0] write buffer data input these signals are the data of the earliest store transac- tion held in the write buffer. wb_fullp write buffer full input the write buffer asserts this signal to inform the bbcc that the write buffer is full. wb_stpndp write buffer store pending input the write buffer asserts this signal to inform the bbcc that the write buffer contains a valid store transaction. wb_vwbfldp write buffer valid write before load input the write buffer asserts this signal to inform the bbcc that the write buffer contains a valid store transaction that should have a higher priority than data fetch transactions.
interfaces 5-25 5.5 interfaces this section describes the bbcc interfaces to the following: cbus basic bus (bbus) caches on-chip memory (ocm) write buffer 5.5.1 cbus 5.5.1.1 cbus transactions this section describes the use of the cbus signals. cbus transactions occur during run cycles . the global output enable module (goe, see chapter 6 of the minirisc CW400X microprocessor core technical manual ) asserts br un_inn to inform the bbcc that the following clock cycle will be a run cycle. when a clock cycle is not a run cycle ( br un_inn deasserted), the system stalls (a stall cycle). the CW400X cip_dn, cmem_fetchp, and cstorep signals are valid in the clock cycle before the run cycle, and indicate what type of transaction will occur during the run cycle. the signals are decoded as shown in table 5.4 . all other combinations do not occur, so designers can simplify the decod- ing to that in table 5.5 . table 5.4 transaction type signal decoding br un_inn cip_dn cmem_fetchp cstorep transaction 1 - - - no operation 0 1 1 0 instruction fetch 0 0 1 0 data fetch 0 0 0 1 store 0 0 0 0 coprocessor table 5.5 transaction type signal decoding simpli?ed br un_inn cip_dn cmem_fetchp cstorep transaction 1 - - - no operation 0 1 - - instruction fetch 0 0 1 - data fetch 0 - - 1 store 0 - 0 0 coprocessor
5-26 basic biu and cache controller (bbcc) the CW400X drives the lower bits of the unmapped address for cbus transactions onto addrp[14:2]. these signals are valid in the clock cycle before the run cycle. maddroutp[31:2] are the mapped cbus transaction address from the mmu or mmu stub. these signals are valid during the run cycle. the mmu or mmu stub holds these signals stable until the next run cycle. if addrp[31:29] = 101 2 , the mmu or mmu stub asserts mearlyks1p to inform the bbcc that the cbus unmapped (virtual) address is in kseg1 (non-cacheable address space). the mmu or mmu stub assert mnocachep to indicate that maddroutp[31:2] is a non-cacheable address, either because the address is in kseg1 or because the mmu is setup to make the address non-cacheable. the CW400X asserts ckillmemp to request that the bbcc kill the memory transaction. ckillmemp is valid during run cycles. the CW400X drives cbytep[3:0] to indicate which bytes it is requesting to read or write. they are only valid for data transactions. cbytep[3:0] is not used for instruction fetches since instruction fetches are always word fetches. cbytep[3:0] holds its value during stall cycles. cbus fetch transactions are normally completed when the bbcc asserts birdyp (instruction ready) or bdrdyp (data ready) to the CW400X. at this time, the data for the fetch is on datap[31:0]. for instruction fetches, birdyp must be received before the CW400X pipeline can proceed. for data fetches, the CW400X pipeline can proceed without receiving a bdrdyp assertion, as long as no dependency occurs with the data fetch, and no other data fetch transaction needs to be started. only one data fetch can be outstanding at any time. for stores, the CW400X pipe- line can proceed unless the write buffer is full, in which case the bbcc stalls the CW400X pipeline. the bbcc asserts bb us_stealn to indicate that the next clock cycle will be bus-stolen. a clock cycle is bus-stolen when the bbcc assumes control of datap[31:0] so it can place data on it for an instruction or data fetch. when the bbcc places the data for an instruction or data fetch on datap[31:0], it asserts birdyp or bdrdyp. for block fetches, the bbcc asserts bb us_stealn for each word, in order to write it to the cache. when the data for a cache re?ll is also the data required by the
interfaces 5-27 CW400X for an instruction or data fetch, the bbcc asserts birdyp or bdrdyp, and is said to be streaming the data to the CW400X. figure 5.4 shows examples of some cbus transactions. figure 5.4 cbus transactions cycle 1: the cbus signals indicate that the next cycle will be a store to address 0x00023000. the data for the store (0x8d830000) is on datap[31:0]. cycle 2: maddroutp[31:2] contains the mapped address (in this case it is the same as the unmapped address) for the store. the store is to a cacheable address, as shown by mnocachep. the next cycle will be an instruction fetch from address 0x00021424. cycle 3: maddroutp[31:2] contains the mapped address for the instruction fetch. birdyp indicates that the data for the pclkp br un_inn cip_dn cmem_fetchp cstorep addrp[14:2] 1 mearlyks1p mnocachep birdyp bdrdyp datap[31:0] maddroutp[31:2] 1 0x00023000 0x00021424 0x00023000 0x00021428 md96.41 0x0002142c 0x00021420 0x00023000 0x00021424 0x00023000 0x00021428 0x8d830000 0x0001f600 0x40800000 0x0001f600 0x00000000 12345 1. values shown are [31:0].
5-28 basic biu and cache controller (bbcc) instruction fetch is on datap[31:0]. the next cycle will be a data fetch from address 0x00023000. cycle 4: maddroutp[31:2] contains the mapped address for the data fetch. bdrdyp indicates that the data for the data fetch is on datap[31:0]. the next cycle will be an instruction fetch from address 0x00021428. cycle 5: maddroutp[31:2] contains the mapped address for the instruction fetch. birdyp indicates that the data for the instruction fetch is on datap[31:0]. the next cycle will be a move-from-coprocessor or a move-to-coprocessor. 5.5.1.2 bus error during an instruction fetch when a transaction terminates unsuccessfully, the bbcc reports a bus error to the CW400X. instruction fetch bus errors are handled differently than data transaction bus errors. on an instruction fetch, the CW400X must wait for birdyp to be asserted before the CW400X pipeline can proceed. therefore, bus errors that are signaled for instruction fetches occur during the instruction fetch. the bbcc asserts biberrorp and birdyp to signal a bus error to the CW400X. figure 5.5 shows a bus error during an instruction fetch.
interfaces 5-29 figure 5.5 bus error during instruction fetch cycle 1: cbus signals indicate that the next cycle will be an instruction fetch from address 0x00021800. cycle 2: maddroutp[31:2] contains the mapped address for the instruction fetch. the goe deasserts br un_inn to indicate that the CW400X will stall in the next clock cycle. cycle 5: birdyp and biberrorp are asserted, signaling a bus error. the next cycle will be an instruction fetch from address 0x00021804. cycle 6: the instruction fetch from address 0x00021804 is killed by the assertion of ckillmemp. the next cycle will be an instruction fetch from the exception handler (address 0x80000080). pclkp br un_inn cip_dn cmem_fetchp cstorep addrp[14:2] 1 mearlyks1p mnocachep birdyp bdrdyp datap[31:0] maddroutp[31:2] 1 md95.230 ckillmemp 0x00021804 0x00021800 0x80000080 0x00021800 0x00021434 0x00021804 0x3c084000 0x00000000 0x350800ff biberrorp 123456 1. values shown are [31:0].
5-30 basic biu and cache controller (bbcc) 5.5.1.3 bus error during a data transaction for data transactions, bus errors might not occur during the data trans- action, since the pipeline can proceed before the data transaction is com- plete. in this case, the bbcc signals a bus error by setting system con?guration register bit 13 (e bit). this bit is reset by writing a zero to it. this signal can be used to detect the bus error by tying it to one of the CW400X interrupts. the CW400X cp0 status register should be setup to detect this interrupt, or the bus error will not be detected. figure 5.6 shows a bus error during a data transaction. figure 5.6 bus error during data transaction pclkp br un_inn cip_dn cmem_fetchp cstorep addrp[14:2] 1 mearlyks1p mnocachep birdyp bdrdyp datap[31:0] maddroutp[31:2] 1 0x00023020 0x00021414 0x00021418 0x0002141c md95.231 0x00021410 0x00023020 0x00021414 0x00021418 0x0002141c ckillmemp 0x80000080 bs_configp13 bintp0 0x00000000 0xff345c41 0x00000000 12345 1. values shown are [31:0].
interfaces 5-31 cycle 1: the cbus signals indicate that the next cycle will be a data fetch from address 0x00023020. cycle 2-4: the CW400X continues to fetch instructions. cycle 5: the bbcc asserts bdrdyp. the bbcc asserts bs_configp13, which is tied to interrupt 0 of the CW400X, and hold it. the next cycle will be an instruction fetch from the exception handler (address 0x80000080). for more details about the cbus interface, see the minirisc CW400X microprocessor core technical manual . 5.5.2 basic bus (bbus) the bbus has a simple, generic bus protocol to interface to bbus devices. the bbcc contains two sets of inputs/outputs to interface to the bbus, one set for when the bbcc is the master on the bbus, the other for when the bbcc is a slave on the bbus. 5.5.2.1 single transactions this section describes the use of the main signals on the bbus for simple transactions with the bbcc as a bus master. the bbcc asserts bst ar tno at the beginning of transactions for one clock cycle, then deasserts it until the beginning of a new transaction. the bbcc asserts btxno at the beginning of a transactions, and con- tinues to assert it for the duration of the transaction. btxno might stay asserted between back-to-back bbus transactions. the bbcc asserts bwrno to indicate the transaction is a write. it deas- serts this signal to indicate the transaction is a read. the bbcc holds this signal stable for the duration of the transaction. the bbcc drives bip_dno low to indicate the transaction is a data transaction. it drives this signal high to indicate the transaction is an instruction fetch. the bbcc also holds this signal stable for the duration of the transaction. the bbcc puts the transaction address on baddrpo[31:2] and the data to be stored on bdatapo[31:0]. the bbus device drives data on bdatapi[31:0] for instruction and data fetches. the bbcc also drives byteno[3:0] to indicate which bytes are being fetched or stored. the bbcc asserts all the byte enables for instruction fetches and cacheable data fetches; otherwise it asserts only the byte enables for the bytes which the CW400X requested to read or write.
5-32 basic biu and cache controller (bbcc) the bbus device asserts brd yni when it places data on bdatapi[31:0] (for fetches) or when the bbus device has received the data from bdatapo[31:0] (for stores). the bbcc supports one-cycle transactions, so brd yni can be asserted in the ?rst clock cycle (the same clock cycle that bst ar tno is asserted). the bbus device asserts berr orn when the transaction terminates in an error. the bbus device asserts bretr yn to request that the bbcc do the transaction over again. figure 5.7 shows some bbus transactions. figure 5.7 bbus transactions cycle 1: the bbcc asserts bst ar tno and btxno low to indicate the beginning of a bbus transaction, bip_dno low to pclkp 0x00021400 0x0002106c 0x98765432 byteno[3:0] blkreqn blkgntn bst ar tno btxno bip_dno bwrno baddrpo[31:2] 1 bdatapo[31:0] bdatapi[31:0] brd yni 0x1fff0000 0x00021404 0x00000000 0x98765432 0x098765432 0x98765432 0x12345678 0x0 1 23456789 bb us_stealn 1. values shown are [31:0]. md96.201
interfaces 5-33 indicate that it is a data transaction, and bwrno low to indi- cate that it is a write transaction. baddrpo[31:2] shows that the transaction address is 0x00021400. since this is a store operation, bdatapo[31:0] contains the data to be stored. byteno[3:0] indicates that all four bytes will be written. cycle 2: the bbcc deasserts bst ar tno. the bbus device asserts brd yni to indicate that it has received the data from bdatapo[31:0]. the transaction is terminated at the end of this clock cycle, when the bbcc detects the brd yni assertion. cycle 3: the previous bbcc transaction is finished, and the bbcc asserts bst ar tno to start a new transaction. this transaction is a back-to-back with the previous store, so the btxno stays asserted. bip_dno and bwrno indicate that this is an instruction fetch from address 0x0002106c. cycle 4: the bbcc deasserts bst ar tno, and waits for the assertion of brd yni. cycle 5: the bbus device asserts brd yni, and drives the data on bdatapi[31:0]. the transaction is terminated at the end of this clock cycle, at which time the bbcc registers the data from bdatapi[31:0]. it asserts bb us_stealn so that it can place the data on datap[31:0] in the next clock cycle. cycle 8: the bbcc starts a data fetch transaction from address 0x00021404. cycle 9: the bbus device asserts brd yni and places the data on bdatapi[31:0]. the bbcc asserts bb us_stealn so it can put the data on datap[31:0] in the next clock cycle. 5.5.2.2 block fetching the bbcc executes a block fetch transaction to re?ll the caches. the transaction must be a cacheable transaction, and the system con?gura- tion register must be written so that the block size is more than one word. in the system con?guration register, the block size can be set to 1, 2, 4, or 8 words. for a four-word block size, the bbcc stops the re?ll once the line has all the words valid. for example, if words 2 and 3 of a line are valid, and a cache miss occurs on word 0, the bbcc does a block fetch for only
5-34 basic biu and cache controller (bbcc) words 0 and 1. for block sizes of 1, 2, or 8, the bbcc tries to fetch 1, 2, or 8 words, respectively. the bbcc also stops block fetches if a subsequent store occurs within the same eight-word block as a block fetch. stopping block fetches pre- vents a block fetch from overwriting a later value in the cache. the bbcc stops block fetches if a new store operation address matches the re?ll address in bits [9:5]. in this case, the bbcc continues to expect the cur- rently requested word, but does not write that word to the cache when it is received. the bbcc starts block fetches with the word which was fetched by the CW400X. for example, if the block size is eight words, and the fetch was for 0x0002101c, the order of the block fetch is: 0x0002101c, 0x00021000, 0x00021004, 0x00021008, 0x0002100c, 0x00021010, 0x00021014, 0x00021018. when the bbcc is able to do a block fetch, it asserts blkreqn with the start signal, bst ar tno. the bbus device asserts blkgntn to acknowledge the request for a block fetch. if the bbus device does not assert blkgntn, the block transfer ends. blkgntn is sampled by the bbcc when brd yni is asserted. if the bbus device asserts berr orn or bretr yn, the bbcc ends the block fetch. the bbcc signals a bus error to the CW400X if the bbus device asserts berr orn while the bbcc is fetching the first word of the block fetch. if the bbus device asserts berr orn while the bbcc is fetching a subsequent word of the block fetch, the bbcc ends the block fetch, but does not report the bus error to the CW400X. the bbcc retries a transaction if the bbus device asserts bretr yn while the bbcc is fetching the first word of the block fetch. if the bbus device asserts bretr yn while the bbcc is fetching a subsequent word of the block fetch, the bbcc ends the block fetch, but does not retry the transaction. figure 5.8 shows an example of a block fetch with a four-word block size.
interfaces 5-35 figure 5.8 block fetch with four-word block size cycle 1: the bbcc asserts the bst ar tno and btxno signals to start an instruction fetch from address 0x00022004 (baddrpo[31:0]). the bbcc asserts blkreqn to indicated that it is able to perform a block fetch. cycle 3: the bbus device asserts brd yni to indicate that it has placed the data on bdatapi[31:0]. it also asserts blkgntn to indi- cate that it can perform a block fetch. cycle 4: the bbcc places the address 0x00022008 on baddrpo[31:2] for the next fetch of the block fetch. cycle 5: the bbus device asserts brd yni and blkgntn to continue the block fetch. pclkp md95.239 0x00022004 0x00022008 98765432 byteno[3:0] blkreqn blkgntn bst ar tno btxno bip_dno bwrno baddrpo[31:2] 1 bdatapo[31:0] bdatapi[31:0] brd yni 0x000200c 0x00022000 0x0 0x987654321 0x98765432 0x00000000 bb us_stealn 1 23 4 5 6 7 8 9 1. values shown are [31:0]. 0x00000000 0x00000000 0x00000000
5-36 basic biu and cache controller (bbcc) cycle 6: the bbcc places the address 0x0002200c on baddrpo[31:2] for the next fetch of the block fetch. cycle 7: the bbus device asserts brd yni and blkgntn to continue the block fetch. cycle 8: the bbcc places the address 0x00022000 on baddrpo[31:2] for the next fetch of the block fetch. since the block size is four, the address wrapped to the first word of the line. the bbcc deasserts blkreqn to indicate that this is the last word of the block fetch. cycle 9: the bbus device asserts brd yni for the last fetch. 5.5.2.3 burst writes burst writes are writes that occur back-to-back to the same page. they are different from block fetches in that each burst write transaction is sep- arate. the bbcc still asserts bst ar tno at the beginning of the subse- quent writes. the bbcc performs burst writes when: a bbus device requests it by asserting bwb urst_reqn, the bbcc is currently doing a store transaction, and the following data transaction is a store residing in the write buffer which stores to the same page (page size is de?ned in the system con?guration register) as the current store. when these conditions are met, the bbcc asserts bwb urst_gntn to indicate that it will perform a burst write. although the bwb urst_reqn is sampled at all times, and the bbcc asserts/deasserts bwb urst_gntn based on the state of bwb urst_reqn, bwb urst_reqn is only mean- ingful when the brd yni is asserted by the bbus device, because that is the clock cycle in which the next bbus transaction is determined. figure 5.9 shows a series of burst writes.
interfaces 5-37 figure 5.9 series of burst writes cycle 1: the bbcc starts a word store to address 0x00021c00 (baddrpo[31:0]). cycle 4: the bbus device asserts brd yni to indicate that it has received the data from bdatapo[31:0]. it also asserts bwb urst_reqn to request a burst write. the bbcc responds by asserting bwb urst_gntn, indicating that the next transaction will be a store to the same page. cycle 5: the bbcc starts a new store to the same page. the address in this case is 0x00021c04 (baddrpo[31:0]). byteno [3:0] indi- cates that only byte 0 will be stored. cycle 7: the bbus device again asserts brd yni and bwb urst_reqn. however, the bbcc does not assert bwb urst_gntn in response. pclkp md95.240 0x00021c00 0x00021c04 byteno[3:0] bwb urst_reqn bwb urst_gntn bst ar tno btxno bip_dno bwrno baddrpo[31:2] 1 bdatapo[31:0] brd yni 0xfcedbecf 0x12345678 0x0 0xe 0x0 1 2345678 1. values shown are [31:0].
5-38 basic biu and cache controller (bbcc) 5.5.2.4 bus mastership the bbus can have multiple bus masters, with only one bus master tak- ing control of the bus at a time. the arbitration of bus ownership can occur while a transaction is in progress, because the transfer of owner- ship from one master to another does not affect the transaction in progress. bus masters should not begin transactions until btxno is deasserted, so there are no con?icts on the bus. breqn and bgntn control ownership of the bus. the bbcc asserts breqn when it does not have ownership of the bus, but needs control of the bus. the bbcc also asserts breqn any time that it has two or more transactions queued. when bgntn is asserted, the bbcc has ownership of the bus, and can start a bbus transaction after btxni has been deasserted. when the bbcc requests the bbus, the bbcc normally starts a trans- action when bgntn is asserted and btxni is deasserted. an exception is when the bus request is for the system con?guration register. in this case, the bbcc performs an internal transaction which is not visible on the bbus. the bbcc deasserts breqn when the system con?guration register transaction is completed. figure 5.10 shows examples of bus arbitration.
interfaces 5-39 figure 5.10 examples of bus arbitration cycle 1: the bbcc asserts breqn to indicate that it needs the bbus. the external arbiter asserts bgntn in reply to give ownership of the bus to the bbcc. cycle 2: the bbcc detects bgntn asserted and begins a data fetch bbus transaction. the bbcc asserts blkreqn to indicate a block fetch. because there was only one transaction queued, the bbcc deasserts breqn. cycle 3: the bbus device asserts brd yni, but not blkgntn. the transaction is complete when the bbcc detects the brd yni assertion at the end of this clock cycle. another transaction was queued, so the bbcc asserts breqn again. cycle 4: bgntn is not asserted, so no transaction occurs. pclkp md95.241 byteno[3:0] blkgntn bst ar tno btxno bip_dno bwrno baddrpo[31:2] 1 bdatapo[31:0] blkreqn 0x800f0437 breqn bgntn 0x00022038 0x00022004 0x000210a8 0x000210ac 0x800f0437 0x04050607 0x0 brd yni 1 23 4 5 6 7 8 9 1. values shown are [31:0].
5-40 basic biu and cache controller (bbcc) cycle 5: the external arbiter asserts bgntn, giving ownership of the bus to the bbcc. cycle 6: the bbcc begins an instruction fetch transaction, requesting a block fetch. it deasserts the breqn, since it no longer needs the bus. cycle 7: the bbus device asserts brd yni and blkgntn. the trans- action continues. cycle 8: the bbcc deasserts blkreqn, indicating that this is the last fetch of the block fetch. cycle 9: the bbus device asserts brd yni, and the transaction com- pletes at the end of this clock cycle. 5.5.2.5 hardware cache test hardware cache test mode allows the caches to be accessed through the bbus. when the bbcc is not the master on the bbus, and bcache_selp is asserted, the bbcc is in cache test mode. bcache_selp acts as a select to the bbcc. when bcache_selp is asserted and a transaction is started on the bbus, the bbcc assumes it is the target of the transaction; when bcache_selp is not asserted, the bbcc assumes that the transaction is to another device on the bbus. hardware cache test mode transactions are all two-cycle transactions. the cache transaction is determined by the inputs: bip_dni (instruction or data cache) bwrni (read or write transaction) bt a gtestn (tag portion or data portion of cache) bisetp (which instruction set) for hardware test reads, the bbcc asserts bb us_stealn in the ?rst clock cycle, to perform the read from the designated ram at the end of the ?rst clock cycle. in the second clock cycle, the data is available from the ram, and is returned on bdatapo[31:0]. for hardware test writes, the bbcc asserts bb us_stealn in the second clock cycle, and per- forms the write at the end of the second clock cycle. reads from the data portion of the caches are straightforward; the data from the cache is simply returned on bdatapo[31:0]. write transactions
interfaces 5-41 to the data portion of the cache cause the bbcc to do a write-?rst type transaction. the bbcc writes the data and tag, sets the corresponding valid bit for the word, and clears the other valid bits in the line. reads from the tag portion of the cache have the same format as in software cache test mode; the tag is in the upper portion of the bus, and the lock and valid bits are in the lower portion of the bus. writes to the tag portion of the cache are similar to software test writes; the tag portion of baddrpi[31:2] is written as the tag, and the lower ?ve bits of bdatapi[31:0] are used as the lock and valid bits. figure 5.11 shows some hardware cache test transactions. figure 5.11 hardware cache test transactions pclkp md95.242 bcache_selp bt a gtestn bip_dni bwrni baddrpi[31:2] 1 bdatapi[31:0] bst ar tni btxni 0x80000000 0x0000000c brd yno bisetp 0x40000008 0x00000004 bdatapo[31:0] 0x12345678 0x80000001 0x12345678 0x00000015 0x00000000 bb us_stealn datap[31:0] 0x00000000 0x00000015 0x12345678 0x12345678 0x80000001 1 2345678 0x12345678 0x00000015 1. values shown are [31:0].
5-42 basic biu and cache controller (bbcc) cycle 1: the bbus device initiates a transaction to the bbcc (bcache_selp asserted). the transaction is a write to the data ram of i-cache set 0 at address 0x80000000. this cor- responds to line 0 of the cache tag ram, and word 0 of the cache data ram. cycle 2: the data for the write should be on bdatapi[31:0] during this clock cycle. the bbcc asserts bb us_stealn to steal datap[31:0] for the write, and asserts brd yno to signal the transaction is complete. cycle 3: datap[31:0] is driven with the data to be written to the ram. the write is performed at the beginning of the clock cycle. a new hardware cache test transaction is started. this time it is a read from the tag ram of the i-cache set 0, at address 0x00000004. this corresponds to line 0 of the cache tag ram. the bbcc asserts bb us_stealn to perform the read from the ram. cycle 4: the bbcc puts the data from the read onto bdatapo[31:0]. the bbcc asserts brd yno. the data reflects what was writ- ten in the tag ram from the write to the data ram, which uses a write-first type of transaction. the upper bits contain the tag from the write address, and the appropriate valid bit is set. cycle 5: the bbus device starts a write to the tag ram of the i-cache set 1. the address 0x40000008 corresponds to line 0 of the tag ram. cycle 6: the data written to the tag ram is the tag portion of baddrpi[31:2] and the lower ?ve bits of bdatapi[31:0]. in this case, the lock bit and two of the valid bits are set. the bbcc asserts brd yno. cycle 7: the bbus device starts a read from the data ram of the d-cache. cycle 8: the bbcc returns data on bdatapo[31:0], and asserts brd yno. 5.5.2.6 snooping snooping occurs when: bisnoop and/or bdsnoop signal are asserted, the bbcc is not a master on the bbus,
interfaces 5-43 bcache_selp is not asserted, and a bbus write transaction occurs. when these conditions are met, the bbcc does a tag comparison on the appropriate line(s) in the caches, and if the tags match, it invalidates the cache line(s). snooping the i-cache requires two clock cycles (for either direct-mapped or two-way set associative) and snooping the d-cache requires two clock cycles. if both i-cache and d-cache snooping are enabled (bisnoop and bdsnoop set), then snooping requires four clock cycles. to help assure that the bbus device does not perform write trans- actions more frequently than it is possible to snoop on, the bbcc asserts bsnoopwaitp when both i-cache and d-cache snooping are enabled. figure 5.12 shows i-cache and d-cache snooping. figure 5.12 i-cache and d-cache snooping cycle 1: the bbus device begins a write transaction. the bbus device does not assert bcache_selp, and bisnoop and bdsnoop are asserted, so the bbcc snoops on address 0x000200d4 (baddrpi[31:2]). the bbcc asserts bb us_stealn to read the appropriate tag from the d-cache. pclkp bisnoop bwrni btxni baddrpi[31:2] 1 md95.243 bsnoopwaitp bdsnoop bst ar tni bcache_selp bb us_stealn 0x000200d4 1 23 4 1. values shown are [31:0].
5-44 basic biu and cache controller (bbcc) when both d-cache snooping and i-cache snooping are enabled, the bbcc performs the d-cache snooping ?rst. cycle 2: the tag match logic performs the d-cache tag comparison in this clock cycle. if the tag in the tag ram matched the baddrpi tag, the bbcc turns off (clears) the valid bits for that line. the bbcc asserts bb us_stealn so that it can perform the tag ram write, if necessary. the bbcc asserts bsnoopwaitp, indicating that it will perform both d-cache and i-cache snooping. cycle 3: the bbcc asserts bb us_stealn so it can read the appro- priate tag from the i-cache (both tags if two-way set associa- tive i-cache). cycle 4: the tag comparison is done in this clock cycle. the bbcc asserts bb us_stealn so that it can write the valid bits if necessary. the bbcc deasserts bsnoopwaitp since this is the last clock cycle. 5.5.2.7 bidirectional bbus the bbus inputs and outputs can be made into a bidirectional bus by adding 3-state gates, and using the bmcntloep, bdoep, and bscntloep bbcc outputs. when bmcntloep is asserted, bst ar tno, btxno, bwrno, bip_dno, and baddrpo[31:2] from the bbcc should be driven onto their 3-state buses. when bdoep is asserted, bdatapo[31:0] from the bbcc should be driven onto its 3- state bus. when bscntloep is asserted, the brd yno from the bbcc should be driven onto its 3-state bus. for all 3-state buses, it is important that one and only one driver be driv- ing the bus at all times. this involves assuring that there is a default driver when no driver needs to drive the bus, and that there is a resolu- tion to having multiple drivers trying to drive the bus at the same time. the following example shows how to create a default driver module for the bbus data bus. if the bbus data bus is connected to the bbcc and two other devices (device x and device z, for example), the bbcc can be made the default driver by implementing table 5.6 in logic.
interfaces 5-45 table 5.6 example data output enable logic bdoep bbcc bbus data output enable request input the bbcc asserts this signal to inform the bbus output enable logic that the bbcc wants to place its data (bdatapo[31:0]) on the 3-state bus. xdoep device x bbus data output enable request input device x asserts this signal to inform the bbus output enable logic that device x wants to place its data (xdatapo[31:0]) on the 3-state bus. zdoep device z bbus data output enable request input device z asserts this signal to inform the bbus output enable logic that device z wants to place its data (zdatapo[31:0]) on the 3-state bus. bbcc_doep bbcc data output enable output the bbus output enable logic asserts this signal to cause the set of 3-state gates to place bdatapo[31:0] on the 3-state bus. devx_doep device x data output enable output the bbus output enable logic asserts this signal to cause the set of 3-state gates to place xdatapo[31:0] on the 3-state bus. devz_doep device z data output enable output the bbus output enable logic asserts this signal to cause the set of 3-state gates to place zdatapo[31:0] on the 3-state bus. figure 5.13 shows the gates for table 5.6 logic. the buffer gates are added to make the delay through the module approximately equal for all inputs to outputs. the equal delay causes the switching of the 3-state gates to occur nearly simultaneously. the duplicate outputs are included inputs outputs bdoep xdoep zdoep bbcc_doep devx_doep devz_doep 00 0 1 0 0 00 1 0 0 1 01 0 0 1 0 01 1 1 0 0 10 0 1 0 0 10 1 1 0 0 11 0 1 0 0 11 1 1 0 0
5-46 basic biu and cache controller (bbcc) because the data bus is 32-bits wide, and having multiple drivers improves the driving of the 32 3-state gates (each output can drive 16 3-state gates). figure 5.14 shows how the default driver logic should be hooked up in the system. figure 5.13 default driver logic bdoep xdoep zdoep bbcc_doep1 bbcc_doep0 xdoep bdoep zdoep devx_doep1 devx_doep0 zdoep bdoep xdoep devz_doep1 devz_doep0 md95.256
interfaces 5-47 figure 5.14 default driver logic in system 5.5.3 caches the bbcc interfaces to either two or four cache rams. a system with a two-way set associative i-cache requires four rams. otherwise, only two rams are required. all of the signals to the rams (clock, write enables, output enables, data) come from the bbcc except for datap[31:0]. the outputs of the rams go to datap[31:0], the bbcc, and to logic that determines if there was a tag match. because the caches use the unmapped address to access the cache rams, if an mmu is in the system, the size of the caches must be smaller than the mmu page size. if the mmu stub is installed, rather than the mmu, this limitation does not apply. the instruction address space and data address space are assumed to be nonoverlapping. when the CW400X performs a store to an address, no checking is performed to determine if that address is contained in the i-cache. figures 5.15 through 5.18 show the contents of each ram fo ra1kbyte cache, and which portions of the ram each write enable controls. device z zdatapi[31:0] zdatapo[31:0] zdoep devz_doep[1:0] default driver logic bbcc bdoep bdatapo[31:0] bdatapi[31:0] bbcc_doep[1:0] device x xdoep xdatapo[31:0] xdatapi[31:0] devx_doep[1:0] md95.257
5-48 basic biu and cache controller (bbcc) figure 5.15 i-cache set 0/d-cache data ram figure 5.16 i-cache set 0/d-cache tag ram figure 5.17 i-cache set 1 data ram figure 5.18 i-cache set 1 tag ram there are two operating modes for the CW400X interface to the caches: normal and software cache test. the system con?guration register determines in which mode the bbcc operates. there are three modes for the bbus interfaces to the caches: normal, hardware test, and snooping. these modes are determined by the transactions on the bbus. 5.5.3.1 normal instruction cache transactions when an instruction fetch is initiated by the CW400X, the bbcc does a tag lookup in the ?rst non-bus-stolen cycle. in this cycle, the bbcc looks at whether there was a tag match (idmatchp and i1matchp) and the 31 24 23 16 15 8 7 0 byte 3 byte 2 byte 1 byte 0 bz_iddwep3 bz_iddwep2 bz_iddwep1 bz_iddwep0 26 543210 tag l v3 v2 v1 v0 bz_idtwep[5:0] bit = 5 4 3 2 1 0 31 0 instruction bz_i1dwep 25 43210 tag v3v2v1v0 bz_i1twep[4:0] bit = 4 3 2 1 0
interfaces 5-49 valid bits (idvldp[3:0] and i1vldp[3:0]) to determine whether the instruction is in the i-cache. in the same cycle, the i-cache places its data onto datap[31:0]. if the instruction is in the i-cache (a cache hit), the bbcc asserts the birdyp to the CW400X, and the CW400X uses the data on datap[31:0]. if the instruction is not in the i-cache, the bbcc does not assert birdyp, and the CW400X stalls until the instruction is fetched from the external memory. the i-cache can be direct-mapped or two-way set associative. for two- way set associative i-cache, when a tag miss occurs and both lines con- tain valid instructions, the bbcc replaces one of the lines. at each tag miss when both sets contain valid instructions, the bbcc alternates between which set it replaces. the ?rst time it replaces set 0, the second time set 1, then set 0, then set 1, and so on. each line of i-cache set 0 can be locked (lock bit) to guarantee that the contents of the selected lines remain in cache. locking is useful for keep- ing certain code in the cache all the time, such as an exception handler, or some other time-critical code. setting the lock bit in the tag ram dur- ing software cache test mode locks a line in the i-cache. the lock bit can only be modi?ed while in software cache test mode or hardware cache test mode. the clocks to the data rams of the caches are delayed clocks, so that stores can happen in the ?rst cycle of the store. figure 5.19 shows some normal i-cache transactions.
5-50 basic biu and cache controller (bbcc) figure 5.19 normal i-cache transactions pclkp br un_inn addrp[14:2] 1 birdyp bb us_stealn biuoen idvldp[3:0] datap[31:0] 0x00000000 0x3 i1matchp 0x0002181c 0x00021820 0x00021824 0x08008809 0x00000000 0x7 0xf i1vldp[3:0] bz_iddoep bz_idtclkp bz_idtwep[5:0] bz_i1tclkp bz_i1twep[4:0] bz_iddclkp bz_iddwep[3:0] 0xf 0x04 0x00 0x08 0x00 0x04 0x00 0x08 0x00 0xf 0x0 0x0 0xf 0x0 bz_i1dclkp bz_i1dwep 12345 cip_dn cmem_ftechp cstorep idmatchp bz_i1doep 1. values shown are [31:0]. md96.202
interfaces 5-51 cycle 2: the next cycle will be an instruction fetch from address 0x0002181c. cycle 3: the tag look-up is done in this cycle, since the cycle is not bus-stolen. idmatchp high indicates that the current address matched the tag in instruction set 0. however, the valid bit corresponding to the fetched word, idvldp3, is zero. this indicates a tag miss. birdyp is not asserted, and the CW400X stalls. even though there was a cache miss, the caches still drive datap[31:0] (in this case it is set 0; indi- cated by bz_iddoep high). since the birdyp was not asserted, the CW400X ignores the value on datap[31:0]. the bbcc asserts bb us_stealn to show that the next cycle will be bus-stolen. cycle 4: the goe asserts biuoen, and the bbcc drives datap[31:0] with the instruction for the instruction fetch during this bus-sto- len cycle. the bbcc asserts birdyp. at the beginning of this cycle, the bbcc also writes the data into set 0 of the i-cache (note bz_iddwep[3:0] and bz_iddclkp), and sets the appropriate valid bit in the set 0 tag ram (note bz_idtwep[5:0] and bz_idtclkp). the next cycle is an instruction fetch from address 0x00021820. cycle 5: the tag look-up is done in this cycle, since the cycle is not bus-stolen. the tag matched set 0, and the appropriate valid bit was set. these conditions indicate a i-cache hit in set 0. the set 0 ram drives datap[31:0], and the bbcc asserts birdyp to inform the CW400X that the instruction is on datap[31:0]. 5.5.3.2 normal data cache transactions d-cache fetches are similar to i-cache fetches, except the d-cache is always direct-mapped, and the lines are not lockable. stores to d-cache occur if the store is in cacheable address space. the write to the data ram occurs at the beginning of the first non-bus-stolen cycle, using a delayed clock. also at the beginning of the clock cycle, if the store is a full-word store, the bbcc writes the appropriate valid bit to the tag ram. if the store is a partial store, then no valid bits are written. in this same cycle, the tag match is done. based on the tag match and the valid bits, the bbcc might stall the CW400X to update the tag ram.
5-52 basic biu and cache controller (bbcc) this stalling is necessary for the following cases: full word store, no tag match: allocate line (update tag, clear other valid bits) partial store, no tag match, word was valid: invalidate word ckillmemp or mnocachep asserted, word was valid or full word store: invalidate word if the tag ram needs to be updated, it is updated at the beginning of the next non-bus-stolen cycle. figure 5.20 shows some d-cache transactions.
interfaces 5-53 figure 5.20 d-cache transactions cycle 1: the next cycle will be a store to address 0x00021c20. pclkp br un_inn addrp[14:2] 1 birdyp bb us_stealn biuoen idvldp[3:0] datap[31:0] 0xf idmatchp 0x55555555 0x55555555 bz_iddoep bz_idtwep[5:0] bz_iddclkp bz_iddwep[3:0] 0x01 0x2e 0x00 0xf 0x0 0x0 cip_dn cmem_fetchp cstorep cbytep[3:0] bdrdyp 0xf 0xf 0x00021c20 0x00021090 0x00021c20 0x00021094 0x3c0ebaba 0x8d6e0000 0x1 0xf 0x1 bz_idtclkp 0x00 12345 1. values shown are [31:0]. md96.203
5-54 basic biu and cache controller (bbcc) cycle 2: since the cycle is not bus-stolen ( bb us_stealn was not asserted in the previous cycle), the data is written to the d-cache ram, using cbytep[3:0] to control which bytes are written. in this case, the transaction is a full-word store, and so the bbcc writes the corresponding valid bit. both of these transactions occur at the beginning of the cycle; the valid bit is written on the normal rising edge, and the data is written with a delayed clock. during this cycle, the bbcc checks whether the tag matches and whether the appropriate valid bit is on. the valid bit is set since it was just written, and the tag does not match (note idmatchp). these conditions indicate a cache miss, and so the bbcc allocates the line for the new data by updating the tag, and clearing the appropriate valid bits. since a tag ram fix-up cycle is needed, the bbcc stalls the CW400X by deasserting brun_outp, which causes br un_inn to be deasserted. cycle 3: since this cycle is not bus-stolen, the bbcc updates the d-cache tag ram. the bbcc writes the tag and the appro- priate valid bits. if the cycle had been b us_st olen, the bbcc would have continued stalling, waiting for a non-bus- stolen cycle. the next cycle will be an instruction fetch from address 0x00021090. cycle 4: this cycle is non-bus-stolen, and the instruction is in i-cache set 0. the bbcc asserts birdyp and puts the instruction onto datap[31:0] by asserting bz_iddoep. the next cycle will be a data fetch from address 0x00021c20. cycle 5: this cycle is non-bus-stolen, and the data is in the d-cache. the bbcc asserts bdrdyp and puts the data onto datap[31:0] by asserting bz_iddoep. the next cycle will be an instruction fetch from address 0x00021094. 5.5.3.3 software cache test mode software cache test mode allows the CW400X to write and read to cache rams that it would not normally have direct access to: the i-cache data rams and the i-cache and d-cache tag rams. this mode is useful for initializing the cache rams on reset, or for locking code into the i-cache set 0.
interfaces 5-55 when the system con?guration register is written such that cm bits (bs_configp[9:8]) indicate that the bbcc is in software cache test mode, the bbcc interprets loads and stores differently. all loads come from the appropriate cache ram, unless the address is in the kseg1 address space. all stores are written to the appropriate cache ram if they are not in kseg1 . when bs_configp[9:8] indicate that the bbcc is in software cache test mode for the i-cache, 1e bit (bs_configp1) in the system con?guration register speci?es the i-cache set. during software cache test mode, the mmu should be disabled, and the software test code should be in kseg1 . the d bit (bs_configp4) in the system con?guration register must be set for software cache test mode, regardless of whether d-cache exists. asserting ckillmemp causes unexpected results (sometimes the transaction occurs, and other times it does not), so ckillmemp must not be asserted during software cache test mode. table 5.7 shows the settings of the system con?guration register for the various software cache test modes. table 5.7 system con?guration register settings for software cache test mode the address for the rams is the normally used index, and is dependent on the size of the caches. for the data rams, the lower two bits of the address are ignored. all loads and stores are interpreted as full word transactions. for the tag rams, the lower four bits are ignored. loads and stores to the tag rams are line operations. during stores to the data ram of the i-cache, the data is on datap[31:0]. during stores to the tag ram of the i-cache or d-cache, the tag originates from the upper bits of maddroutp[31:2] (the tag normally used for the cache) and is placed on bz_tagp[21:0]. the lock bit originates from datap4, and the valid bits originate from datap[3:0]. the bbcc places these bits on bz_lockp and bz_validp[3:0] to write them to the rams. because the data on datap[31:0] is not available at the rising edge of the cache mode bs_configp[9:8] d-cache enable bs_configp4 i-cache set 1 enable bs_configp1 software cache test mode 01 1 0 i-cache set 0 data 01 1 1 i-cache set 1 data 10 1 0 i-cache set 0 tag 10 1 1 i-cache set 1 tag 11 1 - d-cache tag
5-56 basic biu and cache controller (bbcc) tag ram clock, writing to the tag ram requires two clock cycles. the bbcc stalls the CW400X for one cycle by deasserting brun_outp, and writes the data on a subsequent non-bus-stolen cycle. fetches from the i-cache data ram are straightforward. the ram places the 32 bits of its data on datap[31:0]. figure 5.21 shows data returned from reading the tag ram of a 1 kbyte cache while in software cache test mode. figure 5.21 tag ram read data the following is sample code to lock six instructions in the i-cache. setup_t0: li t0, 0x000002fd # sw test to write tags to set 0 li t1, 0xbfff0000 # address of configuration register sw t0, 0(t1) # store to configuration register lw r0, 0(t1) # to flush write buffer addi r0, r0, 1 # force dependency on load li t0, 0x80000000 # tag li t1, 0x1f # lock, all words valid li t2, 0x13 # lock, two words valid sw t1, 0x80(t0) # line 0 sw t2, 0x90(t0) # line 1 setup_d0: li t0, 0x000001fd # sw test to write data to set 0 li t1, 0xbfff0000 # address of configuration register sw t0, 0(t1) # store to configuration register lw r0, 0(t1) # to flush write buffer addi r0, r0, 1 # force dependency on load li t0, 0x80000000 # tag doesnt really matter li t1, 0x3c0eb000 # lui t6, 0xb000 li t2, 0x8dcf0000 # lw t7, 0(t6) li t3, 0x21ef0001 # addi t7, t7,1 li t4, 0xadcf0000 # sw t7, 0(t6) li t5, 0x03e00008 # jr ra li t6, 0x00000000 # nop sw t1, 0x80(t0) # instruction 0 31 109 543210 tag res l v3v2v1v0
interfaces 5-57 sw t2, 0x84(t0) # instruction 1 sw t3, 0x88(t0) # instruction 2 sw t4, 0x8c(t0) # instruction 3 sw t5, 0x90(t0) # instruction 4 sw t6, 0x94(t0) # instruction 5 5.5.3.4 bbus normal cache interface when the bbcc is a master on the bbus, it places the data for cache- able transactions in the cache. there are two types of cache transac- tions: write-?rst and write-subsequent . the transaction can be to the i-cache set 0, the i-cache set 1, or the d-cache. each transaction gen- erates a bus steal, so that the bbcc can write the cache rams. the write-?rst type transaction writes the data (always a full word) to the appropriate data ram, updates the tag with the tag of the bbus trans- action address, sets the valid bit for the word which is written, and clears all the other valid bits. the write-subsequent type transaction writes the data to the appropriate data ram, and sets the corresponding valid bit. figure 5.22 shows some normal bbus cache transactions.
5-58 basic biu and cache controller (bbcc) figure 5.22 normal bbus cache transactions cycle 1: the bbcc asserts bb us_stealn, which indicates that in the next cycle it will drive datap[31:0]. cycle 2: at the beginning of the cycle, the bbcc writes the i-cache set 0 tag ram with a write-?rst type of operation; it writes the tag (corresponding to address 0x00021000 for a 1-kbyte cache), sets the appropriate valid bit, and resets the other valid bits. also at the beginning of the cycle, the bbcc writes the instruction on datap[31:0] into the i-cache set 0 data ram, using the delayed clock. this instruction is also the instruction being fetched, so the bbcc asserts birdyp. cycle 3: the bbcc asserts bb us_stealn to bus steal the next cycle. cycle 4: the bbcc again writes the instruction to the i-cache set 0 data ram. this time, however, the bbcc writes only one valid bit to the tag ram. this operation is a write-subsequent pclkp bb us_stealn biuoen datap[31:0] 0x00000000 bz_iddoep bz_idtwep[5:0] bz_iddclkp bz_iddwep[3:0] 0x2f 0x00 0xf 0x0 0x0 birdyp 0x350800ef 0x00000000 bz_idtclkp 0x00 0x3c08c000 0x00000000 bz_tagp[21:0] 0x84 bz_ip_dn 0x0401 0x0402 bz_indexp[12:0] 0x0400 0x0 0xf 0x02 0x00 12345 0x0401 md96.204
interfaces 5-59 type. the bbcc asserts birdyp again, to indicate that the instruction on datap[31:0] is the fetched instruction. this example demonstrates streaming; the bbcc transmits the instruction to the CW400X at the same time as the instruction is written to the cache. 5.5.3.5 bbus hardware test cache interface the hardware cache test mode cache interface is similar to the normal mode cache interface. the write-?rst and write-subsequent type of oper- ations are used to write to the data rams. in addition, the bbus trans- action might indicate that the bbcc should read the data rams, or write/read the tag rams. hardware cache test mode transactions are all similar. the bbcc asserts bb us_stealn, and performs the appropriate ram transaction. for data ram reads, the 32-bit data is read from the ram and output enabled onto datap[31:0]. for tag ram reads, the format is the same as that of software cache test mode; the tag is on the upper bits of datap[31:0], the lock bit is datap4, and the valid bits are datap[3:0]. for tag ram writes, the tag portion of the address on baddrpi[31:2] is placed on bz_tagp[21:0], and the bbcc writes these signals to the rams. the lock and valid bits are from bdatapi[4:0], and are placed on bz_lockp and bz_validp[3:0] to be written to the rams. 5.5.3.6 snooping the bbcc monitors transactions on the bbus. when another device on the bbus writes to memory, the bbcc generates a snoop transaction. snooping requires two cycles for the i-cache, and two cycles for the d-cache. the bbcc uses the inputs bisnoop and bdsnoop to deter- mine whether or not to snoop on the caches. if both bisnoop and bdsnoop are asserted, snooping takes four clock cycles. each cycle, the bbcc asserts bb us_stealn. for each snoop operation, in the ?rst cycle the bbcc compares bz_tag4matchp[21:0] (which contains the value from the tag portion of baddrpi[31:2]) and the tag in the tag ram. during the second cycle of the snoop, if there is a tag match, the bbcc invalidates the line.
5-60 basic biu and cache controller (bbcc) 5.5.4 on-chip memory (ocm) on-chip memory (ocm) resides on the cbus, and is accessed in one clock cycle. asserting bocmexistp informs the bbcc that ocm exists. transactions to the ocm do not cause bbus transactions; the transactions are only to the ocm, not to external memory. ocm can be used as a data scratchpad or as a boot rom. the address space for the ocm must be in kseg1 (non-cacheable space). the ocm uses both addrp[31:2] and maddroutp[31:2] to address the memory. for reads, the ocm uses addrp[31:2]; for writes, it uses maddroutp[31:2]. the ocm must select which address to use. the ocm must also assert bocmselp to the bbcc when maddroutp[31:2] matches the ocm address space. the bbcc provides the output enable and write enable to the ocm. if ocm is installed, the ocm is output enabled onto datap[31:0] for every non-bus-stolen kseg1 fetch, regardless of whether the address matched the ocm or not. the ocm uses the unmapped address to access the ram. thus, if an mmu is in the system, the ocm must be smaller in size than the mmu page size. if the mmu stub is installed, rather than the mmu, this limita- tion does not apply. reads from the ocm take place at the beginning of the run cycle, using addrp[31:2] as the address. if the ?rst run cycle is bus-stolen, then the ocm must do the read over, this time using maddroutp[31:2] as the address. alternatively, a gated clock can be used for the ocm, which would cause the read to be only done once. the data from the ocm is output enabled onto datap[31:0] during the ?rst non-bus-stolen cycle. at this time, the bbcc asserts birdyp or bdrdyp to the CW400X. writes to the ocm take place at the end of the run cycle. this enables the CW400X to have time to place the store data on datap[31:0] before the write occurs. writes occur in the ?rst non-bus-stolen cycle after the run cycle. because writes occur at the end of the cycle, instead of the beginning, it is not possible to write data to the ocm while executing instructions out of the ocm (since the write to the ocm is followed by an instruction fetch, which occurs at the beginning of the cycle). one way to work around this limitation, at the expense of some performance, is to force the system to stall whenever a write to ocm takes place. this stall- ing is easily done by anding brun_outp with bq_ocmwen, and using this modi?ed brun_outp signal as an input to the goe module. the address match logic for bocmselp should be based on maddroutp[31:2] and a registered version of mearlyks1p. the
interfaces 5-61 registered mearlyks1p signal is needed to remember whether the transaction was in the kseg1 address space; since maddroutp[31:2] is mapped, the information needs to be held separately. the range for the address match is determined by the size of the ocm. using a gated clock for the ocm saves power. if the operating frequency is slow, addrp[31:2] can be decoded, and the ocm can be clocked only when addrp[31:2] matches the ocm address space. this method might be too slow, since addrp[31:2] is one of the later-arriving signals. power can still be saved by clocking the ocm only on kseg1 run cycles. figure 5.23 shows some ocm transactions. in this example, the ocm is 1 kbyte, and has an address space from 0xb000000 to 0xb00003ff.
5-62 basic biu and cache controller (bbcc) figure 5.23 ocm transactions cycle 1: the cbus signals indicate that the next cycle will be an instruction fetch from address 0x00021430. cycle 2: maddroutp[31:2] contains the mapped address for the instruction fetch. the next cycle will be a data fetch from 0xb0000040, which is an ocm address. cycle 3: maddroutp[31:2] contains the mapped address for the data fetch. the ocm asserts bocmselp to indicate that the transaction is for the ocm. the bbcc asserts the output pclkp br un_inn cip_dn cmem_fetchp cstorep addrp[31:2] 1 mearlyks1p mnocachep birdyp bdrdyp datap[31:0] maddroutp[31:2] 1 0x00021430 0xb0000040 0x00021434 0xb0000040 0x0002142c 0x00021430 0x10000040 0x00021434 0x10000040 0x00021438 bocmselp bq_ocmoep 0x8d8b0000 0xad8b0000 0x12345678 0x8d8b0000 0x12345678 bq_ocmwen 12345 1. values shown are [31:0]. md95.232
interfaces 5-63 enable to the ocm, and bdrdyp to the CW400X. the next cycle will be an instruction fetch from 0x00021434. cycle 4: the next cycle will be a store to address 0xb0000040, an ocm address. cycle 5: the ocm asserts bocmselp to indicate that the transaction is for the ocm. the bbcc asserts the write enable to the ocm, and the store occurs at the end of this cycle. the following is sample verilog code for a 1-kbyte ocm. it uses the mg922 ram and a gated clock. module CW400X_ocm (pclkp, runn, ocmoep, ocmwen, mearlyks1p, addrp, upper_maddroutp, maddroutp, ocmselp, datap ); //****************** port declarations ******************** input pclkp, runn, ocmoep, ocmwen, mearlyks1p; input [29:10] upper_maddroutp; input [9:2] addrp, maddroutp; output ocmselp; inout [31:0] datap; //*************** end of port declarations **************** //********************** parameters *********************** parameter start_adrs = 19h40000; // address is 0xb0000000 //****************** end of parameters ******************** //************************ wires ************************** wire gclkp; wire [9:2] adrsp; //********************** end of wires ********************* //************************** regs ************************* reg gatep, kseg1p; //********************** end of regs **********************
5-64 basic biu and cache controller (bbcc) //******************** infer registers ******************** always @ (posedge pclkp) kseg1p <= runn ? kseg1p : mearlyks1p; always @ (pclkp or mearlyks1p or runn or ocmwen) if (pclkp == 1b0) gatep <= ((mearlyks1p & ~runn) | ~ocmwen); assign gclkp = pclkp & gatep; //**************** end of infer registers ***************** assign adrsp = ocmwen ? addrp : maddroutp; assign ocmselp = kseg1p & (upper_maddroutp == start_adrs); //****************** ram instantiation ******************** rr32x256_922 ocm_rami( .a(adrsp[9:2]), .clk(gclkp), .oe(ocmoep), .we(~ocmwen), .di(datap[31:0]), .do(datap[31:0]) ); endmodule 5.5.5 write buffer the write buffer is an extension to the store queue in the bbcc. it is a fifo for store transactions. when stores are received from the CW400X, the store is either placed in the store queue in the bbcc, or in one of the entries of the write buffer. the address (from maddroutp[31:2]), the data (from datap[31:0]), and the byte enables (from cbytep[3:0]) are held until the store can be done. additional information about the store is held in the store queue/write buffer, such as whether the store was to the system configuration register (bw_cfgselp), and whether the store arrived while the data fetch queue was empty (bw_arrivebfldp). the store queue/write buffer entries are loaded with the store informa- tion at the end of the run cycle (unless it is a bus-stolen cycle, in which case the entry is loaded after the bus is no longer stolen). when a store transaction is completed, the store transactions in the store queue and write buffer are passed up the fifo queue. the bbcc signals starting with wb_ are inputs to the bbcc from the write buffer. the wb_arrivebfldp, wb_vwbfldp, wb_stpndp, and wb_fullp signals are queue management information. the wb_addrp[31:2] and wb_datap[31:0] signals are put in the bbcc store queue. the bbcc signals starting with bw_ are outputs from the bbcc to the write buffer.
cache-miss penalty, bbus latency 5-65 flushing of the write buffer is based on bits [8:5] of the address. when read priority is on, the load queue is given priority over the store queue. however, if the load is preceded by a store to the same block, then the store must be done before the load for proper operation. each entry for the store queue/write buffer compares bits [8:5] of its address to bits [8:5] of the load transactions address. if there is a match, and the store transaction preceded the load transaction, then the store queue has priority over the load queue. for information on how to attach additional write buffers, see chapter 6 . 5.6 cache-miss penalty, bbus latency the minimum cache-miss penalty for the bbcc is two cycles. the minimum cache-miss penalty occurs when the bbus device can support a one-cycle transaction, as shown in figure 5.24 . figure 5.24 cache-miss penalty, bbus latency pclkp br un_inn cip_dn cmem_fetchp addrp[14:2] 1 maddroutp[31:2] 1 birdyp bb us_stealn bst ar tno btxno baddrpo[31:2] 1 brd yni 0x00021400 0x00021404 0x00021408 0x0002140c 0x000210cc 0x00021400 0x00021404 0x00021408 0x1fff0000 0x00021408 1 23456 0x1fff0000 1. values shown are [31:0]. md96.205
5-66 basic biu and cache controller (bbcc) cycle 1: the previous instruction fetch has been received (the bbcc asserted birdyp), and so the CW400X begins a new instruc- tion fetch from address 0x00021400. cycle 2: the instruction is in the cache, and the bbcc asserts birdyp. the CW400X begins a new instruction fetch from address 0x00021404. cycle 3: the instruction is in the cache, the bbcc asserts birdyp. the CW400X begins a new instruction fetch from address 0x00021408. cycle 4: the instruction is not in the cache, so the bbcc does not assert birdyp. the bbcc must generate a bbus transaction to fetch the instruction. the CW400X stalls, waiting for the instruction to be returned. cycle 5: the bbcc starts a bbus transaction to address 0x00021408. the bbcc asserts bst ar tno and btxno, and places the address on baddrpo[31:2]. in this case, the bbus device supports a one-cycle bbus transaction, and so the bbus device asserts brd yni in this cycle (in response to bst ar tno being asserted), and the bbcc asserts bb us_stealn so that it can put the instruction on datap[31:0]. cycle 6: the bbcc asserts birdyp to indicate that the instruction is on datap[31:0]. the CW400X begins a new instruction fetch. as seen in figure 5.24 , the latency to start a bbus transaction is one cycle after the cache-lookup cycle. this accounts for the ?rst cache-miss penalty cycle. also, the latency from the assertion of brd yni to the assertion of birdyp is one cycle. this accounts for the second cache- miss penalty cycle. 5.7 adding cache the bbcc interfaces to four cache rams if the system has a two-way set associative i-cache, otherwise it interfaces to two cache rams. figure 5.25 shows the cache rams for a system. the d-cache and i-cache set 0 share the same rams. however, the d-cache and i-cache set 0 are distinct portions of the ram and act as separate caches, not a uni?ed cache. bz_ip_dn, which is used as the highest bit of the address, determines which portion of the ram to access.
adding cache 5-67 figure 5.25 cache rams for a system the data rams of the caches are word-wide (32-bits wide), and the low- est address bit is bit 2 of the address (bz_indexp0). each location of the tag rams is associated with a cache line, and the lowest address bit is bit 4 of the address (bz_indexp2). 5.7.1 ram sizes tables 5.8 through 5.12 show the ram sizes needed for various cache con?gurations. note that although some con?gurations call for rams greater than 4 kbytes deep, the 500 kbyte memory compiler currently only supports mg922 rams up to 4 kbytes in depth. shaded entries rep- resent ram con?gurations that are not supported with the current 500 kbyte mg922 ram compiler. the caches use the unmapped address to access the cache rams. thus, if an mmu is in the system, the size of the caches must be smaller than the mmu page size. if the mmu stub is installed, rather than the mmu, this limitation does not apply. d-cache data i-cache set 0 data i-cache set 1 data d-cache tags i-cache set 0 tags i-cache set 1 tags md95.244 table 5.8 direct mapped i-cache 1k 2k 4k 8k 16k 32k tag 64 x 27 128 x 26 256 x 25 512 x 24 1k x 23 2kx22 data 256 x 32 512 x 32 1k x 32 2k x 32 4k x 32 8kx32
5-68 basic biu and cache controller (bbcc) table 5.9 two-way set associative i-cache 1k 2k 4k 8k 16k 32k ta g 6 4 x 2 7 64 x 26 128x26 128x25 256 x 25 256 x 24 512 x 24 512 x 23 1k x 23 1k x 22 2kx22 2kx21 data 256 x 32 256 x 32 512x32 512x32 1kx32 1kx32 2kx32 2kx32 4k x 32 4k x 32 8kx32 8kx32 table 5.10 d-cache 1k 2k 4k 8k 16k 32k tag 64 x 27 128 x 26 256 x 25 512 x 24 1k x 23 2kx22 data 256 x 32 512 x 32 1k x 32 2k x 32 4k x 32 8kx32 table 5.11 direct-mapped i-cache with d-cache i-cache 1k 2k 4k 8k 16k 32k 1k d-cache tag 128 x 27 192 x 27 320 x 27 576 x 27 1088 x 27 2112 x 27 data 512 x 32 768 x 32 1280 x 32 2304 x 32 4352 x 32 8448 x 32 2k d-cache tag 192 x 27 256 x 26 384 x 26 640 x 26 1152 x 26 2176 x 26 data 768 x 32 1k x 32 1536 x 32 2560 x 32 4608 x 32 8704 x 32 4k d-cache tag 320 x 27 384 x 26 512 x 25 768 x 25 1280 x 25 2304 x 25 data 1280 x 32 1536 x 32 2k x 32 3k x 32 5kx32 9kx32 8k d-cache tag 576 x 27 640 x 26 768 x 25 1k x 24 1536 x 24 2560 x 24 data 2304 x 32 2560 x 32 3k x 32 4k x 32 6kx32 10k x 32 16k d-cache ta g 1088 x 27 1152 x 26 1280 x 25 1536 x 24 2kx23 3kx23 data 4352 x 32 4608 x 32 5kx32 6kx32 8kx32 12k x 32 32k d-cache ta g 2112 x 27 2176 x 26 2304 x 25 2560 x 24 3kx23 4kx22 data 8448 x 32 8704 x 32 9kx32 10k x 32 12k x 32 16k x 32
adding cache 5-69 5.7.2 examples this subsection describes four example ram con?gurations: example 1: 1-kbyte d-cache and 2-kbyte two-way set associative i-cache (1 kbyte each set) example 2: 4-kbyte d-cache and 2-kbyte two-way set associative i-cache (1 kbyte each set) example 3: 2-kbyte d-cache and 4-kbyte direct mapped i-cache example 4: 8-kbyte non-partitioned cache table 5.12 two-way set associative i-cache with d-cache i-cache 1k 2k 4k 8k 16k 32k 1k d-cache tag 128 x 27 64 x 26 192 x 27 128 x 25 320x27 256x24 576x27 512x23 1088 x 27 1kx22 2112 x 27 2kx21 data 512 x 32 256 x 32 768 x 32 512 x 32 1280 x 32 1k x 32 2304 x 32 2kx32 4352 x 32 4kx32 8448 x 32 8kx32 2k d-cache tag 192 x 27 64 x 26 256 x 26 128 x 25 384x26 256x24 640x26 512x23 1152 x 26 1kx22 2176 x 26 2kx21 data 768 x 32 256 x 32 1k x 32 512 x 32 1536 x 32 1k x 32 2560 x 32 2kx32 4608 x 32 4kx32 8704 x 32 8kx32 4k d-cache tag 320 x 27 64 x 26 384 x 26 128 x 25 512x25 256x24 768x25 512x23 1280 x 25 1kx22 2304 x 25 2kx21 data 1280 x 32 256 x 32 1536 x 32 512 x 32 2k x 32 1k x 32 3kx32 2kx32 5kx32 4kx32 9kx32 8kx32 8k d-cache tag 576 x 27 64 x 26 640 x 26 128 x 25 768x25 256x24 1kx24 512x23 1536 x 24 1kx22 2560 x 24 2kx21 data 2304 x 32 256 x 32 2560 x 32 512 x 32 3k x 32 1k x 32 4kx32 2kx32 6kx32 4kx32 10kx32 8kx32 16k d-cache tag 1088 x 27 64 x 26 1152 x 26 128 x 25 1280 x 25 256x24 1536 x 24 512x23 2kx23 1kx22 3kx23 2kx21 data 4352 x 32 256 x 32 4608 x 32 512 x 32 5k x 32 1k x 32 6kx32 2kx32 8kx32 4kx32 12kx32 8kx32 32k d-cache tag 2112 x 27 64 x 26 2176 x 26 128 x 25 2304 x 25 256x24 2560 x 24 512x23 3kx23 1kx22 4kx22 2kx21 data 8448 x 32 256 x 32 8704 x 32 512 x 32 9k x 32 1k x 32 10kx32 2kx32 12kx32 4kx32 16kx32 8kx32
5-70 basic biu and cache controller (bbcc) 5.7.2.1 example 1: 1-kbyte d-cache and 2-kbyte two-way set associative i-cache (1 kbyte each set) tables 5.13 through 5.16 show the signal connections for the rams. table 5.13 d-cache/i-cache set 0 data ram ram port bbcc connection function clk bz_iddclkp clock a8 bz_ip_dn select d-cache or i-cache portion a[7:0] bz_indexp[7:0] address we[31:24] bz_iddwep3 byte 3 write enable we[23:16] bz_iddwep2 byte 2 write enable we[15:8] bz_iddwep1 byte 1 write enable we[7:0] bz_iddwep0 byte 0 write enable oe[31:0] bz_iddoep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out table 5.14 d-cache/i-cache set 0 tag ram ram port bbcc connection function clk bz_idtclkp clock a6 bz_ip_dn select d-cache or i-cache portion a[5:0] bz_indexp[7:2] address we[26:5] bz_idtwep5 tag write enable we4 bz_idtwep4 lock bit write enable we[3:0] bz_idtwep[3:0] valid bits write enables oe[26:0] tied high output enable di[26:5] bz_tagp[21:0] tag di4 bz_lockp lock bit di[3:0] bz_validp[3:0] valid bits do[26:5] idtagp[21:0] tag (to match logic) do4 idlckp lock bit do[3:0] idvldp[3:0] valid bits
adding cache 5-71 in addition to hooking up the rams, the system designer must code the tag match logic and the logic to put the tag on datap[31:0] (for software cache test mode and hardware cache test mode). in verilog code, the tag match logic is: assign idmatchp = (bz_tag4matchp[21:0] == idtagp[21:0]); assign i1matchp = (bz_tag4matchp[21:0] == i1tagp[21:0]); the verilog code for the 3-state gates to place the tag ram outputs onto datap[31:0] is: assign datap = bz_idt_oen ? 32hz : {idtagp[21:0], 5h0, idlckp, idvldp[3:0]}; assign datap = bz_i1t_oen ? 32hz : {i1tagp[21:0], 6h0, i1vldp[3:0]}; table 5.15 i-cache set 1 data ram ram port bbcc connection function clk bz_i1dclkp clock a[7:0] bz_indexp[7:0] address we bz_i1dwep write enable oe bz_i1doep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out table 5.16 i-cache set 1 tag ram ram port bbcc connection function clk bz_i1tclkp clock a[5:0] bz_indexp[7:2] address we[25:4] bz_i1twep4 tag write enable we[3:0] bz_i1twep[3:0] valid bits write enables oe[25:0] tied high output enable di[25:4] bz_tagp[21:0] tag di[3:0] bz_validp[3:0] valid bits do[25:4] i1tagp[21:0] tag (to match logic) do[3:0] i1vldp[3:0] valid bits
5-72 basic biu and cache controller (bbcc) 5.7.2.2 example 2: 4-kbyte d-cache and 2-kbyte two-way set associative i-cache (1 kbyte each set) in this example, the combined d-cache and i-cache set 0 ram sizes are irregular; they are not a power-of-two size. in this case, the address and tag must be manipulated. the address to the ram is limited when the i-cache is accessed. also, the tag for the d-cache is smaller, so some bits of the tag are forced to zero when it is written to the ram. figure 5.26 shows the ram con?guration. figure 5.26 example 2 ram con?guration tables 5.17 through 5.20 show the signal connections for the rams. d-cache tags non-existent i-cache set 1 data md96.42 i-cache set 0 data d-cache data non-existent 4 k 1 k 3 k i-cache set 1 tags i-cache set 0 tags
adding cache 5-73 table 5.17 d-cache/i-cache set 0 data ram ram port bbcc connection function clk bz_iddclkp clock a10 bz_ip_dn select d-cache or i-cache portion a[9:8] bz_indexp[9:8] & 1 {~ bz_ip_dn, ~ bz_ip_dn} 2 address high bits (force to zero for i-cache) a[7:0] bz_indexp[7:0] address low bits we[31:24] bz_iddwep3 byte 3 write enable we[23:16] bz_iddwep2 byte 2 write enable we[15:8] bz_iddwep1 byte 1 write enable we[7:0] bz_iddwep0 byte 0 write enable oe[31:0] bz_iddoep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out 1. & means logical and. 2. ~ means that the signal is connected through an inverter. table 5.18 d-cache/i-cache set 0 tag ram ram port bbcc connection function clk bz_idtclkp clock a8 bz_ip_dn select d-cache or i-cache portion a[7:6] bz_indexp[9:8] & 1 {~ bz_ip_dn, ~ bz_ip_dn} 2 address high bits (force to zero for i-cache) a[5:0] bz_indexp[7:2] address low bits we[26:5] bz_idtwep5 tag write enable we4 bz_idtwep4 lock bit write enable we[3:0] bz_idtwep[3:0] valid bits write enables oe[26:0] tied high output enable di[26:7] bz_tagp[21:2] tag high bits di[6:5] bz_tagp[1:0] & 1 { bz_ip_dn, bz_ip_dn} tag low bits (force to zero for d-cache) di4 bz_lockp lock bit di[3:0] bz_validp[3:0] valid bits do[26:5] idtagp[21:0] tag (to match logic) do4 idlckp lock bit do[3:0] idvldp[3:0] valid bits 1. & means logical and. 2. ~ means that the signal is connected through an inverter.
5-74 basic biu and cache controller (bbcc) the match logic is the same as in example 1, except that the tag for the d-cache is smaller. one way to handle this is to force some of the bits of bz_tag4matchp[21:0] to zero when the d-cache is accessed. the bz_ip_dn_l signal, a registered version of bz_ip_dn, is provided for anding with the bits to force them to zero. assign new_tag4matchp = {bz_tag4matchp[21:2], bz_tag4matchp[1:0] & {bz_ip_dn_l, bz_ip_dn_l}}; assign idmatchp = (new_tag4matchp[21:0] == idtagp[21:0]); assign i1matchp = (bz_tag4matchp[21:0] == i1tagp[21:0]); the verilog code for the 3-state gates to place the tag ram outputs onto datap[31:0] is the same as in example 1: assign datap = bz_idt_oen ? 32hz : {idtagp[21:0], 5h0, idlckp,idvldp[3:0]}; assign datap = bz_i1t_oen ? 32hz : {i1tagp[21:0], 6h0, i1vldp[3:0]}; table 5.19 i-cache set 1 data ram ram port bbcc connection function clk bz_i1dclkp clock a[7:0] bz_indexp[7:0] address we bz_i1dwep write enable oe bz_i1doep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out table 5.20 i-cache set 1 tag ram ram port bbcc connection function clk bz_i1tclkp clock a[5:0] bz_indexp[7:2] address we[25:4] bz_i1twep4 tag write enable we[3:0] bz_i1twep[3:0] valid bits write enables oe[25:0] tied high output enable di[25:4] bz_tagp[21:0] tag di[3:0] bz_validp[3:0] valid bits do[25:4] i1tagp[21:0] tag (to match logic) do[3:0] i1vldp[3:0] valid bits
adding cache 5-75 5.7.2.3 example 3: 2-kbyte d-cache and 4-kbyte direct mapped i-cache in this example, the i-cache is larger than the d-cache, and should be placed in the lower-addressed portion of the ram. again, the address and tag must be manipulated. the address to the ram is limited when the d-cache is accessed. also, the tag for the i-cache is smaller, so the same bits of the tag are forced to zero when it is written to the ram. figure 5.27 shows the ram con?guration. figure 5.27 example 3 ram con?guration tables 5.21 and 5.22 show the signal connections for the rams. md95.246 i-cache tags d-cache data i-cache data non-existent 4 k 2 k 2 k d-cache tags non-existent table 5.21 d-cache/i-cache data ram ram port bbcc connection function clk bz_iddclkp clock a10 ~ bz_ip_dn 1 select d-cache or i-cache portion a9 bz_indexp9 & 2 bz_ip_dn address high bit (zero for d-cache) a[8:0] bz_indexp[8:0] address low bits we[31:24] bz_iddwep3 byte 3 write enable we[23:16] bz_iddwep2 byte 2 write enable we[15:8] bz_iddwep1 byte 1 write enable (sheet 1 of 2)
5-76 basic biu and cache controller (bbcc) the match logic is the same as in examples 1 and 2, except that the tag for the i-cache is smaller. one way to handle this is to force some of the bits of bz_tag4matchp[21:0] to zero when the i-cache is accessed. the bz_ip_dn_l signal, a registered version of bz_ip_dn, is provided for anding with the bits to force them to zero. assign new_tag4matchp = {bz_tag4matchp[21:2], bz_tag4matchp[1] & ~bz_ip_dn_l}; we[7:0] bz_iddwep0 byte 0 write enable oe[31:0] bz_iddoep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out 1. ~ means that the signal is connected through an inverter. 2. & means logical and. table 5.21 (cont.) d-cache/i-cache data ram ram port bbcc connection function (sheet 2 of 2) table 5.22 d-cache/i-cache set 0 tag ram ram port bbcc connection function clk bz_idtclkp clock a8 ~ bz_ip_dn 1 select d-cache or i-cache portion a7 bz_indexp9 & 2 bz_ip_dn address high bit (zero for d-cache) a[6:0] bz_indexp[8:2] address low bits we[25:5] bz_idtwep5 tag write enable we4 bz_idtwep4 lock bit write enable we[3:0] bz_idtwep[3:0] valid bits write enables oe[25:0] tied high output enable di[25:6] bz_tagp[21:2] tag high bits di5 bz_tagp1 & 2 ~ bz_ip_dn 1 tag low bit (zero for i-cache) di4 bz_lockp lock bit di[3:0] bz_validp[3:0] valid bits do[25:5] idtagp[21:1] tag (to match logic) do4 idlckp lock bit do[3:0] idvldp[3:0] valid bits 1. ~ means that the signal is connected through an inverter. 2. & means logical and.
adding cache 5-77 assign idmatchp = (new_tag4matchp[21:1] == idtagp[21:1]); assign i1matchp = 1b0; the verilog code that causes the 3-state gates to place the tag ram outputs onto datap[31:0] is: datap = bz_idt_oen ? 32hz : {idtagp[21:1], 6h0, idlckp, idvldp[3:0]}; 5.7.2.4 example 4: 8-kbyte non-partitioned cache in this example, the cache is not partitioned, and can be used as either d-cache or i-cache, depending on the settings of the cache enable bits in the system con?guration register. if the d-cache enable bit is on, and the i-cache enable bit is not, then the cache is d-cache. if the i-cache enable bit is on, and the d-cache enable bit is not, then the cache is i-cache. if both the d-cache and i-cache enable bits are on, then the cache acts as a uni?ed cache. however, when the cache is uni- ?ed, the lock bits should not be used, since the data fetches and stores ignore the lock bit. figure 5.28 shows the ram con?guration. figure 5.28 example 4 ram con?guration data 8 k ta g md95.259
5-78 basic biu and cache controller (bbcc) tables 5.23 and 5.24 show the signal connections for the rams. the match logic is: assign idmatchp = (tag4matchp[21:3] == idtagp[21:3]); assign i1matchp = 1b0; the verilog code for the 3-state gates to place the tag ram outputs onto datap[31:0] is: datap = bz_idt_oen ? 32hz : {idtagp[21:3], 8h0, idlckp, idvldp[3:0]}; table 5.23 cache data ram ram port bbcc connection function clk bz_iddclkp clock a[10:0] bz_indexp[10:0] address we[31:24] bz_iddwep3 byte 3 write enable we[23:16] bz_iddwep2 byte 2 write enable we[15:8] bz_iddwep1 byte 1 write enable we[7:0] bz_iddwep0 byte 0 write enable oe[31:0] bz_iddoep output enable di[31:0] datap[31:0] data in do[31:0] datap[31:0] data out table 5.24 d-cache/i-cache set 0 tag ram ram port bbcc connection function clk bz_idtclkp clock a[8:0] bz_indexp[10:2] address we[23:5] bz_idtwep5 tag write enable we4 bz_idtwep4 lock bit write enable we[3:0] bz_idtwep[3:0] valid bits write enables oe[23:0] tied high output enable di[23:5] bz_tagp[21:3] tag di4 bz_lockp lock bit di[3:0] bz_validp[3:0] valid bits do[23:5] idtagp[21:3] tag (to match logic) do4 idlckp lock bit do[3:0] idvldp[3:0] valid bits
adding cache 5-79 5.7.3 adding smaller caches systems that require i-caches smaller than 1 kbyte require the minor nec- essary logic that allows the bbcc to work with i-caches as small as 32 bytes. this logic generates additional cache ram tag bits and appends these bits to bz_tagp[21:0]. it also generates additional bits of the tag for tag match and appends these bits to bz_tag4matchp[21:0]. 5.7.3.1 cache ram tag to support smaller i-caches, additional low order bits must be appended to bz_tagp[21:0]. the additional bits of the cache ram tag should be selected from either maddroutp[9:5] or the appropriate bbus address. the additional bits of the cache ram tag must also be put through a transparent-low latch to provide suf?cient hold time for the cache rams. the verilog code for the cache ram tag additional logic follows: assign bbus_tagp[9:5] = bmcntloep ? baddrpo[9:5] : baddrpi[9:5]; assign bz_tagp_d[9:5] = bbus_stealn ? maddroutp[9:5] : bbus_tagp[9:5]; always @ (clock or bz_tagp_d) if (clock == 1b0) low_bz_tagp[4:0] <= bz_tagp_d[9:5]; 5.7.3.2 tag for tag match to support smaller i-caches, additional low order bits must be appended to bz_tag4matchp[21:0]. the tag for tag match additional logic is only needed if the system requires the bbcc to snoop. if the system requires snooping, the tag for tag match additional logic supplies lower order bits to be appended to bz_tag4matchp[21:0]. if the system does not require snooping, the appropriate bits of maddroutp[9:5] supply the lower order bits to be appended to bz_tag4matchp[21:0] (for example, assign low_bz_tag4matchp[4:0] = maddroutp[9:5] ). the verilog code for the tag for tag match additional logic follows: always @ (posedge clock) bbus_stolen <= bbus_stealn; always @ (posedge clock) bbus_tagp_l[9:5] <= bbus_tagp[9:5]; assign low_bz_tag4matchp[4:0] = bbus_stolen ? maddroutp[9:5] : bbus_tagp_l[9:5];
5-80 basic biu and cache controller (bbcc) 5.8 bbus arbitration bbus arbitration occurs independently of transactions on the bbus. while a device is currently doing a transaction, another device can be granted mastership of the bus. the criteria the bbcc uses to determine whether it is able to start a bbus transaction are listed below. other bbus devices might use similar criteria. in the previous cycle, bgntn was asserted. in the previous cycle, btxni was deasserted (no other device is on the bus) or btxno was asserted (the bbcc was doing a transac- tion. if a new transaction is started, btxno remains asserted, and the bbcc can maintain control of the bus). many types of bbus arbiters can be designed. figure 5.29 shows a block diagram of an example arbiter. figure 5.30 shows a state-transition dia- gram of an example arbiter. the rules and assumptions for the arbiter are: btxn is a bidirectional bus that is driven by both the bbcc and another bbus device called device x. when neither the bbcc nor device x request the bbus, the bus grant is given to the bbcc. once the bbcc or the other bbus device is given the bus grant, it maintains the grant until the bus request is deasserted or a transac- tion is started. the bus requests and btxn are fast signals (signals that come early in the cycle). the bus grant signals do not need much setup time. figure 5.29 block diagram of example arbiter bbcc bbus arbiter device x breqn bgntn xreqn xgntn btxn md95.247
bbus arbitration 5-81 figure 5.30 example bbus arbiter state diagram state 00 is when the bbcc is granted the bus. the arbiter is in state 00 when neither the bbcc nor device x are requesting the bus or per- forming a transaction. the arbiter stays in this state until device x requests the bus and either the bbcc is not requesting the bus, or the bbcc starts a transaction. state 11 is a transition state. the bus is granted to device x, but the bbcc is performing a transaction. state 10 is when device x is granted the bus. the arbiter stays in this state until the bbcc requests the bus and either device x is not request- ing the bus, or device x starts a transaction. the arbiter also leaves this state if neither device x nor the bbcc are requesting the bus. device x granted bus, wait for txn bbcc granted bus, state = 00 device x granted bus, state = 10 j d a g bbcc granted bus, wait for txn c i k e b f l h a: xreqn | (~ breqn & btxn) : bgntn = 0 xgntn = 1 b: ~ xreqn & ~ btxn : bgntn = 1 xgntn = 0 c: breqn & ~ xreqn & btxn : bgntn = 1 xgntn = 0 d: ~ xreqn & ~ btxn : bgntn = 1 xgntn = 0 e: ~ xreqn & btxn : bgntn = 1 xgntn = 0 f: xreqn: : bgntn = 0 xgntn = 1 g: ( breqn & ~ xreqn) | (~ xreqn & btxn) : bgntn = 1 xgntn = 0 h: (~ breqn & ~ btxn) | ( xreqn & ~ btxn) : bgntn = 0 xgntn = 1 i: xreqn & btxn : bgntn = 0 xgntn = 1 j: (~ breqn & ~ btxn) | ( xreqn & ~ btxn) : bgntn = 0 xgntn = 1 k: (~ breqn & btxn) | ( xreqn & btxn) : bgntn = 0 xgntn = 1 l: breqn & ~ xreqn : bgntn = 1 xgntn = 0 1. ~ means that the signal is connected through an inverter. 2. & means logical and. 3. | means logical or. state = 11 state = 01 md96.206
5-82 basic biu and cache controller (bbcc) state 01 is a transition state. the bus is granted to the bbcc, but device x is performing a transaction. the verilog code for the state machine is: module arbiter ( pclkp, breqn, xreqn, btxn, bgntn, xgntn ); input pclkp, breqn, xreqn, btxn; output bgntn, xgntn; reg bgntn, wait_for_txn; reg [1:0] state; always @ (posedge pclkp) state <= {bgntn, wait_for_txn}; assign xgntn = ~bgntn; always @ ( state or breqn or xreqn or btxn ) begin case (state) 2b00: begin bgntn = ~(xreqn | (~breqn & btxn)); wait_for_txn = ~xreqn & ~btxn; end 2b11: begin bgntn = ~xreqn; wait_for_txn = ~xreqn & ~btxn; end 2b10: begin bgntn = ~(xreqn | (~breqn & ~btxn)); wait_for_txn = (~breqn & ~btxn) | (xreqn & ~btxn); end 2b01: begin bgntn = ~(~breqn | xreqn); wait_for_txn = (~breqn & ~btxn) | (xreqn & ~btxn); end endcase end endmodule
timing considerations 5-83 5.9 timing considerations this section describes timing considerations when designing with the bbcc. 5.9.1 cache data ram clocks the clocks to the cache data rams are clocked with delayed clocks (bz_iddclkp, bz_i1dclkp). the system designer might need to add additional delay to these clocks, depending on the loading of datap[31:0] and the placement of the rams. figure 5.31 is a conceptual drawing of the CW400X, the bbcc, and the cache data rams. the data being written to the rams can be from the CW400X (for stores) or from the bbcc (for cache re?lls). figure 5.31 conceptual system diagram figure 5.32 shows waveforms for writes to the d-cache/i-cache set 0 data ram. writes to the i-cache set 1 data ram are similar. the pur- pose of the delayed clock to the ram is to allow data setup time to the rams. goe CW400X bbcc i-cache set 1 data ram d-cache/i-cache set 0 data ram coen biuoen md96.43
5-84 basic biu and cache controller (bbcc) figure 5.32 writes to the d-cache/i-cache set 0 data ram cycle 1: the CW400X performs a store to the ram. there are two paths that matter in this case. the first path is the maximum delay from the goe modules coen output enable (to put the data from the CW400X onto datap[31:0]) to the cache ram (a zero-cycle path). the second path is the data from the CW400X (a one-cycle path). the clock to the ram should be delayed so that datap[31:0] has enough setup time before the ram is clocked. usually, the longer path is from coen. cycle 2: the bbcc performs a cache re?ll, and the biuoen assertion enables the output data onto datap[31:0]. the data origi- nates from a register in the bbcc called (in the verilog code) datareg. in this case, there are also two paths that matter. the ?rst path is the maximum delay from the goe modules biuoen output enable (to put the data from the bbcc onto datap[31:0]) (a zero-cycle path). the second path is the maximum delay from the datareg register (also a zero- cycle path). the clock to the ram should be delayed enough so that datap[31:0] has enough setup time before the ram is clocked. in static timing analysis, check the cache data rams setup time con- straint. the CW400X data path should be considered a one-cycle path, and the paths from coen, biuoen, and datareg should be consid- ered zero-cycle paths. check the setup time constraint at the best and worst timing conditions (bccom and wccom). if necessary, add delay gates to bz_iddclkp and bz_i1dclkp. bz_iddclkp and bz_i1dclkp should not be delayed too much, however, since delays on these signals cause ram reads to occur later in the clock cycle. pclkp coen biuoen datap[31:0] bz_i1dwep bz_iddclkp CW400X data bbcc data 12 md96.207
timing considerations 5-85 5.9.2 cache data ram address another timing constraint is the address bus to the cache data rams, bz_indexp[12:0] and bz_ip_dn. since the clock signals to these rams are delayed, it is especially important to check the address hold time requirement of these rams. figure 5.33 shows some transactions to the rams with the clock at a 50% duty cycle and figure 5.34 shows some transactions to the rams with the clock at a 30% duty cycle. the duty cycle is important since bz_ip_dn and bz_indexp[12:0] come from transparent-low latches. these signals change at the falling edge of pclkp. at a 50% duty cycle, there is plenty of hold time on the address to the ram. at a 30% duty cycle, there is less hold time on the address since the duty cycle is smaller. the address to the cache data rams might need to be delayed, depending on the: required minimum duty cycle amount of time bz_iddclkp and bz_i1dclkp are delayed clock frequency if the address to the cache data rams must be delayed, add delay gates to the bz_ip_dn and bz_indexp[12:0] signals to the cache data rams, but not to the cache tag rams. figure 5.33 ram transactions (clock with a 50% duty cycle) pclkp bz_ip_dn bz_indexp[12:0] bz_iddclkp address hold time index index index md96.224
5-86 basic biu and cache controller (bbcc) figure 5.34 ram transactions (clock with a 30% duty cycle) 5.9.3 tag match logic if a real mmu (not the mmu stub) is in the system, and the system has a two-way set associative i-cache, maddroutp[31:2] (from the mmu through the tag match logic) might be the critical path. if this is the case, and if the bbcc does not need to snoop, then this path can be reduced by replacing bz_tag4matchp[21:0] with maddroutp[31:10]. this replacement reduces the delay from the mmu to the tag match logic. md95.251 pclkp bz_ip_dn bz_indexp[12:0] bz_iddclkp address hold time index index index
6-1 chapter 6 adding or removing write buffers this chapter explains how to add write buffers to the bbcc by modifying the lsi logic CW400X_wb3 module. it also explains how to remove them. this chapter contains the following sections: section 6.1, overview, page 6-1 section 6.2, signals, page 6-2 section 6.3, basic operation of a write buffer, page 6-8 section 6.4, adding a write buffer, page 6-8 section 6.5, removing a write buffer, page 6-10 6.1 overview the bbcc contains a single write buffer. it can support up to seven addi- tional write buffers outside the bbcc, attached as shown in figure 6.1 . these additional buffers form a ?rst-in-?rst-out (fifo) queue; the write buffer inside the bbcc is at the head of this queue. figure 6.1 typical write buffer con?guration additional write buffers bbcc single write buffer ... md96.208
6-2 adding or removing write buffers 6.2 signals this section describes the signals that comprise the bit-level interface of a write buffer (wb). tables 6.1 and 6.2 summarize the write buffer sig- nals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for active low signals end in an n and have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. table 6.1 write buffer input signals summary input source description bhd_addrp[31:2] next transaction wbs wb1_addrp[31:2], ground if last wb store address bhd_arrivebfldp 1 next transaction wbs wb1_arrivebfldp, ground if last wb arrive before load bhd_bytep[3:0] next transaction wbs wb1_bytep[3:0], ground if last wb byte enables bhd_cfgp next transaction wbs wb1_cfgp, ground if last wb store to con?guration register bhd_datap[31:0] next transaction wbs wb1_datap[31:0], ground if last wb store data bhd_stpndp next transaction wbs wb1_stpndp, ground if last wb store pending bq_arrivebfldp 1 bbcc arrive before load 1 bq_cfgselp bbcc store to con?guration register bqd_dfqaddrp[3:0] 1 bbcc data fetch address bq_dfqupdatep bbcc update data fetch queue bqo_dfdonep bbcc data fetch done bq_rdstqp bbcc read store queue bq_wrstqp bbcc write store queue bresetn system logic/reset module reset cbytep[3:0] CW400X CW400X byte enables datap[31:0] CW400X CW400X store data (sheet 1 of 2)
signals 6-3 bhd_addrp[31:2] store address input the current wb receives the store address from the next transaction wb on these signals when data moves up the fifo queue. if the current wb is the last wb in the queue, these signals are connected to ground. maddroutp[31:2] cbus cbus store address pclkp system logic system clock prev_stpndp previous transaction wbs wb1_stpndp or bbccs bw_stpndp store pending se system logic scan enable si previous transaction wb or bbccs so scan data in 1. needed to implement read priority. table 6.1 (cont.) write buffer input signals summary input source description (sheet 2 of 2) table 6.2 write buffer output signals summary output destination description so next transaction wb scan data out wb1_addrp[31:2] previous transaction wbs bhd_addrp[31:2], bbccs wb_addrp[31:2] if first external wb store address wb1_arrivebfldp previous transaction wbs bhd_arrivebfldp, bbccs wb_arrivebfldp if first external wb arrive before load wb1_bytep[3:0] previous transaction wbs bhd_bytep[3:0], bbccs wb_bytep[3:0] if first external wb byte enables wb1_cfgp previous transaction wbs bhd_cfgp, bbccs wb_cfgp if first external wb con?guration register select wb1_datap[31:0] previous transaction wbs bhd_datap[31:0], bbccs wb_datap[31:0] if first external wb store data wb1_stpndp previous transaction wbs bhd_stpndp, bbccs wb_stpndp if first external wb store pending wb1_vwbfldp or-gated to bbcc valid write before load
6-4 adding or removing write buffers bhd_arrivebfldp arrive before load input the current wb receives the arrive before load signal from the next transaction wb on this signal when data moves up the fifo queue. if the current wb is the last wb in the queue, this signal is connected to ground. bhd_bytep[3:0] byte enables input the current wb receives the byte enables from the next transaction wb on these signals when data moves up the fifo queue. if the current wb is the last wb in the queue, these signals are connected to ground. bhd_cfgp store to con?guration register input the current wb receives the configuration register select signal from the next transaction wb on this signal when data moves up the fifo queue. if the current wb is the last wb in the queue, this signal is connected to ground. bhd_datap[31:0] store data input the current wb receives the store data from the next transaction wb on these signals when data moves up the fifo queue. if the current wb is the last wb in the queue, these signals are connected to ground. bhd_stpndp store pending input the current wb receives the store pending signal from the next transaction wb on this signal when data moves up the fifo queue. if the current wb is the last wb in the queue, this signal is connected to ground. bq_arrivebfldp store arrived before load input the bbcc bw_arrivebfldp output connects to this input. the bbcc asserts this signal to inform the wb that the current cbus store transaction is occurring while the data fetch queue is empty. this signal is needed to implement read priority. bq_cfgselp store to system con?guration register input the bbcc bw_cfgselp output connects to this input. the bbcc asserts this signal to inform the wb that the cur- rent cbus store is to the system configuration register.
signals 6-5 bqd_dfqaddrp[3:0] data fetch queue address input the bbcc bw_dfqaddrp[3:0] outputs connect to these inputs. these signals are a few bits of the address from the data fetch queue. the wb uses these bits to detect load/store dependencies. this signal is needed to implement read priority. bq_dfqupdatep data fetch queue update input the bbcc bw_dfqupdatep output connects to this input. the bbcc asserts this signal to inform the wb that the data fetch queue is being updated. bqo_dfdonep data fetch done input the bbcc bw_dfdonep output connects to this input. the bbcc asserts this signal to inform the wb that a data fetch transaction just completed. bq_rdstqp read store queue input the bbcc bw_rdstqp output connects to this input. the bbcc asserts this signal to initiate a read operation to the wb. bq_wrstqp write store queue input the bbcc bw_wrstqp output connects to this input. the bbcc asserts this signal to initiate a write operation to the wb. bresetn reset input asserting this signal resets the wb. cbytep[3:0] byte enables input these signals from the CW400X inform the wb (when asserted high) which corresponding bytes are valid on datap[31:0]. the following table shows the correspondence between byte enables and the data bus bytes. byte enable corresponding datap[31:0] byte cbytep3 [31:24] cbytep2 [23:16] cbytep1 [15:8] cbytep0 [7:0]
6-6 adding or removing write buffers datap[31:0] CW400X data bus bidirectional these signals transfer data to, and from, the CW400X. maddroutp[31:2] mapped address input these signals are the mapped cbus address from the mmu or mmu stub. pclkp system clock input this signal is the global clock input. prev_stpndp store pending input when the previous transaction wb (which might be in the bbcc) asserts this signal (by passing along wb1_stpndp), it informs the current wb that the previ- ous transaction wb contains a valid store transaction. se scan enable input asserting this signal enables the scan chain. si scan data in input this signal is the scan data input. so scan data out output this signal is the scan data output. wb1_addrp[31:2] store address output the current wb uses these signals to pass on its store address to the previous transaction wb (which might be in the bbcc). wb1_arrivebfldp arrive before load output the current wb uses this signal to pass on its arrive before load signal to the previous transaction wb (which might be in the bbcc). if this signal is asserted, it informs the previous transac- tion wb in the queue (which might be in the bbcc) that the store transaction held in the current wb was started while the data fetch queue was empty.
signals 6-7 wb1_bytep[3:0] byte enables to previous transaction wb output the current wb uses these signals to pass on its byte enables to the previous transaction wb (which might be in the bbcc). wb1_cfgp con?guration register select output the current wb uses this signal to pass on its con?gu- ration register select signal to the previous transaction wb (which might be in the bbcc). wb1_datap[31:0] store data to previous transaction wb output the current wb uses these signals to pass on its store data to the previous transaction wb (which might be in the bbcc). wb1_stpndp store pending output the current wb uses this signal to pass on its store pending signal to the previous transaction wb (which might be in the bbcc). if this signal is asserted, it informs the previous transac- tion wb and the next transaction wb (or, if there is no next transaction wb, then the bbcc) that the current wb contains a valid store transaction. if the current wb is the last wb in the queue, this signal asserted also informs the bbcc that the wb is full (it is connected to the bbcc wb_fullp signal). wb1_vwbfldp valid write before load output the current wb asserts this signal to inform the bbcc that the current wb contains a valid store transaction has higher priority than data fetch transactions. all the wbs wb1_vwbfldp outputs are logically ored to provide the bbccs bw_vwbfldp input.
6-8 adding or removing write buffers 6.3 basic operation of a write buffer the three store pending bits (prev_stpndp, bhd_stpndp, and wb1_stpndp) and the read/write indicators (bq_rdstqp and bq_wrstqp) determine the source of the wb input. during a read operation (bq_rdstqp high), the bbcc takes data from the top of the fifo queue, and subsequent data moves up the queue. the current wb takes its input from the next wb in queue (the next transaction wb). during a write operation (bq_wrstqp high), if prev_stpndp is high and wb1_stpndp is low, the current wb is the ?rst nonempty buffer in the queue; therefore it takes its input from the CW400X. other- wise, the current wb holds its current data. table 6.3 shows how bhd_stpndp and wb1_stpndp control the cur- rent wb when a read and a write operation happen at the same time. table 6.3 write buffer operation for reads and writes 6.4 adding a write buffer figure 6.2 shows a block diagram of the write buffer connections within the CW400X_wb3 module. bhd_stpndp wb1_stpndp results x 0 does not clock in new data 0 1 clocks in data from the CW400X 1 1 clocks in data from the next write buffer
adding a write buffer 6-9 figure 6.2 write buffer connection diagram wb1_vwbfldp cbytep[3:0] datap[31:0] maddroutp[31:2] bresetn pclkp prev_stpndp wb1_vwbfldp wb1_addrp[31:2] bhd_addrp[31:2] wb1_arrivebfldp wb1_bytep[3:0] wb1_cfgp wb1_datap[31:0] wb1_stpndp wb1_vwbfldp bhd_arrivebfldp bhd_bytep[3:0] bhd_cfgp bhd_datap[31:0] bhd_stpndp or CW400X_wb3 module bbcc wb_addrp[31:2] wb_arrivebfldp wb_bytep[3:0] wb_cfgp wb_datap[31:0] wb_stpndp wb_vwbfldp wb_fullp se wb1_addrp[31:2] bhd_addrp[31:2] wb1_arrivebfldp wb1_bytep[3:0] wb1_cfgp wb1_datap[31:0] wb1_stpndp bhd_arrivebfldp bhd_bytep[3:0] bhd_cfgp bhd_datap[31:0] bhd_stpndp wb1_addrp[31:2] bhd_addrp[31:2] wb1_arrivebfldp wb1_bytep[3:0] wb1_cfgp wb1_datap[31:0] wb1_stpndp bhd_arrivebfldp bhd_bytep[3:0] bhd_cfgp bhd_datap[31:0] bhd_stpndp so si bw_stpndp prev_stpndp prev_stpndp so bqo_dfdonep bqd_dfqaddrp[3:0] bq_dfqupdatep bq_arrivebfldp bq_cfgselp bq_rdstqp bq_wrstqp inputs from the bbcc inputs from the CW400X input from the cbus inputs from the chip system logic so si so si the previous transaction wb is to the left of the current wb, and the next transaction wb is to the right of the current wb. md96.209 cbytep[3:0] datap[31:0] maddroutp[31:2] bresetn pclkp se bqo_dfdonep bqd_dfqaddrp[3:0] bq_dfqupdatep bq_arrivebfldp bq_cfgselp bq_rdstqp bq_wrstqp inputs from the bbcc inputs from the CW400X input from the cbus inputs from the chip system logic cbytep[3:0] datap[31:0] maddroutp[31:2] bresetn pclkp se bqo_dfdonep bqd_dfqaddrp[3:0] bq_dfqupdatep bq_arrivebfldp bq_cfgselp bq_rdstqp bq_wrstqp inputs from the bbcc inputs from the CW400X input from the cbus inputs from the chip system logic
6-10 adding or removing write buffers 6.4.1 connect the inputs connect the input signals as follows: inputs from the next transaction write buffer (bhd_*) - tie to ground inputs from the CW400X, cbus, system logic, and bbcc - these signals are direct inputs to all of the write buffers (see figure 6.2 ). inputs from the previous transaction write buffer - connect the previous transaction wbs wb1_stpndp output to the prev_stpndp input. if the previous transaction write buffer is inside the bbcc, then con- nect the bbcc bw_stpndp output to the prev_stpndp input. scan input - connect the previous transaction wbs scan out signal, so, to the scan in signal, si. 6.4.2 connect the outputs connect the output signals as follows: outputs to the previous transaction write buffer inputs - connect the outputs with the corresponding signal names to the previous trans- action write buffers inputs (wb1_* to bhd_*). for example, connect the wb1_stpndp output to the bhd_stpndp input of the previous transaction write buffer. valid write before load global output signal - if any of the wb1_vwbfldp signals inside the store queue are asserted, the write buffer should be ?ushed before the queue controller lets the data fetch onto the bus. therefore, or all the wbs wb1_vwbfldp signals to generate wb_vwbfldp. then connect wb1_vwbfldp to the bbccs wb_vwbfdp input. the bbcc passes this signal on to the queue controller inside the bbcc. wb1_stpndp signal - if there is a valid store operation in the last write buffer, the write buffer asserts wb1_stpndp, which indicates that the store queue is full. therefore, connect the bbcc wb_fullp signal to the wb1_stpndp signal of the last write buffer. assertion of wb_fullp informs the queue controller that the store queue is full and therefore the queue controller should stall the CW400X if there are any more data stores issued. scan out - connect the scan out signal, so, to the next transaction wbs scan in signal, si. 6.5 removing a write buffer removing a write buffer is the reverse of adding a write buffer. discon- nect the inputs and outputs. ensure that the last (previous) wbs bhd_* signals are connected to ground, and that the wb1_stpndp signal is connected to the bbccs wb_fullp input (see figure 6.2 ).
7-1 chapter 7 timer this chapter describes the timer building block for the CW400X. it contains the following sections: section 7.1, overview, page 7-1 section 7.2, features, page 7-1 section 7.3, functional description, page 7-2 section 7.4, signals, page 7-3 section 7.5, registers, page 7-7 section 7.6, operation, page 7-9 7.1 overview the timer contains two general-purpose programmable timers (timer 0 and timer 1) that can be used to control special functions. figure 7.1 shows a block diagram of a system using the CW400X and the timer. figure 7.1 CW400X system with the timer 7.2 features the timer supports the following features: two independent general-purpose 16-bit down counters with pro- grammable initial count values CW400X cbus timer biu and cache controller (bbcc) bbus cbus interface md96.44
7-2 timer programmable output modes (toggle or pulse) special modes: bus watch dog (timer 1), interrupt (timer 0) external logic half-speed mode 7.3 functional description figure 7.2 shows an internal block diagram of the timer building block. figure 7.2 timer internal block diagram the timer decodes the bbus transaction address (see table 7.4 ) from the bbcc and asserts a select signal to indicate that the bbcc has selected the timer. each timer is a 16-bit down counter (a 32-bit data bus with the high 16 bits tied to 0) with an initial count register and a current count register. the mode register de?nes the operation of both counters. the interrupt status register enables the timer 0s interrupt mode and provides the interrupt output. section 7.5, registers, describes the mode and inter- rupt status registers in detail. btxn initial count t0_outn, t1_outn, address decode pclkp interrupt taddrp[31:2] bst ar tn twr tresetn registers timers 0 and 1 timers 0 and 1 tcsp, tbrd yn status register tdatap[15:0] md96.45 counters/ current counts counter mode register
signals 7-3 7.4 signals this section describes the signals that comprise the bit-level interface of the timer. tables 7.1 through 7.3 summarize the timer signals. detailed descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for signals that are active low end with n and have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. table 7.1 timer input signals summary input source de?nition brdynodrive bbus controller tbrd yn 3-state control bst ar tn bbcc bbus transaction start btxn bbcc bbus transaction active ctestp external logic running cache test, disable timer datanodrive bbus controller data bus 3-state control halfp external logic running at half CW400X speed indicator hclkp external logic half-speed clock pclkp external logic system clock se external logic scan test mode enable si external logic scan test input taddrp[31:2] bbcc bbus address bus tresetn external logic reset twr bbcc bbcc read/write indicator
7-4 timer table 7.3 bbcc bidirectional signals summary brdynodrive tbrd yn 3-state control input the bbus controller asserts this signal to enable the timer to assert tbrd yn. bst ar tn bbus transaction start input the bbcc asserts this signal to inform the timer that a bbus transaction is starting. btxn bbus transaction active input the bbcc asserts this signal to inform the timer that a bbus transaction is in progress. ctestp running cache test, disable timer input asserting this signal during a hardware cache test dis- ables the timer. datanodrive data bus 3-state control input asserting this signal allows the timer to output data onto the data bus, tdatap[31:0]. table 7.2 timer output signals summary output destination de?nition so external logic scan test output tberr orn external logic timer 1 counted down to zero bus error tbrd yn bbcc timer data to bbcc ready tcsp external logic timer module selected toutenp bbus controller data bus output enable request trdyenp bbus controller data ready signal output enable request t0_intn external logic timer 0 interrupt t0_outn external logic timer 0 output for general purpose counting t1_outn external logic timer 1 output for general purpose counting bidirectional connect description tdatap[31:0] bbcc bbus data bus
signals 7-5 halfp running at half CW400X speed indicator input asserting this signal informs the timer that external logic is running at half of the CW400X clock speed. hclkp half-speed clock input this signal is the half-speed clock input. pclkp system clock input this signal is the global clock input. se scan enable input asserting this signal enables the scan chain. si scan data in input this signal is the scan data input. so scan data out output this signal is the scan data output. taddrp[31:2] bbus address bus input the bbcc drives these signals with the address for the current bbus transaction. tberr orn timer 1 counted down to zero bus error output the timer asserts this signal to indicate that a bus error occurred because timer 1 counted down to zero, only if the mode register is set to be watch dog mode. tbrd yn timer data to bbcc ready output the timer asserts this signal to indicate that data is ready on tdatap[31:0] (only if brdynodrive is also high). tcsp timer module selected output the timer asserts this signal to indicate that it has been selected. tdatap[31:0] bbus data bus bidirectional these signals transfer data between the timer and the bbcc. toutenp data bus output enable request output the timer asserts this signal to request permission from external logic (usually the bbus controller) to use the bbus data bus, tdatap[31:0].
7-6 timer trdyenp data ready signal output enable request output the timer asserts this signal to external logic (usually the bbus controller) to request permission to assert tbrd yn. in turn the external logic must assert brdynodrive before the timer can assert tbrd yn. see section 7.6.2, bus control (request/grant), for more information on use of trdyenp. tresetn reset input external logic asserts this signal to reset the timer. usu- ally the system reset drives this signal. twr bbcc write/read indicator input the bbcc drives this signal low to inform the timer that the current bbus transaction is a write. the bbcc drives this signal high to inform the timer that the cur- rent bbus transaction is a read. t0_intn timer 0 interrupt output the timer asserts this signal to external logic when timer 0 is programmed as an interrupt generator (inter- rupt mode), and timer 0 hits zero. t0_outn timer 0 output for general purpose counting out- put the timer asserts this signal to external logic when timer 0 hits zero in toggle mode. the timer pulses this signal to external logic when timer 0 hits zero in pulse mode. t1_outn timer 1 output for general purpose counting out- put the timer asserts this signal to external logic when timer 1 hits zero in toggle mode. the timer pulses this signal to external logic when timer 1 hits zero in pulse mode.
registers 7-7 7.5 registers all timer registers are memory-mapped as shown in table 7.4 . table 7.4 timer register addresses the initial and current count registers are all simple 16-bit registers. figure 7.3 shows the mode register. the mode register controls the operation of the timers. figure 7.3 mode register reserved reserved bits [31:11], [7:2] these bits are reserved. m timer 1 watch dog mode 10 setting this bit to one, puts timer 1 into watch dog mode. clearing this bit to zero puts timer 1 into general purpose mode. note that if timer 1 is in watch dog mode and output is programmed to toggle, watch dog mode forces the output to be a pulse. o1 timer 1 output mode 9 setting this bit to one, puts timer 1 into pulse mode. clearing this bit to zero puts timer 1 into toggle mode. note that if timer 1 is in watch dog mode and output is programmed to toggle, watch dog mode forces the out- put to be a pulse. address register 0xbfff0100 timer 0 initial count register 0xbfff0104 timer 0 current count register 0xbfff0108 timer 1 initial count register 0xbfff010c timer 1 current count register 0xbfff0110 mode register 0xbfff0114 interrupt status register (timer 0 only) 31 11 10 9 8 7 2 1 0 reserved m o1 e1 reserved o0 e0
7-8 timer e1 timer 1 enable 8 setting this bit to one, enables timer 1. clearing this bit disables timer 1. when disabled, the counter holds the current value. o0 timer 0 output mode 1 setting this bit to one, puts timer 0 into pulse mode. clearing this bit to zero puts timer 0 into toggle mode. e0 timer 0 enable 0 setting this bit to one, enables timer 0. clearing this bit to zero, disables timer 0. when disabled, the counter holds the current value. timer 0 resets its output anytime mode register bit 1 is being written with a new value and there is a change in the output mode. timer 1 resets its output anytime mode register bit 9 is being written with a new value and there is a change in the output mode. however, if the new mode is the same as the existing one, the timer ignores the update and does not reset the output. this feature allows the timers to be turned on or off individually. figure 7.4 shows the interrupt status register (timer 0 only). figure 7.4 interrupt status register reserved reserved bits [31:2] these bits are reserved. in timer 0 interrupt 1 when timer 0 counts down to zero, it sets this bit to one. the CW400X writes a zero to this bit to clear it to zero (sticky bit). ie timer 0 interrupt enable 0 setting this bit to one, enables timer 0 interrupt mode. clearing this bit to zero, disables timer 0 interrupt mode. 31 210 reserved in ie
operation 7-9 7.6 operation this section describes the operation of the timer. 7.6.1 reset a system reset (asserting tresetn) disables both timers, clears all the registers to zero, and drives the outputs, t0_outn and t1_outn, high. reset also puts timer 1 into general purpose mode. 7.6.2 bus control (request/grant) asserting toutenp requests permission to drive the bbus data bus to external bus control logic. asserting trdyenp requests permission to drive the ready signal to external bus control logic. the external bus con- trol logic monitors the bbus data bus and ready signals for multiple driv- ers, and ensures that there is only one driver that can drive the bbus data bus and the ready signals in each cycle. once the timer asserts toutenp and trdyenp, the external logic should in turn assert brdynodrive and datanodrive high, which allows the timer internal 3-state buffers to drive out the data on to the bbus data bus, tdatap[31:0], and the ready signal, tbrd yn. the cur- rent design requires that the external bus control logic assert brdyno- drive and datanodrive in the same cycle that the timer asserts toutenp and trdyenp. 7.6.3 external logic half-speed mode the external logic half-speed mode is implemented speci?cally for an external memory system that can only run at half the speed of the CW400X, and the timer output signal is used for memory refresh. asserting halfp informs the timer that external logic is running at half of the CW400X clock speed. during timer operation, all internal timer submodules run at the normal system clock frequency, except the output logic that controls the switching of the primary timer outputs ( t0_outn, t1_outn, and t0_intn). the hclkp input controls the output logic at half the frequency of the system clock. when in pulse mode, the timer generates a pulse that is one hclkp cycle long. when in toggle mode, the low-to-high toggle transition occurs one hclkp cycle later than in normal-speed mode. figure 7.5 shows a waveform comparing half-speed mode with normal- speed mode.
7-10 timer figure 7.5 half-speed mode 7.6.4 timer 0 timer 0 is a 16-bit general-purpose down counter that can be con?gured to either toggle or generate a pulse when it reaches zero. setting mode register bit 1 causes timer 0 to pulse t0_outn when it counts down to zero (pulse mode). clearing mode register bit 1 causes timer 0 to toggle t0_outn when it counts down to zero (toggle mode). the CW400X system clock drives the timer. setting the mode register bit 0 to one enables timer 0. when enabled, timer 0 loads the value from the timer 0 initial count register into the timer 0 current count register, and starts counting down. clearing mode register bit 0 to zero disables timer 0. when disabled, timer 0 stops counting. the timer 0 current count register retains the current count value. if the counter is re-enabled, the timer reloads the timer 0 initial count value into the timer 0 current count register and proceeds to count down. it does not resume using the retained current count value. a system reset (asserting tresetn) disables timer 0, clears the timer 0 initial count register to zero, and drives t0_outn high. writing to the timer 0 initial count register causes the timer to update the timer 0 current count register value and initiate a countdown from the new value (if the timer is enabled). the timer also resets t0_outn to the high default state. clearing the timer 0 initial count register to zero disables timer 0. when disabled, timer 0 drives t0_outn high and does not count. when halfp is asserted, timer 0 functions internally the same way it does in normal mode (when halfp is not asserted), except the pclkp t0_outn 1 hclkp t0_outn (half speed) 1. cycle 2 - timer 0 generates a pulse. cycle 6 - timer 0 toggles to low. cycle 9 - timer 0 toggles to high. 1 2345 6 7 89 md96.212
operation 7-11 t0_outn output lasts one system clock cycle longer. when in pulse mode, timer 0 generates a pulse that is two system clock cycles long. when in toggle mode, timer 0 toggles output one cycle later than in nor- mal mode. halfp allows the timer 0 t0_outn output to be used for a dram refresh when an external memory system is running at half speed. timer 0 can generate an interrupt signal to a core output. when the interrupt status register bit 0 is set to one, timer 0 drives t0_intn low and sets the interrupt bit in the interrupt status register when timer 0 reaches zero. then timer 0 reloads from the initial count reg- ister and counts down again. the t0_intn output stays low until the status bit is cleared by a write to the interrupt status register. once the status bit is cleared, timer 0 drives t0_intn high again. if the status bit is not cleared, when the next count down to zero over?ow occurs, the t0_intn output stays low. note that if the status bit is cleared, the timer does not reload the initial count. figure 7.6 shows a waveform of timer 0 being enabled, read, and then disabled by the CW400X. figure 7.6 timer 0 enabled, read, and disabled pclkp tdatap[31:0] twr bst ar tn tbr ydn t0_current[15:0] 1 234 5 678 9101112 tcsp taddrp[31:2] abc btxn (internal) mode register current register mode register mode data initial-3 mode data initial initial-1 initial-2 initial-3 initial-4 initial-5 initial-6 initial-7 initial-7 md96.46
7-12 timer 7.6.4.1 transaction a cycle 1: the bbcc asserts twr and bst ar tn to initiate a write to the mode register, which starts timer 0 counting. the bbcc asserts btxn to start a bus transaction. timer 0 decodes the address and asserts tcsp to indicate that the transaction is to the timer module. cycle 2: timer 0 decodes the input, loads from the timer 0 initial count register, and asserts tbrd yn to the bbcc. cycle 3: the timer deasserts tbrd yn, and timer 0 starts counting. 7.6.4.2 transaction b cycle 5: the bbcc asserts bst ar tn to initiate a memory read from the timer 0 current count register. cycle 6: timer 0 puts the count value onto tdatap[31:0] and asserts tbrd yn to the bbcc. cycle 7: the timer deasserts tbrd yn. 7.6.4.3 transaction c cycle 9: the bbcc asserts signals to the timer to disable the count- ing. (the bbcc writes a zero into bit 0 of the mode register to disable timer 0.) cycle 11: timer 0 stops counting and holds the count value.
operation 7-13 figure 7.7 shows a waveform of the timer 0 output, t0_outn. figure 7.7 timer 0 output 7.6.4.4 transaction a cycle 1-4: same as in figure 7.6 on page 7-11 . cycle 8: timer 0 reaches zero. cycle 9: timer 0 toggles t0_outn (or generates a pulse) and reloads the timer 0 initial count value into the counter again. cycle 10: timer 0 starts counting. 7.6.5 timer 1 timer 1 is similar to timer 0, but with one additional feature. timer 1 can be programmed as a bus watch dog timer . in this mode, timer 1 loads from the timer 1 initial count register and starts counting down when- ever the CW400X starts a new bus transaction cycle (other than a timer transaction). the bbcc signals a new bus transaction cycle by assert- ing btxn and bst ar tn. twr bst ar tn tbr ydn 1234 56789101112 tcsp ab c btxn t0_outn tdatap[31:0] taddrp[31:2] t1_current[15:0] (internal) pclkp mode registe r mode data initial initial-1 initial-2 initial-3 initial-4 0x0000 initial initial-1 initial-2 md96.47
7-14 timer timer 1 is a 16-bit general-purpose down counter that can be con?gured to either toggle or generate a pulse when it reaches zero. setting mode register bit 9 causes timer 1 to pulse t1_outn when it counts down to zero (pulse mode). clearing mode register bit 9 causes timer 1 to toggle t1_outn when it counts down to zero (toggle mode). the CW400X system clock drives the timer. setting the mode register bit 8 to one enables timer 1. when enabled, timer 1 loads the value from the timer 1 initial count register into the timer 1 current count register and starts counting down. clearing mode register bit 8 to zero disables timer 1. when disabled, timer 1 stops counting. the timer 1 current count register retains the cur- rent count value. if the counter is re-enabled, the timer reloads the timer 1 initial count value into the timer 1 current count register and proceeds to count down. it does not resume using the retained current count value. a system reset (asserting tresetn) disables timer 1, puts it into gen- eral purpose mode, clears the timer 1 initial count register to zero, and drives t1_outn high. writing to the timer 1 initial count register causes the timer to update the timer 1 current count register value and initiate a count down from the new value (if the timer is enabled). the timer also resets t1_outn to the high default state. clearing the timer 1 initial count register to zero disables timer 1. when disabled, timer 1 drives t1_outn high and does not count. in pulse mode, the pulse lasts one system clock cycle, which allows the interrupt controller to register the assertion of tberr orn. once timer 1 hits zero, it asserts tberr orn, disables itself, and waits for the next bus transaction. if the bus cycle ends earlier, timer 1 disables itself and holds the current count value. a new bus transaction causes timer 1 to reload the initial count value and proceed to count down. when halfp is asserted, timer 1 functions internally exactly the same as it does in normal mode (when halfp is not asserted), except the t1_outn output lasts one system clock cycle longer. when in pulse mode, timer 1 generates a pulse that is two system clock cycles long. when in toggle mode, timer 1 toggles output one cycle later than in normal mode. halfp allows timer 1 t1_outn output to be used for a dram refresh when an external memory system is running at half speed.
operation 7-15 figure 7.8 shows a waveform of timer 1 being enabled, read, and then disabled by the bbcc. figure 7.8 timer 1 enabled, read, and disabled 7.6.5.1 transaction a cycles 1-4 are the same as in figure 7.6 , page 7-11 . the bbcc writes to the mode register to enable timer 1 watch dog mode. 7.6.5.2 transaction b cycle 5: the bbcc starts a bbus transaction to a device other than timer 1. cycle 6: timer 1 loads the timer 1 initial count value. cycle 7: the transaction ends. cycle 8: timer 1 latches in the transaction end from btxn and holds the current cycle count value. pclkp twr bst ar tn tbr ydn 1 2 34 5 6 7 89101112 tcsp taddrp[31:2] btxn (internal) abc t1_current[15:0] tdatap[31:0] md96.48 mode registe r address mode register data mode data mode data initial initial-1 initial-1 initial-1 initial-1 initial-1
7-16 timer 7.6.5.3 transaction c cycle 9: the bbcc writes a zero into bit 8 of the mode register to dis- able timer 1 watch dog mode. cycle 11: timer 1 stops counting and holds the current count value. figure 7.9 shows waveforms of timer 1 behavior when watch dog mode triggers tberr orn. figure 7.9 timer 1 watch dog mode triggers berr 7.6.5.4 transaction a cycles 1-4 are the same as in figure 7.6 , page 7-11 . the bbcc writes to the mode register to enable timer 1 watch dog mode. pclkp twr bst ar tn tbr ydn 123456789101112 taddrp[31:2] ab (internal) t1_current[15:0] tdatap[31:0] mode register address mode data data data initial initial-1 initial-2 t1_outn tberr orn btxn md96.49 0x0000 0x0000 0x0000
operation 7-17 7.6.5.5 transaction b cycle 5: the bbcc asserts btxn and starts a bus transaction not related to the timer. cycle 6: timer 1 starts counting down. cycle 9: timer 1 counts to zero. cycle 10: timer 1 pulses t1_outn and tberr orn to indicate that it reached zero.
7-18 timer
8-1 chapter 8 debugger (dbx) this chapter describes the debugger (dbx) building block for the CW400X. the dbx provides hardware debug support for CW400X systems. this chapter contains the following sections: section 8.1, overview, page 8-1 section 8.2, functional description, page 8-2 section 8.3, connection block diagram, page 8-3 section 8.4, signals, page 8-4 section 8.5, registers, page 8-8 section 8.6, instructions, page 8-12 section 8.7, operation, page 8-13 8.1 overview the dbx enables instruction and data access breakpoints as well as trace breakpointing. it also allows customers to use the minirisc scanice debug system to control and observe internal registers. the minirisc scanice debug system interfaces with the ieee1149.1 jtag pins, so no additional pins are required. lsi logic developed the scanice debug system for designers using ice as their debug environ- ment. because the CW400X is deeply embedded within the asic, the CW400X's pins cannot be accessed directly. a standard ice cannot be plugged into the asics socket, as could be done in a standard-product microprocessor-based design. the scanice debug system accesses the CW400X through its scan chain, thus, providing a virtual ice . from the user's point of view, using a debug environment on a host controller, there is no difference in the debug methodology between a standard
8-2 debugger (dbx) product microprocessor ice and the minirisc scanice debug system. scanice is also host-independent and nonintrusive. the dbx attaches to the CW400X flexlink interface. the flexlink inter- face allows new instructions to be added to the CW400X default instruc- tion set. for details on the flexlink interface, see the minirisc CW400X microprocessor core technical manual. figure 8.1 shows how the dbx attaches to the CW400X and minirisc building blocks. figure 8.1 dbx interface to the CW400X and building blocks 8.2 functional description the dbx attaches to the CW400X using the flexlink interface, which gives programmer access to the CW400X registers. the dbx allows hardware debugging in real-time using breakpoints; it also allows the scan chain to load the processor state. for details on the flexlink inter- face, see the minirisc CW400X microprocessor core technical manual. to detect data and instruction addresses, the dbx monitors the cbus. see the minirisc CW400X microprocessor core technical manual for information on the cbus. software controls the dbx through instructions that access the dbx registers. cbus CW400X i-cache dbx timer bus controller bbus set 1 flexlink extended biu and cache controller (bbcc) pci-like memory interface i-cache set 0 d-cache interface md96.213
connection block diagram 8-3 8.3 connection block diagram figure 8.2 shows how to attach the dbx to the CW400X, the building blocks, and system logic. figure 8.2 dbx in a CW400X system pclkp aselp axbusp[31:0] cir_botp[5:0] cir_topp[5:0] ckillxp crsp[31:0] crtp[31:0] crx_validp debreakp scanp_r unn CW400X pclkp aselp axbusp[31:0] cir_botp[5:0] cir_topp[5:0] ckillxp crsp[31:0] crtp[31:0] dbx bbcc goe bcpuresetn cr un_inn external break (unconnected output) scanclk scan clock sysclk system clock scan/run dbreak_bbep topclk biberrorp cip_dn cip_dn r unn bresetn cmem_fetchp cmem_fetchp cstorep cstorep addrp[31:0] cbus address bbep reset bcpuresetn biberrorp md96.222
8-4 debugger (dbx) 8.4 signals this section contains a description of the signals that comprise bit-level interface of the dbx. tables 8.1 and 8.2 summarize the dbx signals. detailed signal descriptions follow the tables. the signals are described in alphabetical order by mnemonic. each sig- nal de?nition contains the mnemonic and the full signal name. the mne- monics for active low signals end with an n and have an overbar over their names. in the descriptions that follow, assert means to drive true or active and deassert means to drive false or inactive. table 8.1 dbx input signals summary input source description addrp[31:0] CW400X cbus address bcpuresetn bbcc reset biberrorp bbcc instruction bus error bresetn system logic reset cip_dn CW400X cbus instruction/data flag cir_botp[5:0] CW400X flexlink instruction opcode bottom six bits cir_topp[5:0] CW400X flexlink instruction opcode top six bits ckillxp CW400X cbus instruction killed in execute stage cmem_fetchp CW400X cbus memory fetch crsp[31:0] CW400X flexlink source register ( rs ) bus crtp[31:0] CW400X flexlink source register ( rt ) bus cstorep CW400X cbus store indicator gscan_enablep system logic scan enable gscan_inp scan chain scan data in pclkp system logic system clock r unn goe run enable/stall scanclk system logic scan clock scanp_r unn system logic clock signal select sysclk system logic main system clock tst_wio system logic test enable
signals 8-5 table 8.2 dbx output signals summary addrp[31:0] cbus address input these signals contain the fetch and store addresses the dbx uses to check for breakpoints. aselp flexlink instruction select output the dbx asserts this signal when the cir_topp[5:0] signals contain a computational instruction. axbusp[31:0] flexlink result bus ( rd ) output during move from debug (mfd), these signals contain the value of an internal dbx register for the CW400X. bcpuresetn reset input the bbcc asserts this signal to reset the dbx. biberrorp bbcc bus error input the bbcc asserts this signal to inform the dbx that the current instruction fetch terminated with an error. the dbx combines this signal with its internal break signal to create dbreak_bbep. bresetn reset input asserting this signal resets the dbx. cip_dn cbus instruction/data flag input this signal quali?es the type of memory fetch when a memory fetch is indicated by cmem_fetchp. the CW400X drives this signal high when it is performing an instruction fetch. the CW400X drives this signal low when it is performing a data fetch. output destination description aselp CW400X flexlink instruction select axbusp[31:0] CW400X flexlink result bus ( rd ) dbreak_bbep CW400X internal break debreakp system logic external break gscan_outp scan chain scan data out topclk system logic clock output
8-6 debugger (dbx) cir_botp[5:0] flexlink instruction opcode bottom six bits input these signals from the CW400X contain the bottom six bits of the instruction register. they allow the dbx to decode the computational instruction. cir_topp[5:0] flexlink instruction opcode top six bits input these signals from the CW400X contain the top six bits of the instruction register. they allow the dbx to decode the computational instruction. ckillxp cbus instruction killed in execute stage input the CW400X asserts this signal to inform the dbx that the instruction in the execute stage is killed. cmem_fetchp cbus memory fetch input the CW400X asserts this signal to inform the dbx that it is performing a memory fetch. cmem_fetchp is valid only during run cycles. crsp[31:0] CW400X source register ( rs ) bus input these signals contain the rs operand of the current instruction from the CW400X. crtp[31:0] CW400X source register ( rt ) bus input these signals contain the rt operand of the current instruction from the CW400X. cstorep store indicator input the CW400X asserts this signal to indicate a cbus store operation. cstorep is valid only during run cycles. dbreak_bbep bus error output the dbx asserts this signal to cause the CW400X to take a bus error exception. debreakp external break output the dbx asserts this signal when an external break con- dition occurs. gscan_enablep scan enable input asserting this signal enables the scan chain.
signals 8-7 gscan_inp scan data in input this signal is the scan data input. gscan_outp scan data out output this signal is the scan data output. pclkp system clock input the pclkp clock is the global clock input. r unn run enable/stall input this signal connects to the goe cr un_inn output. the goe asserts this signal low to enable the CW400X to go on to the next run cycle. the goe deasserts this sig- nal high to stall the CW400X. (for more information on the goe, see the minirisc CW400X microprocessor core technical manual .) scanclk scan clock input the dbx generates topclk from either the scanclk clock input or the sysclk clock input as determined by the scanp_r unn input signal. scanp_r unn topclk select input when external system logic asserts this signal, the dbx selects scanclk as the source of the topclk output. when external system logic deasserts scanp_r unn, the dbx selects sysclk as the source of the topclk output clock. sysclk main system clock input the dbx generates topclk from either the scanclk clock input or the sysclk clock input as determined by the scanp_r unn input signal. topclk clock output output the dbx generates topclk from either the scanclk clock input or the sysclk clock input as determined by the scanp_r unn input signal. topclk then drives pclkp. tst_wio test enable input asserting this signal puts the dbx in test mode for scan.
8-8 debugger (dbx) 8.5 registers programmers access the dbx registers using the mfd and mtd instructions (see section 8.6, instructions on page 8-12 ). all bits are read/write, except for the bits that are hardwired to zero. 8.5.1 dcs register (7) the debug control and status (dcs) register contains the enable and status bits for the system scan facilities. all status bits are stickydebug events only set the bits if the enables are set. the bits must be cleared by using an mtd instruction to update the dcs register. the ud and kd bits are always the samesetting either bit sets both bits. reset clears the de, ibd, ebe bits. all other bits are unknown. mtd instructions have higher priority than status updates.the dbx updates the dcs register with mtd data if an mtd instruction occurs simultaneously with a status update caused by a break event. figure 8.3 shows the format of the dcs register. figure 8.3 dcs register tr trap enable 31 setting tr causes debug events to trap to the debug exception vector. when tr is cleared, trapping is not enabled, but debug status bits are updated with debug event information. ud user mode debug event detection 30 setting ud enables debug event detection in user mode. ud and kd always contain the same value. setting either bit sets both bits. kd kernel mode debug event detection 29 setting kd enables debug event detection in kernel mode. ud and kd always contain the same value. set- ting either bit sets both bits. te trace detection enable 28 setting te enables trace (nonsequential fetch) event detection. 313029282726252423222120 6543210 tr ud kd te dw dr dae pce de ibd ebe reserved twrdapcdb
registers 8-9 dw data write 27 if dae is set, setting dw enables data write event detection. dr data read 26 if dae is set, setting dr enables data read event detection. dae data access breakpoint enable 25 setting dae enables data address breakpoint debug events. pce program counter breakpoint enable 24 setting pce enables program counter breakpoint debug events. de debug enable 23 setting de enables debug breaks. clearing de disables debug breaks. ibd internal break disable 22 setting ibd disables internal breaks. clearing ibd enables internal breaks. ebe external break enable 21 setting ebe enables external breaks. clearing ebe disables external breaks. reserved reserved bits [20:6] these bits are reserved. t trace event detected 5 the dbx sets this bit when it has detected a trace event. w write reference detected 4 the dbx sets this bit when it detects a write to the address in the bda register. r read reference detected 3 the dbx sets this bit when it detects a read from the address in the bda register. da data access debug condition detected 2 the dbx sets this bit when it has detected a a data access debug condition.
8-10 debugger (dbx) pc program counter debug condition detected 1 the dbx sets this bit when it has detected a program counter debug condition. db debug condition detected 0 the dbx sets this bit when it has detected a debug event. 8.5.2 bpc register (18) software uses the breakpoint program counter (bpc) register to spec- ify a program counter breakpoint. this register is used in conjunction with the breakpoint program counter mask register described below. a breakpoint is detected for any instruction fetch in which all unmasked bits in the bpc register match the corresponding bits in the program counter value. figure 8.4 shows the format of the bpc register. figure 8.4 bpc register 8.5.3 bda register (19) software uses the breakpoint data address (bda) register to specify a data address breakpoint. this register is used in conjunction with the breakpoint data address mask register listed below. a breakpoint is detected for any data reference in which all unmasked bits in the bda register match the corresponding bits in the data address. figure 8.5 shows the format of the bda register. the reserved bits, bits [1:0], are read as zeroes. figure 8.5 bda register 31 210 breakpoint program counter 00 2 31 210 breakpoint data address res
registers 8-11 8.5.4 bpcm register (20) the breakpoint program counter mask (bpcm) register masks bits in the bpc register. writing a 1 to any bit n of the bpcm register unmasks the bitwise comparison of bit n in the bpc register with bit n in the pro- gram counter as the input to the breakpoint detection logic. conversely, writin ga0tobit n of the bpcm register masks the comparison, and forces the breakpoint logic to assume a match between bit n in the bpc and bit n in the program counter regardless of the true result. figure 8.6 shows the format of the bpcm register. figure 8.6 bpcm register 8.5.5 bdam register (21) the breakpoint data address mask (bdam) register masks bits in the bda register. writing a 1 to any bit n of the bdam register unmasks the bitwise comparison of bit n in the bda register with bit n in the data reference address as the input to the breakpoint detection logic. con- versely, writing a 0 to bit n of the bdam register masks the comparison, and forces the breakpoint logic to assume a match between bit n in the bda and bit n in the data reference address regardless of the true result. figure 8.7 shows the format of the bdam register. the reserved bits, bits [1:0], are read as zeroes. figure 8.7 bdam register 31 210 breakpoint program counter mask 0 31 210 breakpoint data address mask res
8-12 debugger (dbx) 8.6 instructions the mfd and mtd instructions access the dbx registers. these instructions use the flexlink interface of the CW400X and are not sup- ported by compilers or assemblers. 8.6.1 mfd instruction the move from debug (mfd) instruction loads the contents of dbx reg- ister rd into a general register rt . this instruction is only valid if rd = 7, 18, 19, 20, or 21. all other rd values are unde?ned. the values for each debug register rd are indicated in parentheses in the register descriptions (see section 8.5, registers on page 8-8 ). operation: t: gpr[rt] <- debug[rd] exceptions : none figure 8.8 shows the format of the mfd instruction. figure 8.8 mfd instruction 8.6.2 mtd instruction the move to debug (mtd) instruction loads the contents of general register rs into a dbx register rd . this instruction is only valid if rd = 7, 18, 19, 20, or 21. all other rd values are unde?ned. the values for each debug register rd are indicated in parentheses in the register descriptions (see section 8.5, registers on page 8-8 ). operation: t: debug[rd] <- gpr[rs] exceptions : none figure 8.9 shows the format of the mtd instruction. figure 8.9 mtd instruction 31 26 25 21 20 16 15 11 10 0 mfd = 011111 2 rs = 00000 2 rt rd 0 31 26 25 21 20 16 15 11 10 0 mtd = 011110 2 rs rt = 00000 2 rd 0
operation 8-13 8.7 operation this section describes the operation of the dbx building block in a greater level of detail than the previous sections. it describes breakpoint operation as well as dbx internal blocks operation. 8.7.1 breakpoints to enable breakpoints, the system designer must follow three rules: rule #1 : scanclk must be low when debug is enabled. rule #2 : scanclk must be low when switching back to sysclk. rule #3 : scanclk must be low when enabling/disabling scan mode for scan debug. when a breakpoint occurs, the dbx sets the proper status bit. all status bits are stickythey are never cleared by hardware, and must be cleared by software. if the ebe bit of the dcs register is set, the breakpoint causes debreakp to be asserted for at least one cycle, which causes the sys- tem clock, topclk, to be switched from sysclk to scanclk. after the ?rst cycle, the system clock remains scanclk as long as scanp_r unn is high. the system then has access to the internal scan chain of the mr400x through the scan chain input, scan chain out- put, scan enable, and scan clock signals. the dcs status bits are updated even for instructions that are killed and would have caused a breakpoint. dcs status bits are also updated for instructions in a branch likely delay slot. breakpoints caused by instruc- tions in delay slots can be misleading because they occur even if the instruction is not executed. breaks are signalled according to the order in which the CW400X accesses data. an instruction breakpoint might be signalled before a data breakpoint, if the offending instruction occurs after the data access, because the CW400X pipeline requires that the instruction access occur first. 8.7.1.1 trace breakpoint a trace breakpoint occurs when the program counter branches to a non- consecutive address. a trace breakpoint is not necessarily triggered by a branch instruction. it is only triggered if the instruction stream itself is nonconsecutive.
8-14 debugger (dbx) the de and te bits of the dcs register enable trace breakpoints. no other user inputs are needed. when the trace breakpoint occurs and te is enabled, the CW400X signals an ibus error, and the exception pro- gram counter (epc) register points to the ?rst out-of-order instruction. this instruction is killed, but all previous instructions complete execution. 8.7.1.2 data write breakpoint a data write breakpoint occurs when the CW400X writes data to an address that matches all of the bits of the bda register that are not masked by the bdam register. the de, dw, and dae bits of the dcs register enable data write break- points. the bda and bdam registers must be properly loaded before the breakpoint is enabled. when the data write breakpoint occurs and the te bit of the dcs register is set, the CW400X signals a dbus error, and the epc register points to the offending write instruction. the instruction is killed, but all previous instructions complete execution. 8.7.1.3 data read breakpoint a data read breakpoint occurs when the CW400X reads data (not an instruction fetch) from an address that matches all of the bits of the bda register that are not masked by the bdam register. the de, dr, and dae bits of the dcs register enable the data read breakpoint. the bda and bdam registers must be properly loaded before the breakpoint is enabled. when the breakpoint occurs and the te bit of the dcs register is set, the CW400X signals a dbus error, and the epc register points to the offending read instruction. the instruction is killed, but all previous instructions complete execution. 8.7.1.4 program counter breakpoint a program counter breakpoint occurs when the CW400X fetches an instruction from an address that matches all of the bits of the bpc reg- ister that are not masked by the bpcm register. the de and pce bits of the dcs register enable this breakpoint. the bpc and bpcm registers must be properly loaded before the breakpoint is enabled. when the program counter hits the program counter break- point and the te bit of the dcs register is set, the CW400X signals an
operation 8-15 ibus error, and the epc register points to the offending instruction. the instruction is killed, but all previous instructions complete execution. 8.7.1.5 break instruction breakpoint if the de bit in the dcs register is set, the dbx also monitors the instruction stream for break instructions. break instructions do not cause an internal break because the CW400X automatically breaks on the instruction. however, if the ebe bit of the dcs register is set, an external break occurs. the external break does not update any status bits in the dcs register. 8.7.2 dbx module operation the dbx building block for the minirisc CW400X uses the flexlink interface, which gives the programmer access to the registers and allows the hardware to be debugged in real-time using breakpoints. it also allows the use of the scan chain to load the state of the processor. figure 8.10 shows the dbx internal functional blocks. subsections fol- lowing the ?gure describe each block in detail.
8-16 debugger (dbx) figure 8.10 dbx internal block diagram 8.7.2.1 ibreak module the ibreak and dbreak modules check for breakpoint exceptions. the ibreak module asserts ibreakp if the content of the bpc register (bpc[31:2]) matches the content of addrp[31:2]. the dbx compares only the bits of the bpc register to addrp[31:0] in which the corre- sponding bits are set in the bpcm register. bits of the bcpm register ibreak dbreak bdebugx axbusp[31:0] bdam[31:0] bda[31:0] bpc[31:0] bpcm[31:0] dcs[31:0] crsp[31:0] addrp[31:0] aselp r unn cir_topp[5:0 cip_dn bcpu_resetn ckillxp crtp[15:11] dregaddrp[1:0] pclkp dregwen crsp[31:0] crsp[31:0] crsp[31:0] addrp[31:0] addrp[31:0] dregaddrp[1:0] dregwen dtbreakp ibreakp sysclk scanclk topclk dbreak_bbepp debreakp scanp_r unn cstorep cmem_fetchp axselp[2:0] tbreakp tbreak addrp[31:0] cip_dn cir_botp[5:0] biberrorp breaki cir_botp[5:0] cir_topp[5:0] breakip breakip md96.221
operation 8-17 that are cleared to zero are not compared. the ibreak module does not compare bits [1:0] of the bpc register to the address, because instruc- tion fetches are always on word boundaries. 8.7.2.2 dbreak module the dbreak module works in the same manner as the ibreak module, but uses the bda register and the bdam register and compares all 32 bits of the data address. when the bdebugx module asserts dregwen, the CW400X writes the data contained in crsp[31:0] to a selected register in one of the mod- ules. table 8.3 shows how the registers are selected. table 8.3 register selection 8.7.2.3 tbreak module the tbreak module detects trace breakpoints. it contains a 30-bit regis- ter and an incrementor. the incremented value of the register is com- pared against the current ifetch (addrp[31:2] when cip_dn is high). the register is always loaded with the new ifetch address. the tbreak module asserts tbreakp for one cycle when the instruction stream branches. 8.7.2.4 breaki module if the de bit is set, the breaki module also monitors the instruction stream for break instructions. the break instructions do not cause an internal break, because the CW400X automatically breaks on the instruc- tion. however, if the ebe bit of the dcs register is set, an external break occurs. the external break does not update any status bits in the dcs register. dregaddrp[1:0] register selected 0 bdcm 1bdam 2 bpc 3bda
8-18 debugger (dbx) 8.7.2.5 bdebugx module the bdebugx module contains all of the control logic for dbx. the axselp[2:0] signals select the appropriate register for the mfd instruction. dregaddrp[1:0] select the appropriate register for writing during mtd instructions. the bdebugx module asserts dregwen at the end of the execute stage of an mtd instruction (during a run cycle), when ckillxp is deasserted. this action prevents the registers from being altered if the mtd instruction is killed. the bdebugx module ignores ckillxp when it is updating its status registers, if break event detection is enabled, because the bdebugx module is unable to determine whether it caused the ckillxp assertion by asserting dbreak_bbep or if some other instruction caused ckillxp to be asserted. the dbx detects a break according to the enables in the dcs register and the transaction signals, cip_dn, cmem_fetchp, and cstorep. the breaks are re?ected in the dcs register and in the dbreak_bbep and dbreakp signals. dbreak_bbep is enabled by default on reset. dbreakp is disabled by default at reset. the break signals should be connected to the bus error input of the CW400X. then, if an instruction break is signalled, it is valid by the end of the instruction fetch (if) stage. a bus error causes an ibus error, and the epc register points to the offending instruction. all instructions up to this offending one are completed. if a data breakpoint occurs, a bus error is indicated during the x2 stage, which causes a dbus error exception, and the epc register again points to the offending instruc- tion. the break signals are pulsed (asserted for only one cycle). the sta- tus bits are sticky (they hold their value until another breakpoint occurs so that their value is correct for the current breakpoint). note also that breaks are signalled according to the order in which the CW400X accesses data. an instruction breakpoint may be signalled before a data breakpoint if the offending instruction occurs after the data access. this is because the CW400X pipeline requires that the instruc- tion access happen ?rst. the dbx does not support break event detection in the branch delay slots. if a break is set in the branch delay slot, the dbx switches the
operation 8-19 clock to scanclk even though the instruction that caused the break is killed. normally, the topclk clock output is selected to be the sysclk input. the bdebugx subblock switches topclk to scanclk if the dbreak module asserts debreakp, or if scanp_r unn is low. then, if dbreakp is asserted, the processor switches to scanclk on a break. the system can control scanp_r unn and scanclk to control the scanning of the chain. 8.7.3 clock synchronization the scanclk, scanp_r unn, and bresetn signals pass through ?ip-?ops to synchronize them to sysclk. sysclk is always free- running.
8-20 debugger (dbx)
customer feedback we would appreciate your feedback on this document. please copy the following page, add your comments, and fax it to us at the address on the following page. if appropriate, please also fax copies of any marked-up pages from this document. impor tant: please include your name, phone number, fax number, and company address so that we may contact you directly for clari?cation or additional information. thank you for your help in improving the quality of our documents.
customer feedback readers comments fax your comments to: lsi logic corporation technical publications m/s f-112 fax: 408.433.4333 please tell us how you rate this document: minirisc CW400X building blocks technical manual. place a check mark in the appropriate blank for each category. what could we do to improve this document? if you found errors in this document, please specify the error and page number. if appropriate, please fax a marked-up copy of the page(s). please complete the information below so that we may contact you directly for clari?cation or additional information. excellent good average fair poor completeness of information ____ ____ ____ ____ ____ clarity of information ____ ____ ____ ____ ____ ease of ?nding information ____ ____ ____ ____ ____ technical content ____ ____ ____ ____ ____ usefulness of examples and illustrations ____ ____ ____ ____ ____ overall manual ____ ____ ____ ____ ____ name date telephone title company name street city, state, zip department mail stop fax
u.s. distributors by state alabama huntsville hamilton hallmark tel: 800.633.2918 wyle electronics tel: 800.964.9953 arizona phoenix hamilton hallmark tel: 800.528.8471 wyle electronics tel: 602.804.7000 tempe hamilton hallmark tel: 602.414.7705 california culver city hamilton hallmark tel: 310.558.2000 irvine hamilton hallmark tel: 714.789.4100 wyle electronics tel: 714.789.9953 los angeles wyle electronics tel: 818.880.9000 rocklin hamilton hallmark tel: 916.624.9781 sacramento wyle electronics tel: 916.638.5282 san diego hamilton hallmark tel: 619.571.7540 wyle electronics tel: 619.565.9171 san jose hamilton hallmark tel: 408.435.3500 santa clara wyle electronics tel: 408.727.2500 woodland hills hamilton hallmark tel: 818.594.0404 colorado colorado springs hamilton hallmark tel: 719.637.0055 denver wyle electronics tel: 303.457.9953 englewood hamilton hallmark tel: 303.790.1662 connecticut cheshire hamilton hallmark tel: 203.271.2844 florida fort lauderdale hamilton hallmark tel: 305.484.5482 wyle electronics tel: 305.420.0500 largo hamilton hallmark tel: 800.282.9350 orlando wyle electronics tel: 407.740.7450 tampa/n. florida wyle electronics tel: 800.395.9953 winter park hamilton hallmark tel: 407.657.3317 georgia atlanta wyle electronics tel: 800.876.9953 duluth hamilton hallmark tel: 800.241.8182 illinois arlington heights hamilton hallmark tel: 708.797.7300 chicago wyle electronics tel: 708.620.0969 iowa carmel hamilton hallmark tel: 800.829.0146 kansas overland park hamilton hallmark tel: 800.332.4375 kentucky lexington hamilton hallmark tel: 800.235.6039 maryland baltimore wyle electronics tel: 410.312.4844 columbia hamilton hallmark tel: 800.638.5988 massachusetts boston wyle electronics tel: 800.444.9953 peabody hamilton hallmark tel: 508.532.3701 michigan plymouth hamilton hallmark tel: 313.416.5800 minnesota bloomington hamilton hallmark tel: 612.881.2600 minneapolis wyle electronics tel: 800.860.9953 missouri earth city hamilton hallmark tel: 314.291.5350 new jersey mt. laurel hamilton hallmark tel: 609.222.6400 no. new jersey wyle electronics tel: 201.882.8358 parsippany hamilton hallmark tel: 201.515.1641 new mexico alburquerque hamilton hallmark tel: 505293.5119 new york hauppauge hamilton hallmark tel: 516.737.7400 long island wyle electronics tel: 516.293.8446 rochester hamilton hallmark tel: 800.462.6440 north carolina raleigh hamilton hallmark tel: 919.872.0712 wyle electronics tel: 919.469.1502 ohio cleveland wyle electronics tel: 216.248.9996 dayton hamilton hallmark tel: 800.423.4688 wyle electronics tel: 513.436.9953 solon hamilton hallmark tel: 216.498.1100 toledo wyle electronics tel: 419.861.2622 worthington hamilton hallmark tel: 614.888.3313 oklahoma tulsa hamilton hallmark tel: 918.254.6110 oregon beaverton hamilton hallmark tel: 503.526.6200 portland wyle electronics tel: 503.643.7900 pennsylvania philadelphia wyle electronics tel: 800.871.9953 texas austin hamilton hallmark tel: 512.258.8848 wyle electronics tel: 800.365.9953 dallas hamilton hallmark tel: 214.553.4302 wyle electronics tel: 800.955.9953 houston hamilton hallmark tel: 713.787.8300 wyle electronics tel: 713.784.9953 san antonio wyle electronics tel: 210.697.2816 utah salt lake city hamilton hallmark tel: 801.266.2022 wyle electronics tel: 801.974.9953 washington redmond hamilton hallmark tel: 206.881.6697 seattle wyle electronics tel: 800.248.9953 wisconsin milwaukee wyle electronics tel: 800.867.9953 new berlin hamilton hallmark tel: 414.780.7200 dstributors with design resource centers
sales of?ces and design resource centers printed in usa 996.1k.tp.g printed on recycled paper iso 9000 certified new jersey edison tel: 908.549.4500 fax: 908.549.4802 new york new york tel: 716.223.8820 fax: 716.223.8822 north carolina raleigh tel: 919.783.8833 fax: 919.783.8909 oregon beaverton tel: 503.645.0589 fax: 503.645.6612 texas austin tel: 512.388.7294 fax: 512.388.4171 dallas tel: 214.788.2966 fax: 214.233.9234 houston tel: 713.379.7800 fax: 713.379.7818 washington bellevue tel: 206.822.4384 fax: 206.827.2884 international australia reptechnic pty ltd new south wales tel: 612.9953.9844 fax: 612.9953.9683 canada lsi logic corporation of canada inc ontario ottawa tel: 613.592.1263 fax: 613.592.3253 toronto tel: 416.620.7400 fax: 416.620.5005 quebec pointe claire tel: 514.694.2417 fax: 514.694.2699 lsi logic corporation corporate headquarters tel: 408.433.8000 fax: 408.433.8989 united states california irvine tel: 714.553.5600 fax: 714.474.8101 san diego tel: 619.635.1300 fax: 619.635.1350 silicon valley sales of?ce tel: 408.433.8000 fax: 408.433.7783 design center tel: 408.433.8000 fax: 408.433.2820 colorado boulder tel: 303.447.3800 fax: 303.541.0641 florida boca raton tel: 407.989.3236 fax: 407.989.3237 georgia atlanta tel: 770.395.3800 fax: 770.395.3811 illinois schaumburg tel: 847.995.1600 fax: 847.995.1622 kentucky bowling green tel: 502.793.0010 fax: 502.793.0040 maryland bethesda tel: 301.897.5800 fax: 301.897.8389 massachusetts waltham tel: 617.890.0180 fax: 617.890.6158 minnesota minneapolis tel: 612.921.8300 fax: 612.921.8399 denmark lsi logic development centre ballerup tel: 45.44.86.55.55 fax: 45.44.86.55.56 france lsi logic s.a. paris tel: 33.1.34.63.13.13 fax: 33.1.34.63.13.19 germany lsi logic gmbh munich tel: 49.89.4.58.33.0 fax: 49.89.4.58.33.108 stuttgart tel: 49.711.13.96.90 fax: 49.711.86.61.428 hong kong avt industrial ltd hong kong tel: 852.2428.00008 fax: 852.2401.2105 india logicad india private ltd bangalore tel: 91.80.526.2500 fax: 91.80.338.6591 israel lsi logic ramat hasharon tel: 972.3.5.403741 fax: 972.3.5.403747 netanya tel: 972.9.657190 fax: 972.9.657194 italy lsi logic s.p.a. milano tel: 39.39.687371 fax: 39.39.6057867 japan lsi logic k.k. tokyo tel: 81.3.5463.7821 fax: 81.3.5463.7820 osaka tel: 81.6.947.5281 fax: 81.6.947.5287 korea lsi logic corporation of korea ltd seoul tel: 82.2.561.2921 fax: 82.2.554.9327 singapore desner electronics pte ltd singapore tel: 65.285.1566 fax: 65.284.9466 electronic resources ltd tel: 65.298.0888 fax: 65.298.1111 spain lsi logic s.a. madrid tel: 34.1.3672200 fax: 34.1.3673151 sweden lsi logic ab stockholm tel: 46.8.444.15.00 fax: 46.8.750.66.47 switzerland lsi logic sulzer ag brugg/biel tel: 41.32.536363 fax: 41.32.536367 taiwan lsi logic asia-paci?c regional of?ce taipei tel: 886.2.718.7828 fax: 886.2.718.8869 jeilin technology corporation tel: 886.2.248.4828 fax: 886.2.248.9765 united kingdom lsi logic europe plc bracknell tel: 44.1344.426544 fax: 44.1344.481039 sales of?ces with design resource centers


▲Up To Search▲   

 
Price & Availability of CW400X

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X