View CW4010_1087328.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

minirisc? CW4010 superscalar microprocessor core technical manual a coreware ? product order number c14032
ii this document is preliminary. as such, it contains data derived from functional simulations and performance estimates. lsi logic has not veri?ed either the functional descriptions, or the electrical and mechanical speci?cations using pro- duction parts. document db14-000027-00, first edition (july 1996) this document describes revision a of lsi logic corporations CW4010 super- scalar microprocessor core and will remain the of?cial reference source for all revisions/releases of this product until rescinded by an update. to receive product literature, call us at 1-800-574-4286 (or 415-940-6877 outside the u.s. and canada) and ask for department jds; or visit us at http://www.lsilogic.com. lsi logic corporation reserves the right to make changes to any products herein at any time without notice. lsi logic does not assume any responsibility or lia- bility arising out of the application or use of any product described herein, except as expressly agreed to in writing by lsi logic; nor does the purchase or use of a product from lsi logic convey a license under any patent rights, copyrights, trademark rights, or any other of the intellectual property rights of lsi logic or third parties. copyright ? 1996 by lsi logic corporation. all rights reserved. trademark acknowledgment lsi logic logo design and coreware are registered trademarks and minirisc and minisim are trademarks of lsi logic corporation. sun and sparcstation are trademarks of sun microsystems, inc. sparc is a registered trademark of sparc international, inc. products bearing the sparc trademarks are based on an architecture developed by sun microsystems, inc. mips is a trademark of mips technologies, inc. verilog is a registered trademark of cadence design systems, inc. all other brand and product names may be trademarks of their respective companies.
preface iii preface this book is the primary reference and technical manual for the minirisc? CW4010 superscalar microprocessor core, referred to in this document as the CW4010 core, the CW4010, or the core. the book contains a complete functional description of the CW4010. audience the book is intended for use by engineers and managers who are evaluating the CW4010 core, or for engineers who are designing with the core. the book assumes that this audience is familiar with the concepts of microprocessors and related support devices. organization the book has the following chapters and a glossary of terms. chapter 1, introduction , provides an overview of the CW4010 core and describes the features of the lsi logic coreware ? program. chapter 2, architectural overview , describes the cpu pipeline and microarchitecture, the instructions set architecture, the system coprocessor (cp0), memory management, exception processing, and cache maintenance. chapter 3, instruction set summary , describes the mips r-series instructions and the instruction set extensions supported in the CW4010 core. chapter 4, CW4010 exception processing , describes how the CW4010 handles exception processing. chapter 5, CW4010 memory management , provides detailed information about cp0 and the CW4010 memory management system. chapter 6, CW4010 caches , provides detailed information about the CW4010 caches and cache maintenance.
iv preface chapter 7, signals , describes the CW4010 core i/o signals. chapter 8, interface operation , describes the main timing scenarios for CW4010 transactions. chapter 9, speci?cations , refers you to an addendum that contains speci?cations for the CW4010 core. appendix a, programmers notes , provides information that is useful if you are writing software for the CW4010 core. related publications cw33300 enhanced self-embedding processor core users manual , order no. c14014 conventions used in this manual terms that appear in the glossary are shown in boldface the ?rst time they are mentioned in the text. the term word is used to de?ne a 32-bit quantity, either signed or unsigned. this means that in the CW4010 core a word consists of four 8-bit bytes; a doubleword has 64 bits, or eight 8-bit bytes; and a halfword has 16 bits, or two 8-bit bytes. hexadecimal numbers are indicated by the pre?x 0x before the number, for example, 0x32cf. binary numbers are indicated by a subscript 2 following the number, for example 000.0010.1100.1111 2 . the following signal conventions are used throughout the manual: signals that are inputs have a lower case i as part of the signal name, for example, scd ip. signals that are outputs have a lower case o, for example, scd op. lowercase characters are used to avoid confusion between uppercase i and 1, and uppercase o and 0. active-low signals have a lowercase n at the end of the signal name, for example reset n. active-high signals have a lowercase p at the end of the signal name, for example scao p. the term assert means to drive a signal true or active. the term deassert means to drive a signal false or inactive.
contents v contents chapter 1 introduction 1.1 CW4010 overview 1-1 1.1.1 core and shell 1-2 1.1.2 interfaces 1-3 1.1.3 related modules 1-3 1.2 features 1-4 1.3 coreware program 1-5 1.3.1 coreware building blocks 1-5 1.3.2 design environment 1-6 1.3.3 expert support 1-6 chapter 2 architectural overview 2.1 architectural overview 2-1 2.2 cache and external interface 2-5 2.3 clocking and power management 2-6 2.4 pipeline architecture (isa) 2-6 2.4.1 instruction fetch and scheduling: if, q, and rd stages 2-7 2.4.2 execute stage 2-9 2.4.3 cr and wb stages 2-9 2.5 instruction set summary 2-9 2.6 con?gurability and options 2-13 2.6.1 cache sizes 2-14 2.6.2 standard vs high-performance multiply accumulate unit 2-14 2.6.3 64-bit vs 32-bit memory interface 2-14 2.6.4 memory management unit 2-14 2.7 supporting models and tools 2-14
vi contents chapter 3 instruction set summary 3.1 instruction set formats 3-1 3.2 load and store instructions 3-2 3.3 computational instructions 3-5 3.4 jump and branch instructions 3-11 3.5 trap instructions 3-15 3.6 special instructions 3-16 3.7 coprocessor instructions 3-17 3.8 system control coprocessor (cp0) instructions 3-18 3.9 cache maintenance instructions 3-19 3.10 CW4010 instruction set extensions 3-20 3.11 cpu instruction opcode bit encoding 3-36 chapter 4 CW4010 exception processing 4.1 overview 4-1 4.2 r3000 exception compatibility mode 4-3 4.3 exception handling registers 4-4 4.3.1 context register (4) 4-5 4.3.2 debug control and status (dcs) register (7) 4-6 4.3.3 bad virtual address (badvaddr) register (8) 4-8 4.3.4 count register (9) 4-8 4.3.5 compare register (11) 4-8 4.3.6 status register (12) 4-9 4.3.7 cause register (13) 4-16 4.3.8 exception program counter register (14) 4-18 4.3.9 processor revision identi?er register (15) 4-19 4.3.10 con?guration and cache control (ccc) register (16) 4-20 4.3.11 load linked address (lladdr) register (17) 4-23 4.3.12 breakpoint program counter (bpc) register (18) 4-24 4.3.13 breakpoint data address (bda) register (19) 4-24 4.3.14 breakpoint pc mask (bpcm) register (20) 4-25 4.3.15 breakpoint data address mask (bdam) register (21) 4-25 4.3.16 rotate register (23) 4-26 4.3.17 circular mask (cmask) register (24) 4-26 4.3.18 error exception program counter (error epc)
contents vii register (30) 4-27 4.4 exception description details 4-28 4.4.1 exception operation 4-28 4.4.2 precision of exceptions 4-31 4.4.3 exception vector locations 4-31 4.4.4 priority of exceptions 4-32 4.4.5 cold reset exception 4-32 4.4.6 warm reset exception 4-33 4.4.7 non-maskable interrupt (nmi) exception 4-34 4.4.8 address error exception 4-35 4.4.9 tlb re?ll exception 4-36 4.4.10 tlb invalid exception 4-38 4.4.11 tlb modi?ed exception 4-39 4.4.12 bus error exception 4-40 4.4.13 integer over?ow exception 4-41 4.4.14 trap exception 4-41 4.4.15 system call exception 4-42 4.4.16 breakpoint exception 4-43 4.4.17 reserved instruction exception 4-44 4.4.18 floating-point exception 4-45 4.4.19 coprocessor unusable exception 4-45 4.4.20 debug exception 4-46 4.4.21 interrupt exception 4-47 4.4.22 external vectored interrupt exception 4-48 chapter 5 CW4010 memory management 5.1 tlb physical organization 5-1 5.2 memory management system 5-3 5.2.1 operating modes 5-3 5.2.2 user mode virtual addressing 5-4 5.2.3 kernel mode virtual addressing 5-5 5.3 virtual memory and the tlb 5-5 5.3.1 tlb entry format 5-7 5.3.2 tlb support registers 5-9 5.3.3 virtual address translation 5-15 5.3.4 tlb instructions 5-16
viii contents chapter 6 CW4010 caches 6.1 cache memory organization 6-1 6.2 cache states 6-2 6.2.1 icache and writethrough dcache 6-2 6.2.2 writeback dcache 6-3 6.3 address and cache tag 6-4 6.4 dcache scratch pad ram mode 6-5 6.5 external invalidation 6-6 6.6 cache instructions 6-6 6.6.1 flush (all cache invalidation) 6-6 6.6.2 writeback 6-7 6.6.3 cache maintenance by ccc register 6-7 chapter 7 signals 7.1 signal conventions 7-1 7.2 signal synchronization 7-2 7.3 CW4010 modularity 7-2 7.4 CW4010 shell interface signal de?nitions 7-3 7.4.1 reset signals 7-5 7.4.2 interrupt signals 7-5 7.4.3 scbus interface signals 7-6 7.4.4 cache invalidation interface signals 7-10 7.4.5 coprocessor interface signals 7-11 7.4.6 ocabus interface signals 7-16 7.4.7 miscellaneous signals 7-20 chapter 8 interface operation 8.1 reset and exception signals 8-1 8.1.1 cold reset (cresetn) 8-2 8.1.2 handling cold resets 8-3 8.1.3 warm reset (wresetn) 8-3 8.1.4 non-maskable interrupt (nmin) 8-5 8.1.5 bus error (scberrn) 8-7 8.1.6 floating-point unit (fperrxn) exceptions 8-11 8.1.7 external interrupts (extintn) 8-14 8.1.8 external vectored interrupt (exvintn) 8-16
contents ix 8.1.9 waiti instruction and wstallp 8-18 8.2 scbus interface behavior 8-19 8.2.1 scbus basic transaction 8-20 8.2.2 scbus burst transaction 8-22 8.2.3 scbus in-page write transaction 8-27 8.2.4 scbus bus hold 8-29 8.2.5 scbus bus retry 8-30 8.2.6 scbus bus error 8-30 8.2.7 scbus bus sizing 8-31 8.2.8 scbus bus lock 8-34 8.2.9 big endian con?guration 8-35 8.3 ocabus interface behavior 8-36 8.3.1 basic ocabus transaction 8-37 8.3.2 ocabus transaction rejected 8-39 8.3.3 ocabus access with stall at ex stage 8-40 8.3.4 ocabus access with stall at cr stage 8-41 8.3.5 ocabus access with stall request 8-42 8.3.6 ocabus access with pipeline cancel 8-43 8.4 cache interface behavior 8-44 8.5 coprocessor interface behavior 8-46 8.5.1 coprocessor functional instruction 8-48 8.5.2 data movement to and from cpu general purpose register instructions 8-48 8.5.3 instructions moving data from or to memory 8-49 8.5.4 branch on coprocessor condition instructions 8-50 8.5.5 coprocessor operation (copz) 8-51 8.5.6 data movement to and from cpu registers 8-52 8.5.7 data movement to or from memories 8-55 8.5.8 coprocessor conditions 8-58 8.5.9 even/odd slot and pipeline cancel 8-58 8.5.10 branchlikely in even slot is false 8-60 8.5.11 ex stage suspension 8-61 8.5.12 floating-point unit exception 8-63 chapter 9 speci?cations appendix a programmers notes a.1 instruction related a-1
x contents a.2 cp0 or tlb related a-1 a.3 cache related a-2 a.4 cw33300 compatible debug extensions a-2 glossary customer feedback figures 1.1 CW4010 core interface to external building blocks 1-2 2.1 CW4010 block diagram 2-3 2.2 CW4010 instruction pipeline 2-6 3.1 instruction format 3-2 3.2 byte speci?cations for loads/stores 3-3 4.1 context register 4-5 4.2 dcs register 4-6 4.3 badvaddr register 4-8 4.4 count register 4-8 4.5 compare register 4-9 4.6 status register (r4000 mode) 4-9 4.7 status register (r3000 mode) 4-12 4.8 status register and exception recognition 4-16 4.9 cause register 4-16 4.10 epc register 4-18 4.11 prid register 4-19 4.12 ccc register 4-20 4.13 lladdr register 4-24 4.14 bpc register 4-24 4.15 bda register 4-24 4.16 bpcm register 4-25 4.17 bdam register 4-25 4.18 rotate register 4-26 4.19 cmask register 4-27 4.20 error epc register 4-27 4.21 cold reset exception 4-29 4.22 warm reset, nmi exceptions 4-29 4.23 common exceptions 4-30 4.24 debug exception 4-30
contents xi 4.25 external vectored interrupt exception 4-30 5.1 tlb block diagram 5-2 5.2 CW4010 virtual memory map 5-4 5.3 CW4010 virtual address format 5-6 5.4 format of CW4010 tlb entry 5-7 5.5 entryhi register 5-9 5.6 entrylo register 5-10 5.7 pagemask register 5-11 5.8 index register 5-12 5.9 random register 5-13 5.10 wired register location 5-13 5.11 wired register 5-14 5.12 CW4010 tlb address translation process 5-15 6.1 cache state diagramicache and writethrough dcache 6-3 6.2 cache state diagramdcache writeback 6-3 6.3 address to cache tag and line number 6-5 6.4 cache instruction format 6-6 6.5 tag test mode loaded data format 6-9 7.1 CW4010 module 7-2 7.2 CW4010 interface signals 7-4 8.1 cold reset and pipeline 8-2 8.2 nmin and pipeline (nmin is detected immediately) 8-5 8.3 nmin and pipeline (nmin is not detected immediately due to stall) 8-6 8.4 bus error and pipeline (detected immediately) 8-8 8.5 bus error and pipeline (with stall cycles) 8-9 8.6 fpu exception and pipeline (detected immediately) 8-11 8.7 fpu exception and pipeline (with stall cycles) 8-12 8.8 fpu exception and pipeline (cancel, then not serviced) 8-13 8.9 interrupt and pipeline (interrupt is detected immediately) 8-15 8.10 fastest accepted case of external vectored interrupt 8-17 8.11 waiti and pipeline stall (wstallp) 8-18 8.12 scbus basic transaction 8-21 8.13 scbus eight-word burst transaction timing chart 8-23 8.14 scbus eight-word burst transaction 8-25 8.15 scbus eight-word burst transaction timing chart 8-26 8.16 scbus in-page write transaction timing chart (four words) 8-2 8.17 scbus hold request and grant 8-30
xii contents 8.18 sampled bytes of first and second transaction scbus data 8-31 8.19 read bytes to isu and lsu with sizing 8-32 8.20 write bytes to the scbus with sizing 8-33 8.21 write data bytes from lsu 8-33 8.22 scbus locked transaction 8-35 8.23 typical ocabus transaction 8-38 8.24 ocabus transaction rejected by address decoder 8-39 8.25 ocabus with stall at ex stage 8-40 8.26 ocabus access with stall at cr stage 8-41 8.27 ocabus access with stall request 8-42 8.28 ocabus access with pipeline cancel 8-43 8.29 dcache invalidation by snooping 8-45 8.30 icache invalidation by snooping 8-46 8.31 coprocessor functional instruction 8-48 8.32 cpu general purpose register data movement instructions 8-48 8.33 memory data movement instructions 8-49 8.34 branch on coprocessor condition instructions 8-50 8.35 copz execution 8-51 8.36 data movement to/from cpu registers without stall cycles8-53 8.37 data movement to/from cpu registers with stall cycles at ex stage 8-54 8.38 data movement to/from cpu registers with stall cycles at cr stage 8-55 8.39 lwcz dcache-hit and miss 8-57 8.40 swcz timing 8-58 8.41 even/odd slot and pipeline cancel 8-59 8.42 branchlikely in even slot is false 8-61 8.43 ex stage suspension 8-62 8.44 ex stage suspension (cancelled) 8-62 8.45 fpu exception and pipeline cancel timing 8-64 tables 2.1 CW4010 instruction set summary 2-10 2.2 instruction set extensions 2-13 3.1 load and store instruction summary 3-4 3.2 load and store instruction summary (mips-ii isa extensions) 3-5
contents xiii 3.3 alu immediate instruction summary 3-6 3.4 3-operand, register-type instruction summary 3-7 3.5 shift instruction summary 3-8 3.6 multiply/divide instruction summary 3-9 3.7 computation instruction extensions summary (CW4010 isa) 3-10 3.8 execution time of multiply and divide instructions 3-11 3.9 jump instruction summary 3-12 3.10 branch instruction summary 3-13 3.11 branchlikely instruction summary (mips-ii isa extensions) 3-14 3.12 trap instruction summary (mips-ii isa extensions) 3-15 3.13 special instruction summary 3-16 3.14 coprocessor instruction summary 3-17 3.15 cp0 instruction summary 3-18 3.16 cp0 instruction extension summary 3-19 3.17 cache maintenance instruction summary 3-19 3.18 CW4010 opcode bit encoding 3-37 3.19 special opcode bit encoding 3-37 3.20 regimm opcode rt bit encoding 3-38 3.21 cache x2 opcode rt bit encoding 3-38 3.22 copz rs opcode bit encoding 3-38 3.23 copz rt opcode bit encoding 3-39 3.24 cp0 opcode bit encoding 3-39 4.1 CW4010 exceptions 4-2 4.2 cp0 exception processing registers 4-4 4.3 cause register exccode field 4-17 4.4 current processor mode 4-28 4.5 exception vector base addresses 4-32 4.6 exception vector offset addresses 4-32 4.7 exception priority order 4-32 5.1 caching algorithm criteria 5-3 5.2 cache algorithm bit values 5-8 5.3 tlb instruction 5-16 6.1 dcache writeback mode 6-3 6.2 setting cache size 6-5 6.3 ccc bits related to cache con?guration 6-7 6.4 tag and inv encoding 6-8 6.5 tag and inv encoding 6-8
xiv contents 8.1 common exception vector 8-10 8.2 scbus transaction types 8-20 8.3 sctben and valid scdp 8-36
1-1 chapter 1 introduction this chapter introduces the lsi logic coreware program and describes its features. it also provides an overview of the CW4010 core. this chapter contains the following sections: section 1.1, CW4010 overview, on page 1-1 section 1.2, features, on page 1-4 section 1.3, coreware program, on page 1-5 1.1 CW4010 overview lsi logic corporation has developed the minirisc CW4010 superscalar core, the worlds ?rst mips-ii compatible superscalar core, using lsi logics coreware system-on-a-chip methodology. the CW4010 is a member of lsi logics minirisc family, the next generation of mips risc products. you can use the CW4010 as a microprocessor core in products that require higher performance than that of the cw4001 microprocessor core. the CW4010 is available as a coreware product for use in customer asic designs, and is also used in lsi logics assps (application speci?c standard products).
1-2 introduction 1.1.1 core and shell as shown in figure 1.1 , the CW4010 is implemented at two levels: the core and the shell. figure 1.1 CW4010 core interface to external building blocks the CW4010 superscalar microprocessor core is an encrypted synthesizable verilog model . it is process independent and made up of the following units: an arithmetic logic unit (alu) a system control coprocessor (cp0) a bus interface unit (biu) a load store unit (lsu) an instruction scheduler unit (isu) the following microprocessor building blocks are available with the basic microprocessor core and are shown as part of the shell. the shell is an unencrypted verilog model that contains: a direct-mapped or two-way set associative instruction cache with cache sizes selectable up to 16 kbytes a direct-mapped or two-way set associative data cache reset, interrupts scbus writeback buffer coprocessor interface cache invalidation CW4010 core interface oca interface CW4010 shell alu isu cp0 mmu dcache set-0 dcache set-1 lsu biu icache set-0 icache set-1 multiplier interface md96.72
CW4010 overview 1-3 a memory management unit (mmu) with an up to 64-entry translation lookaside buffer (tlb) a standard multiply/divide unit or a high-performance multiply/accumulate unit a writeback buffer for writeback cache mode 1.1.2 interfaces the CW4010 has four on-chip interfaces: the coprocessor interface connects the core with up to three coprocessors (cp1, cp2, and cp3), as well as the internal coprocessor (cp0). the cache invalidation interface connects the core with optional cache coherency logic. the core uses this bus to communicate only with the on-chip caches. the scbus, the bidirectional system bus, allows the CW4010 to communicate with system elements outside the core. the oca (on-chip access bus) allows access to on-chip modules at the cr stage without going through the scbus. 1.1.3 related modules in addition to the core, the minirlsc product family includes a variety of other modules including: lsi logics minislm? architectural simulator verilog and vhdl models a system veri?cation environment a prom monitor third party software support core bond-out chip for emulation evaluation boards for concurrent software development lsi logics coreware, described in section 1.3, coreware program, on page 1-5
1-4 introduction 1.2 features the CW4010 core has the following features: full mips-ii instruction set implementation (r4000 32-bit mode compatible) instruction set extensions to support embedded applications superscalar execution with up to two instructions issued per clock cycle 64-bit on-chip system interface high-performance coprocessor interface for user definable coprocessors and high performance hardware floating-point unit (fpu) 3.3 volt operation 80-mhz worst-case commercial maximum clock rate using standard cell asic 150 dhrystone mips at 80 mhz 160 native mips peak, 110 native mips sustained with standard compiled mips code at 80 mhz core power5 mw/mhz with power management con?gurable modular design to meet customer requirements integrated cache controllers with separate instruction and data cachessizes selectable from 2 kbytes to 16 kbytes each optional, modi?able building blocks, such as a multiplier with or without accumulator and a memory management unit (mmu) fully testable in embedded asic designs models available: C performance and software development model C verilog and vhdl models (referred to in this manual as hdl models) C gate-level, timing-accurate model in various third party simulation environments compatible with the full range of mips and third party software development tools compact basic microprocessor core size3 mm by 3 mm including biu, cache controllers, and write buffer
coreware program 1-5 1.3 coreware program the coreware program offers a new approach to system design. through the coreware program, lsi logic gives you the ability to combine the minirisc CW4010 superscalar microprocessor core with other cores (such as microprocessors, ?oating-point processors, and peripheral building blocks) on a single chip, and to create products uniquely suited to your applications. this approachcombining high- performance building blocks, sophisticated design software, and expert supportgives you unparalleled design ?exibility and allows you to create high-quality, leading-edge products for a wide range of markets. coreware is lsi logics proprietary approach to creating custom, single- chip systems with more complex high-level logic blocks than those provided by standard gate arrays. it provides a new paradigm that allows you to use applications-optimized engineering. coreware elements allow you to produce silicon that meets your speci?c application requirements. complex systems-on-a-chip can be fabricated with unprecedented time-to-market and low cost compared with the gate array approach or full custom design. the coreware program provides three major design components: coreware building blocks a design environment expert technical support 1.3.1 coreware building blocks coreware building blocks include elements based on lsi logics high- performance standard products as well as other, industry-standard products. these blocks are fully supported library elements for use in the lsi logic hardware development environment. the coreware library contains a wide range of complex cores based on accepted and emerging industry standards for networking, system logic, dig- ital video, and other applications. these cores can be combined with your own unique logic to create a wide variety of single-chip applications such as network routers, raid disk controllers, and pc i/o docking stations. note that the building blocks include gate-level simulation models with timing infor- mation, so you can accurately simulate device performance and trade off various implementation options. in addition to gate-level simulation models, some building blocks also include behavioral simulation models.
1-6 introduction 1.3.2 design environment the coreware building blocks, which include embedded mips and sparc processors, bus interface controllers, and a family of ?oating- point processors, are fully supported library elements for use in the lsi logic hardware development environment. 1.3.3 expert support lsi logics in-house experts support the coreware program with high- level design and market experience in a wide variety of application areas. these experts provide design support from system architecture de?nition through chip layout and test vector generation. they help determine how many functions to integrate on a single chip, trading off functionality versus cost to ?nd the most cost-effective solution. when the trade-offs are complete, working with lsi logics applications engineers, you can implement and test the design.
2-1 chapter 2 architectural overview this chapter discusses the cpu pipeline and microarchitecture, the instruction set architecture, the system coprocessor (coprocessor-0), memory management, exception processing, and cache maintenance. this chapter contains the following sections: section 2.1, architectural overview, on page 2-1 section 2.2, cache and external interface, on page 2-5 section 2.3, clocking and power management, on page 2-6 section 2.4, pipeline architecture (isa), on page 2-6 section 2.5, instruction set summary, on page 2-9 section 2.6, con?gurability and options, on page 2-13 section 2.7, supporting models and tools, on page 2-14 2.1 architectural overview the CW4010 is fully compatible with the r3000 and r4000 32-bit instruction sets (mips-i and mips-ii), but uses an updated hardware architecture to provide higher absolute performance than any other available mips core. the CW4010 also provides substantially better instructions-per-clock performance than other mips processors. at the same time, the hardware design remains compact in comparison with other superscalar architectures. the CW4010 implements a 32-bit virtual address space. individual memory locations are byte-addressed. up to 2 gbytes of virtual address space is available to each user-level process. the CW4010 implements a 32-bit physical address space. individual memory locations are byte-addressed, and, combined with the virtual address space, provide a total of 4 gbytes of addressable physical memory.
2-2 architectural overview the CW4010 can issue and complete two instructions per cycle using a combination of ?ve independent execution units: arithmetic logic unit (alu) load/store/add unit (lsu) lsu executes load and store instructions. it also executes add and load immediate instructions, allowing an add instruction to be issued with another add or logical instruction. branch unit multiply/shift unit coprocessor interface coprocessor interface can feed an instruction to an lsi or customer- de?ned coprocessor unit all instructions, except multiply and divide, can be completed in a single cycle. load instructions have a single hardware delay slot for loads that hit in the cache, but the hardware activates an interlock on register con?icts so that no nop (no operation) is required in the delay slot. on a load miss, the CW4010 extends the hardware con?ict detection so that if the load data is not required by subsequent instructions in the pipeline, the cpu is not stalled. the operation is called load scheduling. the CW4010 has an instruction pre-fetch queue and branch prediction logic to boost branch performance. this means that correctly predicted branches are completed with no penalty and incorrectly predicted branches normally have a penalty of just one cycle. the CW4010 accomplishes branch prediction with a simple hardware algorithm that more than 90% accurate for most application code. figure 2.1 shows a simpli?ed block diagram of the basic cpu core.
architectural overview 2-3 figure 2.1 CW4010 block diagram three units handle instructions: the ifetch queue optimizes the supply of instructions to the microprocessor, even across breaks in the sequential ?ow of execution (jumps and branches). the idecode unit decodes the instructions from the ifetch queue, determines the actions required for the instruction execution, and manages the register file, lsu, alu, and multiply/divide units accordingly. the branch unit is used when branch and jump instructions are recognized within the instruction stream. the register file contains the cores general purpose registers. (there are 32 general purpose registers located in cp0. of these registers 31 are read/write registers and one is the zero register.) the register file coprocessor-0 ifetch queue icache coprocessor interface register file load/store/add unit (lsu) arithmetic multiply/shift dcache write bus interface unit (biu) address data idecode unit branch unit control scbus interface internal instruction execution bus x 2 32 64 64 logic unit interface cache invalidation interface multiply/divide unit ocabus interface 64 md96.73 instruction schedule unit (isu) (alu) buffer unit
2-4 architectural overview supplies source operands to the execution units and handles the storage of results to target registers. three units perform logical, arithmetic, and data-movement operations: the load/store/add unit (lsu) manages loads and stores of data values. data values are loaded from either the dcache or from the scbus interface in the event of a dcache miss. stores pass to the dcache and the scbus interface through the write buffer. the lsu is also able to perform a restricted set of arithmetic operations, including the addition of an immediate offset as required in address calculations. the arithmetic logic unit (alu) calculates the result of an arithmetic or logical operation. the multiply/shift interface unit performs multiply and divide operations. you can select a number of modular options for this unit, including an option with full multiply/accumulate capability. the CW4010 core has four interfaces: the bus interface unit (biu) manages the ?ow of instructions and data between the core and the system by means of the scbus interface. this interface provides the main channel for communication between the CW4010 core and the other functional blocks in the system. some blocks may be implemented as coreware library functions integrated on the same die as the microprocessor core; others may be implemented in separate devices connected by means of i/o pins at the board level. the coprocessor interface allows tightly coupled special-purpose processing units to be attached to the core, enhancing the microprocessors general-purpose computational power. this approach allows high-performance application-speci?c hardware to be made directly accessible to the programmer at the instruction set level. for example, a coprocessor might offer accelerated bit-mapped graphics operations or real-time video decompression. the cache invalidation interface allows supporting hardware outside the microprocessor core to maintain the coherency of on-board cache contents for systems that include multiple main-bus masters. the ocabus interface allows on-chip modules to be accessed at the cr (cache read) stage of the pipeline without going through an
cache and external interface 2-5 scbus transaction. this improves performance since it reduces traf?c on the scbus and therefore reduces latency. 2.2 cache and external interface instruction cache (icache) control is performed by the isu (instruction scheduling unit). data cache (dcache) control is performed by the lsu (load/store unit). a write buffer is also implemented within the lsu, so that cpu execution need not stall if a number of stores are performed in quick succession. the write buffer accepts the store addresses and data values, and passes them on to main memory as rapidly as it can accept them. during this time, the cpu proceeds with execution. the biu provides the interface to on-chip peripherals. one or more peripherals will typically provide a path to off-chip resources, including main memory. the on-chip system interface presented by the biu is the scbus. this bus has a 64-bit data bus and a 32-bit address bus. address and data are not multiplexed. re?lls for both the icache and dcache exploit the 64-bit width of the data bus to achieve the highest possible performance. in its standard form, the biu does not include any support for dynamic bus sizing. that is, it provides no mechanism to subdivide transactions between the cpu core and the on-chip peripherals, which can directly support the requested data transfer width. however, such functionality can be accommodated through simple supporting logic interposed between the biu and the on-chip peripherals. again, the standard version biu does not contain support for writeback data cache management. it implements only a writethrough policy. however, an additional supporting unit, the writeback buffer, may be attached to provide writeback control in those applications which require it. excluding features such as dynamic bus sizing and writeback cache management from the standard biu keeps it small and simple. designs that do not need these features are not compelled to implement them.
2-6 architectural overview 2.3 clocking and power management the cpu core is clocked by a single phase, 1x clock with a 40C60% duty cycle requirement. applications that require a slower system clock interface may use a phase-locked loop (pll) available as a cell in lsi logics asic libraries and logic to implement a clock multiplier circuit for the cpu. power management is provided by the cpu by waiti (wait for interrupt) instruction and by gating the clock separately for each functional unit. units are clocked only when needed. in addition, the core and cache rams are completely static, so that the clock may be slowed or turned off by the user logic to save power. 2.4 pipeline architecture (isa) this section describes the cpu pipelines, instruction fetching and scheduling. it also contains an instruction set summary. as shown in figure 2.2 , the CW4010 core has two identical concurrent six-stage pipelines that provide the core with its superscalar capabilities. one pipeline is known as the even pipeline or even slot, and the other as the odd pipeline or odd slot. figure 2.2 CW4010 instruction pipeline the ?rst three pipeline stages are used during instruction fetch and the last three stages during instruction execution. once a stage has accepted an instruction from the previous stage it must hold the ex cr rd q if instruction fetch instruction execution wb ex cr rd q if wb even slot, odd slot, pipeline 0 pipeline 1 1. branch instruction encountered. 2. q state bypassed. md96.74
pipeline architecture (isa) 2-7 instruction for re-execution in case the pipeline stalls. the function of each pipeline stage is summarized below. 1. if (instruction fetch)the CW4010 fetches the instruction during the ?rst stage. 2. q (queuing)instructions may enter this conditional stage if they deal with branches or register con?icts. an instruction that does not cause a branch or register con?ict is fed directly to the rd stage. 3. rd (read)during this stage, any required operands are read from the register file while the instruction is decoded. 4. ex (execute)all instructions are executed in this stage. conditional branches are resolved in this cycle. the address calculation for load and store instructions is performed in this stage. 5. cr (cache read)this stage is used to read the cache for load and store instructions. data is returned to the register bypass logic at the end of this stage. 6. wb (writeback)results are written into the register file during this stage. sections 2.4.1 , 2.4.2 , and 2.4.3 provide more detailed information about pipeline transactions. 2.4.1 instruction fetch and scheduling: if, q, and rd stages the if, q, and rd stages fetch two instructions per cycle and issue them to the ex (execute) stage. the CW4010 fetches instructions as doubleword aligned pairs (slot 0 and slot 1). there is a two-instruction window in the rd stage during the instruction decode operation. when only slot 0 can be scheduled because slot 1 has a dependency, the window slides down one instruction. in other words, although instructions are always fetched as doubleword pairs, they are scheduled on single- word boundaries. the primary purpose of the q stage is the execution of branch instructions with minimal penalty. the CW4010 generally ?lls the q stage whenever the rd stage has to stall. this occurs fairly frequently on typical compiled code, because of register con?icts, cache misses, and resource con?icts. filling the q stage in these cases allows the if stage to work ahead one cycle. if a branch instruction is encountered when the q stage is already active, it is predicted that the branch will be taken. the if stage does not bring
2-8 architectural overview in any more instructions following from the current address, but instead begins fetching those instructions starting at the branch target address. at this point, the q stage still holds the pair of instructions immediately following the pair that contained the branch. the branch target enters the rd stage, bypassing the q stage, as shown in figure 2.2 on page 2-6 . the branch prediction logic in the isu (instruction scheduler unit) resolves the branch condition when the branch instruction enters the ex stage. if the branch prediction logic predicts the branch correctly, the instructions in the q stage are cancelled. if it predicts the branch incorrectly, the isu cancels the branch target. in this case, it takes non-branch sequential instructions from the q stage and restarts the if stage at the non-branch sequential stream. the process is different when the branch instruction is in the odd instruction slot. if the branch prediction logic correctly predicts a branch in the even instruction slot when the q stage is full, there is generally no cycle penalty associated with it. if the branch prediction logic predicts the branch incorrectly, the branch has a one cycle penalty. if the branch instruction was in the odd instruction slot, the branch delay slot instruction always executes by itself and has no chance to ?ll the other execution slot. there may be some advantage to a software assembler that can attempt to place branches in even word addresses. the branch prediction logic must be able to look at two instructions at the same time, from either the q latches or the rd latches, depending on whether the q stage is active. when it looks at the two instructions, if one is a branch, it passes the offset in that instruction into a dedicated adder to calculate the branch address for the if stage of the instruction fetch. because this is done speculatively, it also saves the non-branch value of the pc (program counter) for the possible restart of the sequential instructions from the q stage. after the isu has allowed an instruction pair to pass into the rd stage, the instruction is decoded, and at the same time the register source addresses are passed to the register ?le so that the operands can be read. register dependencies and resource dependencies are checked in this stage. if the instruction in slot 0 has no dependency on a register or resource currently tied up by a previous instruction, it is passed immediately into the ex stage where it forks to the appropriate execution
instruction set summary 2-9 unit. the instruction in slot 1 may also be dependent on a resource or register in slot 0, so it must be checked for dependencies against both slot 0 and any previous unretired instruction. if either instruction must be held in the rd stage and the q stage is not full, the if stage is allowed to continue to ?ll the q stage. if the q stage is full, then the q and if stages are frozen (stalled). in the rd stage, register bypass opportunities are considered and the bypass multiplexer control signals are set for potential bypass cases from a previous instruction still in the pipeline. 2.4.2 execute stage during instruction execution, a pair of instructions (or a single instruction when there was a previous block) are individually passed to independent execution units. each execution unit receives its operands from the register bypass logic and an instruction from the instruction scheduler. each instruction spends one run cycle in an execution unit. for alu and other single cycle instructions, the result is then fed to the register/bypass unit for the cr stage. 2.4.3 cr and wb stages for load and store instructions, the cache lookup occurs during the cr stage. for load instructions, data is returned to the register/bypass unit during the cr stage, including loads to coprocessors. for all other instructions, cr and wb are holding stages used to hold the result of the execute stage for writeback to the register ?le. 2.5 instruction set summary table 2.1 summarizes the instruction set for the CW4010. the CW4010 supports both mips-i and mips-ii instructions, and also implements some additional CW4010-speci?c instructions. if the design includes the optional mmu, the CW4010 supports the tlb instructions. all instructions are 32 bits long. table 2.1 includes only the mips-ii, CW4010-speci?c, and tlb instructions. with the exception of rfe, mips-i instructions are not shown.
2-10 architectural overview table 2.1 CW4010 instruction set summary op description op description arithmetic instructions: alu immediate addi add immediate andi and immediate addiu add immediate unsigned ori or immediate slti set on less than immediate xori exclusive or immediate sltiu set on less than immediate unsigned lui load upper immediate arithmetic instructions: three-operand, register-type add add sltu set on less than unsigned addu add unsigned and and sub subtract or or subu subtract unsigned xor exclusive or slt set on less than nor nor branch likely instructions beql 1 branch on equal likely bgezl 1 branch on greater than or equal to zero likely bnel 1 branch on not equal likely bltzall 1 branch on less than zero and link likely blezl 1 branch on less than or equal to zero likely bgezall 1 branch on greater than or equal to zero and link likely bgtzl 1 branch on greater than zero likely bcztl 1 branch on coprocessor z true likely bltzl 1 branch on less than zero likely bczfl 1 branch on coprocessor z false likely coprocessor instructions lwcz load word to coprocessor z cfcz move control from coprocessor z swcz store word from coprocessor z copz coprocessor operation mtcz move to coprocessor z bczt branch on coprocessor z true mfcz move from coprocessor z bczf branch on coprocessor z false ctcz move control to coprocessor z (sheet 1 of 3)
instruction set summary 2-11 jump and branch instructions j jump blez branch on less than or equal to zero jal jump and link bgtz branch on greater than zero jr jump register bgez branch on greater than or equal to zero jalr jump and link register bltzal branch on less than zero and link beq branch on equal bgezal branch on greater than or equal to zero and link bne branch on note equal load/store instructions lb load byte sh store halfword lbu load byte unsigned sw store word lh load halfword swl store word left lhu load halfword unsigned swr store word right lw load word ll 1 load linked lwl load word left sc 1 store conditional lwr load word right sync 1 sync sb store byte multiply/divide instructions mult multiply mthi move to hi multu multiply unsigned mflo move from lo div divide mtlo move to lo divu divide unsigned min select minimum value mfhi move from hi max select maximum value table 2.1 (cont.) CW4010 instruction set summary op description op description (sheet 2 of 3)
2-12 architectural overview in addition to the standard mips-ii instruction set, the CW4010 implements certain instruction set extensions, shown in table 2.2 , that provide greater application code performance for typical embedded applications. instruction set extensions are included only if they signi?cantly improve performance, have no impact on clock cycle rate, and have minimal impact on the size and complexity of the hardware. other computational instructions addciu 2 add circular immediate selsr 2 select and shift right ffs 2 find first set selsl 2 select and shift left ffc 2 find first clear madd 2 multiply/add flushd 2, 3 flush data cache maddu 2 multiply/add unsigned flushi 2, 3 flush instruction cache msub 2 multiply/subtract flushid 2, 3 flush instruction and data cache msubu 2 multiply/subtract unsigned special instructions syscall system call break breakpoint system control coprocessor (cp0) instructions mtc0 move to cp0 tlbwi 4 write indexed tlb entry mfc0 move from cp0 tlbwr 4 write random tlb entry rfe restore from exception (r3000 mode only) tlbp 4 probe tlb for matching entry eret exception return (r4000 mode only) waiti 2 wait for interrupt tlbr 4 read indexed tlb entry 1. mips ii instruction 2. CW4010-speci?c instruction 3. do not confuse these instructions with the flush instruction in r6000 processors 4. valid only with implemented mmu (memory management unit) building block table 2.1 (cont.) CW4010 instruction set summary op description op description (sheet 3 of 3)
con?gurability and options 2-13 2.6 con?gurability and options the CW4010 is implemented using verilog hdl (hardware description language) as the design source, and the lsi logic standard cell library and layout tools for physical design. you can easily modify and con?gure the CW4010 core to meet speci?c design requirements. the options available in the basic core are shown in the following sections. note: vhdl models are also available. table 2.2 instruction set extensions extension format and description find first set, find first clear ffs, ffc these instructions, respectively, ?nd the ?rst set bit and the ?rst clear bit in the source register, and return the bit number to the destination register. they are useful for many applications such as interrupt handlers, ?oating point emulation, and graphics. select and rotate left, select and rotate right selsl, selsr these instructions select 32 bits from the 64-bit source register pair and rotate the selected data left or right by the number of bits speci?ed in the new cp0 rotate register. they are useful for data alignment operation in graphics and in bit-?eld selection routines for data transmission and com- pression applications. add circular immediate addciu this instruction does an immediate add, modi?ed according to the value in the new cp0 cmask register. it is useful in addressing circular buffers. this instruction is important in dsp (digital signal processing) and other applica- tions that use circular buffers. multiply/add, multiply/sub instructions madd, msub these instructions are useful in many signal processing and graphics trans- form algorithms. only implemented with the high-performance multiply/accumulate unit, these instructions do a 32 x 32 multiply and then either add or subtract the result to the 64-bit hi/lo register pair. wait for interrupt waiti this instruction halts the cpu in a power saving mode until one of the hard- ware interrupt lines becomes active. upon interrupt, normal execution is resumed starting at the interrupt vector address. minimum rd, rs, rt. min rd, rs, rt the source operands rs and rt are compared as twos complement values. the smaller value is stored in the rd register. maximum rd, rs, rt. max rd, rst, rt the source operands rs and rt are compared as twos complement values. the larger value is stored in the rd register.
2-14 architectural overview 2.6.1 cache sizes the instruction cache sizes available are 0 kbytes to 16 kbytes, direct mapped or two-way set associative. the data cache sizes available are 0 kbytes to 16 kbytes, direct mapped or two-way set associative. 2.6.2 standard vs high- performance multiply accumulate unit each project may choose either a standard multiply unit that provides the base r3000 and r4000 multiply instructions (with similar performance), or a high-performance unit that also implements the madd and msub instructions. in the standard unit, multiples are executed by retiring two bits per clock cycle, with early termination that depends on the size of the multiplican (8 x 32 bit multiply = 4 cycles, 16 x 32 = 8 cycles, 32 x 32 = 16 cycles). the high-performance unit is intended for applications with substantial multiply/accumulate performance needs. it includes a 32 x 32 pipelined array multiplier and a 64-bit accumulator that can retire a multiply or multiply/accumulate instruction every two clock cycles with a latency of three clock cycles per result. the highest multiply/accumulate unit can retire a multiply/accumulate instruction every single clock cycle, with a latency of two clock cycles per result. 2.6.3 64-bit vs 32-bit memory interface for cost-sensitive designs or applications with low memory bandwidth, the biu can be modi?ed to present a 32-bit data bus instead of 64-bit. the CW4010 biu supports sizing for a 32-bit interface. 2.6.4 memory management unit the CW4010 is designed to support the 32-bit addressing mode of the r4000 mmu. the tlb that is available in the base processor design contains up to 64 single-page entries. each page can be individually speci?ed to be 4 kbytes or 16 mbytes. for designs with no tlb requirements, the tlb can be removed to save silicon. 2.7 supporting models and tools a software model, minisim, is available for software development and performance modeling. the model is a stand-alone c program. it provides a software level model and the lsi logic pmon and c runtime library that is supplied with lsi logics hardware evaluation boards. you can run r3000 or r4000 compiled code on the model, using the pmon debugging environment. you can obtain performance data for benchmarks and code tuning. currently, minisim is available for sun sparcstation. contact lsi logic for versions supporting other platforms.
supporting models and tools 2-15 for use later in the development cycle, but before production of silicon, lsi logic provides a behavioral verilog model that is pin compatible and cycle accurate with the real design without the timing. you can use the model for behavioral modeling of your system design. a vhdl version of the model is also available. the gate level model is available in a variety of simulation environments for timing accurate simulations. for faster synthesis of logic surrounding the cpu core, lsi logic provides a synthesis shell. this includes all drive capability, input loading, and timing information. it excludes the considerable complexity of the core internals, since these are of no relevance to synthesis tools. as the design nears completion, timing analysis may be performed using a timing shell. this includes the same set of information as the synthesis shell, provided in a format directly usable with supported static timing analysis tools. again, the core internals are of no relevance to this process, and would only reduce its ef?ciency. structural core models are provided for ?nal versions of the design in various supported simulation tools. contact lsi logic for availability of the version supporting your preferred simulator. you can use the full range of mips and third party software development tools for the r3000 and r4000 on the CW4010 product. the cpu is compatible with both mips-i and mips-ii compilers and assemblers.
2-16 architectural overview
3-1 chapter 3 instruction set summary this chapter presents an overview of the mips r-series instructions and the instruction set extensions supported in the CW4010. this chapter contains the following sections: section 3.1, instruction set formats, on page 3-1 section 3.2, load and store instructions, on page 3-2 section 3.3, computational instructions, on page 3-5 section 3.4, jump and branch instructions, on page 3-11 section 3.5, trap instructions, page 3-15 section 3.6, special instructions, on page 3-16 section 3.7, coprocessor instructions, on page 3-17 section 3.8, system control coprocessor (cp0) instructions, on page 3-18 section 3.9, cache maintenance instructions, on page 3-19 section 3.10, CW4010 instruction set extensions, on page 3-20 section 3.11, cpu instruction opcode bit encoding, on page 3-36 3.1 instruction set formats every r-series instruction consists of a single word (32 bits) aligned on a word boundary. as shown in figure 3.1 , there are three instruction formats: i-type (immediate), j-type (jump), and r-type (register). the restricted format approach simpli?es instruction decoding. the compiler and assembler can synthesize more complicated (and less frequently used) operations and addressing modes.
3-2 instruction set summary figure 3.1 instruction format 3.2 load and store instructions load and store instructions are all i-type instructions and move data between memory and general purpose registers. the only addressing mode directly supported in the base r-series architecture is base register plus 16-bit signed immediate offset . the mips-ii extensions add the load-linked and store-conditional instructions, which support multiple processors, and the sync instruction, which synchronizes loads and stores. the CW4010 supports these instructions. the load/store instruction operation code (opcode) determines the access type, which in turn indicates the size of the data item to be loaded or stored. regardless of access type or byte-numbering order ( big- endian or little-endian ), the address speci?es the byte that has the smallest byte address of all the bytes in the addressed ?eld. for a big- endian machine, the smallest byte is the leftmost byte; for a little-endian machine, it is the rightmost byte. the bytes used within the addressed word can be determined directly from the access type and the two low-order bits of the address, as shown 0 op i-type (immediate) immediate op rs rt j-type (jump) 31 26 25 21 20 16 15 0 31 26 25 target r-type (register) 0 31 26 25 21 20 16 15 11 10 6 5 op rs rt rd shamt funct op 6-bit operation code rs 5-bit source register speci?er rt 5-bit target (source/destination register) immediate 16-bit immediate, branch displacement, or address displacement target 26-bit jump target address rd 5-bit destination register speci?er shamt 5-bit shift amount funct 6-bit function ?eld md96.75
load and store instructions 3-3 in figure 3.2 . note that certain combinations of access type and low-order address bits can never occur; only the combinations shown in figure 3.2 are allowed. figure 3.2 byte speci?cations for loads/stores 111 110 data bus 63 0 63 0 msb lsb lsb msb byte numbers byte numbers low-order address bits a2 a1 a0 access type doubleword word tribyte halfword byte bytes accessed bytes accessed 000 001 010 011 100 101 000 010 100 110 000 001 100 101 000 000 little-endian big-endian 76543210 md96.76 100 7 6 5 4 3 2 1 0
3-4 instruction set summary table 3.1 and table 3.2 describe the load and store instructions supported by the CW4010. instruction format is shown in courier, for example, lb rt, offset(base) . table 3.1 load and store instruction summary instruction format and description load byte lb rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. sign-extend the contents of addressed byte and load into rt . load byte unsigned lbu rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. zero-extend the contents of addressed byte and load into rt . load halfword lh rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. sign-extend contents of addressed halfword and load into rt . load halfword unsigned lhu rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. zero-extend contents of addressed halfword and load into rt . load word lw rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address, and load the addressed word into rt . load word left lwl rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift addressed word left so that addressed byte is leftmost byte of a word. merge bytes from memory with contents of register rt and load result into register rt . load word right lwr rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift addressed word right so that addressed byte is rightmost byte of a word. merge bytes from memory with contents of register rt and load result into register rt . store byte sb rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store least-signi?cant byte of register rt at addressed location. store halfword sh rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store least-signi?cant halfword of register rt at addressed location. store word sw rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. store contents of register rt at addressed location. store word left swl rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift contents of register rt left so that the leftmost byte of the word is in the position of the addressed byte. store word containing shifted bytes into word at addressed byte.
computational instructions 3-5 3.3 computational instructions computational instructions perform arithmetic, logical, and shift operations on values in registers. computational instructions occur in both r-type (both operands are registers) and i-type (one operand is a 16-bit immediate) formats. there are ?ve categories of computational instructions: table 3.3 summarizes the alu immediate instructions table 3.4 summarizes the 3-operand, register-type instructions table 3.5 summarizes the shift instructions table 3.6 summarizes the multiply/divide instructions table 3.7 summarizes the computational CW4010 instruction extensions (CW4010 isa) store word right swr rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. shift contents of register rt right so that the rightmost byte of the word is in the position of the addressed byte. store word containing shifted bytes into word at addressed byte. table 3.1 load and store instruction summary instruction format and description table 3.2 load and store instruction summary (mips-ii isa extensions) instruction format and description load linked ll rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address, and load the addressed word into register rt . store conditional sc rt, offset(base) sign-extend the 16-bit offset and add to the contents of register base to form address. conditionally store register rt at address, based on whether the load-link has been broken. sync sync complete all outstanding load and store instructions before allowing any new load or store instruction to start.
3-6 instruction set summary table 3.3 alu immediate instruction summary instruction format and description add immediate addi rt, rs, immediate add 16-bit, sign-extended immediate to register rs and place 32-bit result in register rt . trap on twos complement over?ow. add immediate unsigned addiu rt, rs, immediate add 16-bit, sign-extended immediate to register rs and place 32-bit result in register rt . do not trap on over?ow. set on less than immediate slti rt, rs, immediate compare 16-bit, sign-extended immediate with register rs as signed 32-bit integers. result = 1 if rs is less than immediate; otherwise result = 0. place result in register rt . set on less than immediate unsigned sltiu rt, rs, immediate compare 16-bit, sign-extended immediate with register rs as unsigned 32-bit integers. result = 1 if rs is less than immediate; otherwise result = 0. place result in register rt . and immediate andi rt, rs, immediate zero-extend 16-bit immediate , and with contents of register rs , and place result in register rt . or immediate ori rt, rs, immediate zero-extend 16-bit immediate , or with contents of register rs , and place result in register rt . exclusive or immediate xori rt, rs, immediate zero-extend 16-bit immediate , exclusive or with contents of register rs , and place result in register rt . load upper immediate lui rt, immediate shift 16-bit immediate left 16 bits. set least-signi?cant 16 bits of word to zeros. store result in register rt .
computational instructions 3-7 table 3.4 3-operand, register- type instruction summary instruction format and description add add rd, rs, rt add contents of registers rs and rt and place 32-bit result in register rd .trapon twos complement over?ow. add unsigned addu rd, rs, rt add contents of registers rs and rt and place 32-bit result in register rd . do not trap on over?ow. subtract sub rd, rs, rt subtract contents of register rt from rs and place 32-bit result in register rd .trap on twos complement over?ow. subtract unsigned subu rd, rs, rt subtract contents of register rt from rs and place 32-bit result in register rd .do not trap on over?ow. set on less than slt rd, rs, rt compare contents of register rt to register rs (as signed, 32-bit integers). if register rs is less than rt , rd = 1; otherwise, rd =0. set on less than unsigned sltu rd, rs, rt compare contents of register rt to register rs (as unsigned, 32-bit integers). if register rs is less than rt , rd = 1; otherwise, rd =0. and and rd, rs, rt bitwise and contents of registers rs and rt and place result in register rd . or or rd, rs, rt bitwise or contents of registers rs and rt and place result in register rd . exclusive or xor rd, rs, rt bitwise exclusive or contents of registers rs and rt and place result in register rd . nor nor rd, rs, rt bitwise nor contents of registers rs and rt and place result in register rd .
3-8 instruction set summary table 3.5 shift instruction summary instruction format and description shift left logical sll rd, rt, shamt shift contents of register rt left by shamt bits, inserting zeros into low-order bits. place 32-bit result in register rd . shift right logical srl rd, rt, shamt shift contents of register rt right by shamt bits, inserting zeros into high-order bits. place 32-bit result in register rd . shift right arithmetic sra, rd, rt, shamt shift contents of register rt right by shamt bits, sign-extending the high-order bits. place 32-bit result in register rd . shift left logical variable sllv rd, rt, rs shift contents of register rt left. low-order 5 bits of register rs specify the number of bits to shift. insert zeros into low-order bits of rt and place 32-bit result in register rd . shift right logical variable srlv rd, rt, rs shift contents of register rt right. low-order 5 bits of register rs specify the number of bits to shift. insert zeros into high-order bits of rt and place 32-bit result in register rd . shift right arithmetic variable srav rd, rt, rs shift contents of register rt right. low-order 5 bits of register rs specify the number of bits to shift. sign-extend the high-order bits of rt and place 32-bit result in register rd .
computational instructions 3-9 table 3.6 multiply/divide instruction summary instruction format and description multiply mult rs, rt multiply contents of registers rs and rt as twos complement values. place 64-bit results in special registers hi and lo . multiply unsigned multu rs, rt multiply contents of registers rs and rt as unsigned values. place 64-bit results in special registers hi and lo. divide div rs, rt divide contents of register rs by the contents of rt as twos complement values. place 32-bit quotient in special register lo and 32-bit remainder in hi. divide unsigned divu rs, rt divide contents of register rs by the contents of rt as unsigned values. place 32-bit quotient in special register lo and 32-bit remainder in hi. move from hi mfhi rd move contents of special register hi to register rd . move from lo mflo rd move contents of register rs to special register lo. move to hi mthi rs move contents of register rs to special register hi. move to lo mtlo rs move contents of register rd to special register lo. minimum min rd, rs, rt compare the contents of registers rs and rt as twos complement values. the smaller value is stored in register rd . maximum max rd, rs, rt compare the contents of registers rs and rt as twos complement values. the larger value is stored in register rd .
3-10 instruction set summary table 3.7 computation instruction extensions summary (CW4010 isa) instruction format and description add circular immediate addciu rt, rs, immediate the 16-bit immediate is sign-extended and added to the contents of general register rs , with the result masked by the value in cp0 register cmask according to the formula: rt = (rs 31...cmask ||(rs+signextended_imed) cmask-1...0 ) . find first set bit ffs rd, rs starting at the most signi?cant bit in register rs , ?nd the ?rst bit that is set to a one, and return the bit number in register rd . if no bit is set, return with all bits of rd set to 1. find first clear bit ffc rd, rs starting at the most signi?cant bit in register rs , ?nd the ?rst bit that is set to a zero, and return the bit number in register rd . if no bit is set, return with all bits of rd set to 1. select and shift right selsr rd, rs, rt using register rs and rt as a 64-bit register pair and the cp0 register rotate as the shift count, shift the register pair rs || rt right the number of bits speci?ed in rotate, and place the least signi?cant 32-bit value in result register rd . select and shift left selsl rd, rs, rt using register rs and rt as a 64-bit register pair and the cp0 register rotate as the shift count, shift the register pair rs || rt left the number of bits speci?ed in rotate, and place the most signi?cant 32-bit value in result register rd . multiply/add madd rs, rt multiply contents of registers rs and rt as twos complement values. add 64-bit results to contents in special register pair hi/lo, and place results in hi and lo. multiply/add unsigned maddu rs, rt multiply contents of registers rs and rt as unsigned values. add 64-bit results to contents in special register pair hi/lo, and place results in hi and lo. multiply/subtract msub rs, rt multiply contents of registers rs and rt as twos complement values. subtract 64-bit results from contents in special register pair hi/lo, and place results in hi and lo. multiply/subtract unsigned msubu rs, rt multiply contents of registers rs and rt as unsigned values. subtract 64-bit results from contents in special register pair hi/lo, and place results in hi and lo.
jump and branch instructions 3-11 table 3.8 shows the execution time of the multiply/divide/accumulate type instructions. table 3.8 execution time of multiply and divide instructions 3.4 jump and branch instructions jump and branch instructions change the control ?ow of a program. mips-i jump and branch instructions always occur with a one-instruction delay. the instruction immediately following the jump or branch is always executed while the target instruction is being fetched from storage. there may be additional cycle penalties, depending on circumstances and implementation, but the penalties are interlocked in hardware. the mips-ii isa extensions add the branch likely class of instructions that operate exactly like their non-likely counterparts, except that when the branch is not taken, the instruction following the branch is cancelled. the j-type instruction format is used for both jump and jump-and-link instructions for subroutine calls. in the j-type format, the 26-bit target address is shifted left two bits and combined with the 4 high-order bits of the current program counter to form a 32-bit absolute address. the r-type instruction format, which takes a 32-bit byte address contained in a register, is used for returns, dispatches, and cross-page jumps. branches have 16-bit signed offsets relative to the program counter (i-type). jump-and-link and branch-and-link instructions save a return address in register 31. operation r3000 cw33300 r4000 CW4010 high speed CW4010 basic multiply 12 1 + (bits/3) 10 3 1 + (bits/2) multiply/add na na na 3 1 1 + (bits/2) divide 34 34 69 34/17 2 35 1. for high-speed CW4010 multiply/add instructions, instructions can be pipe- lined for a throughput of one operation every clock cycle while the latency is three cycles. pipelining the instructions accelerates calculations such as dot products and fir ?lters that perform a series of multiplies/adds to compute a single result. 2. the divide time is shortened to 17 cycles if the divisor has less than 16 sig- ni?cant bits.
3-12 instruction set summary table 3.9 summarizes the r-series jump instructions, table 3.10 summarizes the branch instructions, and table 3.11 summarizes the branchlikely instructions. table 3.9 jump instruction summary instruction format and description jump j target shift 26-bit target address left two bits, combine with four high-order bits of pc, and jump to address with a one-instruction delay. jump and link jal target shift 26-bit target address left two bits, combine with four high-order bits of pc, and jump to address with a one-instruction delay. place address of instruction following delay slot in register 31 (link register). jump register jr rs jump to address contained in register rs with a one-instruction delay. jump and link register jalr rs, rd jump to address contained in register rs with a one-instruction delay. place address of instruction following delay slot in rd .
jump and branch instructions 3-13 table 3.10 branch instruction summary instruction format and description branch on equal beq rs, rt, offset branch to target address 1 if register rs is equal to register rt . branch on not equal bne rs, rt, offset branch to target address if register rs does not equal register rt . branch on less than or equal to zero blez rs, offset branch to target address if register rs is less than or equal to 0. branch on greater than zero bgtz rs, offset branch to target address if register rs is greater than 0. branch on less than zero bltz rs, offset branch to target address if register rs is less than 0. branch on greater than or equal to zero bgez rs, offset branch to target address if register rs is greater than or equal to 0. branch on less than zero and link bltzal rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is less than 0. branch on greater than or equal to zero and link bgezal rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is greater than or equal to 0. 1. all branch-instruction target addresses are computed as follows: add address of instruction in delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). all branches occur with a delay of one instruction.
3-14 instruction set summary table 3.11 branchlikely instruction summary (mips-ii isa extensions) instruction format and description branch on equal likely beql rs, rt, offset branch to target address 1 if register rs is equal to register rt . branch on not equal likely bnel rs, rt, offset branch to target address if register rs does not equal register rt . branch on less than or equal to zero likely blezl rs, offset branch to target address if register rs is less than or equal to 0. branch on greater than zero likely bgtzl rs, offset branch to target address if register rs is greater than 0. branch on less than zero likely bltzl rs, offset branch to target address if register rs is less than 0. branch on greater than or equal to zero likely bgezl rs, offset branch to target address if register rs is greater than or equal to 0. branch on less than zero and link likely bltzall rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is less than 0. branch on greater than or equal to zero and link likely bgezall rs, offset place address of instruction following delay slot in register 31 (link register). branch to target address if register rs is greater than or equal to 0. 1. all branch-instruction target addresses are computed as follows: add address of instruction in delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). all branches occur with a delay of one instruction.
trap instructions 3-15 3.5 trap instructions trap instructions are part of the mips-ii instruction set and provide instructions that conditionally create an exception, based on the same conditions tested in the branch instructions. table 3.12 provides a summary of mips-ii isa extensions. table 3.12 trap instruction summary (mips-ii isa extensions) instruction format and description trap on equal teq rs, rt trap if register rs is equal to register rt . trap on equal immediate teqi rs, immediate trap if register rs is equal to the immediate value. trap on greater than or equal tge rs, rt trap if register rs is greater than or equal to register rt . trap on greater than or equal immediate tgei rs, immediate trap if register rs is greater than or equal to the immediate value. trap on greater than or equal unsigned tgeu rs, rt trap if register rs is greater than or equal to register rt . trap on greater than or equal immediate unsigned tgeiu rs, immediate trap if register rs is greater than or equal to the immediate value. trap on less than tlt rs, rt trap if register rs is less than register rt . trap on less than immediate tlti rs, immediate trap if register rs is less than the immediate value. trap on less than unsigned tltu rs, rt trap if register rs is less than register rt . trap on less than immediate unsigned tltiu rs, immediate trap if register rs is less than the immediate value. trap if not equal tne rs, rt trap if register rs is not equal to rt . trap if not equal immediate tnei rs, immediate trap if register rs is not equal the immediate value.
3-16 instruction set summary 3.6 special instructions special instructions cause an unconditional branch to the general exception-handling vector. special instructions are always r-type and are summarized in table 3.13 . table 3.13 special instruction summary instruction format and description system call syscall initiates system call trap, immediately transferring control to exception handler. breakpoint break initiates breakpoint trap, immediately transferring control to exception handler.
coprocessor instructions 3-17 3.7 coprocessor instructions the CW4010 supports external (on-chip) coprocessors and implements the coprocessor instruction set. coprocessor branch instructions are j-type. table 3.14 summarizes the different coprocessor instructions. table 3.14 coprocessor instruction summary instruction format and description load word to coprocessor lwcz rt, offset(base) extends the sign of the 16-bit offset and adds the offset to the contents of the general register base to form a 32-bit unsigned effective address. the word at the memory location speci?ed is loaded into coprocessor register rt of the coprocessor unit z . store word from coprocessor swcz rt, offset(base) extends the sign of the 16-bit offset and adds the offset to the contents of the general register base to form a 32-bit unsigned effective address. the contents of coprocessor register rt of the coprocessor unit z are stored at the address speci?ed by the 32-bit unsigned effective address. move to coprocessor mtcz rt, rd loads the contents of general register rt into the rd register of coprocessor unit z . move from coprocessor mfcz rt , rd loads the contents of the rd register of coprocessor unit z into general register rt . move control to coprocessor ctcz rt, rd loads the contents of general register rt into the control register rd of coprocessor unit z . move control from coprocessor cfcz rt, rd loads the contents of the control register rd of coprocessor unit z into general register rt . coprocessor operation copz cofun initiates a coprocessor operation that may specify and reference the coprocessors internal registers or change the state of the coprocessors condition line, but does not change the state within the processor or the cache memory. branch on coprocessor z true (likely) bczt offset, (bcztl offset) compute a branch target address by adding address of instruction to the 16-bit offset (shifted left two bits and sign-extended to 32 bits). branch to the target address (with a delay of one instruction) if coprocessor z s condition line is true. in the case of branchlikely, the delay slot instruction is not executed when the branch is not taken. branch on coprocessor z false (likely) bczf offset, (bczfl offset) compute a branch target address by adding address of instruction to the 16-bit offset (shifted left two bits and sign-extended to 32 bits). branch to the target address (with a delay of one instruction) if coprocessor z s condition line is false. in the case of branchlikely, the delay slot instruction is not executed when the branch is not taken.
3-18 instruction set summary 3.8 system control coprocessor (cp0) instructions coprocessor-0 instructions perform operations on the system control coprocessor (cp0) registers to manipulate the memory management and exception-handling facilities of the processor. table 3.15 summarizes the cp0 instructions and table 3.16 shows the extensions. if the tlb is removed, the tlb instructions (tlbr, tlbwi, tlbwrm tlbp) cause an ri (reserved instruction) exception. if the CW4010 is in r3000 compatibility mode, the eret (exception returned) instruction is unavailable and this causes an ri exception. conversely, if the CW4010 is in r4000 mode, the rfe (restore from exception) instruction is unavailable and this causes an ri exception. table 3.15 cp0 instruction summary instruction format and description move to cp0 mtc0 rt, rd loads contents of cpu register rt into cp0 register rd . move from cp0 mfc0 rt, rd loads contents of cp0 register rd into cpu register rt . read indexed tlb entry 1 tlbr loads entryhi and entrylo with the tlb entry pointed to by the index register. write indexed tlb entry 1 tlbwi loads tlb entry pointed to by the index register with the contents of the entryhi and entrylo registers. write random tlb entry 1 tlbwr loads tlb entry pointed to by the random register with the contents of the entryhi and entrylo registers. probe tlb for matching entry 1 tlbp loads the index register with the address of the tlb entry whose contents match the entryhi and entrylo registers. if no tlb entry matches, set the high- order bit of the index register. exception return 2 eret (r4000 mode) loads the pc from errorepc (sr2 = 1: error exception) or epc (sr2 = 0: exception) and clear erl bit (sr2 = 1) or exl bit (sr2 = 0) in the status register. sr2 is status register bit 2. restore from exception 2 rfe (r3000 mode) restores previous interrupt mask and mode bits of the status register into current status bits. restore old status bits into previous status bits. 1. if there is no mmu (memory management unit) installed, any of these instructions can cause a reserved instruction exception. 2. only one of these instructions is legal at any one time. the one that is not legal causes a reserved instruction exception.
cache maintenance instructions 3-19 3.9 cache maintenance instructions cache maintenance instructions are always i-type. table 3.17 summarizes these instructions. table 3.16 cp0 instruction extension summary instruction format and description wait for interrupt waiti stops execution of instructions and places the processor into a power save (stall) condition until a hardware interrupt, nmi, or reset is received. table 3.17 cache maintenance instruction summary instruction format and description flush icache flushi flush icache needs 256 stall cycles. flush dcache flushd flush dcache needs 256 stall cycles. flush icache & dcache flushid flush both icache and dcache in 256 stall cycles. writeback wb offset(base) write back a dcache line addressed by offset+gpr[base] .
3-20 instruction set summary 3.10 CW4010 instruction set extensions this section de?nes the CW4010 instruction set extensions. addciu add with circular mask immediate format syntax addciu rt, rs, immediate description the immediate ?eld of the instruction is sign-extended and added to the contents of general register rs , the result of which is masked with the expanded value in special register cmask according to the equation shown below. the cmask register is cp0 register number 24, whose valid bits are [4:0]. the carries resulting from the addition of the sign-extended offset are not propagated into the ?nal result beyond bit [cmask-1]. operation t: sign_extend_immed = (immediate 15 ) 16 || immediate 15..0 gpr[rt] = gpr[rs] 31..cmask || (gpr[rs] + sign_extend_immed) cmask-1..0 exceptions none 31 26 25 21 20 16 15 0 addciu rs rt immediate 011100 rs rt immediate
CW4010 instruction set extensions 3-21 ffc find first clear bit format syntax ffc rd, rs description the contents of general register rs are examined starting with the most signi?cant bit. the bit number of the ?rst clear bit is returned in general register rd . if no bit is set, all ones are returned in rd . exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs 0 rd 0 ffc 000000 rs 0 rd 00000 001011
3-22 instruction set summary ffs find first set bit format syntax ffs rd, rs description the contents of general register rs are examined starting with the most signi?cant bit. the bit number of the ?rst set bit is returned in general register rd . if no bit is set, all ones are returned in rd . exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs 0 rd 0 ffs 000000 rs 0 rd 00000 001010
CW4010 instruction set extensions 3-23 flushd flush data cache format syntax flushd description flushd ?ushes all data cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushd 0 101111 00000 00010 0
3-24 instruction set summary flushi flush instruction cache format syntax flushi description flushi ?ushes all instruction cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushi 0 101111 00000 00001 0
CW4010 instruction set extensions 3-25 flushid flush instruction and data cache format syntax flushid description flushid ?ushes all data and instruction cache lines and causes stall cycles for 256 clocks, regardless of the cache size. exceptions none 31 26 25 21 20 16 15 0 cache 0 flushid 0 101111 00000 00011 0
3-26 instruction set summary madd multiply add format syntax madd rs, rt description the contents of general register rs and the contents of general register rt are multiplied. both operands are treated as 32-bit twos complement values. when the operation is completed, the doubleword result is added to special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the cache con?guration and control (ccc) register. madd executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.19 on page 3-37 . operation t: t <- (hi || lo) + (gpr[rs] * gpr[rt]) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 madd 000000 rs rt 0 00000 011100
CW4010 instruction set extensions 3-27 maddu multiply add unsigned format syntax maddu rs, rt description the contents of general register rs and the contents of general register rt are multiplied with both operands treated as 32-bit unsigned values. when the operation is completed, the doubleword result is added to special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.19 on page 3-37 . operation t: t <- (hi || lo) + ((0||gpr[rs]) * (0||gpr[rt])) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 maddu 000000 rs rt 0 00000 011101
3-28 instruction set summary max maximum format syntax max rd, rs, rt description the source operands rs and rt are compared as twos complement values. the larger value is stored in the rd register. operation t: if gpr[rs]>gpr[rt] then gpr[rd]<-gpr[rs] else gpr[rd]<-gpr[rt] endif exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 max 000000 rs rt rd 00000 101001
CW4010 instruction set extensions 3-29 min minimum format syntax min rd, rs, rt description the source operands rs and rt are compared as twos complement values. the smaller value is stored in the rd register. operation t: if gpr[rs] 3-30 instruction set summary msub multiply subtract format syntax msub rs, rt description the contents of general register rs and rt are multiplied and both operands are treated as 32-bit twos complement values. when the operation is complete, the doubleword result is subtracted from special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.19 on page 3-37 . operation t: t <- (hi || lo) - (gpr[rs] * gpr[rt]) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 msub 000000 rs rt 0 00000 011110
CW4010 instruction set extensions 3-31 msubu multiply subtract unsigned format syntax msubu rs, rt description the contents of general register rs and rt are multiplied and both operands are treated as 32-bit unsigned values. when the operation is completed, the doubleword result is subtracted from special register pair hi/lo. no over?ow exception occurs under any circumstances. this instruction is only available when the chip has multiplier-accumulator module hardware and mad/mul are set to one in the ccc register. the instruction executes in multiple cycles, depending on the number of signi?cant bits in the operands. refer to table 3.19 on page 3-37 . operation t: t <- (hi || lo) - ((0||gpr[rs]) * (0||gpr[rt])) lo <- t 31..0 , hi <- t 63..32 exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt 0 0 msubu 000000 rs rt 0 00000 011111
3-32 instruction set summary selsl select and shift left format syntax selsl rd, rs, rt description the contents of general register rs and rt are combined to form a 64-bit doubleword. the doubleword is shifted left the number of bits speci?ed in the cp0 register rotate, and the upper 32 bits of the result are placed in general register rd . this rotate register is cp0 register number 23, with valid bits [4:0]. operation t: s <- rotate 4..0 gpr[rd] <- gpr[rs] 31-s..0 || gpr[rt] 31..32-s exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 selsl 000000 rs rt rd 00000 000101
CW4010 instruction set extensions 3-33 selsr select and shift right format syntax selsr rd, rs, rt description the contents of general register rs and rt are combined to form a 64-bit doubleword. the doubleword is shifted right the number of bits speci?ed in cp0 register rotate, and the lower 32 bits of the result are placed in general register rd . this rotate register is cp0 register number 23. valid bits are [4:0]. operation t: s <- rotate 4..0 gpr[rd] <- gpr[rs] s-1..0 || gpr[rt] 31..s exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 special rs rt rd 0 selsr 000000 rs rt rd 00000 000001
3-34 instruction set summary waiti wait for interrupt format syntax waiti description when this instruction is executed, the main processor clock stops and execution of instructions is halted. execution resumes when a hardware interrupt, nmi, or reset exception is received. while it is in wait mode, the processor is in a power saving mode, using very little current because the clock is turned off to most of the circuitry. waiti must be followed by two or more no-operation instructions, otherwise, the results may be unde?ned. refer to appendix a, programmers notes, for further information. exceptions none 31 26 25 21 20 16 15 11 10 6 5 0 cop0 0 0 0 waiti 010000 10000 00000 00000 00000 100000
CW4010 instruction set extensions 3-35 wb writeback data cache format syntax wb offset( base ) description eight words of the data cache line addressed by offset +gpr[base] are written back to memory if the line is dirty. upper bits of offset +gpr[base ] are ignored. exceptions none 31 26 25 21 20 16 15 0 cache base wb offset 101111 base 00100 offset
3-36 instruction set summary 3.11 cpu instruction opcode bit encoding tables 3.18 through 3.24 show the opcode bit encoding for CW4010 instructions. the following keys are referenced in the table: *rxf1 operation codes marked with *rxf1 cause reserved instruction excep- tions in all current implementations and are reserved for future versions of the architecture. *rxf2 operation codes marked with *rxf2 cause reserved instruction excep- tions in all current implementations and are reserved for future versions of the architecture. *rxf2 is separated from other reserved instructions for copz. these are not detected as reserved instruction codes that cause an exception on the r3000. the r4000 detects them. *rx40 an operation code marked with *rx40 causes a reserved instruction exception on r4000 and CW4010 processors (when in r4000 mode). it is used as a restore from exception (rfe) instruction on the r3000, lr33000, lr33300, and CW4010 in r3000 mode. *rx64 operation codes marked with *rx64 cause a reserved instruction excep- tion. they are 64-bit instructions on r4000. *nrx operation codes marked with *nrx are invalid but do not cause reserved instruction exceptions in CW4010 implementations. x1 operation codes marked with x1 are originally extended instructions in CW4010 implementations. they are reserved instructions that cause an exception on r4000. x2 the operation code cache marked with x2 is valid only for CW4010 pro- cessors with cp0 enabled and causes a reserved instruction exception with cp0 disabled. bits [20:16] are sub-opcodes. they are instructions for cache maintenance, and the functions are not compatible with r4000. recommended mnemonics are flushi, flushd, flushid, and wb offset ( base ). undefined opcodes of cache instruction do not cause reserved instruction exception in CW4010 implementations. x3 operation codes marked with x3 are originally extended instructions in CW4010 implementations. they are used for 64-bit multiply and divide instructions on r4000. if the mul bit or mad bit in the ccc register is zero, they cause a reserved instruction exception. the ccc register is described in detail in section 4.3.10, con?guration and cache control (ccc) register (16), on page 4-20 . x4 operation codes marked with x4 cause a reserved instruction exception if the mul bit in the ccc register is zero. x5 the operation code eret marked with x5 is valid only on the r4000 and CW4010 in r4000 mode.
cpu instruction opcode bit encoding 3-37 x6 operation codes marked with x6 are coprocessor-3 instructions, which are not available on r4000. these are available on the r3000 and CW4010. table 3.18 CW4010 opcode bit encoding [28:26] opcode [31:29] 0 1 2 3 4 5 6 7 0 special regimm j jal beq bne blez bgtz 1 addi addiu slti sltiu andi ori xori lui 2 cop0 cop1 cop2 cop3 x6 beql bnel blezl bgtzl 3 *rx64 *rx64 *rx64 *rx64 addciu x1 *rxf1 *rxf1 *rxf1 4 lb lh lwl lw lbu lhu lwr *rx64 5 sb sh swl sw *rx64 *rx64 swr cache x2 6 ll lwc1 lwc2 lwc3 x6 *rx64 *rx64 *rx64 *rx64 7 sc swc1 swc2 swc3 x6 *rx64 *rx64 *rx64 *rx64 table 3.19 special opcode bit encoding [2:0] special function [5:3] 0 1 2 3 4 5 6 7 0 sll selsr x1 srl sra sllv selsl x1 srlv srav 1 jr jalr ffs x1 ffc x1 syscall break *rxf1 sync 2 mfhi x4 mthi x4 mflo x4 mtlo x4 *rx64 *rxf1 *rx64 *rx64 3 mult x4 multu x4 div x4 divu x4 madd x3 maddu x3 msub x3 msubu x3 4 add addu sub subu and or xor nor 5 min x1 max x1 slt sltu *rx64 *rx64 *rx64 *rx64 6 tge tgeu tlt tltu teq *rxf1 tne *rxf1 7 *rx64 *rxf1 *rx64 *rx64 *rx64 *rxf1 *rx64 *rx64
3-38 instruction set summary table 3.20 regimm opcode rt bit encoding [18:16] regimm rt [20:19] 0 1 2 3 4 5 6 7 0 bltz bgez bltzl bgezl *rxf1 *rxf1 *rxf1 *rxf1 1 tgei tgeiu tlti tltiu teqi *rxf1 tnei *rxf1 2 bltzal bgezal bltzall bgezall *rxf1 *rxf1 *rxf1 *rxf1 3 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 *rxf1 table 3.21 cache x2 opcode rt bit encoding [18:16] cache x2 rt [20:19] 0 1 2 3 4 5 6 7 0 *nrx flushi x2 flushd x2 flushid x2 wb x2 *nrx *nrx *nrx 1 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 2 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 3 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx table 3.22 copz rs opcode bit encoding [23:21] copz rs [25:24] 0 1 2 3 4 5 6 7 0 mfcz *rx64 cfcz *rxf2 mtcz *rx64 ctcz *rxf2 1 bc *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 2 copz (coprocessor de?ned instructions) 3
cpu instruction opcode bit encoding 3-39 table 3.23 copz rt opcode bit encoding [18:16] copz rt [20:19] 0 1 2 3 4 5 6 7 0 bcf bct bcfl bctl *rxf2 *rxf2 *rxf2 *rxf2 1 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 3 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 *rxf2 table 3.24 cp0 opcode bit encoding [2:0] cp0 function [5:3] 0 1 2 3 4 5 6 7 0 *nrx tlbr tlbwi *nrx *nrx *nrx tlbwr *nrx 1 tlbp *nrx *nrx *nrx *nrx *nrx *nrx *nrx 2 rfe rx40 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 3 eret x5 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 4 waiti x1 *nrx *nrx *nrx *nrx *nrx *nrx *nrx 5 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 6 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx 7 *nrx *nrx *nrx *nrx *nrx *nrx *nrx *nrx
3-40 instruction set summary
4-1 chapter 4 CW4010 exception processing this chapter describes the CW4010s system coprocessor, coprocessor-0 (cp0), and explains how the CW4010 handles exception processing. the chapter contains the following sections: section 4.1, overview, on page 4-1 section 4.2, r3000 exception compatibility mode, on page 4-3 section 4.3, exception handling registers, on page 4-4 section 4.4, exception description details, on page 4-28 4.1 overview when the CW4010 detects an exception, it suspends the normal sequence of instruction execution, exits from user mode, and enters kernel mode where it can handle exceptions. the CW4010 reverts to kernel mode, regardless of the mode at the time of the exception. the processor then disables interrupts and forces a software handler located at a ?xed address in memory to be executed. the handler saves the context of the processor. the context must be restored when the exception has been handled. section 5.2.1, operating modes, beginning on page 5-3 provides more information on this subject. when an exception occurs, the cp0 loads the exception program counter (epc) with a restart location where execution may resume after the exception has been serviced. the restart location in the epc is the address of the instruction that caused the exception or, if the instruction was executing in a branch delay slot, the address of the branch instruction immediately preceding the delay slot. the instruction causing the exception and all the instructions following in the pipeline are aborted. they will be refetched after return from the exception.
4-2 CW4010 exception processing this chapter describes the events that can initiate exception processing. table 4.1 summarizes the events. table 4.1 CW4010 exceptions exception cause cold reset deassertion of the CW4010 cold reset signal, cresetn. warm reset deassertion of the CW4010 warm reset signal, wresetn. non-maskable interrupt assertion of the non-maskable interrupt signal, nmin. debug detection of a program counter breakpoint, data address breakpoint, or trace event. not supported in standard r3000 and r4000 processors. address error attempt to load, fetch, or store an unaligned wordthat is a word that is at an address not evenly divisible by four, or a halfword that is at an address not evenly divisible by two. references to an address for which the most signi?cant bit was set while in the CW4010 was in user mode may also cause an address error. tlb re?ll there is no tlb entry to match a reference to a mapped address space. tlb entry invalid a virtual address reference matches a tlb entry that is marked invalid. tlb modi?ed a store operations virtual address reference matches a tlb entry that is marked valid but is not dirty/writable. bus error assertion of the CW4010 external bus error signal, scberrn. integer over?ow twos complement over?ow during an add or subtract. trap one of the trap instructions results in a true condition. system call an attempt to execute the syscall instruction. breakpoint an attempt to execute the break instruction. reserved instruction execution of an instruction with an unde?ned or reserved major operation code (bits [31:26]), or a special instruction whose minor operation code (bits [5:0]) is unde?ned. coprocessor unusable execution of a coprocessor instruction where the cu (coprocessor usable) bit is not set for the target coprocessor. floating point available for use by an external ?oating-point coprocessor. interrupt assertion of one of the CW4010s six hardware interrupt inputs, or the setting of one of the two software interrupt bits in the cause register. interrupts must be enabled. external vectored interrupt assertion of the CW4010 exvintn input. not supported in r3000 and r4000 processors.
r3000 exception compatibility mode 4-3 4.2 r3000 exception compatibility mode although the CW4010 processor is based on the mips r4000 architecture, an r3000 style exception processing capability has been added. this facility allows you to con?gure cp0 exception processing in such a way that existing r3000 exception handling code can be run on the CW4010 processor with little or no modi?cation to the code. r3000 compatibility mode is under the control of the compatibility bit (bit 24) of the con?guration and cache control (ccc) register, discussed in section 4.3.10, con?guration and cache control (ccc) register (16), on page 4-20 . the compatibility bit is reset to 0 (r4000 mode) when a cold reset exception occurs. if r3000 mode operation is desired, bit 24 should be set to 1 as part of the cold reset handler. once it has been placed in r3000 mode, the processor should only be switched back to r4000 mode by another cold reset. when r3000 mode is enabled, the behavior of the following areas is affected: status register the lower six bits of the status register are rede?ned to implement the kernel/user mode and interrupt enable stack as de?ned by the r3000 architecture. the status register is discussed in detail in section 4.3.6, status register (12), on page 4-9 . exception handling vectors the exception handling vectors (base and offset) are remapped to those speci?ed by the r3000 architecture. the exception vectors are discussed in detail in section 4.4.3, exception vector locations, on page 4-31 . exception return (rfe vs. eret) when operating in r3000 compatibility mode, exception return is accomplished using the rfe instruction. if an attempt is made to use the eret instruction, a reserved instruction exception will be recognized. the following sections provide more detail on CW4010 exception handling. where appropriate, the differences between standard operation r4000 and r3000 compatibility mode are noted. in all other cases, operation is identical.
4-4 CW4010 exception processing 4.3 exception handling registers this section describes the cp0 registers used in exception processing. software examines these registers during exception processing to determine the cause of an exception and the state of the cpu at the time of the exception. each of the registers is listed and described in detail in the sections that follow. two other cp0 registers that are part of the virtual memory management system and contain important information about exception handling are the index register (cp0 register 0), described in section 5.3.2.4, index register (0), on page 5-12 , and the random register (cp0 register 1), described in section 5.3.2.5, random register (1), on page 5-12 . you can use the mtc0 (move to coprocessor-0) instruction to set the bits in the registers, and mtf0 (move from coprocessor-0) to read the contents of the registers. table 4.2 cp0 exception processing registers register name cp0 register number context 4 dcs (debug control and status) 7 badvaddr (bad virtual address) 8 count 9 compare 11 status 12 cause 13 epc (exception program counter) 14 prid (processor revision identi?er) 15 ccc (con?guration and cache control) 16 lladr (load linked address) 17 bpc (breakpoint program counter) 18 bda (breakpoint data address) 19 bpcm (breakpoint pc mask) 20 bdam (breakpoint data addr mask) 21 errorpc 30
exception handling registers 4-5 4.3.1 context register (4) the context register is a read/write register containing a pointer to an entry in the page table entry (pte) array. this array is an operating system data structure that stores virtual to physical address translations. when there is a tlb miss, operating system software handles the miss by loading the tlb with the missing translation from the pte array. the badvpn ?eld is not writable. it contains the vpn of the most recently translated virtual address that did not have a valid translation (tlbl or tlbs). the ptebase ?eld is both writable and readable, and indicates the base address of the pte table of the current user address space. the context register duplicates some of the information provided in the badvaddr register, but the information is in a form that is more useful for a software tlb exception handler. the context register can be used by the operating system to hold a pointer into the pte array. the operating system sets the pte base ?eld register, as needed. normally, the operating system uses the context register to address the current page map, which resides in the kernel- mapped segment kseg2. the register is included solely for the use of the operating system. figure 4.1 shows the format of the context register. figure 4.1 context register ptebase page table entry base [31:22] this ?eld is the operating system pointer. it points to the page table entry in memory. badvpn bad virtual page number [21:2] this ?eld contains the most recently translated virtual address that did not have a valid translation. bits [31:12] of this ?eld contain the virtual address that caused the tlb miss. this format provides a table of four-byte ptes for a page size of 4 kbytes. for other pte and page sizes, shifting and masking bits [31:12] produces an appropriate address. 31 22 21 2 1 0 ptebase badvpn r
4-6 CW4010 exception processing r reserved [1:0] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. 4.3.2 debug control and status (dcs) register (7) the debug control and status (dcs) register contains the enable and status bits for the CW4010 debug facility. all bits have read/write access. figure 4.2 shows the format of the dcs register. figure 4.2 dcs register tr trap 31 this is the trap enable bit. setting it to 1 traps debug events to the debug exception vector. clearing it to 0 disables the trap. however, the status bits are updated with status debug event information when the bit is cleared. ud user mode debug event 30 this bit is set to 1 to detect a debug event when the CW4010 is operating in user mode. kd kernel mode debug event 29 this bit is set to 1 to detect a debug event when the CW4010 is operating in kernel mode. te trace event 28 this bit is set to 1 to detect a trace event (non-sequential fetch operation). dw data write 27 this bit is set to 1 to detect a data write at bda (breakpoint data address) event. the bit is used in conjunction with dae. 31302928272625242322 6543210 tr ud kd te dw dr dae pce de r t w rd da pc db
exception handling registers 4-7 dr data read 26 this bit is set to 1 to detect a data read at bda event. the bit is used in conjunction with dae. dae detect bda event 25 this bit is set to 1 to detect bda debug events. pce program counter breakpoint event 24 this bit is set to 1 to detect program counter breakpoint events. de debug enable 23 this bit is set to 1 to enable the debug facility. clearing the bit disables the debug facility. r reserved [22:6] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. t trace 5 the setting of this bit indicates the trace event status. it is set to 1 to indicate that a trace event has been detected. w write 4 this bit is the write reference bit and its setting matches the dw bit setting. rd read 3 this bit is the read reference bit and its setting matches the dr bit setting. da dae debug condition 2 this bit indicates the status of the dae debug condition. pc pce debug condition 1 this bit indicates the status of the pce debug condition. db debug detected 0 this bit is set whenever any debug condition is detected.
4-8 CW4010 exception processing 4.3.3 bad virtual address (badvaddr) register (8) the bad virtual address (badvaddr) register is a read-only register that holds the 32-bit failing virtual address for address error (adel, ades) and tlb translation (tlbl, tlbs, mod) exceptions. figure 4.3 shows the format of the badvaddr register. figure 4.3 badvaddr register 4.3.4 count register (9) the count register acts as a timer. it increments at a constant rate regardless of whether an instruction is executed, retried, or any forward progress is made. the count register increments at half the maximum instruction issue rate. the count register is a read/write registerit can be written for diagnostic purposes or for system initialization to synchronize two processors operating in lock step. figure 4.4 shows the format of the count register. figure 4.4 count register 4.3.5 compare register (11) the compare register implements a timer service (see also the count register) that maintains a stable value and does not change on its own. when the timer facility is enabled and the value of the count register equals the value of the compare register, interrupt bit ip7 in the cause register is set. this causes an interrupt on the next execution cycle when the interrupt is enabled. writing a value to the compare register clears the timer interrupt. 31 0 bad virtual address md96.141 31 0 count md96.142
exception handling registers 4-9 for diagnostic purposes, the compare register is a read/write register. in normal operation, the compare register is only written. figure 4.5 shows the format of the compare register. figure 4.5 compare register 4.3.6 status register (12) the status register is a read/write register that contains the operating mode, interrupt enabling, and the diagnostic states of the processor. the format of the status register is slightly different when the CW4010 is operating in r4000 mode and r3000 mode. r4000 mode operation below, describes the format for r4000 mode operation. section 4.3.6.2, r3000 mode operation on page 4-29 describes the format for r3000 mode operation. 4.3.6.1 r4000 mode operation the format of the r4000 version of the status register (ccc24 = 0) is shown in figure 4.6 . figure 4.6 status register (r4000 mode) cu[3:0] coprocessor usability bits [31:28] the software uses this ?eld to control accesses to the coprocessors. when the bit is set to 1 the corresponding coprocessor is usable, as shown below: cu3 = 1 enables coprocessor 3 cu2 = 1 enables coprocessor 2 cu1 = 1 enables coprocessor 1 cu0 = 1 enables coprocessor 0 31 0 compare 31 28 27 23 22 21 20 19 16 15 10 9 8 7 5 4 3 2 1 0 cu[3:0] r bev r sr r int[5:0] sw[1:0] r ksu[1:0] erl exl ie
4-10 CW4010 exception processing r reserved [27:23, 21, 19:16, 7:5] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. bev bootstrap exception vector 22 this bit controls the location of the tlb re?ll and the general exception vectors. setting the bit to 1 implements a bootstrap operation and bootstrap vector locations are used. when the bit is cleared to 0, normal exception vectors are used. refer to the following subsections for further information: cold reset on page 4-12 ; warm reset on page 4-12 . sr soft reset 20 this bit indicates whether a warm reset or a non- maskable interrupt has occurred. when the bit is set to 1, it indicates a warm reset. when it is cleared to 0, it indicates a non-maskable interrupt. refer to the subsection warm reset on page 4-12 ,for further information. int[5:0] interrupt mask [15:10] this ?eld is a six-bit [5:0] hardware interrupt mask. setting a bit to 1 enables the corresponding hardware interrupt. for example, setting bit 5 enables hardware interrupt 5. sw[1:0] software interrupt mask [9:8] this ?eld is a two-bit [1:0] software interrupt mask. setting a bit to 1 enables the corresponding software interrupt. ksu kernel/user mode [4:3] this ?eld determines the base operating mode of the CW4010 core as follows: [1:0] = 00, base mode is kernel [1:0] = 10, base mode is user all other settings are reserved. refer to the following subsections for further information: processor modes on page 4-11 ; kernel address space accesses on page 4-12 ; warm reset on page 4-12 .
exception handling registers 4-11 erl error level 2 this bit determines the error level of the CW4010. when it is set to 1, the level is error. when it is cleared to 0, the level is normal. refer to the following subsections for further information: interrupt enable on page 4-11 ; processor modes on page 4-11 ; kernel address space accesses on page 4- 12 ; cold reset on page 4-12 . exl exception level 1 this bit determines the exception level of the CW4010. when it is set to 1, the level is exception. when it is cleared to 0, the level is normal. refer to the following subsections for further information: interrupt enable on page 4-11 ; processor modes on page 4-11 ; kernel address space accesses on page 4- 12 . ie interrupt enable 0 setting this bit to 1 enables interrupts. clearing it to 0 disables interrupts. refer to subsection interrupt enable below for further information. interrupt enable C interrupts are enabled when the following ?eld conditions are true: ie is set to 1. exl is cleared to 0. erl is cleared to 0. if these conditions are met, interrupts are recognized according to the setting of the int and sw mask bits. processor modes C the setting of the ksu bit, in conjunction with the settings of the exl and erl bits, de?nes the CW4010 processor modes as follows: the processor is in user mode when ksu is equal to 10 b , and exl and erl are cleared to 0.
4-12 CW4010 exception processing the processor is in kernel mode under any one of the following conditions: C ksu is equal to 00 b . C exl is set to 1. C erl is set to 1. kernel address space accesses C access to the kernel address space is allowed only when the processor is in kernel mode, that is under any one of the following conditions: ksu is equal to 00 b . exl is set to 1. erl is set to 1. user address space accesses C access to the user address space is always allowed. cold reset C the contents of the status register are unde?ned after a cold reset, except for the following bits: erl and bev are set to 1. warm reset C the contents of the status register are unchanged by warm reset, except for the following bits: erl, bev, and sr bits are set to 1. 4.3.6.2 r3000 mode operation the format of the r3000 version of the status register (ccc24 = 1) is shown in figure 4.7 . figure 4.7 status register (r3000 mode) 31 2827 2322212019 1615 109876543210 cu[3:0] r bev r sr r int[5:0] sw[1:0] r kuo ieo kup iep kuc iec
exception handling registers 4-13 cu[3:0] coprocessor usability bits [31:28] the software uses this ?eld to control accesses to the coprocessors. when the bit is set to 1 the corresponding coprocessor is usable, as shown below: cu3 = 1 enables coprocessor 3 cu2 = 1 enables coprocessor 2 cu1 = 1 enables coprocessor 1 cu0 = 1 enables coprocessor 0 r reserved [27:23, 21, 19:16, 7:6] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. bev bootstrap exception vector 22 this bit controls the location of the tlb re?ll and the general exception vectors. setting the bit to 1 implements a bootstrap operation and bootstrap vector locations are used. when the bit is cleared to 0, normal exception vectors are used. refer to the subsection warm reset on page 4-15 for further information. sr soft reset 20 this bit indicates whether a warm reset or a non- maskable interrupt has occurred. when the bit is set to 1, it indicates a warm reset. when it is cleared to 0, it indicates a non-maskable interrupt. refer to the subsection warm reset on page 4-12 for further information. int[5:0] interrupt mask [15:10] this ?eld is a six-bit [5:0] hardware interrupt mask. setting a bit to 1 enables the corresponding hardware interrupt. for example, setting bit 5 to 1 enables hardware interrupt 5. sw[1:0] software interrupt mask [9:8] this ?eld is a two-bit [1:0] software interrupt mask. setting a bit to 1 enables the corresponding software interrupt.
4-14 CW4010 exception processing kuo kernel/user mode, old 5 this bit shows the old base operating mode of the CW4010 core. setting it to 1 indicates user mode. clearing the bit to 0 indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes. refer to the subsection warm reset on page 4-15 for further information. ieo interrupt enable, old 4 this bit shows the old interrupt enable setting. setting it to 1 indicates that interrupts are enabled. clearing the bit to 0 indicates that interrupts are disabled. the bit is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. refer to the subsection interrupt enable on page 4-11 for further information. kup kernel/user mode, previous 3 this bit shows the previous base operating mode of the CW4010 core. setting it to 1 indicates user mode. clearing the bit to 0 indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes. refer to the following subsections for further information: warm reset on page 4-15 ; processor modes on page 4-11 . iep interrupt enable, previous 2 this bit shows the previous interrupt enable setting. setting it to 1 indicates that interrupts are enabled. clearing the bit to 0 indicates that interrupts are disabled. the bit is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. refer to the subsection interrupt enable on page 4-11 for further information. kuc kernel/user mode, current 1 this bit shows the current base operating mode of the CW4010 core. setting it to 1 indicates user mode. clearing the bit to 0 indicates kernel mode. the bit is part of a three-bit stack that indicates old, previous, and current modes.
exception handling registers 4-15 refer to the following subsections for further information: warm reset on page 4-15 ; processor modes on page 4-11 . iec interrupt enable, current 0 this bit shows the old interrupt enable setting. setting it to 1 indicates that interrupts are enabled. clearing the bit to 0 indicates that interrupts are disabled. the bit is part of a three-bit stack that indicates old, previous, and current interrupt enable settings. refer to subsection interrupt enable on page 4-11 for further information. interrupt enable C interrupts are enabled when iec is set to 1. in this case, interrupts are recognized according to the setting of the int and sw masks. processor modes C CW4010 processor modes are de?ned by the setting of the kuc bit: the processor is in user mode when kuc is set to 1. the processor is in kernel mode when kuc is cleared to 0. kernel address space accesses C access to the kernel address space is allowed only when the processor is in kernel mode. user address space accesses C access to the user address space is always allowed. warm reset C the contents of the status register are unchanged by warm reset, except for the following bits: the bev and sr bits are set to 1. the ku and ie bits are pushed deeper into the stack and kuc and iec are cleared to 0, for example: kuo/ieo ? kup/iep ? kuc/iec ? 0/0. figure 4.8 shows how the CW4010 core manipulates the status register during exception recognition.
4-16 CW4010 exception processing figure 4.8 status register and exception recognition when the CW4010 recognizes an exception, it saves the current kernel/user mode bit (kuc) and the current interrupt enable bit (iec) in the previous kernel/user mode bit (kup) and previous interrupt enable bit (iep), respectively. the previous bits are saved in the old bits, and the current bits are cleared to 0. the process is shown in the following example: kuo/ieo ? kup/iep ? kuc/iec ? 0. when the CW4010 executes a return from exception (rfe) instruction, the values are popped off the stack, kuc and iec are reset to their previous values, for example: kuc/iec ? kup/iep ? kuo/ieo. 4.3.7 cause register (13) the cause register is a read/write register. the contents of this register provide information about the most recent exception. the format of the register is shown in figure 4.9 . all bits in the register, with the exception of ip[1:0], are read-only bits. figure 4.9 cause register bd branch delay 31 this bit indicates whether or not the last exception was taken while the CW4010 was executing an instruction in the branch delay slot. setting the bit to 1 indicates that the exception was taken. clearing the bit to 0 indicates that the exception was not taken. 543210 kuo ieo kup iep kuc iec kuo ieo kup iep kuc iec 00 exception recognition kuc current kernel/user mode bit iec current interrupt enable mode bit kup previous kernel/user mode bit iep previous interrupt enable mode bit kuo old kernel mode bit ieo old interrupt enable mode bit md96.146 31 30 29 28 27 16 15 8 7 6 2 1 0 bd bt ce[1:0] r ip[7:0] r exccode[4:0] r
exception handling registers 4-17 bt bd set 30 if the bd bit is set, setting this bit to 1 indicates that the branch was taken. ce[1:0] coprocessor error [29:28] the value in the coprocessor error ?eld indicates the coprocessor unit referenced when a coprocessor unusable exception is taken: r reserved [27:16, 7, 1:0] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. ip[7:0] interrupt pending [15:8] this ?eld indicates if an interrupt is pending. there is a direct correlation between the bit set and the interrupt pending. so, if ip7 (bit 15) is set, interrupt 7 is pending. exccode[4:0] exception code [6:2] this ?eld de?nes the exception code. table 4.3 lists the valid exception code values. ce1 ce0 coprocessor referenced 1 1 coprocessor 3 1 0 coprocessor 2 0 1 coprocessor 1 0 0 coprocessor 0 table 4.3 cause register exccode field exception code value mnemonic description 0 int interrupt 1 mod tlb modi?cation exception 2 tlbl tlb exception (load or instruction fetch) 3 tlbs tlb exception (store) 4 adel address error exception (load or instruction fetch) 5 ades address error exception (store) 6 bus bus error exception (sheet 1 of 2)
4-18 CW4010 exception processing 4.3.8 exception program counter register (14) the exception program counter (epc) is a read-write register that contains the address where processing resumes after an exception has been serviced. for synchronous exceptions, the epc register contains either: the virtual address of the instruction that was the direct cause of the exception the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the branch delay bit in the cause register is set). figure 4.10 shows the format of the epc register. bits [31:2] make up the program counter. bits [1:0] are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. figure 4.10 epc register 7 reserved 8 sys syscall exception 9 bp breakpoint exception 10 ri reserved instruction exception 11 cpu coprocessor unusable exception 12 ov arithmetic over?ow exception 13 tr trap exception 14 reserved 15 fpe floating-point exception 16-31 reserved table 4.3 (cont.) cause register exccode field exception code value mnemonic description (sheet 2 of 2) 31 210 exception program counter r
exception handling registers 4-19 4.3.9 processor revision identi?er register (15) the processor revision identi?er (prid) is a 32-bit, read-only register that contains information identifying the implementation and revision level of the CW4010 core, as shown in figure 4.11 . figure 4.11 prid register r reserved [31:16] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. imp implementation number [15:8] the value in this ?eld represents the cores implementation number. this ?eld can be programmed at the core interface using the implop[3:0] lines. rev revision number [7:0] the value of this ?eld is interpreted as a processor unit revision number. the revision number is a value of the form y.x, where y is a major revision number in bits [7:4] and x is a minor revision number in bits [3:0]. this ?eld can be programmed at the core interface using the revlop[3:0] lines. the revision number can distinguish between some chip revisions. however, lsi logic does not guarantee that changes to this core will necessarily be re?ected in the prid register, or that changes to the revision number necessarily re?ect real core changes. for this reason, these values are not listed and software should not rely on the revision number in the prid register to characterize the core. 31 16 15 8 7 0 r imp rev
4-20 CW4010 exception processing 4.3.10 con?guration and cache control (ccc) register (16) the con?guration and cache control (ccc) register allows software to con?gure various pieces of the CW4010 design (for example, biu, tlb, and cache controllers). figure 4.12 shows the format of the ccc register. figure 4.12 ccc register r reserved [31:28] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. sdb scan debug mode 27 this bit enables the scan debug mode. the bit is set to 1 to enable the mode and cleared to 0 to disable the mode. isr1 icache scratchpad ram 26 this bit enables the icache to be used as a scratchpad ram. setting the bit to 1 enables scratchpad ram mode. clearing it to 0 disables scratchpad ram mode. evi external vectored interrupt 25 this bit enables and disables external vectored interrupt. setting the bit to 1 enables the interrupt and clearing it to 0 disables the interrupt. cmp r3000 compatibility 24 this bit enables and disables r3000 compatibility mode. setting the bit to 1 enable the mode and clearing it disables the mode. iie icache invalidate enable 23 this bit enables and disables the icache invalidate request. setting the bit to 1 enables the request and clearing it to 0 disables the request. 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 r s d b i s r 1 e v i c m p i i e d i e m u l m a d t m r b g e i e 0 i e 1 i s [1:0] d e 0 d e 1 ds [1:0] i p w e ipws [1:0] t e w b s r 0 s r 1 i s c t a g i n v md96.150
exception handling registers 4-21 die dcache invalidate enable 22 this bit enables and disables the dcache invalidate request. setting the bit to 1 enables the request and clearing it to 0 disables the request. mul multiplier enable 21 this bit enables and disables the hardware multiplier. setting the bit to 1 enables the multiplier and clearing it disables the multiplier. mad multiplier accumulate extensions 20 this bit allows the multiplier to support accumulate extensions. setting the bit to 1 enables the feature and clearing the bit disables the feature. when this bit is set, mul must also be set. tmr timer 19 setting this bit to 1 enables the timer facility, count = compare ? p7. beg biu bus enable grant 18 this bit enables and disables the biu bus grant. setting this bit to 1 enables the external bus master. clearing it to 0 allows the CW4010 core to ignore the external bus master. ie0 icache set-0 enable 17 this bit enables and disables set-0 of the icache. setting the bit to 1 enables set-0 and clearing it to 0 disables set-0. ie1 icache set-1 enable 16 this bit enables and disables set-1 of the icache. setting the bit to 1 enables set-1 and clearing it to 0 disables set-1. is[1:0] icache size [15:14] the is[1:0] ?eld determines the size of the icache set. the ?eld is set as follows: is1 is0 cache size 00 1k 01 2k 10 4k 11 8k
4-22 CW4010 exception processing de0 dcache set-0 enable 13 this bit enables and disables set-0 of the dcache. setting the bit to 1 enables set-0 and clearing it to 0 disables set-0. de1 dcache set-1 enable 12 this bit enables and disables set-1 of the dcache. setting the bit to 1 enables set-1 and clearing it to 0 disables set-1. ds[1:0] dcache size [11:10] the ds[1:0] ?eld determines the size of the dcache set. the ?eld is set as follows: ipwe in-page write enable 9 this bit enables and disables in-page write operations. setting the bit to 1 enables in-page write and clearing it to 0 disables in-page write. ipws[1:0] in-page write size [8:7] the ipws[1:0] ?eld determines the size of the icache for in-page write operations. the ?eld is set as follows: te tlb enable 6 this bit enables and disables the tlb. setting the bit to 1 enables the tlb and clearing the bit to 0 disables the tlb. ids1 ds0 cache size 00 1k 01 2k 10 4k 11 8k ipws1 ipws0 in-page write size 00 1k 01 2k 10 4k 11 8k
exception handling registers 4-23 wb writeback 5 this bit de?nes operation for addresses not mapped by the tlb. setting the bit to 1 enables a writeback operation and clearing it to 0 enables a writethrough operation. sr0 scratchpad ram mode set-0 4 this bit enables and disables scratchpad ram mode for set-0 of the dcache. setting the bit to 1 enables scratchpad mode and clearing it to 0 disables scratchpad mode. sr1 scratchpad ram mode set-1 3 this bit enables and disables scratchpad ram mode for set-1 of the dcache. setting the bit to 1 enable scratchpad mode and clearing it to 0 disables scratchpad mode. isc isolate cache 2 this bit enables isolate cache mode. this means that stores to the cache are not propagated to external memory. setting the bit to 1 enables the mode and clearing it to 0 disables the mode. tag tag test mode 1 this bit enables and disables tag test mode, which is used for cache maintenance. setting the bit to 1 enables the mode and clearing it to 0 disables the mode. inv invalidate cache mode 0 this bit enables and disables cache invalidate mode, which is used for cache maintenance. setting the bit to 1 enables the mode and clearing it to 0 disables the mode. 4.3.11 load linked address (lladdr) register (17) the load linked address (lladdr) register is a read/write register that contains the physical address (paddr[31:2]) read by the most recent load linked instruction. this register is used for diagnostic purposes only, and serves no function during normal operation. the lladdr register is physically located in the lsu. the cp0 must send read/write signals to the lsu when the value of the register is to be read or written.
4-24 CW4010 exception processing figure 4.13 shows the format of the lladdr register. bits [31:2] contain the paddr[31:2]. bits [1:0] are reserved and cleared to 0. figure 4.13 lladdr register 4.3.12 breakpoint program counter (bpc) register (18) the breakpoint program counter (bpc) register is a read/write register that software uses to specify a program counter breakpoint. the bpc register is used in conjunction with the breakpoint pc mask register, described in section 4.3.14, breakpoint pc mask (bpcm) register (20) on page 4-25 . figure 4.14 shows the format of the 32-bit bpc register. bits [31:2] make up the breakpoint program counter. bits [1:0] are reserved and cleared to 0. figure 4.14 bpc register 4.3.13 breakpoint data address (bda) register (19) the breakpoint data address (bda) register is a read/write register that software uses to specify a virtual data address breakpoint. the bda register is used in conjunction with the breakpoint data address mask register described in section 4.3.15, breakpoint data address mask (bdam) register (21) on page 4-25 . figure 4.15 shows the format of the 32-bit bda register. bits [31:0] make up the bda. figure 4.15 bda register 31 210 paddr[31:2] r 31 210 breakpoint program counter r 31 0 breakpoint data address
exception handling registers 4-25 4.3.14 breakpoint pc mask (bpcm) register (20) the breakpoint program counter mask (bpcm) register is a read/write register that masks bits in the bpc register .a1inanybitinthe bpcm register indicates that the CW4010 compares the value of the bit with the corresponding bit in the bpc register for program counter (debug) exceptions. values of 0 in the mask indicate that the CW4010 does not check the corresponding bits in the bpc register. figure 4.16 shows the format of the 32-bit bpcm register. bits [31:2] make up the mask. bits [1:0] are reserved and cleared to 0. figure 4.16 bpcm register 4.3.15 breakpoint data address mask (bdam) register (21) the breakpoint data address mask (bdam) register is a read/write register that masks bits in the bda register .a1inanybitinthebdam register indicates that the CW4010 compares the value of the bit with the corresponding bit in the bda register for data address (debug) exceptions. values of 0 in the mask indicate that the CW4010 does not check the corresponding bits in the bda register. figure 4.17 shows the format of the 32-bit bdam register. bits [31:0] make up the bdam. figure 4.17 bdam register 31 210 breakpoint program counter mask r 31 0 breakpoint data address mask
4-26 CW4010 exception processing 4.3.16 rotate register (23) the rotate register is used by the CW4010 instruction set extensions. select and rotate left (selsl), and select and rotate right (selsr) use the lower ?ve bits of the register [4:0] as the shift count. this is useful for data alignment operations in graphics and in bit-?eld selection routines for data transmission and compression applications. even though the rotate register resides in the cp0, user-mode access to the register is always granted, regardless of the value contained in the cu0 bit of the status register. figure 4.18 shows the format of the rotate register. figure 4.18 rotate register r reserved [31:5] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. rotate rotate [4:0] this ?eld determines the shift count. 4.3.17 circular mask (cmask) register (24) the circular mask (cmask) register is used by the CW4010 instruction set extensions. the load/store word/halfword/byte with update circular instructions store a value in the destination register and update the base address register with the addition of base + offset, which is modi?ed according to the value of bits [4:0]. this feature is important in dsp (digital signal processing) and other applications that use circular buffers. even though the circular mask register resides within the cp0, user- mode access is always granted to the register, regardless of the value contained in status[cu0]. figure 4.19 shows the format of the cmask register. 31 54 0 r rotate
exception handling registers 4-27 figure 4.19 cmask register r reserved [31:5] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. cmask circular mask [4:0] this ?eld contains the circular mask. 4.3.18 error exception program counter (error epc) register (30) the error exception program counter (error epc) register is similar to the epc. it stores the pc (program counter) on cold reset, warm reset, and nmi exceptions. the read/write error epc register contains the virtual address at which instruction processing can resume after the interrupt has been serviced. the address may be either: the virtual address of the ?rst instruction terminated by the exception the virtual address of the immediately preceding branch or jump instruction when the terminated instruction is in a branch delay slot. figure 4.20 shows the format of the error epc register. bits [31:2] make up the error epc. bits [1:0] are reserved and cleared to 0. there is no branch delay slot indication for the error epc register. figure 4.20 error epc register 31 54 0 r cmask 31 210 error epc r
4-28 CW4010 exception processing 4.4 exception description details this section describes each of the CW4010 exceptions, what causes them, and how they are handled and serviced. 4.4.1 exception operation to handle an exception, the processor saves the current operating state, enters kernel mode, disables interrupts, and forces execution of a handler at a ?xed address. to resume normal operation, the operating state must be restored and interrupts enabled. when an exception occurs, the epc register is loaded with the restart location at which execution can resume after the exception has been serviced. the epc register contains the address of the instruction associated with the exception, or, if the instruction was executing in a branch delay slot, the epc register contains the address of the branch instruction immediately preceding. 4.4.1.1 r4000 mode operation (default after cold reset) the CW4010 processor uses the following mechanisms for saving and restoring the operating mode and interrupt status: a single interrupt enable bit (ie) located in the status register. a base operating mode (user, kernel) located in the ksu ?eld of the status register. an exception level (normal, exception) located in the exl ?eld of the status register. an error level (normal, error) located in the erl ?eld of the status register. interrupts are enabled by setting the ie bit to 1 and both levels (exl, erl) to normal. table 4.4 shows how the current processor operating mode is de?ned. table 4.4 current processor mode current mode status ksu[1:0] status exl status erl user kernel kernel kernel 10 00 xx xx 0 0 1 0 0 0 0 1
exception description details 4-29 exceptions set the exception level to exception (exl = 1). the exception handler typically resets the exception level to normal (exl = 0) after saving the appropriate state. it sets it back to exception while restoring that state. returning from an exception (eret instruction) resets the exception level to normal. 4.4.1.2 r3000 mode operation the r3000 mode of operation is much simpler than the r4000 mode. the current processor operating state is always de?ned by the kuc bit (0 C> kernel, 1 C> user). the basic mechanism for saving and restoring the operating state of the processor is the kernel/user (ku) and interrupt enable (ie) stack located in the bottom six bits of the status register. when responding to an exception, the current mode bits (kuc/iec) are saved into the previous mode bits (kup/iep); the previous mode bits are saved into the old mode bits (kuo/ieo); and the current mode bits (kuc/iec) are both cleared to 0. after exception processing has been completed, the saved state is restored using the rfe instruction, which causes the previous mode bits to be copied back into the current mode bits and the old mode bits to be copied back into the previous mode bits. the old mode bits are left unchanged. 4.4.1.3 exception processing diagrams figures 4.21 through 4.25 show the basic set of actions taken for each of the major CW4010 exception classes: cold reset, warm reset, non- maskable interrupt, common, debug, and external vectored interrupt. figure 4.21 cold reset exception figure 4.22 warm reset, nmi exceptions random ? tlbentries - 1 wired ? 0 ccc ? 032 dcs ? 032 errorpc ? pc sr ? 04 || sr[27:23] || 1 || 0 || 0 || sr[19:3] || 1 || sr[1:0] pc ? 0xbfc0 0000 errorpc ? pc if (ccc24 = 0) then sr ? sr[31:23] || 1 || 0 || 1 || sr[19:3] || 1 || sr[1:0] else sr ? sr[31:23] || 1 || 0 || 1 || sr[19:6] || sr[3:0] || 02 endif pc ? 0xbfc0 0000
4-30 CW4010 exception processing figure 4.23 common exceptions figure 4.24 debug exception figure 4.25 external vectored interrupt exception cause ? bd || bt || ce || 012 || cause[15:8] || 0 || exccode || 02 if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr{31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 02 endif if (sr22 = 1) then if (ccc24 = 0) then pc ? 0xbfc0 0200 + vector offset else pc ? 0xbfc0 0100 + vector offset endif else pc ? 0x8000 0000 + vector offset endif dcs ? dcs[31:6] || t || w || r || da || pc || db cause ? bd || bt || cause[29:0] if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr[31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 02 endif if (sr22 = 1) then if (ccc24 = 0) then pc ? 0xbfc0 0200 + vector offset else pc ? 0xbfc0 0100 + vector offset endif else pc ? 0x8000 0000 + vector offset endif cause ? bd || bt || cause[29:0] if ((ccc24 = 1) | (sr1 = 0)) then epc ? pc endif if (ccc24 = 0) then sr ? sr[31:2] || 1 || sr0 else sr ? sr[31:6] || sr[3:0] || 02 endif pc ? exvap[31:2] || 02
exception description details 4-31 4.4.2 precision of exceptions exceptions are logically precise. this means that the instruction that causes an exception and all those that follow it are aborted, generally before committing to any state; execution picks up where it left off before the exception; and the instruction can be re-executed after the exception has been serviced. when following instructions are killed, exceptions associated with those instructions are also killed, so that exceptions are not taken in the order detected, but in the instruction fetch order. interrupts generated by external devices attached to the processor have a variety of meanings, depending on the system environment into which the CW4010 core is designed. variations in memory system design can affect the meaning of bus error exceptions and the location and means of accessing relevant parameters to service them. as far as possible, this architectural description of the exception handling system details which state information is reliable and which is unreliable. in some cases, however, the characteristics of the pipeline staging cannot guarantee that all states in the processor and associated system will remain completely unchanged. this is because it is possibly the incomplete execution of instructions immediately following an instruction that has caused an exception. state changes that may occur include the following: instructions may be read from memory and loaded into the instruction cache. the multiply/divide registers (hi and lo) may have been altered by a mult/multu, div/divu, or mthi/mtlo instruction. these changes can normally be ignored because the state of the machine is suf?ciently restored, allowing execution to resume after the exception has been serviced. 4.4.3 exception vector locations the cold reset, warm reset, and nmi exceptions are always vectored to location 0xbfc00000. addresses for other exceptions are a combination of a vector offset and a base address, and they are determined by the bev bit of the status register. table 4.5 shows the vector base addresses and table 4.6 shows the vector offsets.
4-32 CW4010 exception processing table 4.5 exception vector base addresses table 4.6 exception vector offset addresses 4.4.4 priority of exceptions while more than one exception can occur for a single instruction, only one exception is reported. table 4.7 shows the priority order given to the exception, with cold reset having the highest priority. table 4.7 exception priority order 4.4.5 cold reset exception the primary purpose of a cold reset is to initialize the CW4010 core at power up. this section describes the cause of and response to a cold reset exception. bev r4000 mode (ccc24 = 0) r3000 mode (ccc24 = 1) 0 0x80000000 0x80000000 1 0xbfc00200 0xbfc0100 exception r4000 mode (ccc24 = 0) r3000 mode (ccc24 = 1) tlb re?ll 0x000 (exl = 0) 0x000 (kuseg access) debug 0x040 0x040 all others 0x180 0x080 priority cold reset warm reset nmi address error - instruction fetch tlb re?ll - instruction fetch tlb invalid - instruction fetch bus error integer over?ow, trap, system call, breakpoint, reserved instruction, coprocessor unusable, floating-point error address error - data access tlb re?ll - data access tlb invalid - data access tlb modi?ed - data write interrupt external vectored interrupt debug
exception description details 4-33 4.4.5.1 cause the cold reset exception occurs when the cresetn signal is asserted and then deasserted. this exception is not maskable. 4.4.5.2 handling the cpu provides a special interrupt vector (0xbfc00000) for the cold reset exception. the reset vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache to handle the exception. the processor can fetch and execute instructions while the caches and virtual memory are in an unde?ned state. the contents of all registers in the cpu are unde?ned when the cold reset exception occurs except for the following: in the status register, the cu[3:0] and sr bits are cleared to 0 and the erl and bev bits are set to 1. other bits are unde?ned the random register is initialized to the value of its upper bound the wired register is initialized to 0 4.4.5.3 servicing the cold reset exception is serviced by initializing all processor registers, coprocessor registers, caches, and the memory system. servicing is accomplished by performing diagnostic tests, and by bootstrapping the operating system. 4.4.6 warm reset exception the primary purpose of the warm reset exception is to reinitialize the processor after a fatal error. unlike non-maskable interrupts, all cache and bus state machines are reset by this exception. like cold reset, it can be used on the processor in any state. the caches, tlb, and normal exception vectors need not be properly initialized. this section describes the cause of and response to a warm reset exception. 4.4.6.1 cause the warm reset exception occurs when the wresetn signal is asserted and then deasserted. this exception is not maskable.
4-34 CW4010 exception processing 4.4.6.2 handling the reset exception vector (0xbfc00000) is used for this exception. the vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache to handle the exception. the sr bit of the status register is set to distinguish between a warm reset exception and a cold reset exception. the contents of all registers are preserved when the warm reset exception occurs, except for the following: the errorpc register, which contains the restart pc (program counter the bev and sr bits of the status register, which are set to 1 r4000 mode, in which the erl bit is set to 1 r3000 mode, in which kuo/ieo ? kup/iep ? kuc/iec ? 0/0 because warm reset can abort cache and bus operations, cache and memory state is unde?ned when the warm reset exception occurs. refer to figure 4.8 ,on page 4-16 for further information. 4.4.6.3 servicing the warm reset exception is serviced by saving the current processor state for diagnostic purposes, and reinitializing in a manner similar to that for the cold reset exception. 4.4.7 non-maskable interrupt (nmi) exception non-maskable interrupts cannot be disabled. they occur when a catastrophic event, such as power failure, requires immediate attention to maintain system integrity. 4.4.7.1 cause the non-maskable interrupt exception occurs in response to the falling edge of the nmi pin. as the name implies, the nmi exception is not maskable, and occurs regardless of the settings of the exl, erl, and ie status register bits. 4.4.7.2 handling the reset exception vector (0xbfc00000) is also used for this exception. the reset vector resides in unmapped and uncached cpu address
exception description details 4-35 space, so the hardware need not initialize the tlb or the cache to handle the nmi interrupt. the sr bit of the status register is set to differentiate the nmi exception from a cold reset exception. because an nmi could occur in the middle of another exception, it is generally not possible to continue program execution after servicing an nmi. unlike cold and warm reset, but in common with other exceptions, nmi is taken only at instruction boundaries. the states of the caches and memory system are preserved by this exception. the contents of all registers in the cpu are preserved when this exception occurs, except for the following: the errorpc register, which contains the restart pc the bev and sr bits of the status register, which are set to 1 r4000 mode, in which the erl bit is set to 1 r3000 mode, in which kuo/ieo ? kup/iep ? kuc/iec ? 0/0 4.4.7.3 servicing the nmi exception is serviced by saving the current processor state for diagnostic purposes and reinitializing the system in a manner similar to that for the cold reset exception. 4.4.8 address error exception this section describes the cause of and response to an address error exception. 4.4.8.1 cause the address error exception occurs when an attempt is made to: load, fetch, or store a word that is not aligned on a word boundary load or store a halfword that is not aligned on a halfword boundary reference the kernel address space from user mode the address error exception is not maskable.
4-36 CW4010 exception processing 4.4.8.2 handling the common exception vector is used for this exception. the cause register exccode is set based on the type of reference that caused the exception: adel for a data load or instruction fetch, ades for a data store operation. when the address error exception occurs, the badvaddr register retains the virtual address that was not properly aligned or that referenced protected address space. the contents of the vpn ?eld of the context and entryhi registers are unde?ned, as are the contents of the entrylo register. the epc register points at the instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.8.3 servicing the process executing at the time should be handed a segmentation violation signal. this error is usually fatal to the process incurring the exception. 4.4.9 tlb re?ll exception this section describes the cause of and response to a tlb re?ll exception. 4.4.9.1 cause the tlb re?ll exception occurs when there is no tlb entry to match a reference to a mapped address space. this exception is not maskable. 4.4.9.2 handling a special tlb re?ll exception vector is for this exception. the cause register exccode is set based on the type of reference that caused the exception: tlbl for a data load or instruction fetch and tlbs for a data store operation. when the tlb re?ll exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in
exception description details 4-37 which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. r4000 mode C this special exception vector is used when the exception level (at the time of tlb miss detection) is set to normal (exl = 0). if the exception level is exception (exl = 1), the common exception vector is used. r3000 mode C this special exception vector is used when user or kernel mode references to user memory space (kuseg) do not ?nd a matching entry in the tlb. if the reference is to kernel memory space (kseg0:2), the common exception vector is used. 4.4.9.3 servicing to service this exception, the contents of the context register are used as a virtual address to fetch memory locations containing the physical page frame and access control bits for a tlb entry. this information is placed in the entryhi and entrylo registers and written into the tlb. it is possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the tlb. in this case, a tlb re?ll exception is allowed inside the tlb re?ll handler. while the ?rst exception goes to a special exception vector offset (0x000), the second exception goes to the common exception vector offset (0x180). the second tlb re?ll exception obscures the contents of the badvaddr, context, and entryhi registers within the tlb re?ll handler. as a result, the exact virtual address whose translation caused the ?rst fault is not known unless the tlb re?ll handler speci?cally saved this address. it is possible to observe only the failing pte virtual address. the badvaddr register now contains the original contents of the context register within the tlb re?ll handler, which is the pte address for the original failing address. the operating system can determine the original virtual page number that caused the fault, but not the complete address. the operating
4-38 CW4010 exception processing system uses this information to fetch the pte that contains the physical address and to access control information. it also writes the entry into the tlb and returns to the original user program. returning to the tlb re?ll handler at this point should be avoided. r4000 mode C when the exl bit is set, it prevents the epc from the ?rst tlb re?ll exception from being overwritten by the second tlb re?ll exception. consequently, the appropriate return address can be determined from the values of the current epc and the bd bit of the status register. r3000 mode C the tlb re?ll handler must save the ?rst re?ll epc and status[bd] information in a way that allows the second re?ll to ?nd it. using this saved epc register and status[bd] information, the appropriate return address can be determined. 4.4.10 tlb invalid exception this section describes the cause of and response to a tlb invalid exception. 4.4.10.1 cause the tlb invalid exception occurs when a virtual address reference matches a tlb entry that is marked invalid. this exception is not maskable. 4.4.10.2 handling the common exception vector is used for this exception. the cause register exccode is set based on the type of reference that caused the exception: tlbl for a data load or instruction fetch, tlbs for a data store operation. when the tlb invalid exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception, unless this instruction is in a branch delay slot. if the instruction is in a
exception description details 4-39 branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.10.3 servicing the valid bit of the tlb entry is typically cleared when: a virtual address does not exist the virtual address exists, but is not in main memory (a page fault) a trap is desired on any reference to the page (for example, to maintain a reference bit) after servicing the cause of this exception, the tlb entry is located with the tlb probe (tlbp) instruction, and replaced by an entry with the valid bit set. 4.4.11 tlb modi?ed exception this section describes the cause of and response to a tlb modi?ed exception. 4.4.11.1 cause the tlb modi?ed exception occurs during a store operation when the virtual address reference to memory matches a tlb entry that is marked valid but is not dirty or writable. this exception is not maskable. 4.4.11.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to the mod. when the tlb modi?ed exception occurs, the badvaddr, context, and entryhi registers hold the virtual address that failed translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are unde?ned. the epc register points at the instruction that caused the exception, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set.
4-40 CW4010 exception processing 4.4.11.3 servicing the kernel uses the failed virtual address and virtual page number to identify the corresponding access control information. the page identi?ed may or may not permit write access. if writes are not permitted, a write protection violation has occurred. if write access is permitted, the kernel marks the page frame as dirty/writable in the kernels own data structures. the tlbp instruction is used to place the index of the tlb entry that must be altered in the index register. the entrylo registers are loaded with physical page frame and access control bits (with the d bit set), and the entryhi and entrylo registers are written into the tlb. 4.4.12 bus error exception this section describes the cause of and response to a bus error exception. 4.4.12.1 cause the bus error exception occurs when signaled by board-level circuitry for events such as bus time-out, bus parity errors, and invalid physical memory accesses. this exception is not maskable. in the CW4010, bus errors are asynchronous events with respect to cpu instruction processing (much like the nmi interrupt). this means that there is no attempt to identify the instruction that was the root source of the error. 4.4.12.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to bus. the epc register points at the ?rst instruction for which processing did not complete unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.12.3 servicing the physical address at which the fault occurred is not available to the exception handler. the process executing at the time of the exception must be handed a bus error signal, which is usually fatal.
exception description details 4-41 4.4.13 integer over?ow exception this section describes the cause of and response to an integer over?ow exception. 4.4.13.1 cause the integer over?ow exception occurs when an add, addi, sub, dadd, daddi, or dsubi instruction results in a twos complement over?ow. this exception is not maskable. 4.4.13.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to ov. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.13.3 servicing the process executing at the time of the exception should be handed an integer over?ow signal. this error is usually fatal to the current process. 4.4.14 trap exception this section describes the cause of and response to a trap exception. 4.4.14.1 cause the trap exception occurs when a tge, tgeu, tlt, tltu, teq, tne, tgei, tgeui, tlti, tltui, teqi, or tnei instruction results in a true condition. this exception is not maskable. 4.4.14.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to tr. the epc register points at the instruction that caused the exception unless the instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set.
4-42 CW4010 exception processing 4.4.14.3 servicing the process executing at the time of the exception should be handed a trap signal. this error is usually fatal. 4.4.15 system call exception this section describes the cause of and response to a system call exception. 4.4.15.1 cause the system call exception occurs when an attempt is made to execute the syscall instruction. this exception is not maskable. 4.4.15.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to sys. the epc register points at the syscall instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in the branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. 4.4.15.3 servicing when this exception occurs, control is transferred to the applicable system routine. to resume execution, the routine must restart instruction execution after the syscall instruction. this restart address can be computed using the epc register along with the bd and bt bits in the cause register. if (bd = 0) then restart_pc = epc + 4 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address it is up to the exception handler to obtain the branch target address from the prior branch when the syscall instruction resides in a branch delay slot.
exception description details 4-43 4.4.16 breakpoint exception this section describes the cause of and response to a breakpoint exception. 4.4.16.1 cause the breakpoint exception occurs when an attempt is made to execute the break instruction. this exception is not maskable. 4.4.16.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to bp. the epc register points at the break instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. 4.4.16.3 servicing when the breakpoint exception occurs, control is transferred to the applicable system routine. additional distinctions can be made from the unused bits of the break instruction (bits [25:6]), by loading the contents of the instruction at which the epc register points. (a value of four must be added to the epc register to locate the instruction if it resides in a branch delay slot). to resume execution, the routine must start executing the instruction again after the break instruction. the restart address can be computed using the epc register along with the bd and bt bits held in the cause register. if (bd = 0) then restart_pc = epc + 4 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address when the break instruction resides in a branch delay slot, it is up to the exception handler to obtain the branch target address from the prior branch.
4-44 CW4010 exception processing 4.4.17 reserved instruction exception this section describes the cause of and response to a reserved instruction exception. 4.4.17.1 cause the reserved instruction exception occurs when an attempt is made to execute an instruction whose major opcode (bits [31:26]) are unde?ned, or a special instruction whose minor opcode (bits [5:0]) are unde?ned. this exception also occurs on a regimm instruction whose minor opcode (bits [20:16]) are unde?ned. this exception is not maskable. 4.4.17.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to ri. the epc register points at the break instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.17.3 servicing the reserved instruction exception can be used to trap to emulation routines for instructions not supported in the CW4010 instruction set. once emulation has been completed, execution can be resumed using the epc register along with the bd and bt bits in the cause register. if (bd = 0) then restart_pc = epc + 4 if ((bd = 1) and (bt = 0)) then restart_pc = epc + 8 if ((bd = 1) and (bt = 1)) then restart_pc = branch target address when the instruction receiving a reserved instruction exception resides in a branch delay slot, it is up to the exception handler to obtain the branch target address from the prior branch. if there is no emulation routine, the process executing at the time of the exception should be given an illegal instruction signal. this error is usually fatal.
exception description details 4-45 4.4.18 floating-point exception this section describes the cause of and response to a ?oating-point exception. 4.4.18.1 cause the floating-point exception is used by the ?oating-point coprocessor (if installed). the contents of the floating-point control status register (inside cp1) indicate the cause of the exception. 4.4.18.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to fpe. the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.18.3 servicing this exception is cleared by clearing the appropriate bit in the floating- point control status register. for an unimplemented instruction exception, the kernel should emulate the instruction. for other exceptions, the kernel should pass the exception to the user process that caused the exception. 4.4.19 coprocessor unusable exception this section describes the cause of and response to a coprocessor unusable exception. 4.4.19.1 cause the coprocessor unusable exception occurs when an attempt is made to execute a coprocessor instruction for either a corresponding coprocessor unit that has not been marked usable, or for cp0 instructions, when the unit has not been marked usable and the process is executing in user mode. this exception is not maskable.
4-46 CW4010 exception processing 4.4.19.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to cpu . the contents of the ce ?eld in the cause register indicate the coprocessor to which an attempted reference has been made. the epc register points at the instruction that caused the exception unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. 4.4.19.3 servicing the coprocessor unit to which an attempted reference was made is identi?ed by the ce ?eld of the cause register. the result is one of the following: if the process is entitled to access, the coprocessor is marked usable and the corresponding user state is restored. if the process is entitled to access the coprocessor, but the coprocessor does not exist or has failed, interpretation of the coprocessor instruction is possible. if the process is not entitled to access the coprocessor, the process executing at the time should be given some sort of illegal/privileged instruction signal. this error is usually fatal. 4.4.20 debug exception this section describes the cause of and response to a debug exception. 4.4.20.1 cause the debug exception occurs when a debug condition (read/write access at breakpoint data address, read access at breakpoint program counter, trace) is detected by the cp0. the debug control and status (dcs) register speci?es which event was detected. r4000 mode C in r4000 mode, the debug exception can be masked by setting the exl bit in the status register. when this bit is set, a debug event does not cause an exception trap even if the dcs[te] bit is set to 1. however, the status bits of the dcs register are updated to indicate that an event was recognized.
exception description details 4-47 r3000 mode C in r3000 mode, the debug exception is not maskable. 4.4.20.2 handling the debug exception vector handles this exception. 4.4.20.3 servicing the debug exception is a debugging aid. typically the exception handler transfers control to a debugger, allowing you to examine the situation. the debug exception condition must be disabled to execute the failing instruction and then re-enabled. notes: 1. the trace status bit (dcs5) is set whenever a branch instruction is encountered regardless of whether the branch is actually taken. however, if the debug exception trap is enabled (dcs31 = 1), an exception is recognized only if the branch is taken and the target instruction executed. 2. the program counter debug status bit (dcs1) is set whenever the target address of a branch falls within the specified pc address range (bpc, bpcm) regardless of whether the branch is actually taken. however, if the debug exception trap is enabled (dcs31 = 1), an exception is recognized only if the branch is taken and the target instruction executed. 4.4.21 interrupt exception this section describes the cause of and response to an interrupt exception. 4.4.21.1 cause the interrupt exception occurs when one of the eight interrupt conditions is asserted. the signi?cance of these interrupts depends on the speci?c system implementation. each of the eight interrupts can be masked by clearing the corresponding bit in the int-mask ?eld of the status register. all eight interrupts can be masked at once by clearing the ie bit of the status register. 4.4.21.2 handling the common exception vector is used for this exception. the exccode ?eld in the cause register is set to int.
4-48 CW4010 exception processing the ip ?eld of the cause register indicates the current interrupt requests. it is possible that more than one of the bits will be set at the same time, or that no bits will be set if an interrupt is asserted and then deasserted before the cause register is read. the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set as an indicator. 4.4.21.3 servicing if the interrupt is caused by one of the two software generated exceptions, the interrupt condition is cleared by setting the corresponding cause register bit to 0. if the interrupt is hardware generated, the interrupt condition is cleared by correcting the condition causing the interrupt pin to be asserted. 4.4.22 external vectored interrupt exception the CW4010 implements an external vectored interrupt interface, which consists of an interrupt input (exvintn), interrupt vector virtual address input (exvap[31:2]), and interrupt accepted output (exvaen). the signals must be asserted and deasserted on the rising edge of the system clock. this interrupt class can be enabled or disabled using the evi bit in the ccc register (enabled when ccc24 = 1). this section describes the cause of and response to an external vectored interrupt exception. 4.4.22.1 cause an external vectored interrupt occurs when the exvintn is asserted. the signi?cance of this interrupt depends on the speci?c system implementation. the interrupt can be masked by clearing the ie (r3000 = iec) bit of the status register. 4.4.22.2 handling the virtual address speci?ed by the exvap[31:2] interface is used to specify the target exception handling routine. the exvap[31:2] address must be provided by a user-de?ned interrupt controller. the exvintn and exvap[31:2] inputs must be held stable and valid until the exception
exception description details 4-49 is accepted. this is indicated by the assertion of the exvaen output for one cycle. the epc register points at the ?rst instruction for which processing was not completed unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set as an indicator. 4.4.22.3 servicing the interrupt condition can be cleared in the user-de?ned interrupt controller in one of two ways: by detecting the assertion of the interrupt accepted output (exvaen), or by correcting the condition causing the interrupt pin (exvintn) to be asserted.
4-50 CW4010 exception processing
5-1 chapter 5 CW4010 memory management this chapter describes the system coprocessor (coprocessor-0) and memory management. it contains the following sections: section 5.1, tlb physical organization, on page 5-1 section 5.2, memory management system, on page 5-3 section 5.3, virtual memory and the tlb, on page 5-5 5.1 tlb physical organization the physical implementation of the tlb consists of two main parts: 1. a two-entry instruction tlb (itlb). 2. a 64-entry joint tlb (jtlb) that holds both instruction fetch and data access page translations. the cp0 can receive virtual address translation requests from both the isu (instruction fetch unit) and the lsu (operand data access) during the same cycle. for maximum performance, address translations must occur in parallel. the two-piece tlb structure shown in figure 5.4 addresses this problem by creating a separate two-entry tlb to be used for instruction fetch translations. with this structure, isu and lsu fetches can be independently processed.
5-2 CW4010 memory management figure 5.1 tlb block diagram the itlb holds the two most recently used instruction fetch page translations. if a valid translation cannot be found in the itlb, the cp0 must stall the pipeline for two cycles and search the jtlb for a valid entry. if the cp0 ?nds a valid entry in the jtlb, it copies it into the less recently used itlb entry and processing continues. if a valid entry cannot be found, a tlb exception must be posted (see chapter 4, CW4010 exception processing, for details.) the entries in the itlb are purged when the entryhi register is written (for example, during a task switch). consequently, the itlb does not need to keep an eight-bit asid for each entry. this reduces storage and match circuitry. this simpli?cation should cause little or not performance penalty, because the entries probably need to be replaced anyway. when no tlb is present in the system, the te ?eld of the con?guration and cache control (ccc) register is cleared to 0. this is transparent to the other modules in the CW4010 core. the cp0 modi?es its translation behavior in the following manner: physical address[31:12] = virtual address[31:12] isu lsu itlb jtlb vadr[31:12] padr[31:12] vadr[31:12] padr[31:12] stall pipe itlb miss vadr[31:12] tlb entry cp0 stall pipe (2 entries) (64 entries) mmu 1. vadr = virtual address 2. padr = physical address md96.88
memory management system 5-3 (for kseg0 and kseg1 , physical address [31:29] = 0); the same is true with tlb present the caching algorithm used for each access is based on the address segment being accessed ( kuseg , kseg0 , and kseg2 = cached; kseg1 = uncached), and the ccc register ?elds (ie0, ie1, de0, de1, and wb). table 5.1 shows the caching algorithm criteria. 5.2 memory management system the memory model used for the CW4010 processor is based on the r3000. to extend the cpus address space, the virtual memory translates addresses composed in a large virtual address space into the physical memory system. the CW4010 physical address space is 4 gbytes and uses a 32-bit address. the virtual address is also 32 bits wide, and the maximum user process size is 2 gbytes (2 31 ). the virtual address is extended with an address space identi?er (asid) to reduce the frequency of the translation lookaside buffer (tlb) ?ushing when switching context. the size of the asid is 8 bits. the asid is contained in the cp0 entryhi register and is described in the subsection entitled entryhi register (10) on page 5-9 . 5.2.1 operating modes this section describes the two modes for 32-bit CW4010 operation: user mode, where non-supervisory programs are executed kernel mode, which is analogous to the supervisory mode provided by many machines the CW4010 usually operates in user mode until an exception forces it into kernel mode. it remains in kernel mode until a restore from table 5.1 caching algorithm criteria address segment icache enabled ifetch cache algorithm dcache enabled wb dcache algorithm kuseg kseg0 kseg2 0 1 uncached cached 0 1 1 x 0 1 uncached cached, writethrough cached, writeback kseg1 x uncached x x uncached
5-4 CW4010 memory management exception instruction (r3000 mode), or exception return (r4000 mode) instruction is executed to restore the processor to the mode existing prior to the exception. address mapping is different for kernel and user modes. to simplify the management of user state from within the kernel, the user-mode address space is a subset of the kernel-mode address space. figure 5.2 shows the virtual-to-physical memory map for both the user mode and kernel mode segments. figure 5.2 CW4010 virtual memory map 5.2.2 user mode virtual addressing in user mode, a single, uniform virtual address space ( kuseg )of 2 gbytes (2 31 bytes) is available. the user segment starts at address 0x0000 0000, and all valid accesses have the most-signi?cant bit cleared to zero. referencing an address with the most signi?cant bit set while in user mode causes an address error exception. the tlb maps all references to kuseg identically for either mode, and controls cache accessibility. kuseg is typically used to hold user code and data, as well ffff ffff c0000000 bfff ffff a0000000 9fff ffff 80000000 7fff ffff 0000 0000 kernel unmapped uncached kernel unmapped cached kernel mapped cacheable user mapped cacheable kuseg kseg0 kseg1 kseg2 00000000 1fff ffff 20000000 memory ffff ffff (4 gbytes) 512 mbytes 512 mbytes virtual physical any any md96.77
virtual memory and the tlb 5-5 as the current user process. the processor state de?nition of user and kernel modes description can be found in section 4.3.6, status register (12) on page 4-9 . 5.2.3 kernel mode virtual addressing the virtual address space is divided into regions, differentiated by the high-order bits of the address, as shown in figure 5.2 on page 5-4 and as listed below. 5.3 virtual memory and the tlb mapped virtual addresses are translated into physical addresses using an on-chip translation lookaside buffer (tlb). the tlb is a fully- associative memory that holds 64 entries that provide mapping to 64 physical page frames. the address range mapped by a page can be either 4 kbytes or 16 mbytes in size. when address mapping is indicated, each tlb entry is simultaneously matched against the virtual address extended by the current asid stored in the entryhi register. if there is a match (hit), the physical page number is extracted from the tlb and concatenated with the offset to form the physical address, as shown in figure 5.3 . kuseg starts at virtual address 0x00000000 and is 2 gbytes long. it allows selective caching and mapping on a per-page basis, rather than requir- ing an all or nothing approach. this segment overlaps kernel memory accesses with user memory accesses as described above. kseg0 starts at virtual address 0x80000000 and is 512 mbytes long. CW4010 direct maps references within kseg0 onto the ?rst 512 mbytes of phys- ical memory. these references use cache memory, but do not use the tlb for address translation. thus, kseg0 is typically used for kernel executable code and some kernel data. kseg1 starts at virtual address 0xa0000000 and is 512 mbytes long. CW4010 direct maps references within kseg1 onto the ?rst 512 mbytes of phys- ical memory. these references do not use cache memory or the tlb for address translation. thus, kseg1 is typically used by operating sys- tems for i/o registers, rom code and disk buffers. kseg2 starts at virtual address 0xc0000000 and is 1024 mbytes long. like kuseg , it uses tlb entries to map virtual addresses to arbitrary phys- ical ones, with or without caching. an operating system typically uses kseg2 for stacks and per-process data that must remap on context switches. the operating system also uses kseg2 for user page tables and some dynamically allocated data areas.
5-6 CW4010 memory management figure 5.3 CW4010 virtual address format if no match occurs (a page miss), an exception is taken. typically, software re?lls the tlb from a page table maintained by the system. software can write over a selected tlb entry or use a hardware mechanism to write into a random location. the CW4010 does not support the tlb-shutdown (ts) bit in the status register, which indicates that more than one entry in the tlb matches the virtual address being translated. if more than one tlb entry matches the virtual address, the virtual address may be translated to an incorrect physical address. system software must ensure that this situation is never created. offset offset asid asid 0 0 31 31 39 32 28 11 12 812 17 88 24 offset passed unchanged offset passed unchanged virtual to physical translation bits 31:29 select user or kernel address spaces virtual address virtual address 0 39 32 24 23 32-bit physical address vpn vpn with 4-kbyte page size with 16-mbyte page size md96.78 virtual to physical translation 29 3 31 29 28
virtual memory and the tlb 5-7 5.3.1 tlb entry format figure 5.4 shows the 32-bit addressing tlb entry format the CW4010 uses. each ?eld of an entry has a corresponding ?eld in the entryhi, entrylo, or pagemask registers described in sections 5.3.2.1 , 5.3.2.2 , and 5.3.2.3 , beginning on page 5-9 . figure 5.4 format of CW4010 tlb entry r reserved [95:78, 76:64, 43:40, 31:26] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. m mask 77 this bit is the page mask bit. it is set to 1 for a 16m page and cleared to 0 for a 4k page. vpn virtual page number [63:44] this ?eld contains the virtual page number. asid address space id field [39:32] this ?eld contains the address space id. pfn page frame number [25:6] this ?eld contains the page frame number. this is the upper bits of the physical address. c cache [5:3] this ?eld contains the cache algorithm, which speci?es whether references to the page should be cached. if the references are to be cached, you can select one of two 95 78 77 76 64 63 31 m rr r vpn r asid pfn d cvg 18 44 13 20 43 40 39 32 48 1 6 26 25 20 65 2 1 0 3111 md96.79 3
5-8 CW4010 memory management algorithms: writeback or writethrough. table 5.2 shows how the cache bits are decoded. table 5.2 cache algorithm bit values d dirty 2 if this bit is set to 1, it indicates that the page marked is dirty and writable. v valid 1 if this bit is set to 1, it indicates that the tlb entry is valid. g global 0 if this bit is set to 1, you can ignore the contents of the asid ?eld during tlb lookup. cbit settings value algorithm 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 reserved reserved uncached cacheablewritethrough reserved reserved reserved cacheablewriteback
virtual memory and the tlb 5-9 5.3.2 tlb support registers the registers described in the following subsections are used in association with the cp0 tlb. entryhi register (10) entrylo register (2) pagemask register (5) index register (0) random register (1) wired register (6) 5.3.2.1 entryhi register (10) the entryhi register is a read/write register used to access the tlb. in addition, this register contains the current asid value for the processor. the asid value is used to match the virtual address with a tlb entry during virtual address translation. typically, the operating system assigns a unique asid value to each known process. in this way, mappings held in the tlb are made unique to the process whose asid they match. the entryhi register holds the high-order bits of a tlb entry when performing tlb read and write operations. when either a tlb re?ll, tlb invalid, or tlb modi?ed exception occurs, the entryhi register is loaded with the virtual page number (vpn) and the asid of the virtual address that failed to have a matching tlb entry. entryhi is accessed by the tlbp, tlbw, tlbwi, and tlbr instructions. figure 5.5 shows the format of this register. figure 5.5 entryhi register vpn virtual page number [31:12] this ?eld contains the virtual page number. r reserved [11:8] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software 31 12 11 8 7 0 vpn r asid
5-10 CW4010 memory management should write these bits as 0 to ensure compatibility with future versions of the software. asid address space id [7:0] this ?eld contains the address space id. 5.3.2.2 entrylo register (2) the entrylo register is a read/write register used to access the tlb. when performing read and write operations, the register contains a physical page frame number, cache algorithm, page dirty, translation valid, and global entry information. figure 5.6 shows the format of this register. figure 5.6 entrylo register r reserved [31:26] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. pfn physical page frame number [25:6] this ?eld contains the physical page frame number. c cache [5:3] this ?eld contains the cache algorithm, which speci?es whether references to the page should be cached. if the references are to be cached, you can select one of two algorithms: write-back or write-through. table 5.2 shows how the cache bits are decoded. d dirty 2 if this bit is set to 1, it indicates that the page marked is dirty and writable. v valid 1 if this bit is set to 1, it indicates that the tlb entry is valid. 31 26 25 6 5 3 2 1 0 r pfn c d v g
virtual memory and the tlb 5-11 g global 0 if this bit is set to 1, you can ignore the contents of the asid ?eld during tlb lookup. mapping is globally available to all asids. 5.3.2.3 pagemask register (5) the pagemask register is a read/write register used to access the tlb. it implements a variable page size by holding a per-entry comparison mask. when virtual addresses are presented for translation, the corresponding pagemask bit in the tlb speci?es whether or not virtual address bits [23:12] participate in the comparison. figure 5.7 shows the format of the pagemask register. figure 5.7 pagemask register r reserved [31:14, 12:0] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. m mask 13 this ?eld contains the pagemask. if it is set to 1, the page size is 16 mbytes, the physical address [31:0] is pfn[31:24] and the virtual address is [23:0]. if the bit is cleared to 0, page size is 4 kbytes, the physical address [31:0] is pfn[31:12] and the virtual address is [11:0]. 31 14 13 12 0 rmr
5-12 CW4010 memory management 5.3.2.4 index register (0) the index register is a 32-bit, read/write register containing six bits that are used to index an entry in the tlb. the high-order bit indicates the success or failure of a tlb probe (tlbp) instruction. the index register also speci?es the tlb entry that is affected by the tlb read (tlbr) and tlb write index (tlbwi) instructions. figure 5.8 shows the format of the index register. figure 5.8 index register p probe 31 if this bit is set to 1, it indicates that the last tlbp instruction failed to ?nd a match. r reserved [30:6] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. index index [5:0] this ?eld contains the index to the tlb entry. the tlbr and tlbwi instructions use this index. 5.3.2.5 random register (1) the random register is a 32-bit read-only register that contains six bits that are used to index an entry in the tlb. the register decrements for each clock cycle. the values range between a lower bound set by the number of tlb entries reserved for exclusive use by the operating system (de?ned in the wired register), and an upper bound set by the total number of tlb entries (64 maximum). the random register speci?es the entry in the tlb affected by the tlb write random (tlbwr) instruction. the register does not need to be read for this purpose, but the register can be read to verify proper operation. 31 30 65 0 p r index
virtual memory and the tlb 5-13 to simplify testing, the random register is set to the value of the upper bound when the system is reset. it is also set to its upper bound when the wired register is written. the format of this register is shown in figure 5.9 . figure 5.9 random register r reserved [31:6] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. random random [5:0] this ?eld contains the index to the tlb entry affected by the tlbwr instruction. 5.3.2.6 wired register (6) the wired register is a read/write register that speci?es the boundary between the wired (?xed, non-replaceable entries that cannot be over- written by a tlbwr operation) and random entries of the tlb. figure 5.10 shows the location of the wired register. figure 5.10 wired register location 31 65 0 r random tlb 63 0 wired register range of md96.85 random entries range of wired entries
5-14 CW4010 memory management when the system is reset, the wired register is set to zero. writing the register also sets the random register to the value of its upper bound. figure 5.11 shows the format of the wired register. figure 5.11 wired register r reserved [31:6] these bits are not used and are read as 0. the CW4010 ignores attempts to set these bits; however, software should write these bits as 0 to ensure compatibility with future versions of the software. wired wired [5:0] this ?eld de?nes the lower boundary of random tlb entries. 31 65 0 r wired
virtual memory and the tlb 5-15 5.3.3 virtual address translation during virtual-to-physical address translation, the cpu compares the asid and, depending upon the page size, the highest 7 to 20 bits of the virtual address to the contents of the tlb. figure 5.12 illustrates the tlb address translation process. figure 5.12 CW4010 tlb address translation process ye s ye s no no ye s ye s ye s ye s no no ye s ye s ye s no no no no no output physical address indicates an exception md96.87 input virtual address user mode? msb=1? address error vpn match? g = 1? asid match? v = 1? write? d = 1? tlb mod c = 010? access main memory access cache tlb invalid tlb re?ll bits a, v, d, and c are bits in the tlb entry
5-16 CW4010 memory management a virtual address matches a tlb entry when: vpn ?eld of the virtual address equals the vpn ?eld of the entry g bit of the tlb entry is set asid held in the entryhi register matches the asid ?eld in the tlb entry although the v bit of the tlb entry must be set for a valid translation to take place, it is not involved in the determination of a matching tlb entry. if a tlb entry matches, the physical address and access control bits (c, d, and v) are retrieved from the entry. if no match is found, a tlb miss exception occurs. if the access control bits (d and v) indicate that the access is not valid, a tlb modi?cation or tlb invalid exceptions occurs, respectively. if the c bits equal 010 b , the physical address that is retrieved is used to access main memory, bypassing the cache. 5.3.4 tlb instructions the instructions that the CW4010 provides for working with the tlb are listed in table 5.3 . table 5.3 tlb instruction instruction description translation lookaside buffer probe (tlbp) the index register is loaded with the address of the tlb entry whose contents match the contents of the entryhi register. if no tlb entry matches, the highest order bit of the index register is set. results are unde- ?ned if a tlb reference encounters more than one matching tlb entry. translation lookaside buffer read (tlbr) this instruction loads the entryhi, entrylo, and page- mask registers with the contents of the tlb entry speci?ed by the index register. translation lookaside buffer write index (tlbwi) this instruction loads the tlb entry speci?ed by the index register with the contents of the entryhi, entrylo, and pagemask registers. translation lookaside buffer write random (tlbwr) this instruction loads the tlb entry speci?ed by the random register with the contents of the entryhi, entrylo, and pagemask registers.
virtual memory and the tlb 5-17 notes: 1. if the tlb is not present or not enabled in the system, the cp0 reg- ister re?ects a coprocessor unusable exception if an attempt is made to execute any of the tlb instructions. 2. tlb instructions (tlbp, tlbr, tlbwi, and tlbwr) cannot be immediately preceded or followed by a data load instruction that requires target address translation (that is, kuseg and kseg 2). 3. the instruction prior to a tlbw instruction must not generate an exception. you are recommended to use an nop to make sure this restriction is met. 4. three instructions are needed between mtc0 (entryhi, entrylo, pagemask, or index) and subsequent tlbwi or tlbwr instructions to affect the result of the mtc0 operation.
5-18 CW4010 memory management
6-1 chapter 6 CW4010 caches this chapter describes the CW4010 cache and cache maintenance. it contains the following sections: section 6.1, cache memory organization, on page 6-1 section 6.2, cache states, on page 6-2 section 6.3, address and cache tag, on page 6-4 section 6.4, dcache scratch pad ram mode, on page 6-5 section 6.5, external invalidation, on page 6-6 section 6.6, cache instructions, on page 6-6 6.1 cache memory organization the CW4010 has separate caches for instructions and data. these are the icache and dcache, respectively. cache maintenance features include: external invalidation for snooping initialization writeback to dcache testing the CW4010 icache and dcache are organized as follows: 1. the icache and dcache can be organized as direct-mapped or two- way set associative caches. a least recently used (lru) algorithm is used for two-way set associative cache replacement for the icache. the dcache is replaced randomly. 2. the cache controllers support con?gurations of 1, 2, 4 or 8 kbyte for each set. thus, the smallest supported con?guration is a 1 kbyte direct-mapped cache, and the largest is a 16 kbyte two-way set associative cache, with 8 kbyte per set.
6-2 CW4010 caches 3. the caches are indexed with a virtual address. 4. they are tagged with a physical address tag. 5. one cache line is 8 words. a single word consists of four 8-bit bytes. the cache line consists of four doublewords (32 bytes or 256 bits). re?ll address ordering is wrap-around from the missing address. 6. you can select between writeback and writethrough modes. if the system has no memory management unit (mmu), the wb bit in the ccc register de?nes the mode for all cacheable regions of memory. when the wb bit is set to 0, the mode is writethrough. when it is set to 1, the mode is writeback. if the system has an mmu system, the translation lookaside buffer (tlb) entry determines the mode on a per-page basis. 7. scratch pad ram mode is available, and works in a similar way to the scratch pad ram in the in the lr33300. this is discussed in more detail in section 6.4, dcache scratch pad ram mode, on page 6-5 . 6.2 cache states this section describes cache states for the icache, writethrough dcache, and writeback dcache. 6.2.1 icache and writethrough dcache when the icache and dcache are operating in writethrough mode, only two states are used: invalid and valid clear. initialization sets all cache lines to the invalid state. this is done using the cache invalidate mode described in section 6.6.3, cache maintenance by ccc register, starting on page 6-7 , or the cache flush instructions described in section 6.6.1, flush (all cache invalidation), starting on page 6-6 . the ?rst time a cache line is re?lled because of a cache miss, its state goes from invalid to valid clear. the cache remains in the valid clear state unless and until it is forced back to invalid. this occurs in one of the following events: an external invalidate another cache instruction modi?cation of the ccc register settings (see section 4.3.10, con- ?guration and cache control (ccc) register (16) on page 4-20 )
cache states 6-3 the v bit of each cache line indicates the state .v=0isinv alid and v = 1 is valid clean. figure 6.1 shows the state diagram for icache and writethrough dcache. figure 6.1 cache state diagramicache and writethrough dcache 6.2.2 writeback dcache when the dcache operates in writeback mode, three cache line states are required: invalid, valid clean, and valid dirty. figure 6.2 shows the state diagram for writeback dcache. the v bit and wb bit of each line indicate the state, as shown in table 6.1 . figure 6.2 cache state diagramdcache writeback table 6.1 dcache writeback mode valid clean invalid load-miss, then re?ll invalidation load-misrecital load-hit store-miss store-hit md96.89 valid clean load-miss, re?ll load-hit store-miss load-miss, then re?ll invalidation store-hit load-miss, then writeback and re?ll invalidation store-hit valid dirty invalid load-hit store-miss md96.90 state v bit wb bit condition invalid 0 x(0) the cache line does not contain valid information. valid clean 1 0 the cache line includes valid information consistent with memory. valid dirty 1 1 the cache line includes valid information, but it is not consistent with memory.
6-4 CW4010 caches a store operation is considered to be a dcache hit when the tag is coincident with the physical address and the v bit is set. of course, the physical address must be in a cached area. when a store-miss occurs, the state condition of the cache line is not changed, and the store data is not written into dcache. instead, the store data is written to the four-word-deep write buffers, which pass it to the systems main memory. some lines, known as dirty lines, contain more recent information than the main memory. occasionally you may need to force the writing of dirty lines to main memory. you can do this using the writeback cache instruction. in writeback mode, data stored in the dcache may not be passed on to the external write controller immediately. because of this, the writeback cache instruction writes back each line of both sets in a two-way set associative con?guration. the instruction does not check whether the address speci?ed by the instruction would hit or miss at the cache line to which it pages. if the wb bit is set, the line data is written back and causes several stall cycles to read data from the dcache. the actual number of stall cycles depends on the speed of memory access. cache lines can be invalidated by an external bus master. a cache line is invalidated when the invalidate address matches the cache tag id, and the cache invalidation signal(s) are asserted. 6.3 address and cache tag figure 6.3 illustrates the relationship between instruction and data address and cache memory location, for both direct map and two-way set associative cache con?gurations. the word offset ?eld addresses a word in a line. the line number ?eld addresses a line in the cache memory. the cache tag id ?eld serves as the tag for the address line. if the system has a mmu, the cache access is indexed by the virtual address and tagged by the physical address. because the minimum memory page size is 4 kbyte, there is no virtual/physical address issue if the cache set size is 4 kbyte or less. if the cache set size is 8 kbyte and the page size is 4 kbyte, address bit 12 of the virtual and physical address must be coincident.
dcache scratch pad ram mode 6-5 figure 6.3 address to cache tag and line number table 6.2 shows how the value of n sets different cache sizes. 6.4 dcache scratch pad ram mode the CW4010 dcache set-0 or set-1 can be con?gured as a scratch pad ram. you can do this by setting the sr0 bit of the ccc register for set-0 and the sr1 bit for set-1. when sr0 is set to 1, set-0 is enabled as a scratch pad ram. when the sr1 bit is set to 1, set-1 is enabled as a scratch pad ram. the scratch pad ram must be located in one speci?c physical address space like a local data memory. if the CW4010 asic device has dcache tag ram, tag must be programmed by isolating the cache before setting sr bit(s). you do this by programming the ccc register as follows: isc = 1, tag = 1, inv = 0; and by setting either de0 or de1. if the dcache ram is used as only a scratch pad ram in the asic device, the dcache tag ram can be removed physically from the device. in this case, the dcache tag inputs of the CW4010 core must be set high or low according to the address of the scratch pad ram area. the sr bit(s) should all be set to 1. when the dcache scratch pad ram is enabled, an access to the scratch pad ram area is a local memory access without any stall cycle. 31 9+n 8+n 5 4 2 1 0 cache tag id line number word offset r md96.91 table 6.2 setting cache size setting cache size value of n 1 kbyte 1 2 kbyte 2 4 kbyte 3 8 kbyte 4
6-6 CW4010 caches 6.5 external invalidation icache and dcache lines can be invalidated by external hardware for bus snooping. the CW4010 has an invalidate strobe and invalidate address bus input. writeback by external hardware is not supported. details are described in chapter 7, signals. 6.6 cache instructions the CW4010 has two types of cache instructions for initialization and writeback. the cache instruction must be followed by three no operation (nop) instructions. figure 6.4 shows the cache instruction format. figure 6.4 cache instruction format 6.6.1 flush (all cache invalidation) one execution of a cache instruction can invalidate all lines of the dcache or the icache, or of both. bit 17 of the instruction de?nes effect and non-effect for the dcache, and bit 16 de?nes effect and non-effect for the icache. if both of bits are 0, this is a no operation (nop), and the base register and the offset have no meaning. one cache line of one or more cache sets is invalidated during one clock cycle. invalidation starts from the wb stage of the execution pipeline, and the pipeline stall request signal is asserted during the time that the cache lines are invalidated. if the pipeline cancel signal is asserted, the invalidation is not executed. the number of the invalidation clock cycles 1 0 1 1 1 1 cache 31 26 25 21 op 20 16 15 0 cache op, offset(base) 0 0 0 0 0 valid for wb only base offset bit[20:18] 000 flush (all cache invalidation) 001 writeback (dcache only) bit17 dcache effect (1)/non-effect (0) bit16 lcache effect (1)/non-effect (0) flushi (op = 00001) flush lcache flushd (op = 00010) flush dcache flushid (op = 00011) flush icache & dcache wb, offset(base) (op = 00100) writeback dcache addressed by offset+[r0] md96.92
cache instructions 6-7 is always 256, regardless of the cache size actually implemented. during this time, the cpu does not respond to interrupts. 6.6.2 writeback writeback is effective for the dcache only, so bits 17 and 16 are ignored. bits [12:5] of the effective address, which is offset+gpr[base] , specify the dcache line. cache size is also a factor. for example, if the cache size is 1 kbyte direct mapped or 2 kbyte two-way set associative, only bits [9:5] are used. upper bits of the effective address are ignored. note that the tag is not checked. one writeback instruction writes back both lines of the two-way set associative cache if the wb bit is set. if wb is cleared, there is no operation. wb is executed at the wb stage and causes four stall cycles to read data from a dirty line. wb bits are cleared after the cache lines are written back. 6.6.3 cache maintenance by ccc register certain ccc register bits support dcache and icache maintenance and testing. table 6.3 lists the bits of the ccc register related to the cache. table 6.3 ccc bits related to cache con?guration the CW4010 has three maintenance modes that allow you to maintain and test the internal icache and dcache. the three modes are data test, tag test, and invalidate. before entering any of these modes, the bit(s) function ie0 icache set-0 enable(1)/disable(0) ie1 icache set-1 enable(1)/disable(0) is[1:0] icache size (00:1 kbyte, 01:2 kbytes, 10:4 kbytes, 11:8 kbytes) de0 dcache set-0 enable(1)/disable(0) de1 dcache set-1 enable(1)/disable(0) ds[1:0] dcache size (00:1 kbyte, 01:2 kbytes, 10:4 kbytes, 11:8 kbytes) wb dcache writeback (1)/writethrough (0) if tebit = 1 sr0 dcache set-0 scratch pad ram enable(1)/disable(0) sr1 dcache set-1 scratch pad ram enable(1)/disable(0) isc dcache/icache isolate cache mode enable(1)/disable(0) tag dcache/icache tag test mode enable (1)/disable (0) inv dcache/icache invalidate mode enable(1)/disable(0)
6-8 CW4010 caches processor must be executing in kseg1 (non-cacheable address space0), interrupts must be disabled, and the caches must be isolated (iscbit = 1). when the caches are isolated, load and store instructions access the icache and dcache. the systems external main memory is not affected by these load and store accesses. to enable the cache maintenance mode, use the following procedure: 1. set the appropriate bits in the ccc register with iscbit = 1. you can do this by executing the mtc0 instruction, which has one delay slot. the three instructions immediately following the mtc0 instruction should not be load or store instructions. the ie0, ie1, de0, and de1 bits in the ccc register select the cache set that is to be accessed, as shown in table 6.4 . only one cache set should be enabled when performing a load operation. mul- tiple caches may be enabled when performing a store operation. table 6.4 tag and inv encoding the tag and inv bits in the ccc register select the cache maintenance function. table 6.5 shows the encoding for the two bits. table 6.5 tag and inv encoding 2. clear the ie bit in the status register to disable all interrupts. this operation is usually done automatically because cache maintenance operations are done in an exception handler (most commonly the reset handler). C data test mode in this mode, all loads and stores access the data rams selected by ie0, ie1, de0, and de1 bits. effective lower address bit set bit number cache set accessed ie0 17 icache set-0 ie1 16 icache set-1 de0 13 dcache set-0 de1 12 dcache set-1 tag bit 1 inv bit 0 cache maintenance mode 0 0 data test 1 0 tag test x 1 invalidate
cache instructions 6-9 bits specify the cache address. the precise bit ?eld depends on the cache size and con?guration actually implemented. C tag test mode when tag bit is set to 1, the CW4010 is in tag test mode. load and store operations access the tag rams. the tag bits available for testing in the tag test mode are the tag data, hit, writeback (dcache only), and valid bits. note that the writeback bit is present only in dcache. the hit bit is ignored during a store oper- ation. for a load operation, the hit bit is set if a match occurs. the cache tag id bits are written from or compared to the most signi?cant bits of the effective address (offset + gpr[base]) . a load operation from the tag ram returns the information shown in figure 6.5 . bits [31:10] are the tag data; bit 2 is the hit bit; bit 1 is the validate bit which re?ects the setting of the inv bit in the ccc register; bit 0 is the writeback bit which re?ects the setting of the wb bit in the ccc register. you can ignore bits [9:3]. figure 6.5 tag test mode loaded data format C invalidate mode when the inv bit in the ccc register is set to 1, the CW4010 is in invalidate mode. because the caches contain random data on both warm and cold starts, software must invalidate all lines in the icache and dcache. executing store word instructions inval- idates the addressed cache line in the enabled cache(s). after reset, 0 must be written into all tags for both sets of dcache and icache. cache flush instructions can perform the same function. 31 10 9 3 2 1 0 tag data x hit v wb md96.93
6-10 CW4010 caches
7-1 chapter 7 signals this chapter describes the CW4010 core i/o signals. you will ?nd this chapter useful if you are interfacing the CW4010 with other core logic or external logic. this chapter contains the following sections: section 7.1, signal conventions, on page 7-1 section 7.2, signal synchronization, on page 7-2 section 7.3, CW4010 modularity, on page 7-2 section 7.4, CW4010 shell interface signal definitions, on page 7-3 7.1 signal conventions the following signal conventions are used in this chapter: signals that are inputs have a lowercase i as part of the signal name, for example, scd i p. signals that are outputs have a lower case o, for example, scd o p. lowercase characters are used to avoid confusion between uppercase alpha character i and the number 1, and uppercase alpha character o and the number 0. active-low signals have a lowercase n at the end of the signal name, for example reset n . active-high signals have a lowercase p at the end of the signal name, for example scao p . the term assert means to drive a signal true or active. the term deassert means to drive a signal false or inactive. you can use the CW4010 core in a variety of designs and with a variety of peripheral logic. for this reason, it is not always possible to identify the agent that asserts and deasserts the i/o signals. the signal descriptions in this manual indicate the states to which the cores i/o signals must be driven. you may then select the design components needed to meet the signal requirements of the core. all interface signals are input to or output from the CW4010 core.
7-2 signals 7.2 signal synchronization all input signals must be synchronized to the rising edge of the system clock outside the CW4010. asynchronous signals, such as resets or interrupts, must be synchronized by at least two sequential ?ip?ops. all output signals are synchronized to the rising edge of the system clock inside the CW4010. 7.3 CW4010 modularity the CW4010 has two module levels, CW4010 core and CW4010 shell, as shown in figure 7.1 . figure 7.1 CW4010 module the CW4010 core module is a process-independent encrypted synthesizable hdl (high-level design language) model. it includes the following internal base units: cp0 (system coprocessor, coprocessor-0) alu (arithmetic and logic unit) isu (instruction scheduler unit) lsu (load store unit) biu (bus interface unit) reset, interrupts scbus writeback buffer coprocessor interface cache invalidation CW4010 core interface ocabus interface CW4010 shell alu isu cp0 mmu dcache set-0 dcache set-1 lsu biu icache set-0 icache set-1 multiplier interface md96.159
CW4010 shell interface signal de?nitions 7-3 the CW4010 shell is a decrypted hdl model that includes some process dependent models and provides the following options: icache con?guration (direct map or two-way set associative) and icache size (0, 1, 2, 4, and 8 kbyte for each set) dcache con?guration (direct map or two-way set associative) and dcache size (0, 1, 2, 4, and 8 kbyte for each set) optional writeback buffer, which can be removed for writethrough dcache con?guration optional tlb for mmu (tlb or dummy tlb) high/low performance multiplier or non-multiplier con?guration with optional multiply/addition function the CW4010 shell hdl model is open. you can change modules individually in the shell to design different con?gurations. internal models of the CW4010 core are not changed by these con?guration changes. the optional modules are provided by lsi logic corporation. 7.4 CW4010 shell interface signal de?nitions the CW4010 shell interface signals are divided into seven categories, as follows. 1. reset signals, which interface to the cp0 unit (see section 7.4.1, reset signals, on page 7-5 ) 2. interrupt signals, which interface to the cp0 (see section 7.4.2, interrupt signals, on page 7-5 3. scbus interface signals, which interface to the biu (see section 7.4.3, scbus interface signals, on page 7-6 ) 4. cache invalidation interface signals, which interface to the isu and lsu (see section 7.4.4, cache invalidation interface signals, on page 7-10 ) 5. coprocessor interface signals, which interface to the isu and lsu (see section 7.4.5, coprocessor interface signals, on page 7-11 ) 6. ocabus interface signals, which interface to the lsu (see section 7.4.6, ocabus interface signals, on page 7-16 ) 7. miscellaneous signals, such as system clock input and endian input (see section 7.4.7, miscellaneous signals, on page 7-20 ) figure 7.2 shows the CW4010 cores interface signals arranged in functional groups.
7-4 signals figure 7.2 CW4010 interface signals md96.223 CW4010 core cresetn wresetn nmln exintn[5:0] exvintn exvap[31:2] exvapen cpbusyn[3:1] cpsreqn[3:1] cpcondp[3:0] fpeoddn fperrxn cprstn[3:1] cpcodep[31:0] cpxstbn[3:1] cpxoddn pstalln pcancrn pcanoddn brlikfn suspexn cpfrcdp[31:0] cptocen cptocdp[31:0] cpfrcen cpmissn cpfixupn sclkp frcmn bendn wstallp scanreqp testmp scdoen scdop[63:0] schgtn scifetn sclockn sctben[7:0] sctbln sctbstn sctpwn sctssn scb32n scberrn scbpwan scbrdyn scbrtyn scdip[63:0] schrqn sctsen scaoen scaop[31:0] cinvap[31:5] icinvsn dcinvsn cpfrcdp[31:0] ocacceptp cpsreqn[3:1] dvaddrp[31:0] accsize[1:0] cptocdp[31:0] cptocen cpfrcen exloadp accstorep crvalidp pstalln resets interrupts coprocessor interface miscellaneous signals scbus interface cache invalidation ocabus interface interface
CW4010 shell interface signal de?nitions 7-5 7.4.1 reset signals this section describes the reset signals which interface to the cp0. cresetn cold system reset input the CW4010 is reset asynchronously when cresetn is asserted. cresetn must be deasserted synchronously on the rising edge of sclkp. all internal states are initialized by asserting cresetn. of all exception inputs, cresetn has the highest priority. when it is deasserted, the cp0 generates a cold reset exception (0xbfc00000). wresetn warm system reset input wresetn must be asserted and deasserted synchronously on the rising edge of the sclkp. while it is asserted, internal states are initialized. when it is deasserted, the cp0 generates a warm reset exception (0xbfc00000). 7.4.2 interrupt signals this section describes the interrupt signals which interface to the cp0. exvap[31:2] external vectored interrupt address input input this is the interrupt vector address. it is accepted by the CW4010 when exvapen is asserted and is written directly into the program counter. exvap[31:2] must remain stable until exvapen is asserted. a virtual address for the exception handler can be provided directly. exvapen exvap enable output this is the enable signal for the interrupt vector address, exvap[31:2]. the CW4010 asserts exvapen to enable the address. exintn[5:0] external interrupts input external logic asserts the exintn[5:0] signals to cause the cp0 in the CW4010 to generate an interrupt exception. assertion of these inputs is indicated in the ip[7:0] ?eld of the cause register. consequently, the interrupting logic should continue to assert the external interrupt input until the exception routine has serviced the interrupt. the interrupt inputs can be individually disabled or masked by setting the appropriate bit in the status register.
7-6 signals external interrupts are not recognized if the interrupt enable bit in the status register is cleared. however, the inputs conditions are shown in the ip bits of the status register. exvintn external vectored interrupt input input exvintn is an external interrupt input that is driven by an external interrupt controller. this feature was not present in previous lsi logic cores. refer to section 4.4.22, external vectored interrupt exception on page 4-48 for further information. nmin non maskable interrupt input nmin is a non-maskable interrupt. when the CW4010 detects that nmin is asserted, the cp0 generates a non- maskable interrupt exception (0xbfc00000). 7.4.3 scbus interface signals this section describes the scbus interface signals, which interface to the biu. scaoen address output enable output when asserted, scaoen indicates that the address output bus scaop[31:0] lines are valid. the CW4010 asserts the signal when the biu is performing an scbus transaction, and the signal remains active throughout the operation. scaoen also enables sctbstn, sctben, and sctpwn. scaop[31:0] address output bus output scaop[31:0] is the address output bus for instruction fetch and data read/write operations. the scaop[31:0] bus is valid only when the address output enable signal scaoen is asserted. it remains valid throughout the operation until scbrdyn, scbrtyn, or scberrn is asserted. scb32n 32-bit bus width sizing input scb32n indicates that the external bus slave on the scbus needs 32-bit bus sizing. the CW4010 samples this signal on the rising edge of the clock that synchronizes scbrdyn. if the signal is asserted for a 64-bit transaction, which is a doubleword or a part of a burst transaction, the biu generates a subsequent 32-bit word transaction and packs data to 64 bits for a read
CW4010 shell interface signal de?nitions 7-7 transaction or unpacks data to 32 bits for a write transaction. scberrn bus error input scberrn is asserted to terminate the current transaction when a bus error occurs. if scbrdyn or scbrtyn is asserted at the same time as scberrn, scberrn has higher priority. scberrn is reported to the cp0 and the cp0 generates an exception. scbpwan bus in-page write accept input scbpwan indicates that the external bus slave on the scbus accepts in-page write transactions. external logic asserts the signal and the CW4010 samples it on the rising edge of the clock that synchronizes scbrdyn. if the sctpwn signal is not asserted, asserting or deasserting scbpwan has no signi?cance. scbrdyn bus ready input the system asserts scbrdyn when the current transaction is terminated. when asserted, it indicates that the scbus is available. the signal remains active (low) until the next transaction starts. the system then deasserts the signal to indicate that the scbus is not available. scbrtyn bus retry input scbrtyn is asserted when the current transaction is terminated unsuccessfully and must be retried later. the control state goes back once to the idle state, then all bus requests are arbitrated again. if there are no other higher priority requests and sctsen is asserted, there is one idle state between the ?rst transaction and a retry transaction. if scbrdyn and scbrtyn are asserted at the same time, scbrtyn has the higher priority. scdip[63:0] data input bus input scdip[63:0] are data bus input signals for instruction fetch and data read transactions. the CW4010 samples scdip[63:0] on the rising edge of the clock when scbrdyn is asserted. byte ordering is little endian. if you are designing a big endian system, the higher order bits, scdip[31:0], must be swapped with the lower order bits, scdip[63:32], outside the CW4010 shell.
7-8 signals scdoen data output enable output scdoen indicates that the data output signals scdop[63:0] are valid. the CW4010 asserts the signal throughout the write transaction to indicate that the current transaction is a write transaction and to enable data output. scdop[63:0] data output bus output scdop[63:0] are the data output bus signals for data write operations and for data writeback to the dcache. the signals are valid throughout the write transaction. byte ordering is little endian. if you are designing a big endian system, the higher order bits, scdop[31:0], must be swapped with the lower order bits, scdop[63:32], outside the CW4010 shell. schgtn bus hold grant output the biu enters the hold state and asserts the grant signal schgtn to indicate that it is releasing scbus ownership in response to a bus hold request, schrqn. schrqn bus hold request input schrqn indicates that an external bus master is requesting ownership of the scbus. the bus hold request has the highest priority during bus arbitration. a bus hold request cannot break continuous transactions of in-page writes and burst read/write transactions if those transactions are supported by an asserted sctsen signal, but must wait until sctsen is deasserted. scifetn instruction fetch output scifetn indicates that the biu is fetching an instruction for monitoring purposes. the CW4010 drives it low at this time and outputs it to external logic. sclockn bus lock output when the CW4010 asserts sclockn, it indicates that the processor wishes to lock the scbus to restrict bus ownership. the CW4010 asserts the signal when a read transaction is started by executing a loadlink instruction in an uncached area or a writethrough cached area. it deasserts the signal just before a write transaction is started by executing a storeconditional instruction. during the read and write transactions, it asserts the signal continuously, preventing bus ownership from
CW4010 shell interface signal de?nitions 7-9 changing during one of these transactions. if a storeconditional transaction hits the dcache in a writeback cached area while sclockn is asserted, an incorrect condition exists, and the CW4010 deasserts sclockn without completing any bus transactions. sctben[7:0] byte enables output sctben[7:0] indicates which byte positions are valid for a transaction. the CW4010 asserts only one of the signals for a byte read or a byte write transaction. it asserts all the signals for a doubleword or a burst transaction. the sctben[7:0] signals are valid when the CW4010 asserts scaoen. sctbln burst last doubleword output the CW4010 deasserts sctbln while the ?rst, second, and third doubleword of a burst transaction is being read or written. otherwise, it asserts the signal, which is valid on the rising edge of the system clock. sctbstn burst transaction output when the CW4010 asserts sctbstn, it indicates that a transaction is taking place during which four doublewords will be moved, and that currently the ?rst doubleword is being moved. it deasserts the signal after the ?rst word has been transferred and during singleword transactions. sctpwn next transaction is in-page write output when the CW4010 asserts sctpwn, it indicates that the next transaction is in the same dram page, as de?ned in the con?guration register. when the CW4010 asserts this signal, a maximum of four write transactions take place one after the other, even if there is an instruction fetch request or data read request. if four write transactions are performed continuously, the CW4010 asserts sctpwn from the ?rst through the third transaction and deasserts it for the last (fourth) transaction. the CW4010 asserts sctpwn from the beginning of one in-page write transaction to the end of that transaction. the write buffer in the lsu checks to see if the subsequent write request is in the same page. sctsen transaction start enable input sctsen enables or disables a new scbus transaction. transaction requests are arbitrated only when sctsen is
7-10 signals asserted. the signal must be deasserted and then asserted when scbrdyn is asserted to allow an idle cycle between two transactions. during the time sctsen is deasserted, the biu repeats the idle state. sctssn transaction start strobe output when the CW4010 asserts sctssn, it indicates that a transaction has started. the CW4010 asserts the signal for one clock cycle at the beginning of a transaction. if the transaction lasts through one cycle and the next transaction begins immediately, it asserts sctssn continuously. 7.4.4 cache invalidation interface signals this section describes the cache invalidation interface signals, which interface to the isu, lsu, and cp0. cinvap[31:5] cache invalidation address bus input the cinvap[31:5] input bus is the address input bus for dcache and icache invalidation. when an external bus master writes data into the main memory, the address must be checked in the dcache and the icache. if the address is cached, the line must be invalidated. the CW4010 samples the bus when dcinvsn or icinvsn is asserted. dcinvsn dcache invalidation strobe input when asserted, dcinvsn indicates the cache invalidation address bus is valid and that there is need for a snooping sequence. if the cache tag is not coincident with higher address bits, the line is not invalidated. icinvsn icache invalidation strobe input when asserted, icinvsn indicates the cache invalidation address bus is valid and that there is need for a snooping sequence. if the cache tag is not coincident with higher address bits, the line is not invalidated.
CW4010 shell interface signal de?nitions 7-11 7.4.5 coprocessor interface signals this section describes the coprocessor interface signals, which interface with the isu and lsu. brlikfn branchlikely of even slot is false output the CW4010 asserts brlikfn when the branchlikely instruction is in an even slot and it is false. if, at this time, a coprocessor has a valid instruction in the ex stage, the instruction must be cancelled. it is not necessary to check whether the instruction in the ex stage is in an even or odd slot, since the CW4010 asserts brlikfn only when the branchlikely instruction is in the even slot. if the branchlikely instruction in the even slot is not taken, the instruction in the odd slot must be nulli?ed although it has already been started. this signal is valid at the ex stage of the pipeline. cpbusyn[3:1] coprocessor busy input these inputs are asserted when the external coprocessors are busy and cannot accept any coprocessor operations. cpbusyn3 is associated with the cp3, cpbusyn2 with cp2, and cpbusyn1 with cp1. the isu does not assert the execution strobe, cpxstbn[3:1], when the related cpbusyn signal is asserted, and the CW4010 stalls until the busy signal is deasserted. each coprocessor is independent and asserts its busy signal from the ex stage. the CW4010 examines the cpbusyn signal at the rd stage of the pipeline, on the rising edge of the clock. this signal is sampled at the rd stage of the pipeline. cpcodep[31:0] cp instruction code bus output this bus outputs the entire instruction bit ?eld at the rd stage. it is valid when the CW4010 asserts one of the cpxstbn[3:1] lines. although the CW4010 can execute two instructions per cycle, only one coprocessor instruction can be issued in one cycle. cpcodep[31:0] are the selected outputs of the even and odd instruction slot. external logic must sample the bus on the rising edge of the system clock when the CW4010 asserts strobe cpxstbn.
7-12 signals it is not necessary to decode all bits of an instruction because the execution strobe signal is a partial decoding signal. cpcodep[31:0] are valid at the rd stage of the pipeline. cpcondp[3:0] coprocessor condition input these inputs are used for the coprocessor condition branch instruction. the CW4010 samples the inputs in the isu at the ex stage of a conditional branch instruction. the four inputs are associated with the four possible coprocessors (cp0-cp3). cpcondp0 is for cp0. however, cp0 does not need it, and cpcondp0 is therefore used as a general purpose condition input. these signals are valid at the ex stage of the pipeline. cpfixupn data fixup cycle strobe for lwcz cache miss output the CW4010 asserts cpfixupn when correct data is output on cptocdp[31:0] during a ?xup cycle. it asserts the signal during stall cycles because lwcz cache misses cause the pipeline to stall until the data is read. cpfrcdp[31:0] data from coprocessor input this bus inputs data from a coprocessor register to a CW4010 cpu general purpose register or to memory. data on the bus is valid when CW4010 asserts the data enable signal, cpfrcen. if there are several external coprocessors, the data bus must be multiplexed outside the CW4010 shell. this signal is sampled at the cr stage of the pipeline. cpfrcen data from coprocessor enable output the CW4010 asserts this signal to enable the data input bus cpfrcdp[31:0]. coprocessors can generate the same information from the instruction code, cpcodep, by tracking the pipeline stage. external logic must decode the coprocessor number from cpcodep[31:0]. if the pipeline enters a stall condition when there is a coprocessor data movement instruction in the cr stage, the CW4010 asserts cptocen continuously until the stall condition is resolved. this signal is valid at the cr stage of the pipeline.
CW4010 shell interface signal de?nitions 7-13 cpmissn data cache miss strobe for lwcz output the CW4010 asserts cpmissn at the cr stage of an lwcz instruction when a dcache miss occurs. data at the cr stage is not correct and the correct data is put on cptocdp[31:0] during a later ?xup cycle. the CW4010 asserts pstalln from the wb stage of the lwcz instruction. this signal is valid at the cr stage of the pipeline. cprstn[3:1] coprocessor reset output these outputs indicate the condition of cu bit [3:1] in cp0s status register. if the cu bit is 0, the CW4010 asserts the corresponding cprstn[3:1] output. the CW4010 asserts the cprstn[3:1] signals when a cold reset is asserted. at this time, the cu bits are cleared. the cu bits are not cleared when a warm reset is asserted. the cprstn[3:1] outputs allow the system designer to use software resets for external coprocessors. cpsreqn[3:1] coprocessor stall request input the external coprocessors assert these signals when they need to request a pipeline stall. coprocessors can assert cpsreqn[3:1] while a previous coprocessor instruction is being executed, after decoding a coprocessor instruction, and after the rd stage. the CW4010 asserts pstalln immediately when in response to one of these signals. cptocdp[31:0] data to coprocessor output this bus outputs data to a coprocessor register from a CW4010 cpu general purpose register or from memory. data on the bus is valid when the CW4010 asserts the data enable signal, cptocen. these signals are valid at the cr stage of the pipeline. cptocen data to coprocessor enable output the CW4010 asserts this signal to indicate when the data output bus, cptocdp[31:0], is valid. coprocessors can generate the same information from the instruction code, cpcodep, by tracking the pipeline stage. the coprocessor number must be decoded from
7-14 signals cpcodep[31:0]. if the pipeline enters a stall condition when there is a coprocessor data movement instruction in the cr stage, the CW4010 asserts cptocen continuously until the stall condition is resolved. this signal is valid at the cr stage of the pipeline. cpxoddn coprocessor instruction at odd slot output when the CW4010 asserts an execution strobe, it also asserts cpxoddn to indicate that the coprocessor instruction is in the odd slot. this information must be kept in the coprocessor pipeline until the cr stage. it is used to determine whether or not the instruction should be cancelled when the cancellation signal is asserted. this signal is valid at the rd stage of the pipeline. cpxstbn[3:1] coprocessor instruction execution strobe output these strobe signals indicate the start of a coprocessor operation that involves data movement. the CW4010 asserts only one of the signals during a clock cycle. cpxstbn[3:1] are partial decoding signals for an instruction. for example, when the CW4010 asserts cpxstbn1, cp1 turns on, and so forth. the isu also uses the signals to check for resource con?icts, including coprocessor busy signals. cpxstbn[3:1] are valid at the rd stage of the pipeline. fpeoddn fpu error exception in odd slot input fpeoddn indicates whether the instruction that caused an fpu exception (fperrxn assertion) is in an even slot (fpeoddn = high) or odd slot (fpeoddn = low) when it started at the rd stage. the CW4010 ignores the fpeoddn signal when fperrxn is deasserted. when the instruction is started at an rd stage, the cpxoddn signal informs the coprocessor that the instruction is in an even or odd slot. to handle a pipeline cancel correctly, the coprocessor must keep the instruction in its pipeline registers. to execute an fpu exception precisely, the coprocessor that asserts fperrxn at the ex stage must drive fpeoddn correctly according to the even/odd status of the ex pipeline stage.
CW4010 shell interface signal de?nitions 7-15 if the fperrxn exception does not need to be treated precisely, the fpeoddn input must be strapped high to cancel the pipeline, in the same way as an interrupt exception. fperrxn floating point unit error exception input fperrxn is an exception input, used speci?cally with an fpu coprocessor. the CW4010 samples the signal at any time in the ex stage and issues a pipeline cancel signal at the cr stage, in the same way as exintn. in the cause register, exception code 15 is shown for the exception if it is the highest priority. fperrxn can be used as a user-de?ned coprocessor exception input. fperrxn must be treated precisely . the fpu asserts fperrxn at the ex stage of the instruction with the fpeoddn signal assertion/deassertion. the CW4010 asserts the pipeline cancel signal at the cr stage with the correct even/odd cancel signal. pcancrn pipeline cancel at cr stage output when one or more exceptions occurs, the pipeline is cancelled at the cr stage and the CW4010 asserts pcancrn. coprocessor pipelines must be cancelled to prevent a second execution of the coprocessor instruction under either one of the following conditions: when the coprocessor returns from an exception handler; or when the coprocessor has ?nished executing an lwcz instruction that caused a tlb miss. the wb stage is not cancelled when pcancrn is asserted. this signal is valid at the cr stage of the pipeline. pcanoddn pipeline cancel is for odd slot output pcanoddn is valid only when pcancrn is asserted. the signal informs coprocessors whether the cancellation is for an odd or even slot. when the CW4010 asserts the signal, cancellation applies to the odd slot. when it is deasserts the signal, cancellation applies to both even and odd slots. the coprocessor must track which slot it is executing in based on the cpxoddn signal. when the CW4010 asserts both pcancrn and pcanoddn and the coprocessor instruction is in the odd slot, the instruction must be cancelled. when the CW4010 asserts
7-16 signals pcancrn and deasserts pcanoddn, the coprocessor instruction must be cancelled regardless of which slot it is operating in. this signal is valid at the cr stage of the pipeline. pstalln pipeline stall broadcasting signal output the CW4010 asserts this signal to indicate that the entire CW4010 pipeline is stalled. coprocessor pipelines must be stalled when they are executing instructions. the CW4010 asserts pstalln for all pipeline stalls and for an lwcz instruction dcache miss. suspexn suspend ex stage output the CW4010 instruction scheduler unit (isu) asserts suspexn to request coprocessors to suspend the instruction in the ex stage. the instruction in the ex stage must be held until the isu deasserts suspexn. instructions in the cr and wb stages must be completed. this signal is valid at the ex stage of the pipeline. 7.4.6 ocabus interface signals the CW4010 core has an on-chip access (oca) interface that allows on-chip modules to be accessed at the cr stage of the pipeline without going through an scbus transaction. this improves performance since it reduces traf?c on the scbus and therefore, reduces latency. if the module that is the target of the transaction can respond in one clock cycle, there is no penalty for a read or write cycle. a read access on the scbus has at least a four-clock penalty, and a write access is done through a four-deep write buffer. the on-chip modules can be accessed from the scbus. the CW4010 is the only bus master for the ocabus. instructions cannot be fetched through the ocabus. this section describes the ocabus interface signals.
CW4010 shell interface signal de?nitions 7-17 accsize[1:0] ocabus transaction size output these signals indicate the transaction size of an ocabus transaction. the signals are valid when the CW4010 asserts either exloadp or accstorep. these signals are valid at the ex stage of the pipeline. accstorep ocabus ex stage store operation output the CW4010 asserts this signal when a store instruction is being executed in the ex stage of the CW4010 pipeline. the CW4010 asserts the signal when dvaddrp[31:0] and accsizep[1:0] are valid. dvaddrp[31:0] is decoded when the CW4010 asserts accstorep. if the resulting address is for a device on the ocabus, ocacceptp is asserted. this signal is valid at the ex stage of the pipeline. cpfrcdp[31:0] data from coprocessor input this bus inputs data from a coprocessor register to a CW4010 cpu general purpose register or to memory. it is valid when the CW4010 asserts the data enable signal, cpfrcen. if there are several coprocessors, the data bus must be multiplexed. these signals are valid at the cr stage of the pipeline. cpfrcen data from coprocessor enable output the CW4010 asserts this signal to indicate when data on the input bus, cpfrcdp[31:0], is valid. coprocessors can generate the same information from the instruction code, cpcodep[31:0], by tracking the pipeline stage, and decode the coprocessor number from cpcodep[31:0]. if the pipeline enters a stall condition when there is a coprocessor data movement instruction in the cr stage, the CW4010 asserts cptocen continuously until the stall condition is resolved. this signal is valid at the cr stage of the pipeline. accsize[1:0] transaction size 00 one byte 01 halfword 10 tribyte 11 one word
7-18 signals cpsreqn[3:1] coprocessor stall request input the coprocessors assert these signals inputs when they coprocessors need to request a pipeline stall. coprocessors can assert cpsreqn[3:1] while a previous coprocessor instruction is being executed, after decoding a coprocessor instruction, and after the rd stage. the CW4010 asserts pstalln immediately when one of these signals is asserted. this signals are issued at the cr stage of the pipeline. cptocdp[31:0] data to coprocessor output this bus outputs data to a coprocessor register from a CW4010 cpu general purpose register or from memory. the bus is valid when the CW4010 asserts the data enable signal, cptocen. these signals are valid at the cr stage of the pipeline. cptocen data to coprocessor enable output the CW4010 asserts this signal to indicate when the data output bus, cptocdp[31:0], is valid. coprocessors can generate the same information from the instruction code, cpcodep[31:0], by tracking the pipeline stage, decode the coprocessor number from cpcodep[31:0]. if the pipeline enters a stall condition when there is a coprocessor data movement instruction in the cr stage, the CW4010 asserts cptocen continuously until the stall condition is resolved. this signal is valid at the cr stage of the pipeline. crvalidp ocabus cr stage valid output the CW4010 asserts this signal when the cr stage of a load or store instruction is valid after it has asserted exloadp or accstorep. if the load or store instruction is cancelled, the CW4010 deasserts crvalidp and the load/store operation must be cancelled. this signal is valid at the cr stage of the pipeline.
CW4010 shell interface signal de?nitions 7-19 dvaddrp[31:0] ocabus virtual address output this is the output bus for the ocabus virtual address. it is used for a load or store instruction being executed at the ex stage for the ocabus. the signal is valid when the CW4010 asserts either exloadp or accstorep. these signals are valid at the ex stage of the pipeline. exloadp ocabus ex state load operation output the CW4010 asserts this signal when a load instruction is being executed in the ex stage of the CW4010 pipeline. it asserts the signal when dvaddrp[31:0] and accsize[1:0] are valid. dvaddrp[31:0] is decoded when the CW4010 asserts exloadp. if the resulting address is for a device on the ocabus, ocacceptp is asserted. this signal is valid at the ex stage of the pipeline. ocacceptp ocabus transaction accepted input when the oca module can accept an oca transaction, it asserts this signal. the signal is an output from the dvaddrp address decoder and it is asserted at the cr stage. when it is asserted for a read operation, the lsu selects cpfrcdp[31:0] as the data input. when it is asserted for a write operation, the data cptocdp[31:0] is written into an oca device at the cr stage. the data is therefore not written into the dcache and an scbus transaction is not requested. this signal is valid at the cr stage of the pipeline. pstalln pipeline stall broadcasting signal output the CW4010 asserts this signal to indicate that all stages of the CW4010 pipelines are stalled. pipelines must be stalled when they are executing instructions. this signal is valid at any stage of the pipeline.
7-20 signals 7.4.7 miscellaneous signals this section describes the miscellaneous signals used by the CW4010. bendn big endian input bendn is a static input and must be tied low for big endian addressing and high for little endian addressing. bendn affects the byte positions for sizing and load/store data alignment. for big endian mode, the upper 32 bits of scdip must be swapped with the lower 32 bits, and the upper 32 bits of scdop must be swapped with the lower 32 bits. frcmn force cache miss input asserting frcmn forces a cache miss for both the icache and dcache. the CW4010 treats the transaction as an access to an uncached area. frcmn is useful for debugging the system. this is a static input. it is tied low for software debugging. scanreqp scan debug event output the CW4010 asserts this signal to indicate that the cp0 has detected a scan debug event. clock circuitry external to the core uses this information to determine when to drop out of a normal operating mode into scan debug mode. sclkp system clock input sclkp is the processor system clock input. it provides basic timing for the CW4010 core and determines the instruction cycle times. internal core logic operates synchronously with the rising edge of sclkp. since the core processor operates at 80 mhz, you must supply an 80 mhz clock. this clock input is used for all core modules. testmp test mode enable input testmp is used for testing. it is a static input and must be tied low during normal operation. wstallp wait interrupt stall output wstallp indicates that internal pipeline stages have entered a stall condition by executing a waiti (wait interrupt) instruction. the CW4010 asserts wstallp when the instruction is at the wb stage of the CW4010 pipeline, and the signal remains active until the CW4010 receives an external exception (enabled external interrupt, nmin, cold reset, or warm reset).
8-1 chapter 8 interface operation this chapter examines various CW4010 functional timing scenarios. it does not deal with all timing cases, however, it covers the main timing related to CW4010 transactions. for details of the operation of each of the signals discussed, refer to chapter 7, signals. this chapter has the following sections: section 8.1, reset and exception signals, on page 8-1 section 8.2, scbus interface behavior, on page 8-19 section 8.3, ocabus interface behavior, on page 8-36 section 8.4, cache interface behavior, on page 8-44 section 8.5, coprocessor interface behavior, on page 8-46 in the timing diagrams shown in this chapter, all inputs and all outputs must be synchronized to the rising edge of the system clock. all inputs require setup and hold time and all outputs have valid delay times from the clock edge to the appearance of a valid level. 8.1 reset and exception signals the CW4010 has the following reset and exception inputs that connect to coprocessor-0: cold reset warm reset non-maskable interrupt bus error floating point unit exception interrupts the above inputs must be synchronized to the rising edge of the system clock.
8-2 interface operation 8.1.1 cold reset (cresetn) the primary purpose of a cold reset is to initialize the CW4010 core at power up. when asserted, cresetn initializes the internal states and control registers in the CW4010. cresetn does not initialize general purpose registers, icache, dcache, or the mmu tlb. cresetn can be asserted asynchronously, but it must be active for at least two system clock cycles and be deasserted on the rising edge of the system clock. the CW4010 considers cresetn a non-maskable exception and is in idle mode during the period that cresetn is asserted. figure 8.1 shows the timing for a cold reset and the start of an instruction fetch after cresetn is deasserted. figure 8.1 cold reset and pipeline sclkp cresetn instruction 1 instruction 0 reset instruction 0 reset instruction 1 if if rd rd cancelled cancelled if if stall due to icache miss stall due to icache miss t1 t2 t3 t4 t5 t6 t7 clock cycles md96.94
reset and exception signals 8-3 8.1.2 handling cold resets the cpu provides a special interrupt vector (0xbfc00000) for the cresetn exception. the reset vector resides in unmapped and uncached cpu virtual address space, so the hardware does not need to initialize the tlb or the cache to handle the exception. the processor can fetch and execute instructions while the caches and virtual memory are in an unde?ned state. for further information on this subject refer to section 4.4.5, cold reset exception on page 4-32 . the contents of all registers in the cpu are unde?ned when the cresetn exception occurs except for the following: in the status register, the cu[3:0] and sr bits are cleared to zero, and the erl and bev bits are set to one. the other bits in the register are unde?ned. the random register is initialized to the value of its upper bound. the wired register is initialized to zero. 8.1.2.1 servicing cold resets to service the cresetn exception, you should initialize all processor registers, coprocessor registers, caches, and the memory system. you can do this by performing diagnostic tests and by bootstrapping the operating system. 8.1.3 warm reset (wresetn) the primary purpose of the wresetn exception is to reinitialize the processor after a fatal error. when asserted, wresetn initializes the CW4010 internal states and control registers. wresetn does not initialize general purpose registers, icache, dcache, or the mmu tlb. wresetn must be asserted and deasserted on the rising edge of the system clock. it must remain active for at least two system clock cycles. wresetn is a non-maskable exception and the CW4010 is in idle mode during the period it is asserted. the start of the instruction fetch after wresetn is deasserted is the same as that of cresetn, as shown in figure 8.1 .
8-4 interface operation 8.1.3.1 handling warm resets the reset exception vector (0xbfc00000) is used for the wresetn exception. the reset vector resides in unmapped and uncached cpu virtual address space, so the hardware does not need to initialize the tlb or the cache to handle the exception. the sr bit of the status register is set to distinguish the wresetn exception from the cresetn exception. unlike a non-maskable interrupt, wresetn resets bus state machines. like cresetn, it can be used on the processor in any state. the contents of all registers are preserved when wresetn occurs, except for the following: errorpc register, which contains the restart pc. erl and bev bits of the status register, which are set to 1. sr bit of the status register, which is set to 1. because wresetn can abort cache and bus operations, cache and memory contents are unde?ned after the wresetn exception occurs. for further information on this subject refer to section 4.4.6, warm reset exception on page 4-33 . 8.1.3.2 servicing warm resets to service the wresetn, you should save the current processor state to use for diagnostic purposes, and also to reinitialize all processor registers, the coprocessor and the memory system.
reset and exception signals 8-5 8.1.4 non-maskable interrupt (nmin) the non-maskable interrupt input nmin must be asserted and deasserted on the rising edge of the system clock. when nmin is sampled and found to be active on the rising edge of the clock, the cp0 provides an non-maskable exception vector (0xbfc00000). figure 8.2 shows the timing diagram for the fastest detected case. figure 8.3 shows the case in which nmin is not serviced immediately because of a pipeline stall. the CW4010 detects the falling edge of nmin and latches the signal until it is ready to service it. figure 8.2 nmin and pipeline (nmin is detected immediately) sclkp nmin pcancrn pcanoddn instruction 0, 1 if rd ex cr if rd ex if rd if instruction 2, 3 instruction 4, 5 instruction 6, 7 exception if cancelled cancelled cancelled cancelled latched pstalln t1 t2 t3 t4 t5 clock cycles md96.95 t6 internal nmi instruction 0, 1
8-6 interface operation figure 8.3 nmin and pipeline (nmin is not detected immediately due to stall) 8.1.4.1 handling a non-maskable interrupt the reset exception vector (0xbfc00000) is also used for the nmin exception. the reset vector resides in unmapped and uncached cpu address space so that the hardware does not need to initialize the tlb or the cache to handle nmin. the sr bit of the status register is set to differentiate this exception from a cresetn exception. because an nmin could occur in the middle of another exception, program execution cannot continue after nmin has been serviced. unlike cold and warm reset, but like other exceptions, a non-maskable interrupt is taken only at instruction boundaries. the nmin exception preserves the state of the caches and memory system. for further sclkp nmin latched pcancrn pcanoddn pstalln instruction 0, 1 instruction 2, 3 ex ex ex ex ex ex cr cancelled rd rd rd rd rd rd ex cancelled if rd cancelled if cancelled if instruction 4, 5 instruction 6, 7 exception t1 t2 t3 t4 t5 t6 t7 t8 t9 clock cycles md96.96 internal nmin instruction 0, 1
reset and exception signals 8-7 information on this subject refer to section 4.4.6, warm reset exception on page 4-33 . the contents of all registers in the cpu are preserved when this exception occurs, except for the following: the errorpc register, which contains the restart pc. the erl and bev bits of the status register, which are set to one. the sr bit of the status register, which is set to one. 8.1.4.2 servicing a non-maskable interrupt to service the nmin exception save the current processor state for diagnostic purposes, and for reinitializing the system, including all processor registers, coprocessor registers, caches, and the memory system. 8.1.5 bus error (scberrn) a bus error exception occurs when board-level circuitry detects events such as bus time-outs, bus parity errors, and invalid physical memory accesses. the scberrn exception is not maskable. in the CW4010, bus errors are asynchronous events with respect to cpu instruction processing (much like the nmin interrupt), which means that there is no attempt to identify the instruction that was the root source of the error. the scberrn input from the scbus interface terminates a transaction and generates an exception to inform the CW4010 that an scbus transaction has not been successfully completed. when the CW4010 is driving the scbus, it detects the assertion of scberrn. scberrn assertion should be a synchronous one clock cycle strobe, which is latched in CW4010 until it is serviced. figure 8.4 shows the timing diagram in which scberrn is serviced immediately and figure 8.5 shows how the exception is serviced later because of stall cycles.
8-8 interface operation figure 8.4 bus error and pipeline (detected immediately) sclkp sctssn scaoen scberrn pcancrn pstalln instruction 0, 1 ex cr cancelled if rd if rd ex cancelled if rd cancelled if cancelled instruction 2, 3 instruction 4, 5 instruction 6, 7 exception t1 t2 t3 t4 t5 if latched internal bus error t6 clock cycles md96.97 instruction 0, 1
reset and exception signals 8-9 figure 8.5 bus error and pipeline (with stall cycles) 8.1.5.1 handling bus errors the common exception vector, shown in table 8.1 , is used for the scberrn exception. the exccode ?eld in the cause register is set to bus. sclkp sctssn scaoen scberrn pcancrn pstalln instruction 0, 1 ex cr cr cr cr wb rd ex ex ex ex cr cancelled ex rd cancelled cancelled instruction 2, 3 instruction 4, 5 instruction 6, 7 rd latched internal bus error clock cycles t1 t2 t3 t4 t5 t6 t7 t8 md96.98 exception if instruction 0, 1
8-10 interface operation t a b l e8 . 1 c o m m o n e x c e p ti o n v e c t o r status register ccc register dev r3000 mode r4000 mode 0 0x80000080 0x80000180 1 0xbfc00180 0xbfc00380
reset and exception signals 8-11 the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. 8.1.5.2 servicing bus errors the physical address at which the fault occurred is not available to the exception handler. the process executing at the time of the exception must be handed a bus error signal, which is usually fatal. 8.1.6 floating-point unit (fperrxn) exceptions the CW4010 coprocessor interface uses the fperrxn input to detect a floating-point unit (fpu) exception. refer to section 8.5, coprocessor interface behavior on page 8-46 , for more details about the fperrxn. figure 8.6 shows the case where fperrxn is serviced immediately. figure 8.6 fpu exception and pipeline (detected immediately) sclkp cpcodep cpxstbn fperrxn pcancrn pstalln cp instruction 1 instruction 2, 3 rd ex cr rd ex rd if if if cancelled cancelled cancelled cancelled instruction 4, 5 instruction 6, 7 exception t1 t2 t3 t4 if clock cycles t5 md96.99 instruction 0, 1
8-12 interface operation if the CW4010 enters a stall condition at the ex stage and fperrxn is asserted, it should be asserted continuously until the stall condition is resolved. fperrxn is not latched in the CW4010. this condition is shown in figure 8.7 . figure 8.7 fpu exception and pipeline (with stall cycles) cp instruction 1 instruction 2, 3 rd ex ex rd rd if instruction 4, 5 instruction 6, 7 exception sclkp cpcodep cpxstbn fperrxn pcancrn pstalln ex ex cr cancelled rd rd ex cancelled if rd if cancelled cancelled if t1 t2 t3 t4 t5 t6 t7 clock cycles md96.100 instruction 0, 1
reset and exception signals 8-13 if the previous instruction cancels the pipeline (the pipeline cancel signal is asserted at the same time as fperrxn), fperrxn is ignored and must be deasserted. fperrxn should be asserted later when the coprocessor instruction is re-executed. this condition is shown in figure 8.8 . figure 8.8 fpu exception and pipeline (cancel, then not serviced) 8.1.6.1 handling fpu exceptions the common exception vector is used for the fperrxn exception. the exccode ?eld in the cause register is set to fpe (15). the coprocessor asserts fperrxn at the cr stage and the CW4010 samples it at the end of the ex stage. the coprocessor instruction in this slot causes the fperrxn exception. the epc register points at the coprocessor instruction, which starts at the rd stage just before the coprocessor asserts fperrxn. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. sclkp cpcodep cpxstbn fperrxn pcancrn pstalln cp instruction 1 instruction 2, 3 rd ex cancelled rd cancelled if if cancelled if instruction 4, 5 exception t1 t2 t3 t4 (not an fpu t5 clock cycles md96.101 instruction 0, 1 exception)
8-14 interface operation refer to section 4.4.18, floating-point exception on page 4-45 for further information on this subject. 8.1.6.2 servicing fpu exceptions coprocessor interface signals indicate whether the coprocessor instruction is in an even or odd slot. the coprocessor must control these signals correctly to synchronize the exception and the pipeline ?ow. the fperrxn exception input can be used as a general purpose exception input. 8.1.7 external interrupts (extintn) the CW4010 has six external interrupt inputs, exintn[5:0], which must be asserted and deasserted on the rising edge of the system clock. to mask all six external interrupts at once, you can clear the ie bit of the status register. to mask each interrupt individually, program the int bits in the status register (refer to section 4.3.6, status register (12) on page 4-9 for further information about this register.) the instruction fetch for the exception procedure starts two clocks after an external interrupt has been detected, provided that the pipeline is not in a stall state and there is no higher priority exception. figure 8.9 shows the timing diagram where an interrupt is immediately detected.
reset and exception signals 8-15 figure 8.9 interrupt and pipeline (interrupt is detected immediately) an extintn exception is similar to an nmin exception, except that external interrupts are not latched internally, and must be asserted until they are serviced. if the pipeline is in a stall cycle, the CW4010 does not service interrupts until the stall condition is resolved. 8.1.7.1 handling external interrupts the common exception vector is used for the extintn exception. the exccode ?eld in the cause register is set to int0. the ip ?eld of the cause register indicates the current interrupt requests. more than one of the bits may be set at the same time. none of the bits may be set, if an interrupt is asserted and then deasserted before the CW4010 reads the cause register. the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the instruction is in a branch delay slot, the epc register points at the preceding branch instruction, and the bd bit of the cause register is set. refer to section 4.4.21, interrupt exception on page 4-47 for further information on this subject. sclkp exintn pcancrn pcanoddn pstalln t1 t2 t3 t4 t5 instruction 0, 1 if rd ex cr if rd ex if rd if instruction 2, 3 instruction 4, 5 instruction 6, 7 exception if cancelled cancelled cancelled cancelled t6 clock cycles md96.102 instruction 0, 1
8-16 interface operation 8.1.7.2 servicing external interrupts if one of two software generated exceptions causes the interrupt, clear the corresponding cause register bit to zero to clear the interrupt condition. if the interrupt is hardware generated, correct the condition that caused the assertion of the interrupt pin to clear the interrupt condition. 8.1.8 external vectored interrupt (exvintn) the CW4010 has an external vectored interrupt input, exvintn. the exvap[31:2] inputs provide the interrupt vector virtual address, so the common exception vector base and offset are not used. exvintn must be asserted and deasserted on the rising edge of the system clock. when the exvintn has been sampled and found active on the rising edge of the clock, cp0 samples an exception vector from exvap[31:2], which is available when the enable bit evi in the ccc is set. to mask the exvintn interrupt at once, you can clear the ie bit of the status register. figure 8.10 shows the fastest accepted case of exvintn. if the pipeline is stalled, it requires more clock cycles. when exvapen is asserted, the system may drive exvap[31:2].
reset and exception signals 8-17 figure 8.10 fastest accepted case of external vectored interrupt 8.1.8.1 handling external vectored interrupts the external vectored interrupt feature is available when the evi bit in the ccc register is set. exvintn has lower priority than the six external interrupts exintn[5:0], but higher priority than the debug exception. to mask exvintn, you can use the interrupt enable bit in the status register in a similar way to that used for external interrupts. if exvintn is accepted, the cp0 reads the exception vector address on exvap[31:2] and writes it into the program counter directly. a user- de?ned interrupt controller provides exvap[31:2], so that the CW4010 jumps to the interrupt handler directly when it is requested. exvap[31:2] must be stable until exvapen is asserted. exvintn does not alter anything in the cause register except the bd bit. the epc register points at the ?rst instruction for which processing was not completed, unless this instruction is in a branch delay slot. if the sclkp exvintn exvap[31:2] exvapen pcancrn instruction 0, 1 instruction 2, 3 cancelled cancelled cancelled instruction 4, 5 exception t1 t2 t3 t4 t5 t6 t7 rd ex cr rd ex rd if rd clock cycles exception address md96.103 instruction
8-18 interface operation instruction is in a branch delay slot, the epc register points at the preceding branch instruction and the bd bit of the cause register is set. refer to section 4.4.22, external vectored interrupt exception on page 4-48 for further information on this subject. 8.1.9 waiti instruction and wstallp the CW4010 uses the waiti instruction, which is one of its extended instructions, to initiate a wait state. this stalls the pipeline and reduces power consumption during the period that the CW4010 is inactive. the CW4010 wakes up when it detects an external exception input (enabled interrupt, nmin, warm reset, or cold reset). figure 8.11 shows the timing diagram for the waiti instruction. figure 8.11 waiti and pipeline stall (wstallp) t1 t2 t3 t4 t5 t6 t7 t8 t9 clock cycles sclkp waiti instruction instruction 1 instruction 2 instruction 3 instruction 4 ex cr rd ex wb cr rd ex cr cr cr cr cr cr cr cr cr if rd ex ex if rd ex ex ex ex ex ex ex ex cancelled cancelled exintn pstalln wstallp pcancrn cancelled cancelled md96.104
scbus interface behavior 8-19 the coprocessor interface signal, pstalln, is also asserted when the pipeline stage is in the stall condition. at t1, the cp0 starts executing a waiti instruction. at t3 (which occurs in the wb stage of the pipeline), the cp0 requests pipeline stall and the CW4010 asserts wstallp. at t6, an external interrupt input is asserted and the CW4010 wakes up from t7. at t8, the instructions in the pipeline stages are cancelled, and the if stage for the exception is started from t9. 8.2 scbus interface behavior the CW4010 generates one or more external data read/write transaction(s) on the scbus under any of the following conditions: uncached area instruction fetch icache-miss uncached area data read/write load dcache-miss any store execution in writethrough mode dcache writeback the scbus is a ?exible address/data bus. it is demultiplexed and synchronized to the system clock. it has a data width of 64 bits, but supports one type of bus sizing from a 64-bit width to a to 32-bit width. the scbus has the following transaction data sizes: byte, halfword, tribyte, 32-bit word, 64-bit doubleword, or 8-word burst (4-doubleword burst), as shown in table 8.2
8-20 interface operation . the CW4010 has a four line depth write buffer for uncached, dcache miss, or writethrough store operations. each line in the buffer contains 32-bits of address and 64-bits of data. if word data is stored to a continuous same-doubleword alignment address, two words are stored in one line. the CW4010 then requests a doubleword write transaction on the scbus, which the sizing function can separate into two 32-bit write transactions. 8.2.1 scbus basic transaction figure 8.12 shows a basic scbus transaction for a single read and write. it is a three-clock-cycle transaction, which means that the scbrdyn assertion is sampled on the rising edge of the third clock edge from the beginning of the transaction cycle. the number of clock cycles for the fastest transaction is one clock, in which case sctssn is asserted continuously if the next transaction starts just after the current one. there is no limit to the maximum number of clock cycles for a transaction. a bus watch dog timer must be designed outside the core to assert the bus error signal scberrn, if necessary, when the transaction length is longer than the speci?cation. table 8.2 scbus transaction types cause of scbus transaction transaction type no. of bytes uncached instruction fetch doubleword 8 instruction cache-miss 8 words 32 data read by uncached load instruction byte, halfword, tribyte, word 1, 2, 3, 4 data read by dcache-miss load instruction 8 words 32 data write by uncached store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by dcache-miss store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by writethrough store instruction byte, halfword, tribyte, word, doubleword 1, 2, 3, 4, 8 data write by writeback 8 words 32
scbus interface behavior 8-21 figure 8.12 scbus basic transaction at the beginning of a transaction, the transaction start strobe sctssn is asserted for one clock cycle. in addition, the address is output on the scaop[31:0] lines and the address output enable signal scaoen is asserted to indicate that scaop[31:0] is valid. the byte enable signals sctben[7:0] are also output. if the transaction is a four doubleword burst, sctbstn is asserted during the ?rst transaction. if the transaction is an in-page write, which means that the next transaction is in the same page, sctpwn is asserted. it is not asserted for burst write transactions. the sctbstn and sctpwn status indication signals are valid by the end of the transaction. sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn 4 sctbstn, sctpwn 5 sctsen scbrdyn arbitration cycle t1 t2 t3 clock cycles address data 1. read cycle. 2. write cycle. 3. high = read, low = write. 4. asserted at ?rst cycle. 5. low for in-page write. md96.105 data sctben[7:0]
8-22 interface operation if the transaction is a data write, data is output to the scdop[63:0] lines and scdoen is asserted from the beginning to the end of the transaction. if the transaction is a data read or instruction fetch, the scdip[63:0] signal lines are sampled on the clock edge as the ready input scbrdyn is asserted. the data output enable signal scdoen then indicates the read/write direction of the transaction and controls the three-state buffers external to the CW4010. asserting scbrdyn terminates the transaction. at the same time, the size input bus signal scb32n is sampled. according to the input, the bus interface unit (biu) of the CW4010 determines the valid byte positions for the read transaction bus sizing. if scbrdyn is asserted for a doubleword transaction, the bus interface generates a subsequent transaction for bus sizing. the bus in-page write accept input scbpwan is also sampled in an in-page write transaction. if scbpwan is deasserted, the bus interface arbitrates bus requests even if the next transaction is a write transaction in the same memory page. if scbpwan is asserted, the bus interface does not arbitrate bus requests and the next transaction must be a write transaction in the same memory page. if scbpwan is asserted during the in-page write transaction but sctsen is deasserted, the next transaction is a write transaction in the same page. to perform an instruction fetch transaction, the CW4010 asserts scifetn during the same period as scaoen in order to monitor the transaction. 8.2.2 scbus burst transaction when an icache miss occurs, the instruction scheduler unit (isu) requests an 8-word (4-doubleword) block burst read. when a dcache miss occurs, the load store unit (lsu) requests an 8-word block burst read. the lsu also requests an 8-word block burst write for dcache writeback. figure 8.13 shows an eight-word burst read/write transaction that consists of four continuous transactions.
scbus interface behavior 8-23 figure 8.13 scbus eight-word burst transaction timing chart in the ?rst transaction, the burst transaction indicator signal, sctbstn, is asserted to indicate an 8-word burst transaction. subsequent transactions are single doublewords transactions. each transaction is terminated by an assertion of the bus ready signal, scbrdyn. the transaction start signal, sctsen, is asserted for each transaction. burst transactions can be suspended if sctsen is deasserted. the bus hold request signal, schrqn, is not accepted during a burst transaction if sctsen is not deasserted when scbrdyn is asserted. schrqn is accepted if sctsen is asserted to insert one or more idle cycles when scbrdyn is asserted. sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctbln sctben[7:0] scbrdyn sctsen t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address address + 8 address + 16 address + 24 data data data data data data data data 1. read cycle. 2. write cycle. 3. high = read, low = write. md96.106
8-24 interface operation sctbln, which indicates whether the last transaction is a burst or a single transaction, is deasserted (high) at the ?rst, second, and third transactions of a four doubleword burst transaction. for a burst read transaction, the ?rst address is the missed address. the addresses of the subsequent transactions are rotative and wrap around ordering in the block. for a burst write transaction, the ?rst address is the beginning of the block and subsequent addresses are incremental. bus sizing for a burst transaction is available to allow the scbus to accomplish burst transactions to 32-bit width devices. the scb32n input must be asserted for each transaction of burst transactions. if 32-bit sizing is requested for a burst transaction, eight word transactions are generated. sctbln is deasserted from the ?rst to the sixth transaction. the in-page write never occurs if the transaction is a burst write. figure 8.14 shows a timing diagram for an eight-word burst transaction. if the bus slave of the transaction is a synchronous dram system, there are some wait cycles for the ?rst data transfer, but not for subsequent transfers. for a synchronous dram system, sctssn is asserted continuously for the second, third, and fourth data transfers. the dram controller generates addresses for these data transfers itself although scaop also outputs addresses.
scbus interface behavior 8-25 figure 8.14 scbus eight-word burst transaction sclkp scaop[31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctbln sctben[7:0] scbrdyn sctsen t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address1 address2 address3 address4 data1 data2 data3 data4 data1 data2 data3 data4 1. read cycle. 2. write cycle. 3. high = read, low = write. md96.107
8-26 interface operation figure 8.15 shows the ?rst and second transactions of an eight-word burst read/write. transactions are suspended when sctsen is deasserted. figure 8.15 scbus eight-word burst transaction timing chart if an individual transaction of a burst transaction is terminated with the deassertion of sctsen, this means the next transaction cannot proceed continuously. in that case, a hold request can be inserted. a hold request can also be inserted if a retry occurs while sctsen is deasserted during a burst transaction. sclkp scaop [31:0] scaoen scdip[63:0] 1 scdop[63:0] 2 scdoen 3 sctssn sctbstn sctben[7:0] sctsen scbrdyn t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address address + 8 data data data data 1. read cycle. 2. write cycle. 3. high = read, low = write. md96.108
scbus interface behavior 8-27 8.2.3 scbus in-page write transaction an in-page write transaction is one in which continuous write accesses are made to the same row and page in a given address area. most types of dram support this type of fast access, which is used to perform burst read/write transactions. the scbus supports continuous write transactions that have the same upper address. the external write buffer in the load store unit compares upper address bits of the current write request with those of the next write transaction in the buffer. it provides the bus interface with the result of the comparison. the address range is de?ned in the con?guration register of the CW4010. if the two addresses have the same upper range, the in-page write output sctpwn is asserted to inform the external bus slave. the in-page write accept input scbpwan must be asserted if the slave is able to accept in-page write transactions. if scbpwan is asserted, the interface does not arbitrate bus requests and the next transaction must be a write transaction. if scbpwan is deasserted, the bus interface performs the next transaction according to the arbitration result. scbpwan is sampled when the bus interface samples an assertion of the scbrdyn signal. the bus interface performs a write transaction if scbpwan is deasserted and there are no higher requests. the scbpwan input has no meaning if the transaction is not an in-page write, and it is ignored when sctpwn is deasserted. the bus interface does not count the number of continuous in-page write transactions. it continues in-page writes until the write buffer is empty, a write transaction is not in the same page address area, or scbpwan is deasserted. when the biu deasserts the transaction start enable signal, sctsen, the CW4010 inserts one or more bus idle states between two in-page write transactions. however, the bus interface does not arbitrate requests during this idle state if the slave accepts the in-page write transactions. a hold request is allowed if the biu deasserts sctsen. the bus interface does not accept the bus hold request during in-page write transactions if the biu receives an asserted sctsen continuously. figure 8.16 shows an example of in-page write transactions.
8-28 interface operation figure 8.16 scbus in-page write transaction timing chart (four words) sclkp scaop[31:0] scaoen scdop[63:0] 1 scdoen sctssn sctben[7:0] sctpwn sctsen scbrdyn scbpwan t1 t2 t3 t4 t5 t6 t7 t8 clock cycles address1 address2 address3 address4 data1 data2 data3 data4 1. write cycle. md96.109
scbus interface behavior 8-29 8.2.4 scbus bus hold there are two ways to hold scbus transactions: external logic asserts the CW4010 bus hold input schrqn. the CW4010 acknowledges the request by issuing the bus hold grant signal, schgtn. external logic deasserts the transaction start enable signal, sctsen. because there is no dedicated acknowledge signal associated with sctsen, the CW4010 deasserts the address output enable signal, scaoen, after the biu deasserts sctsen to show that the bus interface does not own the bus. the bus hold request signal, schrqn, cannot break in-page write transactions and read/write burst transactions if sctsen is asserted continuously. the biu can break the transactions when it deasserts sctsen. to avoid a bus deadlock, a bus retry is requested with each hold request. the current scbus transaction generated by the biu is then terminated by the retry and the hold request must be accepted. figure 8.17 shows the timing diagram for a bus hold request and the associated grant signal, schgtn. the CW4010 asserts the grant signal until the biu deasserts the request. during the period the bus is held, the CW4010 does not detect bus errors.
8-30 interface operation figure 8.17 scbus hold request and grant 8.2.5 scbus bus retry the bus retry signal, scbrtyn, is an input to the biu. it is asserted to abort a transaction and to allow the transaction to be restarted later. the transaction state control goes to the idle state then restarts a transaction when sctsen is asserted. bus retry is valid in a burst transaction. if scbrdyn and scbrtyn are asserted at the same time, scbrtyn has higher priority. if scbrtyn is asserted to hold the bus, schrqn should be asserted before or at the same time as scbrtyn. 8.2.6 scbus bus error the external bus controller asserts the biu bus error signal, scberrn, when the current transaction must be terminated as a bus error. if scbrdyn is asserted at the same time, scberrn has higher priority. assertion of scberrn forces the CW4010 to exit the sequential transactions of in-page write and read/write burst transactions. the states of service and transaction control go to the idle state. if the transaction is a burst (cache re?ll or writeback), the CW4010 invalidates the cache line. when a bus error occurs, the cp0 issues a bus error exception. see the section 8.1.5, bus error (scberrn) on page 8-7 for more details. a bus error exception is a fatal error for the CW4010. sclkp scaoen sctssn scbrdyn schrqn schgtn t1 t2 t3 t4 t5 t6 t7 t8 clock cycles t9 1. if biu has next transaction request, schrqn must be asserted before this cycle. 2. if biu has no next transaction request, schgtn is asserted immediately. 3. minimum 1 sclk. md96.110 12 3
scbus interface behavior 8-31 8.2.7 scbus bus sizing the scbus supports bus sizing for slaves that need sequential address access to 32-bit data. when sizing is requested, the scb32n input to the CW4010 is asserted to separate a doubleword transaction, including part of a burst transaction, into two singleword transactions. the bus interface also selects valid byte positions for a word or a partial word transaction if scb32n is asserted. in the case of a word or a partial write transaction, the bus interface outputs word data to both the upper and lower 32 bits of the data output bus according to address bit 2. the bus interface then completely supports a 32-bit bus interface. although scb32n is sampled with the assertion of the ready signal input, the bus interface behaves as a normal 32-bit data width bus if scb32n is always asserted. if 16-bit or 8-bit width bus sizing is needed, it must be supported outside the CW4010 core. 8.2.7.1 read bus sizing when sizing occurs at a byte, halfword, tribyte, or word during a read transaction, the CW4010 biu can move sampled 32-bit word data to the valid position according to the setting of address bit 2. if sizing is requested for a doubleword transaction, the biu samples 32-bit data at the ?rst transaction then generates a subsequent transaction and packs the ?rst 32 bits and the subsequent 32 bits. the packed data is sent to the instruction scheduler unit or load store unit. figure 8.18 shows the relationship between the valid byte positions of the ?rst and subsequent transactions. in the case of a non-doubleword read, the behavior of a byte, a halfword, and a tribyte transaction is the same as that of a word transaction because sizing supports 32-bit mode only. you can assume that the bus interface samples a doubleword (8 bytes). figure 8.18 shows an example in which the bus interface samples a doubleword (8 bytes). figure 8.18 sampled bytes of first and second transaction scbus data if you are reading a doubleword and doing bus sizing, you will need a second transaction. figure 8.19 shows the doubleword data that is sent to the instruction scheduler unit or load store unit. a1 b1 c1 d1 e1 f1 g1 h1 a2 b2 c2 d2 e2 f2 g2 h2 63 31 0 1st transaction 2nd transaction 1. the 2nd transaction is generated when the transaction is a doubleword or a part of a burst with scb32n = low. md96.111
8-32 interface operation figure 8.19 read bytes to isu and lsu with sizing 1. in example 1, one transaction is initiated. the eight bytes sampled are transferred to the isu and lsu without any change. 2. in example 2, one transaction is initiated. four bytes are sampled (bits [31:0]). they are transferred to bits [63:32] and [31:0] of the isu and lsu. 3. in example 3, two transactions are initiated. bits [31:0] of the ?rst transaction are output on bits [31:0], and bits [31:0] of the second transaction are output on bits [63:32]. this doubleword is transferred to the isu or the lsu. 8.2.7.2 write bus sizing in the case of a non-doubleword write transaction, the bus interface selects the upper or lower 32 bits of data from the load store unit and outputs the same 32-bit word data, which are valid bytes, to the scbus according to address bit 2. the data is output to the scbus before the bus interface detects the sizing input. in the case of a doubleword write transaction, the bus interface generates a subsequent sizing transaction if the sizing input is asserted at the ?rst transaction. figure 8.20 shows the relationship between the doubleword data from the load store unit and the scbus. bytes shown in the shaded area have no meaning for the scbus write transaction. h1 g1 f1 e1 h1 g1 f1 e1 address 2 = 0, 1 2-1) type = byte, half, tri, word 63 0 scb32n = low h1 g1 f1 e1 h2 g2 f2 e2 address 2 = 0 2-2) type = doubleword 63 0 scb32n = low h1 g1 f1 e1 d1 c1 b1 a1 address 2 = 0, 1 1-1) type = any 63 0 scb32n = high example 1 example 2 example 3 md96.112
scbus interface behavior 8-33 figure 8.20 write bytes to the scbus with sizing 1. in example 1, one transaction is initiated. a doubleword from the lsu is output on the date bus without any changes. 2. in example 2, one transaction is initiated. bits [63:0] from the lsu are output on bits [63:32] and [31:0] of the data bus. 3. in example 3, one transaction is initiated. a doubleword from the lsu is output on the data bus without any change. 4. in example 4, two transactions are initiated. in the ?rst transaction, a doubleword from the lsu is output on the data bus without any change. in the second transaction, bits [63:32] from the ?rst transaction are output to bits [31:0] of the data bus. as shown in figure 8.21 , you can assume that the lsu sends a doubleword, regardless of the transaction type. figure 8.21 write data bytes from lsu h g f e d c b a address 2 = 0 1-1) type = byte/half/tri/word little endian 63 0 scb32n = high/low h g f e d c b a address 2 = 0 2-1) type = double 63 0 scb32n = high d c b a d c b a address 2 = 1 63 0 h g f e d c b a address 2 = 0 2-2) type = double 63 0 scb32n = low 1st d c b a d c b a address 2 = 1 2nd example 1 example 2 example 3 example 4 md96.113 a b c d e f g h 63 31 0 md96.114
8-34 interface operation 8.2.8 scbus bus lock the CW4010 sclockn output signal indicates that the scbus is asking to lock ownership. the CW4010 asserts sclockn when the CW4010 executes a loadlink instruction to start a read transaction in an uncached area or writethrough cached area. it deasserts the signal just before it executes a storeconditional instruction to start a write transaction. during the read transaction and the write transaction, the CW4010 asserts sclockn continuously. if an effective address for a loadlink instruction is in the writeback cached area, the CW4010 does not assert sclockn, even if it experiences a dcache miss. the subsequent storeconditional instruction does not generate a write transaction because it may hit the dcache. if a storeconditional instruction hits the dcache in a writeback cached area when sclockn is asserted, an incorrect condition occurs, and sclockn is deasserted without any bus transactions being executed. the effective virtual addresses of loadlink and store instructions must be in kseg1. additionally, a loadlink instruction and a storeconditional instruction must be used as a pair of instructions to the same address. while the CW4010 asserts sclockn, the bus interface does not exhibit any special behaviorfor example, it accepts hold requests. if a hold request is not accepted while the CW4010 is asserting sclockn, outside user logic must mask the hold request by asserting sclockn. figure 8.22 shows the timing behavior for locked transactions. if there are other transactions between the read transaction of a loadlink and write transaction of a storeconditional, the CW4010 asserts sclockn continuously.
scbus interface behavior 8-35 figure 8.22 scbus locked transaction 8.2.9 big endian con?guration the CW4010 can support big endian address ordering, although the default con?guration is little endian. to enable the big endian con?guration, the CW4010 straps the bendn input low. bendn aligns byte positions in a 32-bit word in the lsu. bits [31:0] must be swapped outside the CW4010 with bits [63:32], for both scdip and scdop, as shown in table 8.3 . the isu assumes the ?rst instruction (even slot) is fetched through scdip[31:0], and the second instruction (odd slot) is fetched through scdip[63:32]. the lsu aligns byte positions in a 32-bit word according to the state of bendn input. the lsu assumes the following: full word or partial word data is accessed through the input bus bits scdip[31:0] when address bit 2 is 0. output bus bits scdop[31:0] are accessed through scdip[63:32] or scdop[63:32], when address 2 is 1, regardless of the state of bendn. the byte enable output lines sctben[7:0] indicate valid bytes according to the byte address for both big-endian and little-endian address sclkp scaoen sctssn scdoen sclockn scbrdyn loadlink storeconditional t1 t2 t3 t4 t5 t6 t7 t8 clock cycles md96.115 read transaction write transaction
8-36 interface operation ordering. table 8.3 shows the relationship of sctben[7:0] to the scdip and scdop bit positions. table 8.3 sctben and valid scdp 8.3 ocabus interface behavior the CW4010 on-chip access (oca) bus enables access to on-chip modules at the cr stage without going through the scbus. section 7.4.6, ocabus interface signals on page 7-16 provides additional information about the bus. this section describes certain ocabus transactions and provides timing diagrams for them. these transactions include: a basic oca access rejection of an oca access an ocabus access with a stall at the ex stage of the CW4010 pipeline an ocabus access with a stall at the cr stage of the CW4010 pipeline an ocabus access with a stall request or wait state an ocabus access with a pipeline cancellation valid scdp bits sctben little endian big endian 0 [7:0] [63:56] 1 [15:8] [55:48] 2 [23:16] [47:40] 3 [31:24] [39:32] 4 [39:32] [31:24] 5 [47:40] [23:16] 6 [55:48] [15:8] 7 [63:56] [7:0]
ocabus interface behavior 8-37 8.3.1 basic ocabus transaction regardless of the type of load or store execution, address and size are output at the ex stage of the CW4010 pipeline, and exloadp or accstorep is asserted. the address bits (dvaddrp [31:0] ) need to be decoded to determine whether or not ocacceptp is asserted and the oca module can accept the oca transaction. typically, oca modules should be located as uncached devices, so that the virtual address is in kseg1. this is done by setting address bits [31:29] to 1 0 1. the address bus must be latched on the rising edge of the system clock, between the ex and cr stages, as shown in figure 8.23 . the size information provided by accsize is also latched at this time. refer to subsection entitled accsize[1:0] ocabus transaction size output on page 7-17 for more information on this subject. at the cr stage, write data is output on cptocdp provided that cptocen is asserted. if a read transaction is being executed, the cpfrcen signal is asserted and data on the cpfrcdp bus is sampled on the rising edge of the system clock between stages cr and wb. the ocacceptp signal must be asserted in the cr stage to inform the CW4010 that an oca transaction is in progress. the crvalidp signal is asserted to indicate that the cr stage is valid. if it is deasserted, write data must not be written and read data must not be sampled. the transaction is executed again later.
8-38 interface operation figure 8.23 typical ocabus transaction sclkp dvaddrp[31:0], accsize[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages va wd 1. write cycle. 2. read cycle. md96.116 rd
ocabus interface behavior 8-39 8.3.2 ocabus transaction rejected figure 8.24 shows the timing for an ocabus transaction that is rejected because ocacceptp is deasserted. this occurs when the virtual address is decoded and found not to be an address for an oca module. under these conditions, the CW4010 reads from the dcache, requests an scbus read transaction, and then writes data to the dcache write buffer or to a four-deep external write buffer. ocacceptp is the only signal that determines whether an oca transaction will take place. figure 8.24 ocabus transaction rejected by address decoder dvaddrp[31:0], accsize[1:0] va sclkp cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages wd 1. write cycle. 2. read cycle. md96.117 rd
8-40 interface operation 8.3.3 ocabus access with stall at ex stage figure 8.25 shows an example where pstalln is asserted at the ex stage of the CW4010 pipeline, causing all pipeline stages to enter a stall state. when this happens, dvaddrp[31:0], accsize[1:o], exloadp or accstorep are held during the stall cycles. figure 8.25 ocabus with stall at ex stage sclkp dvaddrp[31:0], accsize[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex ex ex cr wb stages wd rd va 1. write cycle. 2. read cycle. md96.118
ocabus interface behavior 8-41 8.3.4 ocabus access with stall at cr stage figure 8.26 shows an example where pstalln is asserted at the cr stage of the CW4010 pipeline causing all pipeline stages to enter a stall state. when this happens, data on the cptocdp bus, crvalidp, and ocacceptp are held. figure 8.26 ocabus access with stall at cr stage sclkp dvaddrp[31:0], accsize[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr cr cr wb stages rd va wd 1. write cycle. 2. read cycle. md96.119
8-42 interface operation 8.3.5 ocabus access with stall request figure 8.27 shows an example where the oca bus device needs to insert some wait cycles before a read or write operation. to request a pipeline stall, the processor asserts cpsreqn from the beginning of the cr stage and this causes pstalln to be asserted. cpsreqn must be asserted and deasserted early in the clock cycle, since it is one of the critical path signals. figure 8.27 ocabus access with stall request rd ex cr cr cr wb rd wd va rd sclkp cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] stages dvaddrp[31:0], accsize[1:0] 1. write cycle. 2. read cycle. md96.120
ocabus interface behavior 8-43 8.3.6 ocabus access with pipeline cancel figure 8.28 shows an example where a load or store instruction is cancelled by an exception. the exception is indicated when crvalidp is deasserted. when this happens, the write data must not be written into the oca module. the read data being transferred to the CW4010 core is ignored. the cancelled load or store instruction may be executed later. figure 8.28 ocabus access with pipeline cancel sclkp dvaddrp[31:0], accsize[1:0] cptocdp[31:0] 1 cpfrcdp[31:0] 2 cptocen or cpfrcen exloadp or accstorep crvalidp pstalln ocacceptp cpsreqn[3:1] rd ex cr wb stages va wd rd 1. write cycle. 2. read cycle. md96.121
8-44 interface operation 8.4 cache interface behavior when an external bus master writes data into main memory, it can invalidate the dcache and icache lines to maintain coherency between the main memory and the caches. the CW4010 has three signals to support this function. they are: cinvap[31:5] cache invalidate address bus bits dcinvsn dcache invalidate strobe icinvsn icache invalidate strobe when dcinvsn or icinvsn is asserted, the address on the cinvap bus is latched and the CW4010 starts an invalidation process. dcinvsn or icinvsn should be asserted for only one clock cycle. the dcache or icache line is invalidated when the cache physical address tag, whose line is valid, is coincident with the latched invalidate address. both the v bit and the wb bit are cleared. figure 8.29 shows the timing diagram for dcache invalidation implemented by bus snooping. in the ?rst clock cycle after dcinvsn is asserted, the load store unit asserts the stall request signal if the ex stage is a load/store instruction. in the second cycle, the dcache tag is read from the dcache and compared with the address of the latched dcinvap bus. if they match and the ex stage is a load/store instruction, the pipeline stall request is asserted. to avoid timing problems, pstalln may not be deasserted during the second cycle. in the third cycle, the v bit and wb bit of the dcache line are cleared and the line is invalidated. if the addresses do not match at the third cycle, the dcache is not accessed. the stall cycle signal pstalln is asserted at the third clock cycle even if the address and dcache tag do not match and the valid bit is not cleared.
cache interface behavior 8-45 figure 8.29 dcache invalidation by snooping sclkp cinvap[31:5] dcinvsn pstalln instruction 1 instruction 2 rd ex cr wb rd ex cr wb dcache tag address a inst 1 a 1st 2nd 3rd 4th dcache tag access write case 2. two stall cycles instruction 3 rd ex cr rd ex cr wb rd ex cr wb a inst 3 a 1st 2nd 3rd 4th read write rd ex cr wb ex rd case 1. no stall cycle load/store load/store read read read sclkp cinvap[31:5] dcinvsp pstalln instruction 1 instruction 2 dcache tag address dcache tag access instruction 3 rd ex cr wb rd ex cr a 1st 2nd 3rd 4th read rd ex cr case 3. one stall cycle (no invalidation) load/store - ex rd load/store wb ex rd inst 2 read a a a a md96.122
8-46 interface operation the load store unit does not do anything if the external bus master read data from main memory and the address is dirty-cached by the CW4010. in this case, you may use writethrough mode for the page. figure 8.30 shows timing for icache invalidation brought about by bus snooping. it needs a two-cycle stall if the invalidation address hits the tag or a one-cycle stall if it does not hit the tag. figure 8.30 icache invalidation by snooping 8.5 coprocessor interface behavior the CW4010 supports up to four coprocessors, including an internal system coprocessor assigned as coprocessor-0 (cp0). coprocessor-1 is usually a ?oating point coprocessor. coprocessors-2 and -3 are user- de?ned options. the coprocessor interface has three 32-bit buses. the ?rst bus is for an instruction code, the second one for data movement to a coprocessor, and the third for data movement from a coprocessor. the buses are common to all coprocessors. other control signals are divided into two categories: coprocessor private signals, and coprocessor common signals. there are two kinds of coprocessor instructions: coprocessor operation instructions and data movement instructions. the CW4010 outputs an instruction code on the instruction code bus and asserts an execution signal at an rd stage for both types of instructions. the CW4010 executes only one coprocessor instruction even if a pair of instructions has no con?ict. a coprocessor data movement instruction and a load/store instruction are not executed at the same time. sclkp cinvap[31:5] icinvsn pstalln 1st 2nd 3rd 4th 1st 2nd 3rd 4th case 2. 2 stall cycle (hit) case 1. 1 stall cycle (miss) md96.123 aa
coprocessor interface behavior 8-47 a coprocessor receives a data movement instruction at the rd stage, but data is moved at the cr stage, resulting in a one-delay slot for coprocessor data movement instructions. since the nop code is usually a shift instruction, a coprocessor and a coprocessor instruction are executed in the same clock cycle. to have a one-delay slot between two coprocessor operations, three nop codes must be inserted. otherwise, each coprocessor needs to have data bypassing for the cr to ex stage or wb to cr stage (wb to cr gives better results). if a coprocessor needs to support an lwcz (load word from coprocessor) instruction, a dcache miss may occur. the coprocessor cache-miss signal is asserted at the cr stage of the lwcz instruction and the pipeline enters a stall condition until valid data is read from the scbus. during the stall cycles, correct data is placed on the data bus with the ?xup cycle signal assertion. for simple coprocessor register access, the lwcz should not be used to perform memory data accesses. an lw (load word) instruction and a subsequent mtcz (move to coprocessor) can accomplish the same operation. coprocessor pipeline stages must enter a stall state when the CW4010 enters a stall state if the coprocessor is executing an instruction in the ex or cr stage. after the wb stage cycle of the CW4010 pipeline stage, the coprocessor does not need to enter stall because it is no longer cancelled. instruction executions must be cancelled if the instructions are in the ex or cr stage and the pipeline cancel signal is asserted. in this case, the even/odd slot instruction must be issued. if the coprocessor cannot accept subsequent instructions for a time after the ex stage, the coprocessor channel cpbusyn[3:1] signal should be asserted at the ex stage. the next coprocessor instruction is not started until cpbusyn[3:1] is deasserted. if a coprocessor asserts cpsreqn[3:1] (pipeline stall request), the CW4010 enters the stall condition immediately until cpsreqn[3:1] is deasserted. coprocessor related instructions that include cp0 instructions have exclusive instruction bits in the opcode ?eld. bits [31:28] of the opcode ?eld contain 0100 2 , 1100 2 , or 1110 2 . bits [27:26] de?ne the coprocessor number (0 to 3).
8-48 interface operation 8.5.1 coprocessor functional instruction there is only one coprocessor operation instruction. as shown in figure 8.31 , 25 bits of the coprocessor function ?eld de?ne coprocessor operations. figure 8.31 coprocessor functional instruction 8.5.2 data movement to and from cpu general purpose register instructions each coprocessor (apart from cp0) can have up to 32 general-purpose registers and 32 control registers. cp0 does not have any control registers. data in a cpu general purpose register selected by the rt ?eld can be moved from or to a coprocessor register selected by the rd ?eld and by bit 22, which selects between the general purpose and control register groups. the rt ?eld is always for a cpu register and the rd ?eld for a coprocessor register, regardless of the direction of data movement. figure 8.32 shows the instructions used for data movement. figure 8.32 cpu general purpose register data movement instructions 0 1 0 0 z z copz 1 31 26 25 cofun 0 coprocessor-z operation copz 24 co md96.124 0 1 0 0 z z copz mt 0 0 1 0 0 31 26 25 21 r t 20 16 15 11 10 r d 0 0 0 0 0 0 0 0 0 0 0 0 cpr [rd] ? gpr [rt] mtcz rt, rd 0 1 0 0 z z copz mf 0 0 0 0 0 31 26 25 21 r t 20 16 15 11 10 r d 0 0 0 0 0 0 0 0 0 0 0 0 gpr [rt] ? cpr [rd] mfcz rt, rd 0 1 0 0 z z copz cf 0 0 0 1 0 31 26 25 21 r t 20 16 15 11 10 r d 0 0 0 0 0 0 0 0 0 0 0 0 gpr [rt] ? ccr [rd] cfcz rt, rd 0 1 0 0 z z copz ct 0 0 1 1 0 31 26 25 21 r t 20 16 15 11 10 r d 0 0 0 0 0 0 0 0 0 0 0 0 ccr [rd] ? gpr [rt] ctcz rt, rd md96.125
coprocessor interface behavior 8-49 8.5.3 instructions moving data from or to memory coprocessor general purpose registers (not control registers), can move data directly to or from memory. the data is accessed through the dcache if the address is in a cacheable area and it achieves a cache hit. in addition, the effective virtual address is translated to a physical address if the system supports an mmu. instructions for data movement, shown in figure 8.33 are for coprocessors-1, -2, and -3 (cp1, cp2, and cp3) only. they are not supported for cp0. because the CW4010 load/store unit accesses the dcache and jtlb, there is a possibility of a dcache miss and a jtlb miss. when a dcache miss occurs for lwcz, the pipeline enters a stall condition until the data is available from the scbus. when a jtlb miss occurs, the pipeline cancel signal is asserted, and the coprocessor has to cancel the load operation at the cr stage. the CW4010 cancels the pipeline stages for other reasons. figure 8.33 memory data movement instructions 1 1 0 0 z z lwcz base 31 26 25 21 r t 20 16 15 0 cpr [rt] lwcz rt, offset(base) ? mem( offset + gpr[base] ) offset 1 1 1 0 z z swcz base 31 26 25 21 r t 20 16 15 0 cpr [rt] swcz rt, offset(base) ? mem( offset + gpr[base] ) offset md96.126
8-50 interface operation 8.5.4 branch on coprocessor condition instructions the CW4010 can branch to a target address, depending on the coprocessor condition input. the coprocessor interface has condition inputs for each coprocessor, and the branch unit examines these inputs. no coprocessor execution strobes are asserted. both false and true condition branch instruction sets are de?ned. the mips-ii instruction set has a non-branch slot type called likely . however, branching of mips-i instructions always requires a branch slot. the instructions associated with branching on coprocessor conditions are shown in figure 8.34 . figure 8.34 branch on coprocessor condition instructions 0 1 0 0 z z copz bc 0 1 0 0 0 31 26 25 21 bcf 20 16 15 0 if cpcondp = low (false) bczf offset then pc ? pc+offset offset 0 0 0 0 0 0 1 0 0 z z copz bc 0 1 0 0 0 31 26 25 21 bcfl 20 16 15 0 if cpcondp = low (false) bczfl offset then pc ? pc+offset offset 0 0 0 1 0 0 1 0 0 z z copz bc 0 1 0 0 0 31 26 25 21 bct 20 16 15 0 if cpcondp = high (true) bczt offset then pc ? pc+offset offset 0 0 0 0 1 0 1 0 0 z z copz bc 0 1 0 0 0 31 26 25 21 bctl 20 16 15 0 if cpcondp = high (true) bcztl offset then pc ? pc+offset offset 0 0 0 1 1 md96.127
coprocessor interface behavior 8-51 8.5.5 coprocessor operation (copz) when a coprocessor instruction is in the rd stage, the CW4010 coprocessor interface asserts one of the execution strobes, cpxstbn[3:1], with a valid instruction code on cpcodep[31:0]. the result of the operation must be written at the wb stage or later in the coprocessor pipeline, because the instruction in the ex and cr stages may be cancelled by an exception. if the cancel signal, pcancrn, is asserted before the wb stage of a coprocessor instruction, the instruction operation must be cancelled. the coprocessor busy signal must be asserted if the operating coprocessor cannot accept another instruction after the ?rst instruction begins executing. it is not necessary for the coprocessor to enter the stall condition when pstalln is asserted, except to keep the pipeline ?ow of the ex, cr, and wb stages. if pstalln is asserted before the wb stage and during data movement, the coprocessor pipeline must enter a stall cycle. figure 8.35 is a timing diagram for a normal copz instruction execution. the busy signal may be asserted. the next cpxstbn[3:1] is not asserted if the same coprocessor cpbusyn[3:1] is asserted. if cpbusyn[3:1] is not asserted, the execution strobe cpxstbn[3:1] of the same coprocessor is asserted continuously to ensure continuous execution of coprocessor instructions. in the example shown in figure 8.35 , the second coprocessor instruction is placed in a slip condition by the busy signal. figure 8.35 copz execution instead of asserting cpbusyn[3:1], a coprocessor can assert a pipeline stall request signal, cpsreqn[3:1], when the coprocessor cannot continue execution according to the results of instruction decoding. sclkp cpcodep[31:0] cpxstbn[3:1] cpbusyn[3:1] rd ex cr wb rd ex md96.128 cop2-1 cop2-2
8-52 interface operation cpsreqn[3:1] can assert the ex or cr stage, but CW4010 needs enough setup time for the stall request assertion/deassertion because it is a critical path. if a coprocessor asserts cpsreqn[3:1] at the cr stage, there may be another coprocessor instruction in the ex stage. when cpsreqn[3:1] is asserted, pstalln is also immediately asserted. similarly, when cpsreqn[3:1] is deasserted, pstalln is also immediately deasserted. the ex, cr, and wb stages of all execution units, including all coprocessors, enter a stall condition. 8.5.6 data movement to and from cpu registers the coprocessor interface asserts one of the execution strobe signals, cpxstbn[3:1], with a valid instruction code on the cpcodep[31:0] lines. at the cr stage, cptocen is asserted with valid data on the cptocdp[31:0] bus when a mtcz or ctcz instruction is executed. in the case of a mfcz or cfcz instruction, cpfrcen is asserted, then the coprocessor outputs data on the cpfrcdp[31:0] at the cr stage. coprocessor data movement instructions to or from cpu registers always have one delay slot even if the busy signal is not asserted at the ex stage. when moving data from a coprocessor, data is valid at the cr stage and not valid at the ex stage. when moving data to a coprocessor, the CW4010 assumes the written data in the coprocessor is valid from the end of the cr stage, but it must be written into a coprocessor register at the wb stage. when the assertion of the pstalln signal indicates a stall, the coprocessor pipeline stage from the ex and cr stages must stall in the same manner as the CW4010 core pipeline stage. figure 8.36 shows timing for a data movement instruction, with data transferring to and from the coprocessor. one data bus enable signal is asserted for each cycle.
coprocessor interface behavior 8-53 figure 8.36 data movement to/from cpu registers without stall cycles figure 8.37 shows the timing for the assertion of the stall signal at the ex stage. the coprocessor pipeline is in a stall state. sclkp cpcodep[31:0] cpxstbn cpbusyn[3:1] cptocdp[31:0] cptocen cpfrcdp[31:0] cpfrcen rd ex cr wb stages mf/tcp 1. data to coprocessor. 2. move to coprocessor. 3. data from coprocessor. 4. move from coprocessor. 1 2 3 4 md96.129
8-54 interface operation figure 8.37 data movement to/from cpu registers with stall cycles at ex stage figure 8.38 shows the timing for a stall signal assertion at the cr stage. the data to the coprocessor on the cptocdp[31:0] lines is valid at the clock edge between the cr and wb stages. the coprocessor must output valid data at the last cr stage when the stall condition is resolved. sclkp cpcodep[31:0] cpxstbn[3:1] cpbusyn[3:1] cptocdp[31:0] cptocen cpfrcdp[31:0] cpfrcen rd ex cr wb pstalln ex ex stages mf/tcp 1. data to coprocessor. 2. move to coprocessor. 3. data from coprocessor. 4. move from coprocessor. 1 2 3 4 md96.130
coprocessor interface behavior 8-55 figure 8.38 data movement to/from cpu registers with stall cycles at cr stage 8.5.7 data movement to or from memories the lwcz instruction moves data directly from memory to a coprocessor register. the swcz instruction moves data from the coprocessor register to memory. the CW4010 accesses the dcache if the address is cacheable. the CW4010 also accesses the jtlb if the address is mappable. jtlb hit/miss is checked at the cr stage. in the case of a tlb miss, the pipeline is cancelled at the cr stage for both lwcz and swcz. the data from memory must not be written into a coprocessor register until the wb stage; the pipeline is cancelled at the cr stage because of a jtlb miss. in the case of an lwcz instruction, the read data from dcache is output on the cptocdp[31:0] lines when cptocen is asserted at the cr stage. in the same cycle, dcache hit/miss is examined. if there is a cache hit, the data is written into the coprocessor register at the wb stage. if there is a cache miss, the coprocessor cache miss signal, cpmissn, is asserted at the cr stage and the pipeline stall signal, pstalln, is asserted from the wb stage. the pipeline then enters a stall state. when sclkp cpcodep[31:0] cpxstbn[3:1] cpbusyn[3:1] cptocdp[31:0] cptocen cpfrcdp[31:0] cpfrcen rd ex cr wb pstalln cr cr stages mf/tcp 1. data to coprocessor. 2. move to coprocessor. 3. data from coprocessor. 4. move from coprocessor. 1 2 3 4 md96.131
8-56 interface operation data is read from the memory, the data is output on the data bus cptocdp[31:0] lines with the ?xup cycle signal cpfixupn asserted. in the case of an swcz instruction, the coprocessor needs to output data on the cpfrcdp[31:0] lines at the cr stage. the timing is also indicated by the assertion of cpfrcen. dcache hit/miss is checked at the same cr stage. if there is a cache hit, data is stored in the dcache. if there is a cache miss, data is stored in the external write buffer in the load store unit. the CW4010 then generates a write transaction on the scbus. data transferred from the coprocessor interface to a coprocessor at the cr stage for an lwcz instruction needs at least one delay slot. if the data is not valid until the wb stage and the coprocessor does not have register bypassing hardware like the CW4010 cpu, the busy signal should be asserted to stop the next coprocessor operation. figure 8.39 shows timing for the lwcz instruction dcache hit and miss. the ?rst lwcz instruction hits the dcache, but the second instruction misses.
coprocessor interface behavior 8-57 figure 8.39 lwcz dcache-hit and miss sclkp cpcodep[31:0] cpxstbn 1 cpxstbn 2 cptocdp[31:0] cptocen pstalln instruction 1 instruction 2 rd ex cr wb rd ex cr wb wb wb wb wb cpmissn cpfixupn lwc1 lwc2 d1 1 d2 2 d2 3 1. dcache hit. 2. dcache miss. 3. valid data for instruction 2. (lwc1-hit) (lwc2-miss) md96.132
8-58 interface operation figure 8.40 shows the timing diagram for two swcz instructions. there is no difference in the coprocessor interface signals for a dcache hit or dcache miss. figure 8.40 swcz timing 8.5.8 coprocessor conditions coprocessor conditions are sampled on the rising edge of the clock at the end of the ex stage. condition inputs are active-high and must be synchronized with the system clock. 8.5.9 even/odd slot and pipeline cancel when an exception occurs, the CW4010 cancels pipeline stages at the cr stage. if a cancel occurs, the pipelines of the coprocessors must also be cancelled. since the CW4010 executes two instructions in even and odd slots, it must ?nd out whether the cause of the exception is in an even slot or odd slot. if the even slot is the cause of the exception, both even and odd instructions in the cr stage must be cancelled. if the odd slot is the cause, the even instruction in the cr stage must be completed and the odd instruction must be cancelled. instructions in the ex stage even and odd slots must be cancelled. coprocessors must know whether an executing instruction is in an even or odd slot at the rd stage. at the rd stage, the CW4010 issues only one coprocessor instruction even if there are two coprocessor instructions in even and odd slots. when the coprocessor execution signal, cpxstbn[3:1], is asserted, the coprocessor interface outputs the cpxoddn signal, which indicates that this instruction is in an even or odd slot. the coprocessor needs to keep this information in the ex and cr pipeline stages. sclkp cpcodep[31:0] cpxstbn[3:1] cpfrcdp[31:0] cpfrcen instruction 1 (swc2) instruction 2 (swc2) rd ex cr wb swc2 swc2 d1 d2 md96.133 rd ex cr wb
coprocessor interface behavior 8-59 when the pipeline cancel signal, pcancrn, is asserted, pcanoddn is valid, which indicates that the even instruction at the cr stage is the cause of the cancellation or that the odd instruction is the cause. coprocessors have to compare the pcanoddn signal and the information kept at the cr pipeline stage. if the instruction in a coprocessor cr stage is an even instruction (cpxoddn high at rd) and pcanoddn is asserted low (odd slot), the instruction must be completed (go to the wb stage). otherwise, the instruction must be cancelled. instructions in the ex stage must be cancelled. the CW4010 does not cancel instructions in the wb stage or later cycles. figure 8.41 shows an example of even/odd slot and pipeline cancellation. part (a) shows the case in which the instruction in the cr stage is not cancelled and part (b) shows the case where the instruction is cancelling. figure 8.41 even/odd slot and pipeline cancel sclkp cpxstbn[3:1] cpcodep[31:0] cpxoddn pcancrn pcanoddn cp instruction 1 cp instruction 2 rd ex cr wb rd ex cancelled t1 t2 t3 t4 rd ex cr rd ex cancelled cancelled t1 t2 t3 t4 case 1. instruction in cr stage is not cancelled. case 2. instruction in cr stage is cancelled md96.134 cp inst 1 cp inst 2 cp inst 2 cp inst 1
8-60 interface operation 8.5.10 branchlikely in even slot is false the mips-ii instruction set adds the branchlikely instructions. the instruction in the branch delay slot is executed only when the branch condition is true. if the branch condition is false, the delay slot instruction must be nulli?ed. in the case of a normal branch instruction, the instruction in the branch delay slot is executed regardless of the branch condition. because the CW4010 tries to execute two instructions in one clock cycle, the instruction in a branch delay slot is executed at the same time as a branchlikely instruction, if the branchlikely instruction is in an even slot. if the branch condition is false, the delay slot instruction, which is in the odd slot, must be cancelled. the coprocessor interface provides a signal, brlikfn, to cancel the instruction that is in a delay slot of a branchlikely instruction. this type of instruction resides in an even slot. if brlikfn is asserted, coprocessors need to cancel any instruction in the ex stage. although coprocessors keep information about even and odd slots so that they can cancel the instruction at the cr stage correctly, this information is not needed to examine the even and odd slots for brlikfn. if a branchlikely is in an odd slot and it is false, brlikfn is not asserted because the execution strobe of the branch delay slot is not asserted at the rd stage. figure 8.42 shows an example of bflikfn assertion. assuming that cp inst1, branchlikely, and cp inst2 are fetched continuously and that the branchlikely instruction is in an even slot, if the branch is not taken, cp inst2 must not be executed, and must be cancelled in the coprocessor according to the assertion of bflikfn.
coprocessor interface behavior 8-61 figure 8.42 branchlikely in even slot is false 8.5.11 ex stage suspension the CW4010 may request that the ex stage of a coprocessor be suspended when an icache miss occurs just after a cp instruction is executed. if the suspension signal, suspexn, is asserted and there is a valid instruction in the ex stage of the coprocessor, the ex stage must be suspended until suspexn is deasserted. instructions in other pipeline stages should not be stopped. figure 8.43 shows an example where two coprocessor instructions are executed continuously, but the second one is suspended at the ex stage. sclkp cpcodep[31:0] cpxstbn[3:1] cpxoddn brlikfn cp instruction 1 branchlikely rd ex cr wb rd ex cancelled t1 t2 t3 t4 cp instruction 2 rd ex (cr) cp inst 1 cp inst 2 md96.135
8-62 interface operation figure 8.43 ex stage suspension if the pipeline is cancelled by asserting pcancrn, the instruction in the ex stage must be cancelled even if suspexn is asserted on the same clock cycle, as shown in figure 8.44 . figure 8.44 ex stage suspension (cancelled) sclkp cpcodep[31:0] cpxstbn[3:1] suspexn cp instruction 1 cp instruction 2 rd ex cr wb rd ex ex ex ex cr wb pcancrn cp inst 1 cp inst 2 md96.136 rd ex cr rd ex cancelled cancelled sclkp cpcodep[31:0] cpxstbn[3:1] suspexn pcancrn cp instruction 1 cp instruction 2 cp inst 1 cp inst 2 md96.137
coprocessor interface behavior 8-63 8.5.12 floating-point unit exception the coprocessor interface has an exception input, fperrxn, assigned as a floating-point unit (fpu) exception. fperrxn is sampled at the ex stage and issued at the cr stage. it has its own exception code (15) in the cause register of cp0. it can also be used as a user-de?ned coprocessor exception input if necessary. although an fpu is usually assigned to cp1, the fpu exception is not related to the coprocessor number. fpeoddn is the coprocessor interface input for the fpu exception and is issued when fperrxn is asserted at the ex stage. when a coprocessor instruction is started at an rd stage, cpxoddn indicates whether the instruction is in the even slot (cpxoddn high) or the odd slot (cpxoddn low). the signal must be held in the ex and cr stages of the coprocessor pipeline so that the pipeline may be cancelled correctly. in addition, if the fperrxn signal is asserted in the ex stage of the coprocessor instruction, fpeoddn must be driven according to the even/odd status of the ex stage. if fpeoddn is high (even) when fperrxn is asserted, the CW4010 cancels instructions in both even and odd slots of the cr stage. if fpeoddn is low (odd), only the odd instruction is cancelled. if fperrxn is used for an user-de?ned coprocessor exception and is not asserted at the ex stage, the pipeline is not cancelled precisely and is treated in the same way as an interrupt exception if the fpeoddn is strapped high. figure 8.45 shows timing for a coprocessor instruction start, fpu exception, and pipeline cancel.
8-64 interface operation figure 8.45 fpu exception and pipeline cancel timing sclkp cpcodep[31:0] cpxstbn[3:1] cpxoddn fperrxn fpeoddn pcancrn pcanoddn rd ex cr (wb) rd ex cr (wb) 1. cop1: coprocessor instruction is in the even slot. 2. cop1: coprocessor instruction is in the odd slot. cop1 1 cop1 2 md96.138
9-1 chapter 9 speci?cations the CW4010 core speci?cations are available in an addendum entitled minirisc? CW4010 superscalar microprocessor core technical manual addendum . the addendum includes information on ac timing, input and output loading, driver type, and power consumption.
9-2 speci?cations
a-1 appendix a programmers notes this appendix contains information that will be useful if you are writing software for the CW4010 core. the information is arranged in functional groups: instruction related, cp0 or tlb related, and cache related. a.1 instruction related the following are instruction related notes: the instruction prior to an eret must not generate an exception. you can use a nop (no operation) to make sure that this restriction is met. the waiti instruction must be followed by at least one nop. trap instructions must not be placed in branch delay slots. a.2 cp0 or tlb related the following are cp0 or tlb related notes: when the CW4010 is operating in r3000 exception compatibility mode, the rfe (restore from exception) instruction clears the ll (load linked) bit. this is consistent with r4000 mode eret operation. if a tlb is not present or enabled in the system, cp0 will re?ect a coprocessor unusable exception if an attempt is made to execute any of the tlb maintenance instructions: tblp, tlbr, tlbwi, tlbwr. tlb instructions (tlbp, tlbr, tlbwi, tlbwr) cannot be preceded or followed by a data access instruction (load or store) that requires target address translation, that is kseg, kseg2. the instruction prior to a tlbwi or tlbwr instruction must not gen- erate an exception. you can use a nop to make sure that this restriction is met.
a-2 programmers notes three instructions are required between a mtc0 instruction that tar- gets any of the tlb support registers (that is, entryhi, entrylo, pagemask, and index) and a subsequent tlbwi or tlbwr instruc- tion. this ensures that the results of the prior mtc0 instruction will be seen by the tlb write operation five instructions are required between a mtc0 status register oper- ation that updates the coprocessor usability ?eld (status[31:28]) and a subsequent coprocessor instruction that expects to see the updated value. seven instructions are required between a mtc0 epc register oper- ation and a subsequent eret instruction that expects to see the updated value. a.3 cache related the following is a cache related note: when the CW4010 is operating in isolate cache mode, load and store operations to the cache are not allowed in the delay slot of branchlikely instructions. a.4 cw33300 compatible debug extensions the following is a cw33300 compatible debug extension note: the existing cw33300 has some extensions to the cp0 that provide enhanced debugging and exception handler support. the CW4010 core remains compatible with these enhancements. refer to the cw33300 enhanced self-embedding processor core users manual for further information.
glossary big-endian this is a method of data formatting in which each ?eld is addressed by referring to its most signi?cant byte. this means that if you are accessing a four-byte, singleword, the most signi?cant byte is byte 03, and the most signi?cant bit is bit 31. see also little-endian. bus sizing refers to the ability of the processor to support and interface with data buses of different sizes. bus snooping this is the method used by the cache controller to monitor memory accesses performed by other bus masters. direct-map caching in a direct-mapped cache, each memory location is mapped to one position in the cache. direct mapping is useful if you are storing small loops and sequential operations. you can use this type of caching for both the dcache and the icache. encrypted encrypted ?les are source code ?les that have been processed in a language such as hdl or verilog so that they are only machine readable. this process enables you to have access to the behavior of the ?les but not to the intellectual property associated with them. fixup cycle this is a clock cycle during which the load miss data is funneled back to the instruction that requires it. little-endian this is a method of data formatting in which each ?eld is addressed by referring to its least signi?cant byte. this means that if you are accessing a four-byte, singleword, the most signi?cant byte is byte 00, and the most signi?cant bit is bit 00. the CW4010 supports little- endian format. see also big-endian. placement algorithms information is placed in a cache using placement algorithms. these algorithms de?ne the positions in the cache where the information from a particular memory location may be stored. the CW4010 uses two types of algorithm, direct mapping and two-way set associative mapping.
glossary slip condition a slip condition occurs when the pipeline stalls after the ex stage. in this situation, the previous instruction is executed and clears the pipeline. however, the earlier stages of the pipeline are stalled. two-way set associative caching in a two-way set associative cache, each memory location is stored in one of two possible positions. two-way set associative mapping is well-suited for data references, which tend to be more scattered than instructions. you can use this type of caching for both the dcache and the icache. unencrypted files that are unencrypted have not been subjected to the processing described in the entry encrypted. these ?les are human readable and can be written using a text editor. verilog model verilog is an open standard language. a verilog model represents a design in the language. it provides no indication of the level of abstraction.
customer feedback we would appreciate your feedback on this document. please copy the following page, add your comments, and fax it to us at the address on the following page. if appropriate, please also fax copies of any marked-up pages from this document. impor tant: please include your name, phone number, fax number, and company address so that we may contact you directly for clari?cation or additional information. thank you for your help in improving the quality of our documents.
customer feedback readers comments fax your comments to: lsi logic corporation technical publications m/s f-112 fax: 408.433.4333 please tell us how you rate this document: minirisc CW4010 supersca- lar microprocessor core technical manual. place a check mark in the appropriate blank for each category. what could we do to improve this document? if you found errors in this document, please specify the error and page number. if appropriate, please fax a marked-up copy of the page(s). please complete the information below so that we may contact you directly for clari?cation or additional information. excellent good average fair poor completeness of information ____ ____ ____ ____ ____ clarity of information ____ ____ ____ ____ ____ ease of ?nding information ____ ____ ____ ____ ____ technical content ____ ____ ____ ____ ____ usefulness of examples and illustrations ____ ____ ____ ____ ____ overall manual ____ ____ ____ ____ ____ name date telephone title company name street city, state, zip department mail stop fax
u.s. distributors by state alabama huntsville hamilton hallmark tel: 800.633.2918 wyle electronics tel: 800.964.9953 arizona phoenix hamilton hallmark tel: 800.528.8471 wyle electronics tel: 602.804.7000 tempe hamilton hallmark tel: 602.414.7705 california culver city hamilton hallmark tel: 310.558.2000 irvine hamilton hallmark tel: 714.789.4100 wyle electronics tel: 714.789.9953 los angeles wyle electronics tel: 818.880.9000 rocklin hamilton hallmark tel: 916.624.9781 sacramento wyle electronics tel: 916.638.5282 san diego hamilton hallmark tel: 619.571.7540 wyle electronics tel: 619.565.9171 san jose hamilton hallmark tel: 408.435.3500 santa clara wyle electronics tel: 408.727.2500 woodland hills hamilton hallmark tel: 818.594.0404 colorado colorado springs hamilton hallmark tel: 719.637.0055 denver wyle electronics tel: 303.457.9953 englewood hamilton hallmark tel: 303.790.1662 connecticut cheshire hamilton hallmark tel: 203.271.2844 florida fort lauderdale hamilton hallmark tel: 305.484.5482 wyle electronics tel: 305.420.0500 largo hamilton hallmark tel: 800.282.9350 orlando wyle electronics tel: 407.740.7450 tampa/n. florida wyle electronics tel: 800.395.9953 winter park hamilton hallmark tel: 407.657.3317 georgia atlanta wyle electronics tel: 800.876.9953 duluth hamilton hallmark tel: 800.241.8182 illinois arlington heights hamilton hallmark tel: 708.797.7300 chicago wyle electronics tel: 708.620.0969 iowa carmel hamilton hallmark tel: 800.829.0146 kansas overland park hamilton hallmark tel: 800.332.4375 kentucky lexington hamilton hallmark tel: 800.235.6039 maryland baltimore wyle electronics tel: 410.312.4844 columbia hamilton hallmark tel: 800.638.5988 massachusetts boston wyle electronics tel: 800.444.9953 peabody hamilton hallmark tel: 508.532.3701 michigan plymouth hamilton hallmark tel: 313.416.5800 minnesota bloomington hamilton hallmark tel: 612.881.2600 minneapolis wyle electronics tel: 800.860.9953 missouri earth city hamilton hallmark tel: 314.291.5350 new jersey mt. laurel hamilton hallmark tel: 609.222.6400 no. new jersey wyle electronics tel: 201.882.8358 parsippany hamilton hallmark tel: 201.515.1641 new mexico alburquerque hamilton hallmark tel: 505293.5119 new york hauppauge hamilton hallmark tel: 516.737.7400 long island wyle electronics tel: 516.293.8446 rochester hamilton hallmark tel: 800.462.6440 north carolina raleigh hamilton hallmark tel: 919.872.0712 wyle electronics tel: 919.469.1502 ohio cleveland wyle electronics tel: 216.248.9996 dayton hamilton hallmark tel: 800.423.4688 wyle electronics tel: 513.436.9953 solon hamilton hallmark tel: 216.498.1100 toledo wyle electronics tel: 419.861.2622 worthington hamilton hallmark tel: 614.888.3313 oklahoma tulsa hamilton hallmark tel: 918.254.6110 oregon beaverton hamilton hallmark tel: 503.526.6200 portland wyle electronics tel: 503.643.7900 pennsylvania philadelphia wyle electronics tel: 800.871.9953 texas austin hamilton hallmark tel: 512.258.8848 wyle electronics tel: 800.365.9953 dallas hamilton hallmark tel: 214.553.4302 wyle electronics tel: 800.955.9953 houston hamilton hallmark tel: 713.787.8300 wyle electronics tel: 713.784.9953 san antonio wyle electronics tel: 210.697.2816 utah salt lake city hamilton hallmark tel: 801.266.2022 wyle electronics tel: 801.974.9953 washington redmond hamilton hallmark tel: 206.881.6697 seattle wyle electronics tel: 800.248.9953 wisconsin milwaukee wyle electronics tel: 800.867.9953 new berlin hamilton hallmark tel: 414.780.7200 dstributors with design resource centers
sales of?ces and design resource centers printed in usa 796.1k.tp.g printed on recycled paper iso 9000 certified lsi logic corporation corporate headquarters tel: 408.433.8000 fax: 408.433.8989 united states california irvine tel: 714.553.5600 fax: 714.474.8101 san diego tel: 619.635.1300 fax: 619.635.1350 silicon valley sales of?ce tel: 408.433.8000 fax: 408.433.7783 design center tel: 408.433.8000 fax: 408.433.2820 colorado boulder tel: 303.447.3800 fax: 303.541.0641 florida boca raton tel: 407.395.6200 fax: 407.394.2865 georgia atlanta tel: 404.395.3800 fax: 404.395.3811 illinois schaumburg tel: 708.995.1600 fax: 708.995.1622 kentucky bowling green tel: 502.793.0010 fax: 502.793.0040 maryland bethesda tel: 301.897.5800 fax: 301.897.8389 massachusetts waltham tel: 617.890.0180 fax: 617.890.6158 minnesota minneapolis tel: 612.921.8300 fax: 612.921.8399 new jersey edison tel: 908.549.4500 fax: 908.549.4802 new york new york tel: 716.223.8820 fax: 716.223.8822 north carolina raleigh tel: 919.783.8833 fax: 919.783.8909 oregon beaverton tel: 503.645.9882 fax: 503.645.6612 texas austin tel: 512.388.7294 fax: 512.388.4171 dallas tel: 214.788.2966 fax: 214.233.9234 houston tel: 713.379.7800 fax: 713.379.7818 washington bellevue tel: 206.822.4384 fax: 206.827.2884 international australia reptechnic pty ltd new south wales tel: 612.9953.9844 fax: 612.9953.9683 canada lsi logic corporation of canada inc alberta tel: 403.262.9292 fax: 403.262.9494 ontario ottawa tel: 613.592.1263 fax: 613.592.3253 toronto tel: 416.620.7400 fax: 416.620.5005 quebec pointe claire tel: 514.694.2417 fax: 514.694.2699 france lsi logic s.a. paris tel: 33.1.34.63.13.13 fax: 33.1.34.63.13.19 germany lsi logic gmbh munich tel: 49.89.4.58.33.0 fax: 49.89.4.58.33.108 stuttgart tel: 49.711.13.96.90 fax: 49.711.86.61.428 hong kong avt industrial ltd hong kong tel: 852.2428.0888 fax: 852.2401.2105 india logicad india private ltd bangalore tel: 91.80.526.2500 fax: 91.80.338.6591 israel lsi logic ramat hashron tel: 972.3.5.403741 fax: 972.3.5.403747 netanya tel: 972.9.657190 fax: 972.9.657194 italy lsi logic s.p.a. milano tel: 39.39.687371 fax: 39.39.6057867 japan lsi logic k.k. tokyo tel: 81.3.5463.7821 fax: 81.3.5463.7820 osaka tel: 81.6.947.5281 fax: 81.6.947.5287 korea lsi logic corporation of korea ltd seoul tel: 82.2.561.2921 fax: 82.2.554.9327 singapore desner electronics pte ltd singapore tel: 65.285.1566 fax: 65.284.9466 electronic resources ltd tel: 65.298.0888 fax: 65.298.1111 spain lsi logic s.a. madrid tel: 34.1.3672200 fax: 34.1.3673151 sweden lsi logic ab stockholm tel: 46.8.7034680 fax: 46.8.7506647 switzerland lsi logic sulzer ag brugg/biel tel: 41.32.536363 fax: 41.32.536367 taiwan lsi logic asia-paci?c regional of?ce taipei tel: 886.2.718.7828 fax: 886.2.718.8869 jeilin technology corporation tel: 886.2.248.4828 fax: 886.2.248.9765 united kingdom lsi logic europe plc bracknell tel: 44.1344.426544 fax: 44.1344.481039 sales of?ces with design resource centers

▲Up To Search▲

Price & Availability of CW4010

	To Download CW4010 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .