![]() |
|
If you can't view the Datasheet, Please click here to try to view without PDF Reader . |
|
Datasheet File OCR Text: |
user? manual ? mips technologies, inc. 1994 printed in japan 64-bit microprocessor v r 4300 tm , v r 4305 tm , v r 4310 tm m pd30200 m pd30210 document no. u10504ej7v0umj1 (7th edition) date published august 2000 n cp(k) 1996, 1998
2 user s manual u10504ej7v0um00 [memo] user?s manual u10504ej7v0um00 3 notes for cmos devices 1 precaution against esd for semiconductors note: strong electric field, when exposed to a mos device, can cause destruction of the gate oxide and ultimately degrade the device operation. steps must be taken to stop generation of static electricity as much as possible, and quickly dissipate it once, when it has occurred. environmental control must be adequate. when it is dry, humidifier should be used. it is recommended to avoid using insulators that easily build static electricity. semiconductor devices must be stored and transported in an anti-static container, static shielding bag or conductive material. all test and measurement tools including work bench and floor should be grounded. the operator should be grounded using wrist strap. semiconductor devices must not be touched with bare hands. similar precautions need to be taken for pw boards with semiconductor devices on it. 2 handling of unused input pins for cmos note: no connection for cmos device inputs can be cause of malfunction. if no connection is provided to the input pins, it is possible that an internal input level may be generated due to noise, etc., hence causing malfunction. cmos devices behave differently than bipolar or nmos devices. input levels of cmos devices must be fixed high or low by using a pull-up or pull-down circuitry. each unused pin should be connected to v dd or gnd with a resistor, if it is considered to have a possibility of being an output pin. all handling related to the unused pins must be judged device by device and related specifications governing the devices. 3 status before initialization of mos devices note: power-on does not necessarily define initial status of mos device. production process of mos does not define the initial operation status of the device. immediately after the power source is turned on, the devices with reset function have not yet been initialized. hence, power-on does not guarantee out-pin levels, i/o settings or contents of registers. device is not initialized until the reset signal is received. reset operation must be executed immediately after power-on for devices having reset function. v r series, v r 4300 series, v r 3000, v r 4000, v r 4100, v r 4200, v r 4300, v r 4305, v r 4310, and v r 4400 are trademarks of nec corporation. unix is a registered trademark licensed by x/open company limited in the us and other countries. mc68000 is a trademark of motorola inc. ibm370 is a trademark of international business machines corporation. iapx is a trademark of intel corporation. dec vax is a trademark of digital equipment corporation. mips is a registered trademark of mips technologies, inc. in the u.s.a. 4 users manual u10504ej7v0um00 exporting this product or equipment that includes this product may require a governmental license from the u.s.a. for some countries because this product utilizes technologies limited by the export control regulations of the u.s.a. m8e 00. 4 the information in this document is current as of october, 1999. the information is subject to change without notice. for actual design-in, refer to the latest publications of nec's data sheets or data books, etc., for the most up-to-date specifications of nec semiconductor products. not all products and/or types are available in every country. please check with an nec sales representative for availability and additional information. no part of this document may be copied or reproduced in any form or by any means without prior written consent of nec. nec assumes no responsibility for any errors that may appear in this document. nec does not assume any liability for infringement of patents, copyrights or other intellectual property rights of third parties by or arising from the use of nec semiconductor products listed in this document or any other liability arising from the use of such products. no license, express, implied or otherwise, is granted under any patents, copyrights or other intellectual property rights of nec or others. descriptions of circuits, software and other related information in this document are provided for illustrative purposes in semiconductor product operation and application examples. the incorporation of these circuits, software and information in the design of customer's equipment shall be done under the full responsibility of customer. nec assumes no responsibility for any losses incurred by customers or third parties arising from the use of these circuits, software and information. while nec endeavours to enhance the quality, reliability and safety of nec semiconductor products, customers agree and acknowledge that the possibility of defects thereof cannot be eliminated entirely. to minimize risks of damage to property or injury (including death) to persons arising from defects in nec semiconductor products, customers must incorporate sufficient safety measures in their design, such as redundancy, fire-containment, and anti-failure features. nec semiconductor products are classified into the following three quality grades: "standard", "special" and "specific". the "specific" quality grade applies only to semiconductor products developed based on a customer-designated "quality assurance program" for a specific application. the recommended applications of a semiconductor product depend on its quality grade, as indicated below. customers must check the quality grade of each semiconductor product before using it in a particular application. "standard": computers, office equipment, communications equipment, test and measurement equipment, audio and visual equipment, home electronic appliances, machine tools, personal electronic equipment and industrial robots "special": transportation equipment (automobiles, trains, ships, etc.), traffic control systems, anti-disaster systems, anti-crime systems, safety equipment and medical equipment (not specifically designed for life support) "specific": aircraft, aerospace equipment, submersible repeaters, nuclear reactor control systems, life support systems and medical equipment for life support, etc. the quality grade of nec semiconductor products is "standard" unless otherwise expressly specified in nec's data sheets or data books, etc. if customers wish to use nec semiconductor products in applications not intended by nec, they must contact an nec sales representative in advance to determine nec's willingness to support a given application. (note) (1) "nec" as used in this statement means nec corporation and also includes its majority-owned subsidiaries. (2) "nec semiconductor products" means any semiconductor product developed or manufactured by or for nec (as defined above). 5 users manual u10504ej7v0um00 regional information some information contained in this document may vary from country to country. before using any nec product in your application, piease contact the nec office in your country to obtain a list of authorized representatives and distributors. they will verify: device availability ordering information product release schedule availability of related technical literature development environment specifications (for example, specifications for third-party tools and components, host computers, power plugs, ac supply voltages, and so forth) network requirements in addition, trademarks, registered trademarks, export restrictions, and other legal issues may also vary from country to country. nec electronics inc. (u.s.) santa clara, california tel: 408-588-6000 800-366-9782 fax: 408-588-6130 800-729-9288 nec electronics (germany) gmbh duesseldorf, germany tel: 0211-65 03 02 fax: 0211-65 03 490 nec electronics (uk) ltd. milton keynes, uk tel: 01908-691-133 fax: 01908-670-290 nec electronics italiana s.r.l. milano, italy tel: 02-66 75 41 fax: 02-66 75 42 99 nec electronics (germany) gmbh benelux office eindhoven, the netherlands tel: 040-2445845 fax: 040-2444580 nec electronics (france) s.a. velizy-villacoublay, france tel: 01-30-67 58 00 fax: 01-30-67 58 99 nec electronics (france) s.a. madrid office madrid, spain tel: 91-504-2787 fax: 91-504-2860 nec electronics (germany) gmbh scandinavia office taeby, sweden tel: 08-63 80 820 fax: 08-63 80 388 nec electronics hong kong ltd. hong kong tel: 2886-9318 fax: 2886-9022/9044 nec electronics hong kong ltd. seoul branch seoul, korea tel: 02-528-0303 fax: 02-528-4411 nec electronics singapore pte. ltd. united square, singapore tel: 65-253-8311 fax: 65-250-3583 nec electronics taiwan ltd. taipei, taiwan tel: 02-2719-2377 fax: 02-2719-5951 nec do brasil s.a. electron devices division guarulhos-sp brasil tel: 55-11-6462-6810 fax: 55-11-6462-6829 j00.7 6 user s manual u10504ej7v0um00 major revisions in this edition the mark shows major revised points. page description p.33 1.1 characteristics correction of description p.35 1.4.1 internal block configuration correction of description p.166 6.3.5 status register (12) correction of description p.198 6.4.17 watch exception correction and addition of description p.244 8.2.7 unimplemented operation exception (e) addition of description p.254 9.3.1 power modes correction of description pp.259, 260 10.2 basic system clocks correction of description p.264 10.4 low power mode operation correction of description p.360 15.1 features correction of description p.360 15.1.2 low power mode correction of description p.568 p.570 p.574 p.576 p.578 p.580 p.587 p.589 p.600 p.602 p.610 p.612 17.5 fpu instructions addition of description to the following instructions ceil.l.fmt ceil.w.fmt cvt.d.fmt cvt.l.fmt cvt.s.fmt cvt.w.fmt floor.l.fmt floor.w.fmt round.l.fmt round.w.fmt trunc.l.fmt trunc.w.fmt p.628 table a-1 differences between the v r 4300, v r 4305, and v r 4310 correction of description p.630 b.1.3 status register correction of description p.632 table b-1 differences in software correction of description p.634 b.2.2 system interface correction of description p.635 table b-2 differences in system design correction of description p.639 table b-3 other differences correction of description p.644 c.2.2 clock correction of description pp.647, 648 appendix d restrictions of v r 4300 addition user? manual u10504ej7v0um00 7 preface readers this manual targets users who intends to understand the functions of the v r 4300, v r 4305 ( m pd30200, v r 4310 ( m pd30210) and to design application systems using this microprocessor. purpose this manual introduces the architecture functions of the v r 4300, v r 4305, and v r 4310 to users, following the organization described below. organization this manual consists of the following contents: introduction pipeline operation memory management system and cache exception processing floating-point operation hardware instruction set details how to read this manual it is assumed that the readers of this manual has a general knowledge of electric engineering, logic circuits, and microcomputers. unless otherwise specified, v r 4300 is described as a representative product in this manual. when using this manual as that for v r 4305 or v r 4310, read as follows. v r 4300 ? v r 4305 v r 4300 ? v r 4310 the v r 4400 tm in this manual represents the v r 4000 tm . the v r 4000 series in this manual represents the v r 4100 tm , v r 4200 tm , v r 4300, v r 4305, v r 4310, and v r 4400. to learn about detailed function of a specific instruction, ? refer to chapter 3 cpu instruction set summary , chapter 7 floating-point operations , and chapter 17 fpu instruction set details . 8 user? manual u10504ej7v0um00 to learn about the overall functions of the v r 4300, ? read this manual in sequential order. to learn about electrical specifications of the v r 4300, ? refer to the data sheet which is separately available. conventions data significance: higher digits on the left and lower digits on the right active low (overscore over pin or signal name) representation: *: footnote for item marked with * in the text caution: information requiring particular attention remark: supplementary information numerical binary or decimal ... representation: hexadecimal ...........0 prefixes indicating power of 2 (address space, memory capacity): k (kilo) 2 10 = 1024 m (mega) 2 20 = 1024 2 g (giga) 2 30 = 1024 3 t (tera) 2 40 = 1024 4 p (peta) 2 50 = 1024 5 e (exa) 2 60 = 1024 6 related documents see also the following documents. the related documents indicated in this publication may include preliminary versions. however, preliminary versions are not marked as such. document name document number v r 4300, v r 4305, v r 4310 user? manual this manual m pd30200, 30210 data sheet u10116e v r series application note - programming guide u10710e v r 4000 series application note - simulation guide u11788j (japanese only) user s manual u10504ej7v0um00 9 contents chapter 1 general ................................................................................31 1.1 characteristics ............................................................................32 1.2 ordering information ................................................................33 1.3 64-bit architecture .....................................................................33 1.4 v r 4300 processor .......................................................................33 1.4.1 internal block configuration .......................................................35 1.4.2 cpu registers ..............................................................................37 1.4.3 cpu instruction set overview.....................................................39 1.4.4 data formats and addressing ......................................................41 1.4.5 system control coprocessor (cp0) .............................................44 1.4.6 floating-point unit (fpu), cp1...................................................47 1.4.7 internal cache ..............................................................................47 1.5 memory management system (mmu) ....................................48 1.5.1 translation lookaside buffer (tlb) ...........................................48 1.5.2 operating modes ..........................................................................49 1.6 instruction pipeline ....................................................................49 chapter 2 pin functions ....................................................................51 2.1 pin configuration (top view) ...................................................52 2.2 pin functions ..............................................................................54 2.2.1 system interface signals ..............................................................54 2.2.2 clock/control interface signals ...................................................55 2.2.3 interrupt interface signals............................................................57 2.2.4 joint test action group (jtag) interface signals......................58 2.2.5 initialization interface signals .....................................................58 chapter 3 cpu instruction set summary .................................59 3.1 cpu instruction formats ..........................................................60 3.2 instruction classes .....................................................................61 3.2.1 load/store instructions ................................................................61 3.2.2 computational instructions ..........................................................68 10 user s manual u10504ej7v0um00 3.2.3 jump/branch instructions.............................................................77 3.2.4 special instructions ......................................................................81 3.2.5 coprocessor instructions ..............................................................83 3.2.6 system control coprocessor (cp0) instructions..........................86 chapter 4 pipeline ................................................................................89 4.1 general ........................................................................................90 4.1.1 pipeline operations ......................................................................92 4.2 branch delay ...............................................................................94 4.3 load delay ..................................................................................95 4.4 pipeline operation ......................................................................95 4.5 interlock and exception handling ..........................................103 4.6 pipeline interlocks and exceptions .........................................106 4.6.1 pipeline interlocks ......................................................................106 4.6.2 instruction tlb miss (itm) ......................................................107 4.6.3 instruction cache busy (icb) ....................................................108 4.6.4 multicycle instruction interlock (mci)......................................109 4.6.5 load interlock (ldi) ..................................................................110 4.6.6 data cache miss (dcm) ............................................................111 4.6.7 data cache busy (dcb) ............................................................111 4.6.8 cache operation (cop) ..........................................................112 4.6.9 coprocessor 0 bypass interlock (cp0i) .....................................113 4.7 pipeline exceptions ...................................................................114 4.7.1 instruction-independent exceptions (reset, nmi, and interrupt) ........................................................114 4.7.2 instruction-dependent exceptions .............................................115 4.7.3 interactions between interlocks and exceptions ........................115 4.7.4 exception and interlock priorities ..............................................116 4.7.5 wb-stage interlock and exception priorities ............................117 4.7.6 dc-stage interlock and exception priorities .............................117 4.7.7 ex-stage interlock and exception priorities .............................118 4.7.8 rf-stage interlock and exception priorities..............................118 4.7.9 bypassing ...................................................................................119 4.8 code compatibility ..................................................................119 4.9 write buffer ..............................................................................120 user s manual u10504ej7v0um00 11 contents chapter 5 memory management system .................................121 5.1 translation lookaside buffer (tlb) ......................................122 5.2 memory management system architecture ..........................122 5.2.1 operating modes ........................................................................127 5.2.2 virtual addressing in user mode...............................................127 5.2.3 virtual addressing in supervisor mode.....................................129 5.2.4 virtual addressing in kernel mode ...........................................133 5.3 system control coprocessor ...................................................142 5.3.1 format of a tlb entry ...............................................................143 5.4 cp0 registers ............................................................................146 5.4.1 index register (0) .......................................................................146 5.4.2 random register (1)...................................................................147 5.4.3 entryhi (10), entrylo0 (2), entrylo1 (3), and pagemask (5) registers..............................................................148 5.4.4 wired register (6) ......................................................................150 5.4.5 processor revision identifier (prid) register (15)....................151 5.4.6 config register (16) ...................................................................151 5.4.7 load linked address (lladdr) register (17)...........................154 5.4.8 cache tag registers [taglo (28) and taghi (29)] ...................154 5.4.9 virtual-to-physical address translation process.......................155 5.4.10 tlb misses ................................................................................158 5.4.11 tlb instructions.........................................................................158 chapter 6 exception processing ...................................................159 6.1 exception processing operation .............................................160 6.2 precision of exceptions ............................................................161 6.3 exception processing registers ...............................................161 6.3.1 context register (4) ...................................................................163 6.3.2 badvaddr register (8)...............................................................164 6.3.3 count register (9) ......................................................................164 6.3.4 compare register (11) ...............................................................165 6.3.5 status register (12) ....................................................................165 6.3.6 cause register (13) ....................................................................171 6.3.7 exception program counter (epc) register (14) ......................174 6.3.8 watchlo (18) and watchhi (19) registers................................175 12 user s manual u10504ej7v0um00 6.3.9 xcontext register (20)...............................................................176 6.3.10 parity error (perr) register (26) ................................................178 6.3.11 cache error (cacheerr) register (27)........................................178 6.3.12 error exception program counter (error epc) register (30) ...............................................................................179 6.4 exception details ......................................................................180 6.4.1 exception types .........................................................................180 6.4.2 exception vector locations .......................................................180 6.4.3 priority of exceptions.................................................................182 6.4.4 cold reset exception .................................................................183 6.4.5 soft reset exception ..................................................................184 6.4.6 non-maskable interrupt (nmi) exception.................................185 6.4.7 address error exception ............................................................186 6.4.8 tlb exceptions..........................................................................187 6.4.9 bus error exception ...................................................................190 6.4.10 system call exception ...............................................................191 6.4.11 breakpoint exception .................................................................192 6.4.12 coprocessor unusable exception...............................................193 6.4.13 reserved instruction exception..................................................194 6.4.14 trap exception ...........................................................................195 6.4.15 integer overflow exception .......................................................196 6.4.16 floating-point exception............................................................197 6.4.17 watch exception ........................................................................198 6.4.18 interrupt exception.....................................................................199 6.5 exception handling and servicing flowcharts .....................200 chapter 7 floating-point operations .........................................207 7.1 overview ....................................................................................208 7.2 fpu programming model .......................................................208 7.2.1 floating-point general purpose register (fgr)........................208 7.2.2 floating-point registers (fpr) ..................................................210 7.2.3 floating-point control registers (fcrs) ...................................211 7.2.4 control/status register (fcr31) ...............................................211 7.2.5 implementation/revision register (fcr0)................................216 7.3 floating-point formats ............................................................217 user s manual u10504ej7v0um00 13 contents 7.4 fixed-point format ..................................................................220 7.5 fpu set overview .....................................................................221 7.5.1 floating-point load/store/transfer instructions........................221 7.5.2 convert instructions ...................................................................224 7.5.3 computational instructions ........................................................226 7.5.4 compare instructions..................................................................227 7.5.5 fpu branch instructions ............................................................229 7.5.6 fpu instruction execution time................................................230 7.6 fpu pipeline synchronization .................................................233 chapter 8 floating-point exceptions .........................................235 8.1 types of exceptions ..................................................................236 8.2 exception processing ................................................................237 8.2.1 flags ...........................................................................................238 8.2.2 inexact exception (i) ..................................................................240 8.2.3 invalid operation exception (v) ................................................240 8.2.4 divide-by-zero exception (z)....................................................241 8.2.5 overflow exception (o) .............................................................242 8.2.6 underflow exception (u) ...........................................................242 8.2.7 unimplemented operation exception (e) ..................................243 8.3 saving and returning state .....................................................244 8.4 handling of ieee754 exceptions ............................................245 chapter 9 initialization interface ................................................247 9.1 functional overview ................................................................248 9.2 reset signal description ..........................................................249 9.2.1 power-on reset.........................................................................249 9.2.2 cold reset ..................................................................................250 9.2.3 soft reset....................................................................................251 9.3 v r 4300 processor modes .........................................................254 9.3.1 power modes ..............................................................................254 9.3.2 privilege modes..........................................................................255 9.3.3 floating-point registers .............................................................255 9.3.4 reverse endianness ....................................................................256 14 user s manual u10504ej7v0um00 9.3.5 instruction trace support ...........................................................256 9.3.6 bootstrap exception vector (bev)............................................256 9.3.7 interrupt enable (ie)...................................................................256 chapter 10 clock interface ..............................................................257 10.1 signal terminology ..................................................................258 10.2 basic system clocks .................................................................259 10.3 system timing parameters ......................................................263 10.3.1 synchronization with sclock .....................................................263 10.3.2 synchronization with masterclock ............................................263 10.3.3 phase-locked loop (pll) .........................................................263 10.4 low power mode operation ...................................................264 10.5 connecting clocks to a phase-locked system ......................265 10.6 connecting clocks to a system without phase locking .......266 10.6.1 connecting to a gate-array device ...........................................266 10.6.2 connecting to a cmos discrete device....................................269 chapter 11 cache memory ...............................................................273 11.1 memory organization ..............................................................274 11.2 cache organization ..................................................................275 11.2.1 organization of the instruction cache (i-cache) .......................276 11.2.2 organization of the data cache (d-cache)................................277 11.2.3 accessing the caches .................................................................278 11.3 cache operations .....................................................................279 11.3.1 cache write policy.....................................................................280 11.3.2 data cache line replacement ...................................................280 11.3.3 instruction cache line replacement..........................................282 11.4 cache states ..............................................................................283 11.5 cache state transition diagrams ...........................................283 11.5.1 data cache state transition .......................................................284 11.5.2 instruction cache state transition .............................................285 11.6 manipulation of the caches by an external agent ...............285 user s manual u10504ej7v0um00 15 contents chapter 12 system interface ............................................................287 12.1 terminology ..............................................................................288 12.2 system interface description ...................................................289 12.2.1 physical addresses .....................................................................289 12.2.2 interface buses ...........................................................................291 12.2.3 address and data cycles............................................................292 12.2.4 issue cycles ................................................................................293 12.2.5 handshake signals......................................................................295 12.3 system interface protocols ......................................................296 12.3.1 master and slave states..............................................................296 12.3.2 moving from master to slave state............................................297 12.3.3 external arbitration....................................................................297 12.3.4 uncompelled change to slave state ..........................................298 12.4 processor and external requests ............................................298 12.4.1 processor requests .....................................................................300 12.4.2 processor read request .............................................................301 12.4.3 processor write request.............................................................301 12.4.4 external requests .......................................................................302 12.4.5 external write request...............................................................303 12.4.6 read response............................................................................303 12.5 handling requests ....................................................................304 12.5.1 fetch miss ..................................................................................304 12.5.2 load miss ...................................................................................304 12.5.3 store miss...................................................................................304 12.5.4 loads or stores to uncached area .............................................305 12.5.5 cache instructions...................................................................305 12.6 processor request and external request protocols ..............306 12.6.1 processor request protocols.......................................................306 12.6.2 processor read request protocol...............................................306 12.6.3 processor write request protocol ..............................................309 12.6.4 flow control of processor request............................................311 12.6.5 external request protocols.........................................................312 12.6.6 external arbitration protocol .....................................................313 12.6.7 external write request protocol ................................................316 12.6.8 external read response protocol ..............................................317 12.7 successive processing of request ............................................321 16 user s manual u10504ej7v0um00 12.7.1 successive processor write requests ........................................321 12.7.2 processor write request followed by processor read request ..............................................................................322 12.7.3 processor read request followed by processor write request .............................................................................323 12.7.4 processor write request followed by external write request .............................................................................324 12.8 discarding and re-executing commands .............................325 12.8.1 re-execution of processor commands ......................................325 12.8.2 discarding and re-executing write command .........................325 12.8.3 discarding and re-executing read command..........................327 12.8.4 executing and discarding command.........................................328 12.9 data flow control ....................................................................330 12.9.1 independent transfer on sysad(31:0) bus ...............................331 12.9.2 system endianness .....................................................................331 12.10 system interface cycle time ...................................................332 12.10.1 release latency time ................................................................332 12.11 system interface commands and data identifiers ...............333 12.11.1 command and data identifier syntax ........................................333 12.11.2 system interface command syntax ...........................................334 12.11.3 read requests ............................................................................334 12.11.4 write requests............................................................................336 12.11.5 system interface data identifier syntax.....................................337 12.11.6 data identifier bit definitions....................................................337 12.12 system interface addresses .....................................................339 12.12.1 addressing conventions.............................................................339 12.12.2 sequential and subblock ordering.............................................339 chapter 13 jtag interface ..............................................................341 13.1 principles of boundary scanning ............................................342 13.2 signal summary ........................................................................343 13.3 jtag controller and registers ..............................................344 13.3.1 instruction register ....................................................................344 13.3.2 bypass register ..........................................................................345 13.3.3 boundary-scan register.............................................................346 13.3.4 test access port (tap) ..............................................................347 user s manual u10504ej7v0um00 17 contents 13.3.5 tap controller ...........................................................................348 13.3.6 controller reset..........................................................................348 13.3.7 controller states .........................................................................348 13.4 notes on implementation .........................................................350 chapter 14 interrupts .........................................................................351 14.1 non-maskable interrupt ..........................................................352 14.2 external normal interrupts ....................................................353 14.3 software interrupts ..................................................................354 14.4 timer interrupt ........................................................................354 14.5 generation of interrupt request signal .................................354 14.5.1 detection of hardware interrupts...............................................356 14.5.2 masking of interrupt request signals ........................................357 chapter 15 power management .....................................................359 15.1 features .....................................................................................360 15.1.1 normal power mode ..................................................................360 15.1.2 low power mode .......................................................................360 15.1.3 power off mode .........................................................................361 chapter 16 cpu instruction set details .....................................363 16.1 instruction notation conventions ...........................................364 16.2 load and store instructions ....................................................367 16.3 jump and branch instructions ................................................369 16.4 coprocessor instructions .........................................................369 16.5 system control coprocessor (cp0) instructions ...................370 16.6 cpu instructions ......................................................................370 16.7 cpu instruction opcode bit encoding ..................................544 chapter 17 fpu instruction set details ......................................547 17.1 instruction formats ..................................................................548 18 user s manual u10504ej7v0um00 17.2 instruction notation conventions ...........................................552 17.3 load and store instructions ....................................................553 17.4 floating-point computational instructions ...........................555 17.5 fpu instructions .......................................................................558 17.6 fpu instruction opcode bit encoding ...................................613 chapter 18 pll passive elements .................................................615 chapter 19 coprocessor 0 hazards ...............................................619 appendix a differences between the v r 4300, v r 4305, and v r 4310 ......................................................................627 appendix b differences from v r 4400 ...........................................629 b.1 differences in software ............................................................630 b.1.1 cache instruction ....................................................................630 b.1.2 cache parity................................................................................630 b.1.3 status register ............................................................................630 b.1.4 config register...........................................................................631 b.1.5 status of fcr31 on occurrence of unimplemented operation exception....................................................................................631 b.1.6 integer zero division .................................................................631 b.1.7 cache parity error exception.....................................................632 b.2 differences in system design ...................................................633 b.2.1 initialization of processor...........................................................633 b.2.2 system interface .........................................................................633 b.3 other differences ......................................................................636 b.3.1 cache size ..................................................................................636 b.3.2 tlb.............................................................................................636 b.3.3 floating-point unit.....................................................................637 b.3.4 pipeline .......................................................................................637 b.3.5 interrupt ......................................................................................638 b.3.6 kernel physical address segment configuration ......................638 b.3.7 jtag ..........................................................................................638 user s manual u10504ej7v0um00 19 contents appendix c differences from v r 4200 ...........................................641 c.1 differences in software ............................................................642 c.1.1 cache parity................................................................................642 c.1.2 status register ............................................................................642 c.1.3 config register...........................................................................642 c.1.4 cache parity error exception.....................................................643 c.2 differences in system design ...................................................644 c.2.1 system interface .........................................................................644 c.2.2 clock...........................................................................................644 c.2.3 package.......................................................................................645 c.3 other differences ......................................................................645 c.3.1 physical address ........................................................................645 c.3.2 write buffer................................................................................646 c.3.3 reset ...........................................................................................646 c.3.4 status(3:0) pins...........................................................................646 appendix d restrictions of v r 4300 ................................................647 appendix e index ...................................................................................649 list of figures (1/6) figure no. title page 20 user s manual u10504ej7v0um00 1-1 internal block diagram ................................................................. 34 1-2 cpu registers .................................................................................. 38 1-3 cpu instruction formats ............................................................. 39 1-4 big-endian byte ordering ............................................................ 41 1-5 little-endian byte ordering ........................................................ 41 1-6 big-endian data in a doubleword ............................................. 42 1-7 little-endian data in a doubleword ......................................... 42 1-8 misaligned word addressing ....................................................... 43 1-9 cp0 registers ................................................................................... 45 3-1 cpu instruction formats ............................................................. 60 3-2 byte access within a doubleword ............................................... 63 4-1 pipeline stages ................................................................................. 90 4-2 instruction execution in the pipeline ........................................ 91 4-3 pipeline operations ........................................................................ 92 4-4 branch delay .................................................................................... 94 4-5 add instruction pipeline operations ......................................... 97 4-6 jump and link register instruction pipeline operations ... 98 4-7 branch on equal instruction pipeline operations ................ 99 4-8 trap if less than instruction pipeline operations ............. 100 4-9 load word instruction pipeline operations ......................... 101 4-10 store word instruction pipeline operations ......................... 102 4-11 interlocks, exceptions, and faults ........................................... 103 4-12 correspondence of pipeline stage to interlock and exception condition ..................................................................... 104 4-13 instruction tlb miss interlock ................................................ 107 4-14 example of an instruction cache busy interlock ................ 108 4-15 example of a multicycle instruction interlock ..................... 109 4-16 example of a load interlock ..................................................... 110 4-17 example of a data cache miss followed by a load interlock .......................................................................................... 112 list of figures (2/6) figure no. title page user s manual u10504ej7v0um00 21 4-18 example of a coprocessor 0 bypass interlock (cp0i) ........ 113 4-19 execution and interlock priorities ........................................... 116 4-20 write buffer format .................................................................... 120 5-1 overview of a virtual-to-physical address translation ..... 123 5-2 32-bit mode virtual address translation ............................... 125 5-3 64-bit mode virtual address translation ............................... 126 5-4 user mode virtual address space ............................................ 128 5-5 supervisor mode address space ............................................... 130 5-6 kernel mode address space ...................................................... 134 5-7 details of xkphys field ................................................................. 135 5-8 cp0 registers and the tlb ........................................................ 142 5-9 tlb entry format ....................................................................... 143 5-10 tlb entry registers .................................................................... 144 5-11 index register ................................................................................ 146 5-12 random register ........................................................................... 147 5-13 wired register boundary ........................................................... 150 5-14 wired register ............................................................................... 150 5-15 processor revision identi?r register .................................... 151 5-16 con? register .............................................................................. 152 5-17 lladdr register ........................................................................... 154 5-18 taglo and taghi register .......................................................... 155 5-19 tlb address translation ............................................................ 157 6-1 context register ............................................................................ 163 6-2 badvaddr register ....................................................................... 164 6-3 count register ............................................................................... 164 6-4 compare register ......................................................................... 165 6-5 status register ............................................................................... 166 6-6 self-diagnostic status field ........................................................ 167 6-7 cause register ............................................................................... 171 6-8 epc register .................................................................................. 174 list of figures (3/6) figure no. title page 22 user s manual u10504ej7v0um00 6-9 watchlo and watchhi registers .............................................. 175 6-10 xcontext register ......................................................................... 176 6-11 perr register ................................................................................. 178 6-12 cacheerr register ........................................................................ 178 6-13 errorepc register ....................................................................... 179 6-14 general purpose exception handler ....................................... 201 6-15 tlb/xtlb miss exception handler ....................................... 203 6-16 cold reset, soft reset & nmi exception handler .............. 205 7-1 fpu registers ................................................................................. 209 7-2 control/status register bit assignments ................................ 211 7-3 control/status register (fcr31) cause, enable, and flag bit fields ........................................................................ 212 7-4 implementation/revision register ........................................... 216 7-5 single-precision floating-point format .................................. 217 7-6 double-precision floating-point format ................................ 217 7-7 32-bit fixed-point format .......................................................... 220 7-8 64-bit fixed-point format .......................................................... 220 7-9 dc-to-ex hardware interlock bypass ................................... 231 8-1 fcr31 cause/enable/flag bits ................................................. 237 9-1 power-on reset ............................................................................. 252 9-2 cold reset ........................................................................................ 252 9-3 soft reset .......................................................................................... 253 10-1 signal transitions .......................................................................... 258 10-2 clock-to-q delay ........................................................................... 258 10-3 when frequency ratio of masterclock to pclock is 1:1.5 ................................................................................ 261 10-4 when frequency ratio of masterclock to pclock is 1:2 ................................................................................... 262 10-5 phase-locked system ................................................................... 265 list of figures (4/6) figure no. title page user s manual u10504ej7v0um00 23 10-6 gate-array system without phase lock, using the v r 4300 processor ....................................................... 267 10-7 gate-array and cmos system without phase lock, using the v r 4300 processor ....................................................... 270 11-1 logical hierarchy of memory ................................................... 274 11-2 v r 4300 cache support ................................................................ 275 11-3 v r 4300 8-word i-cache line format .................................... 276 11-4 v r 4300 4-word data cache line format ............................. 277 11-5 cache data and tag organization ........................................... 278 11-6 data cache state diagram ......................................................... 284 11-7 instruction cache state diagram ............................................. 285 12-1 data sequence on instruction cache read request ........... 290 12-2 data sequence on data cache read request ....................... 290 12-3 system interface buses ................................................................ 291 12-4 eok signal status of processor request ............................... 293 12-5 address cycle extended by eok signal ................................ 294 12-6 system interface register-to-register operation ................ 296 12-7 requests and system events ...................................................... 299 12-8 processor request flow ............................................................... 300 12-9 external request flow ................................................................. 302 12-10 read response ............................................................................... 303 12-11 unforcible transition by processor read request .............. 308 12-12 delayed processor read request .............................................. 308 12-13 processor block write request (write data pattern: d) .............................................................. 310 12-14 processor block write request (write data pattern: dxx) .......................................................... 310 12-15 delayed processor read request .............................................. 311 12-16 delayed second processor write request .............................. 312 12-17 arbitration of external request ............................................... 314 12-18 bus arbitration of processor ...................................................... 315 12-19 external write request protocol .............................................. 317 list of figures (5/6) figure no. title page 24 user s manual u10504ej7v0um00 12-20 read request/read response protocol ................................... 318 12-21 block read response in slave status ...................................... 318 12-22 external write request following read response ............. 319 12-23 when external write request takes precedence while processor read request is pending ............................. 320 12-24 successive block write requests (write data pattern: d) ............................................................... 321 12-25 successive single write requests (write data pattern: dxx) .......................................................... 321 12-26 processor write request followed by processor read request (write data pattern: d) ................................... 322 12-27 processor single read request followed by block write request (write data pattern: d) .................................. 323 12-28 successive processor write requests followed by external write request (write data pattern: d) ................ 324 12-29 discarding and re-executing processor single write request ................................................................................ 326 12-30 discarding and re-executing processor single read request .................................................................................. 327 12-31 discarding bus mastership by external agent by processor request ......................................................................... 329 12-32 system interface command syntax bit de?ition .............. 334 12-33 read request syscmd(4:0) bus bit de?ition ..................... 334 12-34 write request syscmd(4:0) bus bit de?ition .................... 336 12-35 data identi?r syscmd(4:0) bus bit de?ition ................... 337 13-1 jtag boundary-scan cells ....................................................... 342 13-2 jtag interface signals and registers ..................................... 343 13-3 instruction register ...................................................................... 344 13-4 bypass register operation ......................................................... 345 13-5 output enable bit of boundary-scan register .................... 346 13-6 jtag test access port ................................................................. 347 14-1 nmi signal ...................................................................................... 353 14-2 interrupt register bits and enables bits ............................... 355 list of figures (6/6) figure no. title page user s manual u10504ej7v0um00 25 14-3 hardware interrupt request signals ...................................... 356 14-4 masking of interrupt requests ................................................. 357 16-1 v r 4300 opcode bit encoding .................................................... 544 17-1 load and store instruction format ......................................... 554 17-2 computational instruction format .......................................... 555 17-3 bit encoding for fpu instructions ........................................... 613 18-1 connection example of pll passive elements .................... 616 18-2 layout example of qfp and capacitor on pwb ................ 617 list of tables (1/4) table no. title page 26 user s manual u10504ej7v0um00 1-1 frequency ratio between pclock and masterclock ............ 35 1-2 system control coprocessor (cp0) register de?itions ..... 46 2-1 system interface signals ............................................................... 54 2-2 clock/control interface signals .................................................. 55 2-3 interrupt interface signals .......................................................... 57 2-4 jtag interface signals ................................................................. 58 2-5 initialization interface signals ................................................... 58 3-1 number of cycles for load and store instruction delay slot ........................................................................................... 62 3-2 load/store instructions ................................................................. 64 3-3 load/store instructions (extended isa) .................................. 66 3-4 alu immediate instructions ....................................................... 69 3-5 alu immediate instruction (extended isa) ......................... 70 3-6 three-operand type instruction .............................................. 71 3-7 three-operand type instructions (extended isa) ............... 72 3-8 shift instructions ............................................................................. 73 3-9 shift instructions (extended isa) .............................................. 74 3-10 multiply/divide instructions ........................................................ 75 3-11 multiply/divide instructions (extended isa) ......................... 76 3-12 number of cycles stalled by multiply/ divide instruction ........................................................................... 76 3-13 number of delay slot cycles of jump/ branch instruction ......................................................................... 77 3-14 jump instructions ........................................................................... 78 3-15 branch instructions ........................................................................ 79 3-16 branch instructions (extended isa) ......................................... 80 3-17 special instructions ........................................................................ 81 3-18 special instructions (extended isa) .......................................... 81 3-19 coprocessor instructions .............................................................. 83 3-20 coprocessor instructions (extended isa) ................................ 84 3-21 system control coprocessor (cp0) instructions ................... 86 list of tables (2/4) table no. title page user s manual u10504ej7v0um00 27 4-1 description of pipeline showing stage in which operations commence ...................................................................93 4-2 description of pipeline exceptions ...........................................105 4-3 description of pipeline interlocks .............................................105 5-1 32-bit and 64-bit user mode segments ...................................128 5-2 32-bit and 64-bit supervisor mode segments .......................131 5-3 32-bit kernel mode segments ....................................................136 5-4 64-bit kernel mode segments ....................................................138 5-5 use of cache and xkphys address space ................................140 5-6 cache algorithm ............................................................................145 5-7 mask field values for page sizes ..............................................149 6-1 cp0 exception processing registers ........................................162 6-2 cause register exccode field ...................................................172 6-3 64-bit mode exception vector base addresses .....................181 6-4 32-bit mode exception vector base addresses .....................181 6-5 exception priority order ............................................................182 7-1 floating-point control register assignments .......................211 7-2 flush values of denormalized number results ....................213 7-3 rounding mode control bits .....................................................215 7-4 equations for calculating values in single-and double-precision floating-point format ................................218 7-5 floating-point format parameter values ...............................218 7-6 minimum and maximum floating-point values ..................219 7-7 load/store/transfer instructions .............................................223 7-8 convert instruction .......................................................................224 7-9 computational instructions ........................................................226 7-10 compare instruction ....................................................................227 7-11 mnemonics and de?itions of compare instruction conditions .................................................................228 7-12 fpu branch instructions ............................................................229 list of tables (3/4) table no. title page 28 user? manual u10504ej7v0um00 7-13 number of load/store/transfer instruction execution cycles ........................................................................... 230 7-14 number of fpu instruction delay cycles .............................. 233 8-1 default fpu ieee754 exception values ................................ 238 8-2 fpu internal results and flag status ..................................... 239 10-1 frequency ratio between pclock and masterclock .......... 259 11-1 stall cycle count for data cache miss ................................... 281 11-2 stall cycle count for instruction cache miss ....................... 282 12-1 system interface requests .......................................................... 306 12-2 release latency time for external requests ........................ 332 12-3 encoding of syscmd3 for system interface commands ... 334 12-4 encoding of syscmd2 for read requests .............................. 335 12-5 encoding of syscmd(1:0) for block read requests ........... 335 12-6 encoding of syscmd(1:0) for single read requests ........... 335 12-7 encoding of syscmd2 for write requests ............................. 336 12-8 encoding of syscmd(1:0) for block write requests .......... 336 12-9 encoding of syscmd(1:0) for single write requests ......... 336 12-10 processor data identi?r encoding of syscmd(3:0) ........... 338 12-11 external data identi?r encoding of syscmd(3:0) ............. 338 13-1 jtag instruction register bit encoding ................................ 344 13-2 jtag scan order ......................................................................... 349 16-1 cpu instruction operation notations .................................... 365 16-2 load and store instruction common functions .................. 367 16-3 access type speci?ations for load/store instructions ..... 368 17-1 valid fpu instruction formats ................................................. 549 17-2 logical reverse of predicates by condition true/false ..... 550 list of tables (4/4) table no. title page user? manual u10504ej7v0um00 29 17-3 load and store instructions common functions ................ 554 17-4 format field decoding ................................................................ 555 17-5 floating-point computational instructions and operations ....................................................................................... 556 19-1 coprocessor 0 hazards ................................................................ 621 19-2 example of calculating number of cp0 hazards and number of instructions inserted ...................................... 625 a-1 differences between the v r 4300, v r 4305, and v r 4310 ..... 628 b-1 differences in software ................................................................ 632 b-2 differences in system design ..................................................... 635 b-3 other differences ......................................................................... 639 c-1 differences in software ............................................................... 643 c-2 differences in system design .................................................... 645 c-3 other differences .......................................................................... 646 30 user?s manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 31 general 1 this chapter outlines the risc 64-bit microprocessor v r 4300, v r 4305 ( m pd30200), and v r 4310 ( m pd30210). chapter 1 32 user? manual u10504ej7v0um00 1.1 characteristics the v r 4300, v r 4305, and v r 4310 are members of the nec v r series tm risc (reduced instruction set computer) microprocessors and is a high-performance 64-bit microprocessor employing the risc architecture developed by mips tm . its instructions are upward-compatible with the instructions of the v r 3000 tm series and are completely compatible with those of the v r 4400 and v r 4200. therefore, existing applications can be used as is with the v r 4300, v r 4305, and v r 4310. the v r 4300, v r 4305, and v r 4310 have the following features: internal operating frequency: 80 mhz max. ( m pd30200-80), 100 mhz max. ( m pd30200-100), 133 mhz max. ( m pd30200-133, 30210-133), 167 mhz max. ( m pd30210-167) 64-bit architecture supporting 64-bit data processing optimized, 5-stage pipeline processing high-speed translation lookaside buffer (tlb) supporting virtual addresses (of 32 double entries) address space physical: 32 bits virtual: 40 bits (64-bit mode) 31 bits (32-bit mode) supports single-precision and double-precision ?oating-point operations on-chip cache memories instruction: 16 kb data: 8 kb employs write back cache system ? store operation via system bus decreased 32-bit external bus interface facilitating system development multiplies external operating frequency (input clock and bus interface) to create internal operating frequency. multiple is selected on power application ( m pd30200-80: 1, 2, or 3) ( m pd30200-100: 1.5, 2, or 3) ( m pd30200-133: 2, 3, or 4) ( m pd30210-133: 2, 2.5, 3, or 4) ( m pd30210-167: 2, 2.5, 3, 4, 5, or 6) user? manual u10504ej7v0um00 33 general write buffer low power mode ( m pd30200-80, 30200-100 only) reduces internal and system bus clocks to 1/4 of normal level. also reduces power consumption software-compatible with v r 4400 and v r 4200 and upward- compatible with v r 3000 series supply voltage: 3.3 v 0.3 v ( m pd30200-80, 30200-100), 3.0 to 3.5 v ( m pd30200-133, 30210- ) 1.2 ordering information 1.3 64-bit architecture the v r 4300 is a 64-bit high-performance microprocessor. it can also execute 32- bit applications even when it operates as a 64-bit microprocessor. 1.4 v r 4300 processor figure 1-1 shows the internal block diagram of the v r 4300. the v r 4300 is equipped with a full-associative high-speed translation lookaside buffer (tlb) that has 32 entries with two pages corresponding to each entry; data cache and instruction cache; and fpu, in addition to a high-performance integer operation unit. part number package maximum operating frequency (mhz) m pd30200gd-80-lbb 120-pin plastic qfp (28 28 mm) 80 m pd30200gd-100-mbb 120-pin plastic qfp (28 28 mm) 100 m pd30200gd-133-mbb 120-pin plastic qfp (28 28 mm) 133 m pd30210gd-133-mbb 120-pin plastic qfp (28 28 mm) 133 m pd30210gd-167-mbb 120-pin plastic qfp (28 28 mm) 167 chapter 1 34 user? manual u10504ej7v0um00 figure 1-1 internal block diagram system interface clock generator data/address control instruction cache pipeline control data cache instruction address execution unit cp0 tlb masterclock user? manual u10504ej7v0um00 35 general 1.4.1 internal block configuration system interface allows the processor to access external resources such as memories. it contains a 32-bit multiplexed address/data bus, with per-byte parity, clock signals, interrupt request signals, and various control signals. it is not compatible with the system interface bus used on the v r 4400 and v r 4200. clock generator generates a pipeline clock (pclock) based on an externally input clock (masterclock). the frequency of the pclock can be selected by setting the frequency ratio between the masterclock and the pclock . this ratio is set using the divmode pins on power application. (for setting of the divmode pins, refer to table 2-2 clock/control interface signals .) table 1-1 indicates the selectable frequency ratio. system interface clock ( sclock ) usually has the same frequency as the masterclock . table 1-1 frequency ratio between pclock and masterclock *1. selectable with the 100 mhz model only (with the 133 mhz model, this setting is reserved.) 2. selectable with the 133 mhz model only (with the 100 mhz model, this setting is reserved.) 3. selectable with the 167 mhz model only (with the 133 mhz model, this setting is reserved.) if the rp bit of the status register is set to 1 during operation, the frequencies of the pclock and sclock can be reduced to 1/4 of the normal frequency * . because the pll (phase-locked loop) technique is employed, the skew (phase difference) between the external clock and internal operation clock can be minimized. * 100 mhz model of the v r 4300 and the v r 4305 only instruction cache is direct-mapped, virtually-indexed, and physically-tagged. the capacity is 16 kb. execution unit has the hardware resources to execute integer and floating-point instructions. it has a 64-bit register file, 64-bit integer/mantissa datapath, and 12- bit exponent datapath. it is provided with a dedicated multiplexer in order to process multiply instruction at a high speed. product name divmode pin selectable frequency ratio (masterclock : pclock) v r 4300 divmode (1 : 0) 1 : 1.5 *1 , 1 : 2, 1 : 3, 1 : 4 *2 v r 4305 divmode (1 : 0) 1 : 1, 1 : 2, 1 : 3 v r 4310 divmode (2 : 0) 1 : 2, 1 : 2.5 *3 , 1 : 3, 1 : 4, 1 : 5, 1 : 6 chapter 1 36 user? manual u10504ej7v0um00 coprocessor 0 (cp0) has the memory management unit (mmu) and handles exception processing. the mmu handles address translation and checks memory accesses that occur between different memory segments (user, supervisor, or kernel). the translation lookaside buffer (tlb) is used to translate virtual to physical addresses. data cache is a direct-mapped, virtually-indexed and physically-tagged write- back cache. the capacity is 8 kb. instruction address calculates the effective address of the next instruction to be fetched. it contains the incrementer for the program counter (pc), the target address adder, and the conditional branch address selector. pipeline control ensures the instruction pipeline operates properly (should one of the following conditions occur: pipeline stall or exception). user? manual u10504ej7v0um00 37 general 1.4.2 cpu registers the processor provides the following registers: 32 64-bit general purpose registers, gpr s 32 64-bit ?ating-point operation registers, fpr s in addition, the processor provides the following special registers: 64-bit program counter, the pc register 64-bit hi register, containing the integer multiply and divide high- order doubleword result 64-bit lo register, containing the integer multiply and divide low- order doubleword result 1-bit load/link llbit register 32-bit ?ating-point implementation/revision register, fcr0 32-bit ?ating-point control/status register, fcr31 two of the general purpose registers have assigned functions: r0 is hardwired to a value of zero, and can be used as the target register for any instruction whose result is to be discarded. r0 can also be used as a source when a zero value is needed. r31 is the link register used by jal and jalr instructions. it can be used by other instructions. make sure that other data used in calculations does not overlap with the register used by the jal/jalr instruction. furthermore, the processor contains registers in the system control processor (cp0) which perform the exception processing and address management. cpu registers can operate as either 32-bit or 64-bit registers, depending on the v r 4300 processor mode of operation. figure 1-2 shows the cpu registers. chapter 1 38 user? manual u10504ej7v0um00 figure 1-2 cpu registers the v r 4300 processor has no program status word (psw) register as such; this is covered by the status and cause registers incorporated within the system control coprocessor (cp0). for cp0 registers, refer to 1.4.5 system control coprocessor (cp0) . r0 = 0 r1 r2 r31 = link address multiply and divide registers program counter 0 0 0 hi lo 0 general purpose registers pc r29 r30 63 63 63 63 r0 r1 r2 r31 = control/status 0 floating-point registers r29 r30 63 load/link register 0 llbit floating-point control registers 0 0 31 31 r0 = implementation/revision r31 user? manual u10504ej7v0um00 39 general 1.4.3 cpu instruction set overview each cpu instruction is 32 bits long. as shown in figure 1-3, there are three instruction formats: immediate (i-type) jump (j-type) register (r-type) figure 1-3 cpu instruction formats the instruction set can be further divided into the following groupings: load and store instructions move data between memory and general purpose registers. they are all immediate (i-type) instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset. computational instructions perform arithmetic, logical, shift, multiply, and divide operations on values in registers. they include register (r-type, in which both the operands and the result are stored in registers) and immediate (i-type, in which one operand is a 16-bit signed immediate value) formats. jump and branch instructions change the control ?w of a program. jumps are always made to an address formed by combining a 26-bit target address with the high-order bits of the program counter (j-type format) or register address (r-type format). branch instructions are performed to the 16-bit offset address relative to the program counter (i-type). jump and link instructions save their return address in register 31. 0 15 16 20 21 25 26 31 0 15 16 20 21 25 26 31 0 25 26 31 op rs rt immediate op target funct op rs rt 11 10 65 rd sa r-type (register) j-type (jump) i-type (immediate) chapter 1 40 user? manual u10504ej7v0um00 coprocessor instructions (cpz) perform operations in the coprocessors. coprocessor load and store instructions are i-type. as opposed to cp0 instructions, cpz instructions are not speci? to any coprocessor. (refer to chapter 7 floating-point operations .) coprocessor 0 (system coprocessor, cp0) instructions perform operations on cp0 registers to control the memory-management and exception-handling facilities of the processor. special instructions perform system call exception and breakpoint exception operations, or cause a branch to the general exception- handling vector based upon the result of a comparison. these instructions occur in both r-type (both the operands and the result are registers) and i-type (one operand is a 16-bit immediate value) formats. for each instruction, refer to chapter 3 cpu instruction set summary and chapter 16 cpu instruction set details . user? manual u10504ej7v0um00 41 general 1.4.4 data formats and addressing the v r 4300 processor uses four data formats: a 64-bit doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. byte ordering within all of the larger data formats?alfword, word, doubleword?an be configured in either big-endian or little-endian. when the v r 4300 processor is configured as a big-endian system, byte 0 is the most-significant (leftmost) byte, thereby providing compatibility with mc 68000 tm and ibm 370 tm conventions. figure 1-4 shows this configuration. figure 1-4 big-endian byte ordering remarks 1. the most-significant byte is the lowest address. 2. a word is addressed by the address of the most-significant byte. when configured as a little-endian system, byte 0 is always the least-significant (rightmost) byte, which is compatible with iapx tm x86 and dec vax tm conventions. figure 1-5 shows this configuration. unless otherwise specified, the little endian is used throughout this manual. figure 1-5 little-endian byte ordering remarks 1. the least-significant byte is the lowest address. 2. a word is addressed by the address of the least-significant byte. higher address lower address word 4 8 12 address 89 11 10 45 7 6 01 3 2 12 13 15 14 0 31 24 23 16 15 8 7 0 higher address lower address word 4 8 12 address 8 9 11 10 4 5 76 0 1 32 12 13 15 14 0 31 24 23 16 15 8 7 0 chapter 1 42 user? manual u10504ej7v0um00 figure 1-6 big-endian data in a doubleword remarks 1. the most-significant byte is the lowest address. 2. a word is addressed by the address of the most-significant byte. figure 1-7 little-endian data in a doubleword remarks 1. the least-significant byte is the lowest address. 2. a word is addressed by the address of the least-significant byte. higher address lower address doubleword 16 address 16 17 18 8910 012 63 32 15 8 31 16 20 21 23 22 12 13 15 14 45 7 6 70 19 11 3 word halfword byte 8 0 higher address lower address doubleword 16 address 16 17 18 8 9 10 0 1 2 63 32 15 8 31 16 20 21 23 22 12 13 15 14 4 5 76 70 19 11 3 word halfword byte 8 0 user? manual u10504ej7v0um00 43 general the cpu uses byte addressing for halfword, word, and doubleword accesses with the following alignment constraints: halfword accesses must be aligned on an even byte boundary (0, 2, 4...). word accesses must be aligned on a byte boundary divisible by four (0, 4, 8...). doubleword accesses must be aligned on a byte boundary divisible by eight (0, 8, 16...). the following special instructions load and store words that are not aligned on 4- byte (word) or 8-word (doubleword) boundaries: lwl lwr swl swr ldl ldr sdl sdr these instructions are always used in pairs to access data not aligned at an boundary. to access data not aligned at a boundary, additional 1p cycle is necessary as compared when accessing data aligned at a boundary. figure 1-8 illustrates how a word misaligned and having byte address 3 is accessed in big and little endian. figure 1-8 misaligned word addressing higher address lower address big-endian 4 56 3 31 24 23 16 15 8 7 0 higher address lower address little-endian 4 5 6 3 31 24 23 16 15 8 7 0 chapter 1 44 user? manual u10504ej7v0um00 1.4.5 system control coprocessor (cp0) isa of mips defines four types of coprocessors (cp0 through cp3). cp0 is an internal system control coprocessor and supports a virtual memory system and exception processing. cp1 is an internal floating-point unit. cp2 is reserved for future definition. cp3 is also reserved for expansion. if the cp3 instruction is executed, a reserved instruction exception occurs. cp0 converts virtual addresses into physical addresses, selects an operating mode (kernel, supervisor, or user mode), and control exceptions. it also controls the cache subsystem to analyze causes and return execution from error processing. the cp0 register of the v r 4300 is the same as that of the v r 4200. because the v r 4300 does not have a parity check function, however, its parity error register (26) and cache error register (27) do not practically operate. these registers are defined to maintain compatibility with the v r 4200. figure 1-9 shows the cp0 register. table 1-2 briefly explains each register. for the details of the registers related to the virtual memory system, refer to chapter 5 memory management system , and for the details of the registers used for exception processing, refer to chapter 6 exception processing. user? manual u10504ej7v0um00 45 general figure 1-9 cp0 registers index random entrylo0 entrylo1 context pagemask wired badvaddr count entryhi compare status cause epc 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 prid 15 config lladdr watchlo watchhi parity error cache error taglo taghi errorepc 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 register name reg. # register name reg. # for future use xcontext memory management exception processing chapter 1 46 user? manual u10504ej7v0um00 table 1-2 system control coprocessor (cp0) register definitions * these registers are defined to maintain compatibility with the v r 4200, and not used with the hardware of the v r 4300. number register description 0 index programmable pointer into tlb array 1 random pseudorandom pointer into tlb array ( read only ) 2 entrylo0 low half of tlb entry for even virtual address (vpn) 3 entrylo1 low half of tlb entry for odd virtual address (vpn) 4 context pointer to kernel virtual page table entry (pte) in 32-bit mode 5 pagemask page size specification 6 wired number of wired tlb entries 7 reserved for future use 8 badvaddr display of virtual address that occurred an error last 9 count timer count 10 entryhi high half of tlb entry (including asid) 11 compare timer compare value 12 status operation status setting 13 cause display of cause of last exception 14 epc exception program counter 15 prid processor revision identifier 16 config memory system mode setting 17 lladdr load linked instruction address display 18 watchlo memory reference trap address low bits 19 watchhi memory reference trap address high bits 20 xcontext pointer to kernel virtual pte table in 64-bit mode 21?5 reserved for future use 26 parity error * cache parity bits 27 cache error * cache error and status register 28 taglo cache tag register low 29 taghi cache tag register high 30 errorepc error exception program counter 31 reserved for future use user? manual u10504ej7v0um00 47 general 1.4.6 floating-point unit (fpu), cp1 the floating-point unit (fpu) operates as a coprocessor for the cpu and performs arithmetic operations on floating-point values. the fpu, with associated system software, fully conforms to the requirements of ansi/ieee standard 754?985, ieee standard for binary floating-point arithmetic . the fpu includes: full 64-bit operation . the fpu can contain either 16 64-bit registers to hold single-precision or double-precision values. another sixteen ?ating-point registers can be used by setting the fr bit of the status register to 1. moreover, a 32-bit control/status register is provided, conforming to the ieee exception processing standard. load and store instruction set . like the cpu, the fpu uses a load- and store-based instruction set. floating-point operations are started in a single cycle, however execution of ?ating-point ops are not allowed to overlap other operations. sharing hardware . there is no separate fpu on the v r 4300; ?ating-point operations are processed by the same hardware as is used for integer instructions. 1.4.7 internal cache the v r 4300 has an instruction cache and a data cache to enhance the efficiency of pipelining. each cache has a data width of 64 bits and can be accessed in 1 clock. the instruction cache and data cache can be accessed in parallel. the instruction cache has a capacity of 16k bytes, while the data cache has a capacity of 8k bytes. for the details of the cache, refer to chapter 11 cache memory . chapter 1 48 user? manual u10504ej7v0um00 1.5 memory management system (mmu) the v r 4300 processor has a 32-bit physical addressing range of 4 gb. however, since it is rare for systems to implement a physical memory space this large, the cpu provides a logical expansion of memory space to the programmer by translating addresses into the large virtual address space. the v r 4300 processor supports the following two addressing modes: 32-bit mode, in which the virtual address space is divided into 2 gb per user process and 2 gb for the kernel. 64-bit mode, in which the virtual address is expanded to 1 tb (2 40 bytes) of user virtual address space. a detailed description of these address spaces is given in chapter 5 memory management system . 1.5.1 translation lookaside buffer (tlb) virtual memory mapping is assisted by a translation lookaside buffer, which holds virtual-to-physical address translations. this fully-associative, on-chip tlb contains 32 entries, each of which maps a pair of variable-sized pages of either 4 kb or 16 mb. joint tlb (jtlb) the tlb can hold both instruction and data addresses, and is thus also referred to as a joint tlb (jtlb). an address translation value is tagged with the high-order bits of its virtual address (the number of these bits depends upon the size of the page) and a per- process identifier. if there is no matching entry in the tlb, an exception occurs and software writes the entry contents to the on-chip tlb from a page table in memory. the jtlb entry to be rewritten is selected by a value in either the random or index register. user? manual u10504ej7v0um00 49 general instruction micro-tlb (itlb) the v r 4300 processor has a two-entry instruction micro-tlb (itlb) which assists in instruction address translation. the itlb can not be operated directly by the software. instructions access this tlb while data accesses the joint tlb; a miss in the micro-tlb stalls the pipeline until the micro-tlb is refilled from the joint tlb. the micro-tlb is fully associative, and uses the least-recently- used (lru) replacement algorithm. each micro-tlb entry maps 4 kb of virtual space to physical space. this ensures each itlb entry is a subset of any single jtlb entry. 1.5.2 operating modes the v r 4300 processor has three operating modes: user mode supervisor mode kernel mode the manner in which memory addresses are translated or mapped depends on the operating mode of the cpu; this is described in chapter 5 memory management system . 1.6 instruction pipeline the v r 4300 has a 5-stage instruction pipeline. this pipeline is used for floating- point operations as well as for integer operations. in a normal environment, the pipeline executes one instruction in 1 cycle. the pipeline of the v r 4300 operates at a frequency determined depending on the setting of the divmode(1:0)* pins. for details, refer to chapter 4 pipeline . * in v r 4300 and v r 4305. in v r 4310, divmode(2:0) . 50 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 51 pin functions 2 chapter 2 52 user? manual u10504ej7v0um00 2.1 pin configuration (top view) 120-pin plastic qfp (28 28 mm) m pd30200gd-80-lbb m pd30200gd-100-mbb m pd30200gd-133-mbb m pd30210gd-133-mbb m pd30210gd-167-mbb 90 v dd 89 gnd 88 int2 87 sysad27 86 sysad28 85 v dd 84 gnd 83 sysad29 82 eok 81 sysad30 80 v dd 79 gnd 78 pvalid 77 sysad31 76 v dd 75 gnd 74 preq 73 sysad0 72 v dd 71 gnd 70 sysad1 69 sysad2 68 v dd 67 gnd 66 sysad3 65 jtdo 64 sysad4 63 jtdi 62 v dd 61 gnd 120 gnd 119 v dd 118 int3 117 sysad23 116 divmode0 115 sysad24 114 gnd 113 v dd 112 divmode1 111 syscmd4 110 coldreset 109 syscmd3 108 gnd 107 v dd 106 syscmd2 105 evalid 104 reset 103 syscmd1 102 gnd 101 v dd 100 syscmd0 99 ereq 98 sysad25 97 gnd 96 v dd 95 pmaster 94 sysad26 93 nmi 92 gnd 91 v dd v dd 1 gnd 2 sysad22 3 sysad21 4 v dd 5 gnd 6 sysad20 7 v dd 8 v dd p9 gndp 10 pllcap0 11 pllcap1 12 v dd p13 gndp 14 v dd (div mode2) 15 masterclcok 16 gnd 17 tclock 18 v dd 19 gnd 20 syncout 21 sysad19 22 v dd 23 syncin 24 gnd 25 sysad18 26 sysad17 27 int4 28 v dd 29 gnd 30 gnd 31 v dd 32 sysad16 33 sysad15 34 gnd 35 v dd 36 sysad14 37 sysad13 38 gnd 39 v dd 40 sysad12 41 sysad11 42 gnd 43 v dd 44 sysad10 45 int0 46 sysad9 47 gnd 48 v dd 49 sysad8 50 sysad7 51 jtms 52 gnd 53 v dd 54 sysad6 55 sysad5 56 jtck 57 int1 58 gnd 59 v dd 60 remark ( ): pin name of the m pd30210-xxx user? manual u10504ej7v0um00 53 pin functions pin name coldreset : cold reset divmode (1:0)* : divide mode eok : external ok ereq : external request evalid : external valid int (4:0) : interrupt request jtck : jtag clock input jtdi : jtag data in jtdo : jtag data out jtms : jtag command signal masterclock : master clock nmi : non-maskable interrupt request pllcap (1:0) : phase locked loop capacitance pmaster : processor master preq : processor request pvalid : processor valid reset : reset syncln : synchronization clock input syncout : synchronization clock output sysad (31:0) : system address/data bus syscmd (4:0) : system command data id bus tclock : transmit clock v dd : power supply gnd : ground v dd p : v dd for pll gndp : gnd for pll * in the m pd30200- . divmode (2:0) in the m pd30210- . chapter 2 54 user? manual u10504ej7v0um00 2.2 pin functions 2.2.1 system interface signals the system interface signals are used when the v r 4300 is connected with an external device in the system. table 2-1 indicates the functions of these signals. table 2-1 system interface signals signal name definition i/o function sysad(31:0) system address/data bus i/o 32-bit address/data bus. used to transmit or receive data or address between the processor and the external agent. syscmd(4:0) system command/data id bus i/o 5-bit bus. used to transfer commands or data identifiers between the processor and the external agent. ereq external request input asserted active when the external agent requests the processor for the system interface. preq processor request output asserted active when the processor requests the external agent for the system interface. if a protocol error is detected in the system interface, this signal is oscillated in synchronization with masterclock in a cycle which is a multiple of sclock. evalid external agent valid input asserted active when the external agent drives a valid address or valid data onto the sysad bus, and a valid command/data identifier is on the syscmd bus. pvalid processor valid output asserted active when the processor drives a valid address or data onto the sysad bus, and a valid command/data identifier is on the syscmd bus. pmaster processor master output asserted active when the processor is the master of the system interface bus. eok external ready input asserted active when the external agent is ready to accept a processor request. user? manual u10504ej7v0um00 55 pin functions 2.2.2 clock/control interface signals these interface signals are used to supply or control clocks. table 2-2 shows the functions of the signals. table 2-2 clock/control interface signals (1/3) signal name definition i/o function masterclock master clock input inputs the masterclock from this pin. the internal operating speed is determined by the frequency of this signal and the contents of the divmode signals. tclock transmit/receive clock output outputs the transmit/receive clock at the same frequency as the masterclock. syncout synchronization clock output output outputs a synchronization clock. connect this pin to syncin. model the mutual connection between tclock and external agent. syncin synchronization clock input input inputs a synchronization clock. v dd p static v dd for pll this pins is static v dd for the internal pll circuit. gndp static gnd for pll this pin is static gnd for the internal pll circuit. pllcap(1:0) adjusting pll this pin connects a capacitor for adjusting the internal pll circuit of the processor. divmode internal operating frequency mode input indicates the ratio at which the internal pclock is generated from the masterclock. normally, the frequency of the tclock is the same as that of the masterclock. do not change the value of these pins after setting the value on power application. otherwise, the operation will not guaranteed. the following indicates the relationship between the divmode values and frequency ratio of each product. remark the maximum value of pclock is the same as the maximum internal operating frequencies of each product regardless of the frequency ratio. (refer to 1.2 ordering information .) ?v r 4300 m pd30200-100 divmode (1 : 0) masterclock : pclock : tclock frequency ratio example [mhz] 00 rfu e 01 2 : 3 : 2 66.7 : 100 : 66.7 10 1 : 2 : 1 50 : 100 : 50 11 1 : 3 : 1 33.3 : 100 : 33.3 chapter 2 56 user? manual u10504ej7v0um00 divmode internal operating frequency mode input ?v r 4300 ?v r 4305 ?v r 4310 table 2-2 clock/control interface signals (2/3) signal name definition i/o function m pd30200-133 divmode (1 : 0) masterclock : pclock : tclock frequency ratio example [mhz] 00 1 : 4 : 1 33.3 : 133 : 33.3 01 rfu e 10 1 : 2 : 1 66.7 : 133 : 66.7 11 1 : 3 : 1 44.3 : 133 : 44.3 m pd30200-80 divmode (1 : 0) masterclock : pclock : tclock frequency ratio example [mhz] 00 1 : 1 : 1 66.7 : 66.7 : 66.7 01 rfu e 10 1 : 2 : 1 40 : 80 : 40 11 1 : 3 : 1 20 : 60 : 20 m pd30210-133 divmode (2 : 0) masterclock : pclock : tclock frequency ratio example [mhz] 000 1 : 5 : 1 26.7 : 133 : 26.7 001 1 : 6 : 1 22.2 : 133 : 22.2 010 rfu e 011 1 : 3 : 1 33.3 : 100 : 33.3 100 1 : 4 : 1 33.3 : 133 : 33.3 101 rfu e 110 1 : 2 : 1 50 : 100 : 50 111 1 : 3 : 1 33.3 : 100 : 33.3 user? manual u10504ej7v0um00 57 pin functions 2.2.3 interrupt interface signals these signals are used by the external device to issue interrupt requests to the v r 4300. table 2-3 shows the functions of these signals. table 2-3 interrupt interface signals divmode internal operating frequency mode input ? v r 4310 signal name definition i/o function int(4:0) interrupt request acknowledge input general purpose interrupt request pins. these pins are ored with the bits 4 through 0 of the internal interrupt register. nmi non-maskable interrupt input this pin accepts the non-maskable interrupt signal. it is ored with the bit 6 of the internal interrupt register. table 2-2 clock/control interface signals (3/3) signal name definition i/o function m pd30210-167 divmode (2 : 0) masterclock : pclock : tclock frequency ratio example [mhz] 000 1 : 5 : 1 33.3 : 167 : 33.3 001 1 : 6 : 1 27.8 : 167 : 27.8 010 2 : 5 : 2 66.7 : 167 : 66.7 011 1 : 3 : 1 33.3 : 100 : 33.3 100 1 : 4 : 1 33.3 : 133 : 33.3 101 rfu e 110 1 : 2 : 1 50 : 100 : 50 111 1 : 3 : 1 33.3 : 100 : 33.3 chapter 2 58 user? manual u10504ej7v0um00 2.2.4 joint test action group (jtag) interface signals these signals are for interfacing the boundary scan of jtag. table 2-4 shows the functions of these signals. table 2-4 jtag interface signals 2.2.5 initialization interface signals these signals are used when the external device initializes the operation parameters of the processor. table 2-5 shows the functions of these signals. table 2-5 initialization interface signals signal name definition i/o function jtdi jtag data input input inputs data to be scanned serially. jtck jtag clock input input inputs a serial clock. jtdi and jtms are read simultaneously at the rising edge of this signal. fix this signal to the low level when the jtag interface is not used. jtdo jtag data output output outputs serially scanned data. jtms jtag command input inputs a high level to this pin if the serial data to be input next is a command of the jtag. signal name definition i/o function coldreset cold reset input asserted active at cold reset. sclock and tclock start the cycle at the rising edge of this signal. this signal needs not be asserted active or deasserted inactive in synchronization with the masterclock signal. reset reset input make this pin active or inactive in synchronization with masterclock, or keep it inactive at cold reset. make this pin active or inactive in synchronization with masterclock at soft reset. user? manual u10504ej7v0um00 59 cpu instruction set summary 3 this chapter is an overview of the central processing unit (cpu) instruction set; refer to chapter 16 cpu instruction set details for detailed descriptions of individual cpu instructions. because the fpu instruction is dependent upon the structure of the coprocessor, refer to chapter 7 floating-point operations and chapter 17 fpu instruction set details . chapter 3 60 user? manual u10504ej7v0um00 3.1 cpu instruction formats each cpu instruction consists of a single 32-bit word, aligned on a word boundary. there are three instruction formats?mmediate (i-type), jump (j- type), and register (r-type)?s shown in figure 3-1. by simplifying the instruction format in three ways, decoding instructions is simplified. complicated and less frequently used operations and addressing modes are implemented by combining two or more instructions by using a compiler. figure 3-1 cpu instruction formats op 6-bit operation code rs 5-bit source register number rt 5-bit target (source/destination) register number or branch condition immediate 16-bit immediate value, branch displacement or address displacement target 26-bit unconditional branch target address rd 5-bit destination register number sa 5-bit shift amount funct 6-bit function field 0 15 16 20 21 25 26 31 0 15 16 20 21 25 26 31 0 25 26 31 op rs rt immediate op target funct op rs rt 11 10 6 5 rd sa r-type (register) j-type (jump) i-type (immediate) user? manual u10504ej7v0um00 61 cpu instruction set summary support of the mips isa even though the v r 4300 processor does not support a multiprocessor operating environment, the synchronization support instructions defined in the mips ii and mips iii isa?he load linked and store conditional instructions?re processed correctly, in order to maintain compatibility with v r 4400 and v r 4200. the load link bit ( llbit ) is set by the ll instruction, cleared by an eret, and tested by the sc instruction. the only operation to the llbit that can be implemented is a reset due to cache invalidation. caution note that all load/store instructions in this processor are executed in program order since the sync instruction is handled as a nop. 3.2 instruction classes the cpu instructions can be classified into six classes. 3.2.1 load/store instructions load and store are immediate (i-type) instructions that move data between memory and the general purpose registers. only a mode that adds a 16-bit signed immediate offset to the base register is available as the addressing mode of the load/store instructions. scheduling a load delay slot a load instruction whose loading result cannot be used by the instruction immediately following is called a delayed load instruction. the instruction slot immediately after a delayed load instruction is called a load delay slot. with the v r 4000 series, an instruction including the load destination register can be described immediately after a load instruction. in this case, however, the interlock count is generated equal to the number of necessary cycles. therefore, although any instruction can be described, it is recommended to schedule the load delay slot to improve the performances of the v r 4300 and to maintain its compatibility with the v r 3000 series (for details, refer to chapter 4 pipeline ). store delay slot in the v r 4300 processor, a store instruction writing to the data cache keeps the data cache busy during both its dc and wb stages. if the instruction immediately following needs to access the data cache in its dc stage (e.g. a load instruction), the hardware interlocks. consequently, scheduling store delay slots can be desirable for performance. chapter 3 62 user? manual u10504ej7v0um00 table 3-1 number of cycles for load and store instruction delay slot defining access types access type is the size of the data loaded/stored by the processor. the op code of the load/store instruction determines the access type. figure 3-2 shows the access type and the data to be loaded/stored. the address used for the load/store instruction is the least significant byte address (most significant byte in big endian and the address indicating the least significant byte in little endian), regardless of the access type and byte ordering (endianness). the byte ordering in the doubleword of the data to be accessed is determined by the access type and the low-order 3 bits of the address, as shown in figure 3-2. combinations of an access type and the low-order bits of an address other than those shown in figure 3-2 are prohibited. if a combination other than those shown in the figure is used, an address error exception occurs. table 3-2 lists the load/store instructions defined by isa, and table 3-3 lists the instructions of the extended isa. instruction pcycles required load 1 store 1 user? manual u10504ej7v0um00 63 cpu instruction set summary figure 3-2 byte access within a doubleword access-type mnemonic ( value ) low-order address bits bytes accessed big endian (63 0) little endian (63 0) 210 doubleword ( 7 ) 0 0 0 0123456776543210 septibyte ( 6 ) 0 0 0 0123456 6543210 001 12345677654321 sextibyte ( 5 ) 0 0 0 012345 543210 010 234567765432 quintibyte ( 4 ) 0 0 0 01234 43210 011 3456776543 word ( 3 ) 0 0 0 0123 3210 100 45677654 triplebyte ( 2 ) 000012 210 001 123 321 100 456 654 101 567765 halfword ( 1 ) 00001 10 010 23 32 100 45 54 110 6776 byte ( 0 ) 0000 0 001 1 1 010 2 2 011 3 3 100 4 4 101 5 5 110 6 6 111 77 chapter 3 64 user? manual u10504ej7v0um00 table 3-2 load/store instructions (1/2) instruction format and description load byte lb rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. sign-extends the contents of a byte specified by the address and loads the result to register rt. load byte unsigned lbu rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. zero-extends the contents of a byte specified by the address and loads the result to register rt. load halfword lh rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. sign-extends the contents of a halfword specified by the address and loads the result to register rt. load halfword unsigned lhu rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base zero-extends the contents of a halfword specified by the address and loads the result to register rt. load word lw rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. sign-extends the contents of a word specified by the address (in the 64-bit mode) and loads the result to register rt. load word left lwl rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts a word specified by the address to the left, so that a byte specified by the address is at the leftmost position of the word. sign-extends (in the 64- bit mode), merges the result of the shift and the contents of register rt, and loads the result to register rt. load word right lwr rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts a word specified by the address to the right, so that a byte specified by the address is at the rightmost position of the word. sign-extends (in the 64- bit mode), merges the result of the shift and the contents of register rt, and loads the result to register rt. op base rt offset user? manual u10504ej7v0um00 65 cpu instruction set summary store byte sb rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. stores the contents of the low-order byte of register rt to the memory specified by the address. store halfword sh rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. stores the contents of the low-order halfword of register rt to the memory specified by the address. store word sw rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. stores the contents of the low-order word of register rt to the memory specified by the address. store word left swl rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the contents of register rt to the right so that the leftmost byte of the word is at the position of the byte specified by the address. stores the result of the shift to the lower portion of the word in memory. store word right swr rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the contents of register rt to the left so that the rightmost byte of the word is at the position of the byte specified by the address. stores the result of the shift to the higher portion of the word in memory. table 3-2 load/store instructions (2/2) instruction format and description op base rt offset chapter 3 66 user? manual u10504ej7v0um00 table 3-3 load/store instructions (extended isa) (1/2) instruction format and description load doubleword ld rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. loads the contents of the doubleword specified by the address to register rt. load doubleword left ldl rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the doubleword specified by the address to the left so that the byte specified by the address is at the leftmost position of the doubleword. merges the result of the shift and the contents of register rt, and loads the result to register rt. load doubleword right ldr rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the doubleword specified by the address to the right so that the byte specified by the address is at the rightmost position of the doubleword. merges the result of the shift and the contents of register rt, and loads the result to register rt. load linked ll rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. loads the contents of the word specified by the address to register rt nd sets the ll bit to 1. load linked doubleword lld rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. loads the contents of the doubleword specified by the address to register rt and sets the ll bit to 1. load word unsigned lwu rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. zero-extends the contents of the word specified by the address, and loads the result to register rt. op base rt offset user? manual u10504ej7v0um00 67 cpu instruction set summary store conditional sc rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. if the ll bit is 1, stores the contents of the low-order word of register rt to the memory specified by the address, and sets register rt to 1. if the ll bit is 0, does not store the contents of the word, and clears register rt to 0. store conditional doubleword scd rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. if the ll bit is 1, stores the contents of register rt to the memory specified by the address, and sets register rt to 1. if the ll bit is 0, does not store the contents of the register, and clears register rt to 0. store doubleword sd rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. stores the contents of register rt to the memory specified by the address. store doubleword left sdl rt, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the contents of register rt to the right so that the leftmost byte of a doubleword is at the position of the byte specified by the address. stores the result of the shift to the lower portion of the doubleword in memory. store doubleword right sdr rf, offset (base) generates an address by adding a sign-extended offset to the contents of register base. shifts the contents of register rt to the left so that the rightmost byte of a doubleword is at the position of the byte specified by the address. stores the result of the shift to the higher portion of the doubleword in memory. table 3-3 load/store instructions (extended isa) (2/2) instruction format and description op base rt offset chapter 3 68 user? manual u10504ej7v0um00 3.2.2 computational instructions computational instructions executes arithmetic operations, multiply/divide, logical operations, and shift operations on the values of registers. these instructions are classified into two types: r-type and i-type. the r-type instructions uses registers as both the source, and the i-type instructions uses an immediate value as one of the sources. the operation instructions are divided into the following four types by classification of operation. (1) alu immediate instructions (refer to tables 3-4 and 3-5 .) (2) 3-operand type instructions (refer to tables 3-6 and 3-7 .) (3) shift instructions (refer to tables 3-8 and 3-9 .) (4) multiply/divide instructions (refer to tables 3-10 and 3-11 .) if compatibility of data is necessary in the 64-bit and 32-bit modes, the 32-bit operands must be correctly sign-extended. otherwise, the 32-bit value of the result of the operation will be meaningless. user? manual u10504ej7v0um00 69 cpu instruction set summary table 3-4 alu immediate instructions instruction format and description add immediate addi rt, rs, immediate sign-extends the 16-bit immediate and adds it to register rs. stores the 32-bit result to register rt (sign-extends the result in the 64-bit mode). generates an exception if a 2's complement integer overflow occurs. add immediate unsigned addiu rt, rs, immediate sign-extends the 16-bit immediate and adds it to register rs. stores the 32-bit result to register rt (sign-extends the result in the 64-bit mode). does not generate an exception even if an integer overflow occurs. set on less than immediate slti rt, rs, immediate sign-extends the 16-bit immediate and compares it with register rs as a signed integer. if rs is less than the immediate, stores 1 to register rt; otherwise, stores 0 to register rt. set on less than immediate unsigned sltiu rt, rs, immediate sign-extends the 16-bit immediate and compares it with register rs as an unsigned integer. if rs is less than the immediate, stores 1 to register rt; otherwise, stores 0 to register rt. and immediate andi rt, rs, immediate zero-extends the 16-bit immediate, ands it with register rs, and stores the result to register rt. or immediate ori rt, rs, immediate zero-extends the 16-bit immediate, ors it with register rs, and stores the result to register rt. exclusive or immediate xori rt, rs, immediate zero-extends the 16-bit immediate, exclusive-ors it with register rs, and stores the result to register rt. load upper immediate lui rt, immediate shifts the 16-bit immediate 16 bits to the left, and clears the low-order 16 bits of the word to 0. stores the result to register rt (by sign-extending the result in the 64-bit mode). op rs rt immediate chapter 3 70 user? manual u10504ej7v0um00 table 3-5 alu immediate instruction (extended isa) instruction format and description doubleword add immediate daddi rt, rs, immediate sign-extends the 16-bit immediate to 64 bits, and adds it to register rs. stores the 64-bit result to register rt. generates an exception if an integer overflow occurs. doubleword add immediate unsigned daddiu rt, rs immediate sign-extends the 16-bit immediate to 64 bits, and adds it to register rs. stores the 64-bit result to register rt. does not generate an exception even if an integer overflow occurs. op rs rt immediate user? manual u10504ej7v0um00 71 cpu instruction set summary table 3-6 three-operand type instruction instruction format and description add add rd, rs, rt adds the contents of register rs and rt, and stores (sign-extends in the 64-bit mode) the 32-bit result to register rd. generates an exception if an integer overflow occurs. add unsigned addu rd, rs, rt adds the contents of register rs and rt, and stores (sign-extends in the 64-bit mode) the 32-bit result to register rd. does not generate an exception even if an integer overflow occurs. subtract sub rd, rs, rt subtracts the contents of register rs from register rt, and stores (sign-extends in the 64-bit mode) the result to register rd. generates an exception if an integer overflow occurs. subtract unsigned subu rd, rs, rt subtracts the contents of register rt from register rs, and stores (sign-extends in the 64-bit mode) the 32-bit result to register rd. does not generate an exception even if an integer overflow occurs. set on less than slt rd, rs, rt compares the contents of registers rs and rt as signed integers. if the contents of register rs are less than those of rt, stores 1 to register rd; otherwise, stores 0 to rd. set on less than unsigned sltu rd, rs, rt compares the contents of registers rs and rt as unsigned integers. if the contents of register rs are less than those of rt, stores 1 to register rd; otherwise, stores 0 to rd. and and rd, rs, rt ands the contents of registers rs and rt in bit units, and stores the result to register rd. or or rd, rs, rt ors the contents of registers rs and rt in bit units, and stores the result to register rd. exclusive or xor rd, rs, rt exclusive-ors the contents of registers rs and rt in bit units, and stores the result to register rd. nor nor rd, rs, rt nors the contents of registers rs and rt in bit units, and stores the result to register rd. op rs rt sa rd funct chapter 3 72 user? manual u10504ej7v0um00 table 3-7 three-operand type instructions (extended isa) instruction format and description doubleword add dadd rd, rs, rt adds the contents of registers rs and rt, and stores the 64-bit result to register rd. generates an exception if an integer overflow occurs. doubleword add unsigned daddu rd, rs, rt adds the contents of registers rs and rt, and stores the 64-bit result to register rd. does not generate an exception even if an integer overflow occurs. doubleword subtract dsub rd, rs, rt subtracts the contents of register rt from register rs, and stores the 64-bit result to register rd. generates an exception if an integer overflow occurs. doubleword subtract unsigned dsubu rd, rs, rt subtracts the contents of register rt from register rs, and stores the 64-bit result to register rd. does not generate an exception even if an integer overflow occurs. op rs rt sa rd funct user? manual u10504ej7v0um00 73 cpu instruction set summary table 3-8 shift instructions instruction format and description shift left logical sll rd, rt, sa shifts the contents of register rt sa bits to the left, and inserts 0 to the low- order bits. sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. shift right logical srl rd, rt, sa shifts the contents of register rt sa bits to the right, and inserts 0 to the high- order bits. sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. shift right arithmetic sra rd, rt, sa shifts the contents of register rt sa bits to the right, and sign-extends the high- order bits. sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. shift left logical variable sllv rd, rt, rs shifts the contents of register rt to the left and inserts 0 to the low-order bits. the number of bits by which the register contents are to be shifted is specified by the low-order 5 bits of register rs. sign-extends (in the 64-bit mode) the result and stores it to register rd. shift right logical variable srlv rd, rt, rs shifts the contents of register rt to the right, and inserts 0 to the high-order bits. the number of bits by which the register contents are to be shifted is specified by the low-order 5 bits of register rs. sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. shift right arithmetic variable srav rd, rt, rs shifts the contents of register rt to the right and sign-extends the high-order bits. the number of bits by which the register contents are to be shifted is specified by the low-order 5 bits of register rs. sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. op rs rt sa rd funct chapter 3 74 user? manual u10504ej7v0um00 table 3-9 shift instructions (extended isa) (1/2) instruction format and description doubleword shift left logical dsll rd, rt, sa shifts the contents of register rt sa bits to the left, and inserts 0 to the low- order bits. stores the 64-bit result to register rd. doubleword shift right logical dsrl rd, rt, sa shifts the contents of register rt sa bits to the right, and inserts 0 to the high- order bits. stores the 64-bit result to register rd. doubleword shift right arithmetic dsra rd, rt, sa shifts the contents of register rt sa bits to the right, and sign-extends the high- order bits. stores the 64-bit result to register rd. doubleword shift left logical variable dsllv rd, rt, rs shifts the contents of register rt to the left, and inserts 0 to the low-order bits. the number of bits by which the register contents are to be shifted is specified by the low-order 6 bits of register rs. stores the 64-bit result and stores it to register rd. doubleword shift right logical variable dsrlv rd, rt, rs shifts the contents of register rt to the right, and inserts 0 to the higher bits. the number of bits by which the register contents are to be shifted is specified by the low-order 6 bits of register rs. sign-extends the 64-bit result and stores it to register rd. doubleword shift right arithmetic variable dsrav rd, rt, rs shifts the contents of register rt to the right, and sign-extends the high-order bits. the number of bits by which the register contents are to be shifted is specified by the low-order 6 bits of register rs. sign-extends the 64-bit result and stores it to register rd. doubleword shift left logical + 32 dsll32 rd, rt, sa shifts the contents of register rt 32+sa bits to the left, and inserts 0 to the low- order bits. stores the 64-bit result to register rd. doubleword shift right logical + 32 dsrl32 rd, rt, sa shifts the contents of register rt 32+sa bits to the right, and inserts 0 to the high-order bits. stores the 64-bit result to register rd. op rs rt sa rd funct user? manual u10504ej7v0um00 75 cpu instruction set summary table 3-10 multiply/divide instructions doubleword shift right arithmetic + 32 dsra32 rd, rt, sa shifts the contents of register rt 32+sa bits to the right, and sign-extends the high-order bits. stores the 64-bit result to register rd. instruction format and description multiply mult rs, rt multiplies the contents of register rs by the contents of register rt as a 32-bit signed integer. sign-extends (in the 64-bit mode) and stores the 64-bit result to special registers hi and lo. multiply unsigned multu rs, rt multiplies the contents of register rs by the contents of register rt as a 32-bit unsigned integer. sign-extends (in the 64-bit mode) and stores the 64-bit result to special registers hi and lo. divide div rs, rt divides the contents of register rs by the contents of register rt. the operand is treated as a 32-bit signed integer. sign-extends (in the 64-bit mode) and stores the 32-bit quotient to special register lo and the 32-bit remainder to special register hi. divide unsigned divu rs, rt divides the contents of register rs by the contents of register rt. the operand is treated as a 32-bit unsigned integer. sign-extends (in the 64-bit mode) and stores the 32-bit quotient to special register lo and the 32-bit remainder to special register hi. move from hi mfhi rd transfers the contents of special register hi to register rd. move from lo mflo rd transfers the contents of special register lo to register rd. move to hi mthi rs transfers the contents of register rs to special register hi. move to lo mtlo rs transfers the contents of register rs to special register lo. table 3-9 shift instructions (extended isa) (2/2) instruction format and description op rs rt sa rd funct op rs rt sa rd funct chapter 3 76 user? manual u10504ej7v0um00 table 3-11 multiply/divide instructions (extended isa) when an integer multiply or divide instruction is executed, the v r 4300 stalls the entire pipeline. the number of processor cycles (pcycles) stalled at this time is shown below. table 3-12 number of cycles stalled by multiply/divide instruction instruction format and description doubleword multiply dmult rs, rt multiplies the contents of register rs by the contents of register rt as a signed integer. stores the 128-bit result to special registers hi and lo. doubleword multiply unsigned dmultu rs, rt multiplies the contents of register rs by the contents of register rt as an unsigned integer. stores the 128-bit result to special registers hi and lo. doubleword divide ddiv rs, rt divides the contents of register rs by the contents of register rt. the operand is treated as a signed integer. stores the 64-bit quotient to special register lo, and the 64-bit remainder to special register hi. doubleword divide unsigned ddivu rs, rt divides the contents of register rs by the contents of register rt. the operand is treated as an unsigned integer. stores the 64-bit quotient to special register lo, and the 64-bit remainder to special register hi. instruction mult multu div divu dmult dmultu ddiv ddivu number of required cycles 5 5 37 37 8 8 69 69 op rs rt sa rd funct user? manual u10504ej7v0um00 77 cpu instruction set summary 3.2.3 jump/branch instructions the jump and branch instructions change the flow of the program. all the jump and branch instructions generate one delay slot. the instruction immediately following a jump or branch instruction (i.e., the instruction in the delay slot) is executed while the first instruction at the destination is fetched from the memory. instructions involving link, such as jal and bltzal, store the return address to register r31. table 3-13 number of delay slot cycles of jump/branch instruction outline of jump instruction subroutine call described in a high-level language usually uses j or jal instruction. the j and jal instructions are j-type instructions. an instruction of this type shifts a 26-bit target address 2 bits to the left and combines it with the high-order 4 bits of the current program counter to generate a 32- or 64-bit absolute address. to return, dispatch, or jump between pages, the jr or jalr instruction is usually used. both of these instructions are of r-type and references the 32- or 64-bit byte address of a general purpose register. for details, refer to chapter 16 cpu instruction set details . outline of branch instruction the branch instruction has a signed 16-bit offset relative to the program counter. instructions involving link, such as jal and bltzal, store the return address to register r31. table 3-14 lists the jump instructions, and table 3-15 shows the branch instructions. table 3-16 lists the branch instructions of the extended isa. instruction number of required cycles branch 1 jump 1 chapter 3 78 user? manual u10504ej7v0um00 table 3-14 jump instructions the following common limits are applied to tables 3-15 and 3-16. branch address the branch addresses of all the branch instructions are calculated by adding a 16- bit offset (signed 64 bits shifted 2 bits to the left) to the address of the instruction in the delay slot. all the branch instructions generate one delay slot. operation during no branch (table 3-16) if the branch condition of the branch likely instruction is not satisfied, the instruction in the delay slot is invalidated. the instruction in the delay slot are unconditionally executed for all the other branch instructions. remark the instruction at the branch destination is fetched in the ex stage of the branch instruction. comparison of branch and calculation of the target address are executed in phase 2 of the rf stage and phase 1 of the ex stage of the branch instruction. one cycle of the branch delay slot defined by the architecture is necessary. one cycle of the delay slot is also necessary for the jump instruction. if the branch condition of the branch likely instruction is not satisfied, the instruction in the branch slot are invalidated. instruction format and description jump j target shifts the 26-bit target address 2 bits to the left, and jumps to the address coupled with the high-order 4 bits of the pc, delayed by one instruction. jump and link jal target shifts the 26-bit target address 2 bits to the left, and jumps to the address coupled with the high-order 4 bits of the pc, delayed by one instruction. stores the address of the instruction following the delay slot to r31 (link register). instruction format and description jump register jr rs jumps to the address of register rs, delayed by one instruction. jump and link register jalr rs, rd jumps to the address of register rs, delayed by one instruction. stores the address of the instruction following the delay slot to register rd. op target op rs rt sa rd funct user? manual u10504ej7v0um00 79 cpu instruction set summary the following symbols in the instruction format in table 3-15 through table 3-21 are special. regimm : op code sub : sub operation code co : sub operation identifier bc : bc sub operation code br : branch condition identifier cofun : coprocessor function area op : operation code table 3-15 branch instructions instruction format and description branch on equal beq rs, rt, offset branches to the branch address if register rs equals to rt. branch on not equal bne rs, rt, offset branches to the branch address if register rs is not equal to rt. branch on less than or equal to zero blez rs, offset branches to the branch address if register rs is less than 0. branch on greater than zero bgtz rs, offset branches to the branch address if register rs is greater than 0. instruction format and description branch on less than zero bltz rs, offset branches to the branch address if register rs is less than 0. branch on greater than or equal to zero bgez rs, offset branches to the branch address if register rs is greater than 0. branch on less than zero and link bltzal rs, offset stores the address of the instruction following the delay slot to register r31 (link register), and branches to the branch address if register rs is less than 0. branch on greater than or equal to zero and link bgezal rs, offset stores the address of the instruction following the delay slot to register r31 (link register) and branches to the branch address if register rs is greater than 0. op rs rt offset rd funct regimm rs sub offset rd funct chapter 3 80 user? manual u10504ej7v0um00 table 3-16 branch instructions (extended isa) instruction format and description branch on equal likely beql rs, rt, offset branches to the branch address if registers rs and rt are equal. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on not equal likely bnel rs, rt, offset branches to the branch address if registers rs and rt are not equal. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on less than or equal to zero likely blezl rs, offset branches to the branch address if register rs is less than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on greater than zero likely bgtzl rs, offset branches to the branch address if register rs is greater than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. instruction format and description branch on less than zero likely bltzl rs, offset branches to the branch address if register rs is less than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on greater than or equal to zero likely bgezl rs, offset branches to the branch address if register rs is greater than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on less than zero and link likely bltzall rs, offset stores the address of the instruction following the delay slot to register r31 (link register). branches to the branch address if register rs is less than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on greater than or equal to zero and link likely bgezall rs, offset stores the address of the instruction following the delay slot to register r31 (link register). branches to the branch address if register rs is greater than 0. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. op rs rt offset rd funct regimm rs sub offset rd funct user? manual u10504ej7v0um00 81 cpu instruction set summary 3.2.4 special instructions the special instructions generate an exception by software. the instruction type is r-type (syscall, break). the trap instructions are invalid with the v r 3000 series. all the other instructions are valid with all the v r series. table 3-17 special instructions table 3-18 special instructions (extended isa) (1/2) instruction format and description synchronize sync completes the load/store instruction currently in the pipeline before the new load/store instruction is executed. system call syscall generates a system call exception and transfers control to the exception processing program. breakpoint break generates a breakpoint exception and transfers control to the exception processing program. instruction format and description trap if greater than or equal tge rs, rt compares registers rs and rt as signed integers. if register rs is greater than rt, generates an exception. trap if greater than or equal unsigned tgeu rs, rt compares registers rs and rt as unsigned integers. if register rs is greater than rt, generates an exception. trap if less than tlt rs, rt compares registers rs and rt as signed integers. if register rs is less than rt, generates an exception. trap if less than unsigned tltu rs, rt compares registers rs and rt as unsigned integers. if register rs is less than rt, generates an exception. trap if equal teq rs, rt generates an exception if registers rs and rt are equal. trap if not equal tne rs, rt generates an exception if registers rs and rt are not equal. special rs rt sa rd funct special rs rt sa rd funct chapter 3 82 user? manual u10504ej7v0um00 table 3-18 special instructions (extended isa) (2/2) instruction format and description trap if greater than or equal immediate tgei rs, immediate compares the contents of register rs with 16-bit sign-extended immediate as signed integer. if rs contents are greater than the immediate, generates an exception. trap if greater than or equal immediate unsigned tgeiu rs, immediate compares the contents of register rs with 16-bit zero-extended immediate as unsigned integer. if rs contents are greater than the immediate, generates an exception. trap if less than immediate tlti rs, immediate compares the contents of register rs with 16-bit sign-extended immediate as signed integer. if rs contents are less than the immediate, generates an exception. trap if less than immediate unsigned tltiu rs, immediate compares the contents of register rs with 16-bit zero-extended immediate as unsigned integer. if rs contents are less than the immediate, generates an exception. trap if equal immediate teqi rs, immediate generates an exception if the contents of register rs are equal to immediate. trap if not equal immediate tnei rs, immediate generates an exception if the contents of register rs are equal to immediate. regimm rs sub immediate rd funct user? manual u10504ej7v0um00 83 cpu instruction set summary 3.2.5 coprocessor instructions the coprocessor instructions are used to operate each coprocessor. the coprocessor load and store instructions are i-type. the format of the operation instruction of each coprocessor differs. table 3-19 shows the coprocessor instructions valid for all the v r series. table 3-20 lists the coprocessor instructions valid only with the v r 4000 which is defined as extended isa. table 3-19 coprocessor instructions (1/2) instruction format and description load word to coprocessor z lwcz rt, offset (base) sign-extends and adds offset to register base to generate an address. loads the contents of the word specified by the address to the general purpose register rt of coprocessor z. store word from coprocessor z swcz rt, offset (base) sign-extends and adds offset to register base to generate an address. stores the contents of the general purpose register rt of coprocessor z to the memory position specified by the address. instruction format and description move to coprocessor z mtcz rt, rd transfers the contents of cpu register rt to the general purpose register rd of coprocessor z. move from coprocessor z mfcz rt, rd transfers the contents of the general purpose register rd of coprocessor z to cpu register rt. move control to coprocessor z ctcz rt, rd transfers the contents of cpu register rt to the coprocessor control register rd of coprocessor z. move control from coprocessor z cfcz rt, rd transfers the contents of the coprocessor control register rd of coprocessor z to cpu register rt. instruction format and description coprocessor z operation copz cofun coprocessor z executes an operation defined for each coprocessor. the status of the cpu is not changed by the operation of the coprocessor. op base rt offset rd funct copz sub rt 0 rd funct copz co rt sa rd cofun chapter 3 84 user? manual u10504ej7v0um00 table 3-19 coprocessor instructions (2/2) table 3-20 coprocessor instructions (extended isa) (1/2) instruction format and description branch on coprocessor z true bczt offset shifts the 16-bit offset 2 bits to the left and sign-extends it to 32 bits. adds the result to the address of the instruction in the delay slot to calculate the branch address. if the condition signal of coprocessor z is true, branches to the branch address, delayed by one instruction. branch on coprocessor z false bczf offset shifts the 16-bit offset 2 bits to the left and sign-extends it to 32 bits. adds the result to the address of the instruction in the delay slot to calculate the branch address. if the condition signal of coprocessor z is false, branches to the branch address, delayed by one instruction. instruction format and description doubleword move to coprocessor z dmtcz rt, rd transfers the contents of the general purpose register rt of the cpu to the general purpose register rd of coprocessor z. doubleword move from coprocessor z dmfcz rt, rd transfers the contents of the general purpose register rd of coprocessor z to the general purpose register rt of the cpu. instruction format and description load doubleword to coprocessor z ldcz rt, offset (base) sign-extends and adds offset to register base to generate an address. loads the contents of the doubleword specified by the address to the general purpose register (rt if fr = 1 and rt and rt+1 if fr = 0) of coprocessor z. store doubleword from coprocessor z sdcz rt, offset (base) sign-extends and adds offset to register base to generate an address. stores the contents of the doubleword of the general purpose register (rt if fr = 1 and rt and rt+1 if fr = 0) of coprocessor z to the memory position specified by the address. copz bc br offset rd funct copz sub rt sa rd 0 op base rt offset rd 0 user? manual u10504ej7v0um00 85 cpu instruction set summary table 3-20 coprocessor instructions (extended isa) (2/2) instruction format and description branch on coprocessor z true likely bcztl offset shifts the 16-bit offset 2 bits to the left and sign-extends it. adds the result to the address of the instruction in the delay slot to calculate the branch address. if the condition signal of coprocessor z is true, branches to the branch address, delayed by one instruction. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. branch on coprocessor z false likely bczfl offset shifts the 16-bit offset 2 bits to the left and sign-extends it. adds the result to the address of the instruction in the delay slot to calculate the branch address. if the condition signal of coprocessor z is false, branches to the branch address, delayed by one instruction. if the branch condition is not satisfied, the instruction in the branch delay slot is discarded. copz bc br offset rd funct chapter 3 86 user? manual u10504ej7v0um00 3.2.6 system control coprocessor (cp0) instructions the system control coprocessor (cp0) instructions execute operations to the cp0 register to control the memory of the processor and to perform exception processing. table 3-21 system control coprocessor (cp0) instructions (1/2) instruction format and description move to system control coprocessor mtc0 rt, rd loads the contents of the word of the general purpose register rt of the cpu to the general purpose register rd of cp0. move from system control coprocessor mfc0 rt, rd loads the contents of the word of the general purpose register rd of cp0 to the general purpose register rt of the cpu. doubleword move to system control coprocessor dmtc0 rt, rd loads the contents of the doubleword of the general purpose register rt of the cpu to the general purpose register rd of cp0. doubleword move from system control coprocessor dmfc0 rt, rd loads the contents of the doubleword of the general purpose register rd of cp0 to the general purpose register rt of the cpu. instruction format and description read indexed tlb entry tlbr loads the tlb entry indicated by the index register to the entry hi, entry lo0, entry lo1, and page mask registers. write indexed tlb entry tlbwi loads the contents of the entry hi, entry lo0, entry lo1, and page mask registers to the tlb entry indicated by the index register. write random tlb entry tlbwr loads the contents of the entry hi, entry lo0, entry lo1, and page mask registers to the tlb entry indicated by the random register. probe tlb for matching entry tlbp loads the address of the tlb entry coinciding with the contents of the entry hi register to the index register. return from exception eret returns from an exception, interrupt, or error trap. cop0 sub rt 0 rd funct cop0 co rt sa rd funct user? manual u10504ej7v0um00 87 cpu instruction set summary table 3-21 system control coprocessor (cp0) instructions (2/2) instruction format and description cache operation cache op, offset (base) sign-extends the 16-bit offset to 32 bits and adds it to register base to generate a virtual address. the virtual address is converted into a physical address by using the tlb, and a cache operation indicated by a 5-bit sub op code is executed to that address. cache base op offset rd funct 88 user? manual u10504ej7v0um00 [memo] user?s manual u10504ej7v0um00 89 pipeline 4 this chapter describes the operation of the v r 4300 processor pipeline. chapter 4 90 user?s manual u10504ej7v0um00 4.1 general the v r 4300 uses a 5-stage pipeline. the pipeline is usually controlled by the pipeline clock that is determined by the value of the divmode(1:0) * pins. this pipeline clock is called pclock and one cycle of it is called pcycle. each stage of the pipeline is executed in 1 pcycle. the pcycle has two stages, f 1 and f 2, as shown in figure 4-1. therefore, at least 5 pcycles are required to execute an instruction. if the necessary data is not in the cache and must be fetched from the main memory, more cycles are necessary. when the pipeline flows smoothly, five instructions are executed simultaneously. * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). figure 4-1 pipeline stages the five pipeline stages are: ic - instruction cache fetch rf - register fetch ex - execution dc - data cache fetch wb - write back ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 pcycle masterclock cycle user?s manual u10504ej7v0um00 91 pipeline figure 4-2 outlines the pipeline. the horizontal rows in this figure indicate the execution processes of instructions, and the vertical columns indicate the five processes executed at the same time. figure 4-2 instruction execution in the pipeline pcycle (5-deep) current cpu cycle ic rf ex dc ic rf ex wb ic rf dc wb ic ex dc wb rf ex dc wb wb dc ex rf ic chapter 4 92 user?s manual u10504ej7v0um00 4.1.1 pipeline operations figure 4-3 shows the operations that can occur during each pipeline stage; table 4-1 describes these pipeline activities. figure 4-3 pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 pcycle instr fetch computational load/store branch itc icf itlb rfr idec bcmp dva dcr dtc dtlb rfw dcw iva alu la user?s manual u10504ej7v0um00 93 pipeline table 4-1 description of pipeline showing stage in which operations commence cycle begins during this phase mnemonic descriptions ic f 1 ?? f 2 icf instruction cache fetch itlb instruction micro-tlb read rf f 1 itc instruction cache tag check f 2 rfr register file read idec instruction decode iva instruction virtual address calculation ex f 1 bcmp branch compare alu arithmetic logic operation dva data virtual address calculation dc f 1 dcr data cache read dtlb data joint-tlb read f 2 la load data alignment dtc data cache tag check wb f 1 dcw data cache write rfw register file write f 2 ?? chapter 4 94 user?s manual u10504ej7v0um00 4.2 branch delay the pipeline of the v r 4300 generates a branch delay of one cycle in the following cases: when a target address is calculated with a jump instruction when the branch condition of a branch instruction is satis?d and a target address is calculated the instruction address generated in the ex stage of a jump/branch instruction cannot be used until the ic stage of the instruction to be executed after the next instruction. figure 4-4 illustrates the branch delay and the location of the branch delay slot. figure 4-4 branch delay branch target branch delay single branch delay instruction ic rf ex wb ic rf dc wb ic ex dc wb dc ex rf (branch delay slot) user?s manual u10504ej7v0um00 95 pipeline 4.3 load delay a load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction . the instruction slot immediately following this delayed load instruction is referred to as the load delay slot . in the v r 4300 processor, the instruction immediately following a load instruction can use the contents of the loaded register, however in such cases hardware interlocks insert additional delay cycles. consequently, scheduling load delay slots can be desirable, both for performance and v r -series processor compatibility. 4.4 pipeline operation the operation of the pipeline is illustrated by the following examples that describe how typical instructions are executed. the instructions described are: add, jalr, beq, tlt, lw, and sw. each instruction is taken through the pipeline and the operations that occur in each relevant stage are described. floating-point instructions are executed in the pipeline in the same manner as multicycle integer instructions. chapter 4 96 user?s manual u10504ej7v0um00 add instruction add rd,rs,rt ic stage in phase 2 of the ic stage, the fourteen low-order bits of the virtual address are used to address the instruction cache. the two high-order bits of this virtual address select one of four instruction cache banks, and the remaining bits address the selected bank. the itlb selects the page. rf stage in phase 1 of the rf stage, the cache index is compared with the page frame number from the itlb and the cache data is read out. the cache hit/miss signal is valid late in phase 1 of the rf stage, and the virtual pc is incremented by 4 so that the next instruction can be fetched. during phase 2, the rs and rt ?lds of the 2-port register ?e are accessed and the register data is valid at the register ?e output. at the same time, bypass multiplexers select inputs from either the ex- or dc-stage output in addition to the register ?e output, depending on the need for an operand bypass. ex stage the alu controls are set to do an a+b operation. the operands ?w into the alu inputs, and the alu operation is started. the result of the alu operation is latched into the alu output latch during phase 2. dc stage this stage is a nop for this instruction. the data from the output of the ex stage (the alu) is moved into the output latch of the dc. wb stage during phase 1, the wb latch feeds the data to the inputs of the register ?e, which is addressed by the rd ?ld. the ?e write strobe is enabled. by the end of phase 1, the data is written into the register ?e. user?s manual u10504ej7v0um00 97 pipeline figure 4-5 add instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf rfr idec rfw alu itlb chapter 4 98 user?s manual u10504ej7v0um00 jump and link register instruction jalr rd,rs ic stage same as the ic stage for the add instruction . rf stage during phase 2 of the rf stage, the register addressed by the rs ?ld is read out of the ?e. ex stage during phase 1 of the ex stage, the value of register rs is clocked into the virtual pc latch. this value is used in phase 2 to fetch the next instruction. the value of the virtual pc incremented during the rf stage is incremented again to produce the link address pc+8 where pc is the address of the jalr instruction. the resulting value is the pc to which the program will eventually return from the jump destination. this value is placed in the link output latch of the instruction address unit. dc stage the pc+8 value is moved from the link output latch to the output latch of the dc pipeline stage. wb stage refer to the add instruction. note that if no value is explicitly provided for rd then register 31 is used as the default. if rd is explicitly speci?d, it cannot be the same register addressed by rs ; if it is, the result of executing such an instruction is unde?ed. figure 4-6 jump and link register instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf itlb rfr idec rfw iva alu user?s manual u10504ej7v0um00 99 pipeline branch on equal instruction beq rs,rt,offset ic stage same as the ic stage for the add instruction. rf stage during phase 2, the register ?e is addressed with the rs and rt ?lds and the contents of these registers are placed in the register ?e output latch. ex stage during phase 1, a check is performed to determine if each corresponding bit position of these two operands has equal values. if they are equal, the pc is set to pc+target , where target is the sign-extended offset ?ld. if they are not equal, the pc is set to pc+4 . the next pc resulting from the branch comparison is valid at the beginning of phase 2 for instruction fetch. dc stage this stage is a nop for this instruction. wb stage this stage is a nop for this instruction. figure 4-7 branch on equal instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf itlb rfr idec bcmp iva chapter 4 100 user?s manual u10504ej7v0um00 trap if less than instruction tlt rs,rt ic stage same as the ic stage for the add instruction. rf stage same as the rf stage for the add instruction. ex stage during the phase 1, the bypass multiplexers select inputs from the rf-, ex- or dc-stage output latch, depending on the need for an operand bypass. alu controls are set to do an a ?b operation. the operands ?w into the alu inputs, and the alu operation is started. the result of the alu operation is latched into the alu output latch during phase 2. dc stage the sign bits of operands and of the alu output latch are checked to determine if a less than condition is true. if this condition is true, a trap exception occurs. this, as with all pipeline exceptions, implies a 2-cycle stall. the pc register is loaded with the value of the exception vector and instructions following in previous pipeline stages are killed. wb stage the exception code is set in the excode ?ld in the cause register if the less than condition was met in the dc stage. the pc value of this instruction is stored in the epc register and bd bit are updated appropriately according to the contents of the exl bit of the status register. if the less than condition was not met in the dc stage, no activity occurs in the wb stage. figure 4-8 trap if less than instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf itlb rfr idec rfw alu iva user?s manual u10504ej7v0um00 101 pipeline load word instruction lw rt,offset(base) ic stage same as the ic stage for the add instruction. rf stage same as the rf stage for the add instruction. note that the base ?ld is in the same position as the rs ?ld. ex stage refer to the ex stage for the add instruction. for lw, the inputs to the alu come from gpr[base] through the bypass multiplexer and from the sign-extended offset ?ld. the result of the alu operation that is latched into the alu output latch in phase 2 represents the effective virtual address of the operand (dva). dc stage the data cache is accessed in parallel with the tlb, and the cache tag ?ld is compared with the page frame number (pfn) ?ld of the tlb entry. after passing through the load aligner, aligned data is placed in the dc output latch during phase 2. wb stage during phase 1, the cache read data is written into the ?e addressed by the rt ?ld. figure 4-9 load word instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf itlb rfr idec dva dcr dtc dtlb rfw la chapter 4 102 user?s manual u10504ej7v0um00 store word instruction sw rt,offset(base) ic stage same as the ic stage for the add instruction. rf stage same as the rf stage for the lw instruction. ex stage refer to the lw instruction for a calculation of the effective address. from the rf output latch the gpr[rt] is sent through the bypass multiplexer and into the main shifter, where the shifter performs the byte-alignment operation for the operand. the results of the alu and the shift operations are latched in the output latches during phase 2. dc stage refer to the lw instruction for a description of the cache access. additionally, the merged data from the load aligner is moved into the store data output latch during phase 2. wb stage if there was a cache hit, the content of the store data output latch is written into the data cache at the appropriate word location. note that all store instructions use the data cache for two consecutive pcycles. if the following instruction requires use of the data cache, the pipeline is stalled for one pcycle to complete the writing of an aligned store data. figure 4-10 store word instruction pipeline operations ic rf ex dc wb f 1 f 2 pclock phase cycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 itc icf itlb rfr idec dva dcr dtc dtlb la dcw user?s manual u10504ej7v0um00 103 pipeline 4.5 interlock and exception handling smooth pipeline flow is interrupted when cache misses or exceptions occur, or when data dependencies are detected. interruptions handled using hardware, such as cache misses, are referred to as interlocks , while those that are handled using software are called exceptions . as shown in figure 4-11, all interlock and exception conditions are collectively referred to as faults. figure 4-11 interlocks, exceptions, and faults at each cycle, exception and interlock conditions are checked for all active instructions. because each exception or interlock condition corresponds to a particular pipeline stage, a condition can be traced back to the particular instruction in the exception/ interlock stage, as shown in figure 4-12. for instance, an ldi interlock is raised in the execution (ex) stage. tables 4-2 and 4-3 describe the pipeline interlocks and exceptions listed in figure 4-12. hardware exceptions software interlocks faults stalls abort chapter 4 104 user?s manual u10504ej7v0um00 remark the conditions of the exceptions are shown starting from the exception with the highest priority. figure 4-12 correspondence of pipeline stage to interlock and exception condition state pipeline stage ic rf ex dc wb interlock itm ldi dcm cp0i icb mci dcb cop exceptions iade sysc rst itlb brpt nmi ibe cpu ovfl rsvd trap fpe dade dtlb wat intr dbe clock pcycle f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 f 1 f 2 user?s manual u10504ej7v0um00 105 pipeline table 4-2 description of pipeline exceptions table 4-3 description of pipeline interlocks exception description iade instruction address error exception itlb instruction tlb exception ibe instruction bus error exception sysc syscall instruction exception brpt breakpoint instruction exception cpu coprocessor unusable exception rsvd reserved instruction exception rst external reset exception nmi external nmi exception ovfl integer overflow exception trap trap instruction exception fpe floating-point exception dade data address error exception dtlb data tlb exception wat reference to watch address exception intr interrupt exception dbe data bus error exception interlock description itm instruction tlb miss icb instruction cache busy ldi load interlock mci multi-cycle interlock dcm data cache miss dcb data cache busy cop cache op cp0i cp0 bypass interlock chapter 4 106 user?s manual u10504ej7v0um00 4.6 pipeline interlocks and exceptions when an interlock or exception condition arises, pipeline flow is interrupted. depending upon whether the condition is an interlock or an exception, one of the following occurs: if an interlock condition arises, the pipeline remains stalled until the interlock is corrected by hardware. if an exception occurs, the exception-causing instruction and all pipelines that follow are aborted , the exception is resolved by software, and the pipeline restarted and reloaded. pipeline interlocks and pipeline exceptions are described in the following section. the exceptions themselves are described in chapter 6 exception processing . bypassing, which allows data and conditions produced in the ex, dc and wb stages of the pipeline to be made available to the ex stage of the next cycle, is also described in this section. 4.6.1 pipeline interlocks when an interlock condition occurs, the pipeline stalls and remains stalled until the interlock is corrected. should pipeline stall requests from different stages arise simultaneously, the pipeline control unit prioritizes the stall requests. for instance, a stall request from the dc stage is always allowed to be resolved before a simultaneous rf-stage stall request, since both may require the same resource (tlb, memory) to be resolved. the ex stage is allowed to stall in order to complete a multicycle instruction as long as there is no load dependency between itself (the ex stage) and the dc stage. interlock conditions for each pipeline stage are shown in figure 4-12 and described in table 4-3. the remainder of this section describes in detail the following pipeline interlocks: instruction tlb miss (itm) instruction cache busy (icb) load interlock (ldi) multicycle instruction interlock (mci) data cache miss (dcm) data cache busy (dcb) cache operation (cop) cp0 bypass interlock (cp0i) user?s manual u10504ej7v0um00 107 pipeline 4.6.2 instruction tlb miss (itm) a pipeline stall due to an instruction tlb miss occurs when the virtual address of the next instruction to be fetched is not found in the instruction micro-tlb (itlb). the pipeline stalls when the micro-tlb miss is detected in the rf stage, whereupon the pipeline controller notifies the micro-tlb to proceed in servicing the stall. the pipeline starts running again when the micro-tlb has been updated from the jtlb. a miss penalty of 3 pcycles is incurred when the micro-tlb is updated from the jtlb. if the virtual address also misses in the jtlb, an exception is taken which overrides the stall to allow the handler to update the jtlb. once the update is completed, the instruction fetch is re-executed. this initiates a repeat of the itm stall until the micro-tlb is updated from the jtlb, which was just updated by the exception handler. figure 4-13 instruction tlb miss interlock ic rf rf rf rf ex dc wb itm itlb miss access jtlb itm itlb update run run stall stall run run run run run run stall ic ic ic ic rf ex dc wb ic rf ex dc wb ic rf ex dc wb chapter 4 108 user?s manual u10504ej7v0um00 4.6.3 instruction cache busy (icb) a pipeline stall due to an instruction cache busy interlock occurs when the next instruction is not found in the instruction cache, and the cache cannot service the instruction fetch. the pipeline stalls when the instruction cache miss is detected in the rf stage. after detecting the stall, the pipeline controller notifies the instruction cache to proceed in servicing the stall. the pipeline begins running again after the entire cache line has been written into the instruction cache. when the instruction cache is busy with a cache instruction and the instruction fetch cannot be serviced, a cache operation (cop) interlock is taken, not icb. figure 4-14 example of an instruction cache busy interlock rf rf ex dc wb ic rf ? ? icb i-cache miss refill i-cache icb ? ? i-cache update run run stall stall run run run run run run ic ic ic rf ex dc wb ic rf ex dc wb ic rf ex dc wb user?s manual u10504ej7v0um00 109 pipeline 4.6.4 multicycle instruction interlock (mci) a pipeline stall due to a multicycle interlock occurs when an instruction with an execution latency of more than one pipeline clock enters the ex stage. the pipeline begins running again during the multicycle instruction? last clock of operation in the ex stage. figure 4-15 example of a multicycle instruction interlock ?? run stall ic rf ex ex run stall run run run run ?? mult a,b mci ex ex dc wb mci ic rf rf read multhi ic ic read multlo multiple cycle instruction stall rf rf ex dc ic ic rf ex ? ? ? ? ? ? chapter 4 110 user?s manual u10504ej7v0um00 4.6.5 load interlock (ldi) a pipeline stall due to a load interlock occurs when data fetched by a load instruction is required by the next immediate instruction. the pipeline stalls when the load-use instruction (the instruction using the load data), enters the ex stage. the pipeline begins running again when the clock after the target of the load is read from the data cache (in the dc stage of the ?oad b?instruction in figure 4- 16). the load interlock is normally only active for one pclock cycle when the load instruction is in the dc stage and the load-use instruction is in the ex stage. the data returned from the data cache at the end of the dc stage is input into the ex stage, using the bypass multiplexers. if the data cache misses, the data cache busy interlock extends the stall until the data cache has been updated with the missing data. the ldi is still active during this time and extends the stall one clock beyond the data cache interlock while the data is bypassed from the data cache into the ex stage. this case is illustrated in figure 4-17. figure 4-16 example of a load interlock ldi ldi detected load a i-cache run bypass run stall run run run run ic rf ex dc wb wb run run run load b i-cache ic rf ex dc dc wb i-cache ic rf ex ex dc wb add a,b ldi i-cache ic rf rf ex dc wb i-cache ic ic rf ex dc wb user?s manual u10504ej7v0um00 111 pipeline 4.6.6 data cache miss (dcm) if a data cache miss occurs in the dc stage, the pipeline stalls for 1 pcycle in which the miss is detected. the pipeline stalls regardless of whether the load or store instruction is executed. the data cache busy (explained next) continues stalling until a new cache line is read. when a requested word data has been read from the cache, the pipeline begins running again. figure 4-17 illustrates dcm. 4.6.7 data cache busy (dcb) a pipeline stall due to the data cache being busy can occur in the following two situations: if the instruction immediately after a store instruction requires use of the data cache then the pipeline is stalled in its dc stage while the store writes the data to the cache during its wb stage. on a cache store hit the pipeline only stalls for one pclock while the data is written to the data cache. on a cache store miss the pipeline stalls with the store in the dc stage until the cache line has been updated. once the line has been updated, the pipeline restarts and moves the store instruction into the wb stage. if the instruction following the ?tore?(i.e. the instruction currently in the dc stage) also requires access to the data cache, the pipeline will then stall for one pcycle while the store data is being written to the cache. when a miss occurs on a load, the data cache signals it is busy while it fetches the missed data word from external memory. refer to figure 4-17 . the pipeline begins running again on a load when the missed data word is available from the data cache. chapter 4 112 user?s manual u10504ej7v0um00 figure 4-17 example of a data cache miss followed by a load interlock 4.6.8 cache operation (cop) a pipeline stall due to a cache operation can occur in the following two situations: when an instruction cache operation instruction enters the dc stage, the instruction cache operation continues to be serviced while the pipeline stalls. the pipeline begins running again when the instruction cache operation is complete, allowing the next instruction fetch to proceed. when the data cache operation instruction requiring an operation of 2 pcycles of the data cache has entered the dc stage. ldi ldi detected load a i-cache run bypass run run stall run run ic rf ex dc dc wb run run stall i-cache ic rf ex ex dc wb add a,b ldi ?? stall ldi detected load c d-cache miss ic rf ic ic rf ex ex ex ex dc wb wb wb dc dc dc wb rf rf rf ex bypass dcb ic rf rf ex dc dcm dcm dcb d-cache ?????? ?? ?? d-cache update ?? ?? ldi ldi dc miss user?s manual u10504ej7v0um00 113 pipeline 4.6.9 coprocessor 0 bypass interlock (cp0i) a pipeline stall due to a cp0 bypass interlock occurs when an instruction which caused an exception reaches the wb stage and the subsequent instruction in the dc stage requests a read of any cp0 register. this interlock causes a pipeline stall for one pcycle to allow the cp0 register to be written in the wb stage before allowing any cp0 register to be read in the dc stage. figure 4-18 example of a coprocessor 0 bypass interlock (cp0i) rf ex dc wb cp01 cp01 run run run stall run run run run ic rf ex dc dc wb ic ex ex dc wb run ic rf ic ex rf dc wb rf instruction which causes exception load lo wb stage completes in first phase of stage chapter 4 114 user?s manual u10504ej7v0um00 4.7 pipeline exceptions when a pipeline exception condition occurs, the pipeline stalls for 2 pcycles and the instruction causing the exception as well as all those that follow it in the pipeline are aborted. accordingly, any stall conditions and any later exception conditions from any aborted instruction are inhibited; there is no benefit in servicing stalls for an aborted instruction. after aborting the instructions, an execution starts at a predefined exception vector. system control coprocessor (cp0) registers are loaded with information that identifies the type of exception as well as auxiliary information such as the virtual address at which translation exceptions occur. exception conditions for each pipeline stage are shown in figure 4-12 and described in table 4-2. exceptions can split into two groups: those that occur independently of instruction execution (reset, nmi, and interrupt exceptions) those exceptions that result from the execution of a particular instruction (an instruction-dependent exception). this category includes all other exceptions. exceptions are logically precise. 4.7.1 instruction-independent exceptions (reset, nmi, and interrupt) reset, nmi and interrupt exceptions are identified and processed as follows: reset exception has the highest priority of all the possible exceptions; when a reset exception is asserted, instructions in all pipeline stages except the wb are aborted regardless of any interlocks or other exceptions that may be active. nmi and interrupt exception requests are accepted only if the previous pcycle was a run cycle. when an nmi or interrupt exception occurs, all pipeline stages except the wb are aborted. user?s manual u10504ej7v0um00 115 pipeline 4.7.2 instruction-dependent exceptions prioritizing between instruction-dependent exceptions and interlocks is made according to these rules: an exception request from a particular pipeline stage is only processed if no stall condition from a later pipeline stage is active. an exception request from a later pipeline stage always has a higher priority than an exception from an earlier pipeline stage. an exception request from a pipeline stage always has higher priority than any stall request from the same or earlier pipeline stages. 4.7.3 interactions between interlocks and exceptions with the v r 4300, the processing of the ex and rf stages can be continued while the pipeline stalls. the interaction between interlocking of the two stages and exceptions is relatively simple. interaction between ex and rf stages the ex exception occurs only when an instruction that causes the ex exception has entered a pipeline stage. because the rf interlock solving processing has not yet been started at this time, the ex exception takes precedence because of the stall request from the rf stage. interactions in various cases are described next. when ex exception is stalled by dc interlock the ex exception takes precedence over the rf stall request. this is because the rf interlock is not solved during the dc stall period. if instruction cache busy and multi-cycle instruction interlock take place simultaneously both the rf and ex stages solve the respective interlocks. the cause that has generated a ?ating-point exception is detected before the instruction cache busy (icb) stall ends, but the exception occurs after execution has entered the dc stage. therefore, the exception condition is retained in the ex stage until the rf interlock is solved, and the related stage is deleted. if exception from ex stage and rf interlock take place simultaneously the ex exception takes precedence. this is because the instruction that has caused the rf interlock is canceled and no request is issued to the external memory. chapter 4 116 user?s manual u10504ej7v0um00 interaction between rf and dc stages if a stall request is made at the same time in the rf and dc stages, the pipeline controller gives the priority to the processing of the dc stage. in other words, the rf stall processing is started after the dc stall has been solved. this is because the same resources (such as the system interface and tlb) are necessary for solving the rf interlock and dc interlock. 4.7.4 exception and interlock priorities the priority for processing exceptions and interlocks within the same clock cycle is listed below. exception and interlock requests from the wb stage always have priority over exception and interlock requests from the dc stage. exception and interlock requests from the dc stage always have priority over exception and interlock requests from the ex stage. ex-stage exception and interlock requests in turn always have priority over any exception and interlock requests from the rf stage. figure 4-19 execution and interlock priorities in the case of multiple exception requests from the same pipeline stage, the highest-priority exception is processed first. the priority of the instruction- dependent exceptions and interlocks are shown in the following sections. current cpu cycle ic rf ex dc ic rf ex ic rf ic wb dc ex rf ic higher priority: lower user?s manual u10504ej7v0um00 117 pipeline 4.7.5 wb-stage interlock and exception priorities because there is only the following one exception or interlock in the wb stage, there is no priority. cp0 bypass interlock 4.7.6 dc-stage interlock and exception priorities following is a prioritized list of the exceptions and interlocks processed in the dc pipeline stage. reset exception (highest) nmi exception integer over?w exception trap exception floating-point exception data address error exception data tlb miss exception data tlb invalid exception data tlb modi?ation exception watch exception interrupt exception data cache miss interlock data cache busy interlock cache op interlock data bus error exception chapter 4 118 user?s manual u10504ej7v0um00 4.7.7 ex-stage interlock and exception priorities following is a prioritized list of the exceptions and interlocks processed in the ex stage. system call exception breakpoint exception coprocessor unusable exception reserved instruction exception load interlock multicycle instruction interlock 4.7.8 rf-stage interlock and exception priorities following is a prioritized list of the exceptions and interlocks processed in the rf pipeline stage. instruction address error exception instruction tlb miss exception instruction tlb invalid exception instruction tlb miss interlock instruction cache busy interlock instruction bus error exception if an instruction bus error exception occurs during a cache refill, while an instruction cache busy interlock is active, the instruction cache only signals the exception to the pipeline controller after the cache refill is complete, and therefore no stall is active. individual exceptions are described in detail in chapter 6 exception processing . user?s manual u10504ej7v0um00 119 pipeline 4.7.9 bypassing in some cases, data and conditions produced in the ex, dc and wb stages of the pipeline are made available to the ex stage (only) through the bypass datapath. operand bypass allows an instruction in the ex stage to continue without having to wait for data or conditions to be written to the register file at the end of the wb stage. instead, the bypass control unit ensures data and conditions from later pipeline stages are available at the appropriate time for instructions earlier in the pipeline. the bypass control unit also controls the source and destination register addresses supplied from the register file. 4.8 code compatibility the v r 4300 can execute any programs which can be executed on the v r 3000 series and v r 4000 series*, but the reverse may not necessarily be true. standard mips compilers produce code which will run on both. when hand-coding assembly code, it is strongly advised to maintain compatibility with the v r series. for more information, refer to the each product? user? manuals. * the instruction set on the v r 4100 differs partially from the other products. (for example, fpu instructions are not supported.) chapter 4 120 user?s manual u10504ej7v0um00 4.9 write buffer the v r 4300 processor contains an on-chip write buffer, used as a temporary data storage for outgoing data. the write buffer stores one doubleword (8 bytes) of data for each pcycle, and can buffer a total of eight words (32 bytes) of data, equal to the data cache line size. when storing data, therefore, all the data lengths can be used. the write buffer can store any data as long as it has a vacancy. the format of the write buffer is shown below. figure 4-20 write buffer format the write buffer can store the following: four 32-bit physical addresses 4-bit size area indicating four types of transfer data size data up to 4 doublewords during an uncached store operation, data is held in this buffer until it can be retrieved by the external interface. the processor pipeline continues to execute while data is stored in the write buffer. during either a load miss or a store miss to a cache line in the dirty state (refer to chapter 11 cache memory for a description of cache line states), dirty data is stored in this buffer until the requested data is returned from the external interface. the processor pipeline continues to run while the write buffer waits (for a response from the external interface) to empty its contents to the external interface/memory. if the processor executes a load or store instruction requiring external resources when the write buffer is full, the pipeline is stalled until the write buffer has a space for the data to be stored. 4 size physical address data size physical address data size physical address data size physical address data 32 64 user? manual u10504ej7v0um00 121 memory management system 5 the v r 4300 processor provides a full-featured memory management unit (mmu) which uses an on-chip translation lookaside buffer (tlb) to translate virtual addresses into physical addresses. this chapter describes the operation of the tlb, those system control coprocessor (cp0) registers that provide the software interface to the tlb and the memory mapping method that translates the virtual address to the physical address. chapter 5 122 user? manual u10504ej7v0um00 5.1 translation lookaside buffer (tlb) a virtual address is converted into a physical address by using the internal tlb * . the internal tlb is a full-associative memory having 32 entries, and one entry is mapped with an odd and even numbers in pairs. the size of these pages can be 4k, 16k, 64k, 256k, 1m, 4m, or 16m, and can be specified for each entry. when a virtual address is given, each tlb entry checks the 32 entries whether the virtual address coincides with the virtual address appended with the asid area stored to the entry hi register. if the addresses coincide (if a hit occurs), a physical address is generated from the physical address in the tlb and an offset. if the addresses do not coincide (if a miss occurs), an exception occurs, and the tlb entry is written by software from a page table on the memory. the software either writes the tlb entry over the entry selected by the index register, or writes it to a random entry indicated by the random register. if there are two or more tlb entries that coincide, the tlb operation is not correctly executed. in this case, the tlb-shutdown (ts) bit of the status register is set to 1, and then the tlb cannot be used. 5.2 memory management system architecture the memory management system expands the address space of the cpu by converting a large virtual memory space into physical addresses. the physical address space of the v r 4300 is 4 gb with 32-bit addresses used. a virtual address is 32 bits wide in the case of the 32-bit mode, and the maximum user area is 2 gb (2 31 ). in the case of the 64-bit mode, the address is 64 bits wide, and the maximum user area is 1 tb (2 40 ). for the tlb entry format in each mode, refer to 5.3.1 . the virtual address is expanded by the address space id (asid) (refer to figures 5-2 and 5-3 ). asid decreases the number of times of tlb flash when the context is switched. the asid area is 8 bits wide and is in the entry hi register of cp0. the global bit (g) is in the entry lo0 and entry lo1 registers. * there are virtual-to-physical address translations that occur outside of the tlb. for example, addresses in the kseg0 and kseg1 spaces are unmapped translations. in these spaces the physical address is derived by subtracting the base address of the space from the virtual address. user? manual u10504ej7v0um00 123 memory management system figure 5-1 overview of a virtual-to-physical address translation 1. virtual address (va) represented by the vir- tual page number (vpn, high-order bit of the address) is compared with indicated area in tlb. virtual address 2. if there is a match, the page frame number (pfn) representing the high-order bits of the physical address (pa) is output from the tlb. vpn asid g vpn asid pfn tlb physical address pfn offset offset tlb 3. the offset, which does not pass through the tlb, is then concatenated to the pfn. entry chapter 5 124 user? manual u10504ej7v0um00 virtual-to-physical address translation converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual addresses in the tlb; there is a match when the virtual page number (vpn) of the address is the same as the vpn field of the entry, and either: the global ( g ) bit of the tlb entry is set, or the asid ?ld of the virtual address is the same as the asid ?ld of the tlb entry. this match is referred to as a tlb hit . if there is no match, a tlb miss exception is taken by the processor and software is allowed to reference a page table of virtual/physical addresses in memory and to write its contents to the tlb. if there is a virtual address match in the tlb, the physical address is output from the tlb and concatenated with the offset , which represents an address within the page frame space. the offset does not pass through the tlb. the lower bits of the virtual address are output as is. for details, refer to 5.4.9 virtual-to-physical address translation process . the next two sections describe the 32-bit and 64-bit address translations. user? manual u10504ej7v0um00 125 memory management system 32-bit mode address translation figure 5-2 shows the virtual-to-physical-address translation of a 32-bit mode address. this figure illustrates the two of seven possible page sizes: a 4 kb page (12 bits) and a 16 mb page (24 bits). the top portion of figure 5-2 shows a virtual address with a 12-bit, or 4 kb, page size, labelled offset . the remaining 20 bits of the address excluding asid represent the vpn, and index the 1m-entry page table. the bottom portion of figure 5-2 shows a virtual address with a 24- bit, or 16 mb, page size, labelled offset . the remaining 8 bits of the address excluding asid represent the vpn, and index the 256-entry page table. figure 5-2 32-bit mode virtual address translation 28 11 0 20 12 29 31 vpn offset 32 39 asid 8 23 0 8 24 offset 39 virtual address with 256 (2 8 )16 mb pages 8 bits = 256 pages 20 bits = 1m pages 12 asid 8 28 29 31 32 vpn 24 virtual-to-physical translation in tlb bits 31, 30 and 29 of the virtual address select user, supervisor, or kernel address spaces. virtual-to-physical translation in tlb tlb tlb 31 0 pfn offset 32-bit physical address virtual address with 1m (2 20 ) 4 kb pages offset passed unchanged to physical memory offset passed unchanged to physical memory chapter 5 126 user? manual u10504ej7v0um00 64-bit mode address translation figure 5-3 shows the virtual-to-physical-address translation of a 64-bit mode address. this figure illustrates the two of seven possible page sizes: a 4 kb page (12 bits) and a 16 mb page (24 bits). the top portion of figure 5-3 shows a virtual address with a 12-bit, or 4 kb, page size, labelled offset . the remaining 28 bits of the address excluding asid represent the vpn, and index the 256m- entry page table. the bottom portion of figure 5-3 shows a virtual address with a 24- bit, or 16 mb, page size, labelled offset . the remaining 16 bits of the address excluding asid represent the vpn, and index the 64k- entry page table. figure 5-3 64-bit mode virtual address translation 11 0 12 63 vpn offset 64 71 asid 8 virtual address with 256m (2 28 ) 4 kb pages 23 0 22 24 offset virtual address with 64k (2 16 )16 mb pages 16 bits = 64k pages 28 bits = 256m pages 12 asid vpn 61 62 40 39 28 0 or -1 63 64 71 61 62 40 24 8 39 16 22 0 or -1 virtual-to-physical translation in tlb bits 62 and 63 of the virtual address select user, supervisor, or kernel address spaces. virtual-to-physical translation in tlb tlb 31 0 pfn offset tlb 32-bit physical address 2 2 offset passed unchanged to physical memory offset passed unchanged to physical memory user? manual u10504ej7v0um00 127 memory management system 5.2.1 operating modes the processor has three operating modes that function in both 32- and 64-bit operations: user mode supervisor mode kernel mode the user mode and kernel mode are common to all the v r series members. generally, the operating system is executed in the kernel mode, and the application program is executed in the user mode. the v r 4000 series is provided with a third mode. this mode, called the supervisor mode, is intermediate between the user and kernel modes, and is used to organize a high security system. if an exception occurs, the cpu enters the kernel mode, and remains in this mode until an exception return instruction (eret) is executed. the eret instruction restores the mode in which the processor was operating before the occurrence of the exception. 5.2.2 virtual addressing in user mode in the single-user mode, a virtual address space (useg) of 2 gb (2 31 bytes) can be used in the 32-bit mode, and a 1 tb (2 40 bytes) virtual address space (xuseg) can be used in the 64-bit mode. as shown in figures 5-2 and 5-3, each virtual address is expanded to a separate virtual address by an 8-bit address space id (asid) for up to 256 user processes. the system allocates each process with an asid to retain the contents of the tlb even when it has switched the context. useg and xuseg are referenced via tlb. whether the cache can be used or not is determined for each page by the tlb entry (the c bit of the tlb entry determines whether the cache can be used). the user segment starts from address 0 and the currently valid user process resides in useg (in the 32-bit mode) or xuseg (in the 64-bit mode). the v r 4300 operates in the user mode when the values of the bits in the status register is as follows: ksu bits = 10 exl = 0 erl = 0 in conjunction with these bits, the ux bit in the status register selects between 32- or 64-bit user mode addressing as follows: chapter 5 128 user? manual u10504ej7v0um00 ux = 0: selects 32-bit useg tlb miss is processed by a 32-bit tlb miss exception handler. ux = 1: selects 64-bit xuseg tlb miss is processed by a 64-bit xtlb miss exception handler. table 5-1 lists the characteristics of the two user mode segments, useg and xuseg . * the v r 4300 internally uses 64-bit addresses. in the kernel mode, the pro- cessor saves and restores each register to initialize the register before switching the context. a 32-bit value is used as an address, with bit 31 sign-extended to bits 32 through 63, in the 32-bit mode. usually, the program in the 32-bit mode does not generate invalid address- es. if the context is switched and the processor enters the kernel mode, a value other than the 32-bit address previously sign-extended may be stored to a 64-bit register. in this case, the program in the user mode may gener- ate invalid addresses. figure 5-4 user mode virtual address space table 5-1 32-bit and 64-bit user mode segments address bit values status register segment name virtual address range segment size bit values ksu exl erl ux 32-bit a(31) = 0 10 0 0 0 useg 0x0000 0000 through 0x7fff ffff 2 gb (2 31 bytes) 64-bit a(63:40) = 0 10 0 0 1 xuseg 0x0000 0000 0000 0000 through 0x0000 00ff ffff ffff 1 tb (2 40 bytes) useg xuseg address error 1 tb tlb mapped 32-bit* 64-bit 0x ffff ffff ffff ffff 0x 0000 0000 0000 0000 0x ffff ffff 0x 8000 0000 0x 0000 0000 0x 0000 0100 0000 address error tlb mapped 2 gb 0x 7fff ffff 0x 0000 00ff ffff 0000 ffff user? manual u10504ej7v0um00 129 memory management system useg (32-bit mode) when the ux bit of the status register is 0 and the most significant bit of the virtual address is 0, this virtual address space is referred to as useg. if an attempt is made to reference an address whose most significant bit is 1, an address error exception occurs (refer to chapter 6 exception processing ). xuseg (64-bit mode) if the ux bit of the status register is 1 and the bits (63:40) of the virtual address are all 0, the virtual address space is referred to as xuseg. a user address space of 1 tb (2 40 bytes) can be used. if an attempt is made to reference an address that has 1 in bits (63:40), an address error exception occurs (refer to chapter 6 exception processing ). 5.2.3 virtual addressing in supervisor mode the supervisor mode shown in figure 5-5 is intended for hierarchical execution of the operating system. in the kernel mode, the kernel operating system in the highest hierarchy is executed, and the other operating systems are executed in the supervisor mode. referencing suseg, sseg, xsuseg, xsseg, and csseg (i.e., all spaces) is carried out via tlb. whether the cache can be used or not is determined by the tlb entry of each page (the c bit of the tlb entry determines whether the cache can be used). the processor operates in the supervisor mode if the bits of the status register are in the following status: ksu = 01 exl = 0 erl = 0 in addition, the addressing mode in the supervisor mode is determined by the sx bit of the status register. sx = 0: 32-bit supervisor space tlb miss is processed by a 32-bit tlb miss exception handler. sx = 1: 64-bit supervisor space tlb miss is processed by a 64-bit xtlb miss exception handler. table 5-2 shows the features of each segment in the supervisor mode. chapter 5 130 user? manual u10504ej7v0um00 * the v r 4300 internally uses 64-bit addresses. in the 32-bit mode, a 32-bit value with bits 32 through 63 sign-extended is used as an address. normally, the program in the 32-bit mode does not generate an invalid ad- dress. however, there is a possibility that an integer overflow may occur as a result of an operation of base register + offset to calculate an address. the address calculated at this time is invalid, and the result is undefined. two causes of the overflow are cited below. when bit 15 of offset = 0, bit 31 of base register = 0, and bit 31 of ( base register + offset) = 1 when bit 15 of offset = 1, bit 31 of base register = 1, and bit 31 of ( base register + offset) = 0 figure 5-5 supervisor mode address space suseg sseg xsuseg xsseg 32-bit* 64-bit csseg address error 0.5 gb tlb mapped address error 2 gb tlb mapped 0x ffff ffff 0x e000 0000 0x dfff ffff 0x c000 0000 0x bfff ffff 0x 8000 0000 0x 7fff ffff 0x 0000 0000 address error 0.5 gb tlb mapped 0x ffff ffff ffff ffff address error 1 tb tlb mapped address error 1 tb tlb mapped 0x ffff ffff e000 0000 0x ffff ffff dfff ffff 0x ffff ffff c000 0000 0x ffff ffff bfff ffff 0x 4000 0100 0000 0000 0x 4000 00ff ffff ffff 0x 4000 0000 0000 0000 0x 3fff ffff ffff ffff 0x 0000 0100 0000 0000 0x 0000 00ff ffff ffff 0x 0000 0000 0000 0000 user? manual u10504ej7v0um00 131 memory management system table 5-2 32-bit and 64-bit supervisor mode segments 32-bit supervisor mode, user space ( suseg ) in supervisor mode, when sx = 0 in the status register and the most-significant bit of the virtual address is set to 0, the suseg virtual address space is selected; it covers the full 2 31 bytes (2 gb) of the current user address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. 32-bit supervisor mode, supervisor space ( sseg ) in supervisor mode, when sx = 0 in the status register and the three high-order bits of the virtual address are 110, the sseg virtual address space is selected; it covers 2 29 bytes (512 mb) of the current supervisor address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. address bit values status register segment name virtual address range segment size bit values ksu exl erl sx 32-bit a(31) = 0 01 0 0 0 suseg 0x0000 0000 through 0x7fff ffff 2 gb (2 31 bytes) 32-bit a(31:29) = 110 01 0 0 0 sseg 0xc000 0000 through 0xdfff ffff 512 mb (2 29 bytes) 64-bit a(63:62) = 00 01 0 0 1 xsuseg 0x0000 0000 0000 0000 through 0x0000 00ff ffff ffff 1 tb (2 40 bytes) 64-bit a(63:62) = 01 01 0 0 1 xsseg 0x4000 0000 0000 0000 through 0x4000 00ff ffff ffff 1 tb (2 40 bytes) 64-bit a(63:62) = 11 01 0 0 1 csseg 0xffff ffff c000 0000 through 0xffff ffff dfff ffff 512 mb (2 29 bytes) chapter 5 132 user? manual u10504ej7v0um00 64-bit supervisor mode, user space ( xsuseg ) in supervisor mode, when sx = 1 in the status register and bits 63:62 of the virtual address are set to 00, the xsuseg virtual address space is selected; it covers the full 2 40 bytes (1 tb) of the current user address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. 64-bit supervisor mode, current supervisor space ( xsseg ) in supervisor mode, when sx = 1 in the status register and bits 63:62 of the virtual address are set to 01, the xsseg current supervisor virtual address space is selected; it covers the full 2 40 bytes (1 tb) of the current supervisor address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. 64-bit supervisor mode, separate supervisor space ( csseg ) in supervisor mode, when sx = 1 in the status register and bits 63:62 of the virtual address are set to 11, the csseg separate supervisor virtual address space is selected . the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. user? manual u10504ej7v0um00 133 memory management system 5.2.4 virtual addressing in kernel mode the processor operates in kernel mode when the status register contains one or more of the following values: ksu = 00 exl = 1 erl = 1 in conjunction with these bits, the kx bit in the status register selects between 32- or 64-bit kernel mode addressing space: when kx = 0, 32-bit kernel space is selected tlb miss is processed by a 32-bit tlb miss exception handler. when kx = 1, 64-bit kernel space is selected tlb miss is processed by a 64-bit xtlb miss exception handler. the processor enters kernel mode whenever an exception is detected and it remains in kernel mode until an exception return (eret) instruction is executed and results in erl and/or exl = 0. the eret instruction restores the processor to the mode existing prior to the exception. kernel mode virtual address space is divided into regions differentiated by the high-order bits of the virtual address, as shown in figure 5-6. table 5-3 lists the characteristics of the 32-bit kernel mode segments, and table 5-4 lists the characteristics of the 64-bit kernel mode segments. chapter 5 134 user? manual u10504ej7v0um00 * the v r 4300 internally uses 64-bit addresses. in the 32-bit mode, a 32-bit value with bits 32 through 63 sign-extended is used as an address. normally, the program in the 32-bit mode uses 64-bit instructions. how- ever, there is a possibility that an integer overflow may occur as a result of an operation of base register + offset to calculate an address. the address calculated at this time is invalid, and the result is undefined. two causes of the overflow are cited below. when bit 15 of offset = 0, bit 31 of base register = 0, and bit 31 of ( base register + offset) = 1 when bit 15 of offset = 1, bit 31 of base register = 1, and bit 31 of ( base register + offset) = 0 figure 5-6 kernel mode address space kuseg kseg0 kseg1 ksseg kseg3 xkuseg ckseg0 ckseg1 xksseg ckseg3 cksseg xkseg 32-bit* 64-bit xkphys 0x ffff ffff ffff ffff 0x ffff ffff e000 0000 0.5 gb tlb mapped 0.5 gb tlb unmapped uncached 0.5 gb tlb mapped 0.5 gb tlb unmapped cacheable tlb unmapped (for details, refer to figure 5-7 .) address error tlb mapped address error 1 tb tlb mapped address error 1 tb tlb mapped 0x ffff ffff dfff ffff 0x ffff ffff c000 0000 0x ffff ffff bfff ffff 0x ffff ffff a000 0000 0x ffff ffff 9fff ffff 0x ffff ffff 8000 0000 0x ffff ffff 7fff ffff 0x c000 00ff 8000 0000 0x c000 00ff 7fff ffff 0x c000 0000 0000 0000 0x bfff ffff ffff ffff 0x 8000 0000 0000 0000 0x 7fff ffff ffff ffff 0x 4000 0100 0000 0000 0x 4000 00ff ffff ffff 0x 4000 0000 0000 0000 0x 3fff ffff ffff ffff 0x 0000 0100 0000 0000 0x 0000 00ff ffff ffff 0x 0000 0000 0000 0000 0.5 gb tlb mapped 0.5 gb tlb unmapped uncached 0.5 gb tlb mapped 0.5 gb tlb unmapped cacheable 2 gb tlb mapped 0x ffff ffff 0x e000 0000 0x dfff ffff 0x c000 0000 0x bfff ffff 0x a000 0000 0x 9fff ffff 0x 8000 0000 0x 7fff ffff 0x 0000 0000 user? manual u10504ej7v0um00 135 memory management system figure 5-7 details of xkphys field 0x bfff ffff ffff ffff 0x b800 0001 0000 0000 0x b800 0000 ffff ffff 0x b800 0000 0000 0000 0x b7ff ffff ffff ffff 0x b000 0001 0000 0000 0x b000 0000 ffff ffff 0x b000 0000 0000 0000 0x afff ffff ffff ffff 0x a800 0001 0000 0000 0x a800 0000 ffff ffff 0x a800 0000 0000 0000 0x a7ff ffff ffff ffff 0x a000 0001 0000 0000 0x a000 0000 ffff ffff 0x a000 0000 0000 0000 0x 9fff ffff ffff ffff 0x 9800 0001 0000 0000 0x 97ff ffff ffff ffff 0x 9000 0001 0000 0000 address error 4 gb tlb unmapped cacheable address error 4 gb tlb unmapped cacheable address error 4 gb tlb unmapped cacheable address error 4 gb tlb unmapped cacheable address error 4 gb tlb unmapped cacheable address error 4 gb tlb unmapped uncached address error 4 gb tlb unmapped cacheable 4 gb tlb unmapped cacheable address error 0x 9800 0000 ffff ffff 0x 9800 0000 0000 0000 0x 9000 0000 ffff ffff 0x 9000 0000 0000 0000 0x 8fff ffff ffff ffff 0x 8800 0001 0000 0000 0x 8800 0000 ffff ffff 0x 8800 0000 0000 0000 0x 87ff ffff ffff ffff 0x 8000 0001 0000 0000 0x 8000 0000 ffff ffff 0x 8000 0000 0000 0000 chapter 5 136 user? manual u10504ej7v0um00 table 5-3 32-bit kernel mode segments 32-bit kernel mode, user space ( kuseg ) in kernel mode, when kx = 0 in the status register, and the most-significant bit of the virtual address is cleared, the kuseg virtual address space is selected; it covers the current 2 31 bytes (2 gb) user address space . the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. if the erl bit of the status register is 1, the user address area is a 2 gb area that cannot be cached without tlb mapping (i.e., the virtual addresses are used as physical addresses as is). however, this is a function used by the v r 4400 to process an ecc error in an exception handler. this function is defined to maintain the compatibility of the v r 4300 with the v r 4400 because the v r 4300 does not have an ecc and a parity function. address bit values status register bit value segment name virtual address physical address segment size ksu exl erl kx a(31) = 0 ksu = 00 or exl = 1 or erl =1 0 kuseg 0x0000 0000 through 0x7fff ffff tlb map 2 gb (2 31 bytes) a(31:29) = 100 0 kseg0 0x8000 0000 through 0x9fff ffff 0x0000 0000 through 0x1fff ffff 512 mb (2 29 bytes) a(31:29) = 101 0 kseg1 0xa000 0000 through 0xbfff ffff 0x0000 0000 through 0x1fff ffff 512 mb (2 29 bytes) a(31:29) = 110 0 ksseg 0xc000 0000 through 0xdfff ffff tlb map 512 mb (2 29 bytes) a(31:29) = 111 0 kseg3 0xe000 0000 through 0xffff ffff tlb map 512 mb (2 29 bytes) user? manual u10504ej7v0um00 137 memory management system 32-bit kernel mode, kernel space 0 ( kseg0 ) in kernel mode, when kx = 0 in the status register and the high-order three bits of the virtual address are 100, kseg0 virtual address space is selected; it covers the current 2 29 -byte (512 mb) address space. references to kseg0 are not mapped through the tlb; the physical address selected is defined by subtracting 0x8000 0000 from the virtual address. the k0 field of the config register controls cacheability. (refer to chapter 6 exception processing .) 32-bit kernel mode, kernel space 1 ( kseg1 ) in kernel mode, when kx = 0 in the status register and the high-order three bits of the virtual address are 101, kseg1 virtual address space is selected; it covers the current 2 29 -byte (512 mb) address space. references to kseg1 are not mapped through the tlb; the physical address selected is defined by subtracting 0xa000 0000 from the virtual address. caches are disabled for accesses to these addresses, and physical memory (or memory-mapped i/o device registers) are accessed directly. 32-bit kernel mode, supervisor space ( ksseg ) in kernel mode, when kx = 0 in the status register and the high-order three bits of the virtual address are 110, the ksseg virtual address space is selected; it covers the current 2 29 -byte (512 mb) virtual address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. 32-bit kernel mode, kernel space 3 ( kseg3 ) in kernel mode, when kx = 0 in the status register and the high-order three bits of the virtual address are 111, the kseg3 virtual address space is selected; it is the current 2 29 -byte (512 mb) virtual address space. the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. chapter 5 138 user? manual u10504ej7v0um00 table 5-4 64-bit kernel mode segments address bit values status register bit value segment name virtual address physical address segment size ksu exl erl kx a(63:62) = 00 ksu = 00 or exl = 1 or erl =1 1 xkuseg 0x0000 0000 0000 0000 through 0x0000 00ff ffff ffff tlb map 1 tb (2 40 bytes) a(63:62) = 01 1 xksseg 0x4000 0000 0000 0000 through 0x4000 00ff ffff ffff tlb map 1 tb (2 40 bytes) a(63:62) = 10 1 xkphys refer to 64-bit kernel mode, physical spaces (xkphy) on the following page. 0x8000 0000 0000 0000 through 0xbfff ffff ffff ffff 0x0000 0000 through 0xffff ffff 2 32 bytes a(63:62) = 11 1 xkseg 0xc000 0000 0000 0000 through 0xc000 00ff 7fff ffff tlb map 2 40 to 2 31 bytes a(63:62) = 11 a(61:31) = ? 1 ckseg0 0xffff ffff 8000 0000 through 0xffff ffff 9fff ffff 0x0000 0000 through 0x1fff ffff 512 mb (2 29 bytes) a(63:62) = 11 a(61:31) = ? 1 ckseg1 0xffff ffff a000 0000 through 0xffff ffff bfff ffff 0x0000 0000 through 0x1fff ffff 512 mb (2 29 bytes) a(63:62) = 11 a(61:31) = ? 1 cksseg 0xffff ffff c000 0000 through 0xffff ffff dfff ffff tlb map 512 mb (2 29 bytes) a(63:62) = 11 a(61:31) = ? 1 ckseg3 0xffff ffff e000 0000 through 0xffff ffff ffff ffff tlb map 512 mb (2 29 bytes) user? manual u10504ej7v0um00 139 memory management system 64-bit kernel mode, user space ( xkuseg ) in kernel mode, when kx = 1 in the status register and bits 63:62 of the virtual address are 00, the xkuseg virtual address space is selected; it covers the current 2 40 -byte (1 tb) user address space . the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. if the erl bit of the status register is 1, the user address area is a 2 gb area that cannot be cached without tlb mapping (i.e., the virtual addresses are used as physical addresses as is). however, this is a function used by the v r 4400 to process an ecc error in an exception handler. this function is defined to maintain the compatibility of the v r 4300 with the v r 4400 because the v r 4300 does not have an ecc and a parity function. 64-bit kernel mode, current supervisor space ( xksseg ) in kernel mode, when kx = 1 in the status register and bits 63:62 of the virtual address are 01, the xksseg virtual address space is selected; it covers the current supervisor virtual space . the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. 64-bit kernel mode, physical spaces ( xkphys ) in kernel mode, when kx = 1 in the status register and bits 63:62 of the virtual address are 10, one of the eight unmapped xkphys address spaces are selected, either cached or uncached . bits 31:0 of the virtual address are used as they are as the physical address. accesses with address bits 58:32 including 1 cause an address error. use of the cache is indicated by the bits 61 through 59 of the virtual address. table 5-5 shows the eight address spaces and use of the corresponding cache. chapter 5 140 user? manual u10504ej7v0um00 table 5-5 use of cache and xkphys address space 64-bit kernel mode, kernel space ( xkseg ) in kernel mode, when kx = 1 in the status register and bits 63:62 of the virtual address are 11 the address space is referred to as xkseg . the address space selected is one of the following: kernel virtual space, xkseg , the current kernel virtual space ; the virtual address is extended with the contents of the 8-bit asid field to form a unique virtual address this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. one of the four 32-bit kernel compatibility spaces, as described in the next section. bits 61 ?59 use of cache address 0 used 0x8000 0000 0000 0000 through 0x8000 0000 ffff ffff 1 used 0x8800 0000 0000 0000 through 0x8800 0000 ffff ffff 2 not used 0x9000 0000 0000 0000 through 0x9000 0000 ffff ffff 3 used 0x9800 0000 0000 0000 through 0x9800 0000 ffff ffff 4 used 0xa000 0000 0000 0000 through 0xa000 0000 ffff ffff 5 used 0xa800 0000 0000 0000 through 0xa800 0000 ffff ffff 6 used 0xb000 0000 0000 0000 through 0xb000 0000 ffff ffff 7 used 0xb800 0000 0000 0000 through 0xb800 0000 ffff ffff user? manual u10504ej7v0um00 141 memory management system 64-bit kernel mode, compatibility spaces ( ckseg1:0, cksseg, ckseg3 ) in kernel mode, when kx = 1 in the status register, bits 63:62 of the 64-bit virtual address are 11, and bits 61:32 of the virtual address are 0xffff ffff, bits 31:16 of the virtual address in the 64-bit mode are 0x8000-0xffff, as shown in figure 5-6, select one of the following 512 mb compatibility spaces. ckseg0 . this space is an unmapped region, compatible with the kseg0 space in 32-bit mode. the k0 ?ld of the con? register controls cacheability and coherency. ckseg1 . this space is an unmapped and uncached region, compatible with the kseg1 space in 32-bit mode. cksseg . this space is the current supervisor virtual space, compatible with the ksse g space in 32-bit mode. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. ckseg3 . this space is current supervisor virtual space, compatible with the kseg3 space in 32-bit mode. this space is referenced via tlb. whether the cache can be used or not is determined by the value of the c bit of the tlb entry of each page. chapter 5 142 user? manual u10504ej7v0um00 5.3 system control coprocessor the system control coprocessor (cp0) is implemented as an integral part of the cpu, and supports memory management, address translation, exception handling, and other privileged operations. cp0 contains the registers shown in figure 5-8 plus a 32-entry tlb. the sections that follow describe how the processor uses each of the tlb-related registers. remark each register is assigned a number called a register number. for details, refer to chapter 1 general . for the relations among the cp0 function, exception processing, and registers, refer to chapter 6 exception processing . figure 5-8 cp0 registers and the tlb entrylo0 2* entryhi page mask index random wired count 31 0 badvaddr tlb (?afe?entries) refer to 5.4.4 wired prid 0 127/255 8* 15* compare 11* config 16* lladdr 17* watchlo 18* watchhi 19* taglo 28* taghi 29* register (6). parity error 26* used with exception processing used with memory entrylo0 2* 3* entrylo1 entryhi 10* 5* page mask index 0* random 1* wired 6* errorepc 30* context 4* status 12* cause 13* epc 14* management system cacheerr 27* xcontext 20* 9* * register number user? manual u10504ej7v0um00 143 memory management system 5.3.1 format of a tlb entry figure 5-9 shows the tlb entry formats for both 32- and 64-bit modes. each field of an entry has a corresponding field in the entryhi , entrylo0 , entrylo1 , or pagemask registers. figure 5-9 tlb entry format 12 127 13 96 mask 0 95 vpn2 g 19 64 14 8 asid 76 77 20 63 32 pfn 31 0 7 0 121 120 109 108 75 72 71 58 57 6 cv d 311 33 34 35 37 38 0 1 20 pfn 26 25 6 c v d 311 1 2 3 5 6 0 1 0 0 0 12 255 13 192 mask 0 191 vpn2 g 27 128 14 8 asid 140 141 20 127 64 pfn 63 0 39 0 139 136 135 90 89 c v d 311 65 66 67 69 70 0 1 20 pfn 26 25 38 c v d 311 1 2 3 5 6 0 1 0 0 32-bit mode 64-bit mode 38 0 167 168 r 190 189 22 0 2 204 205 216 217 chapter 5 144 user? manual u10504ej7v0um00 the formats of the entryhi , entrylo0 , entrylo1 , and pagemask registers are almost the same as the tlb entry. however, the g bit of tlb is undefined with the entry hi register. figure 5-10 tlb entry registers (1/2) 12 31 13 0 mask 31 vpn2 19 0 5 8 asid 12 13 7 25 24 13 12 87 pagemask register entryhi register 00 0 vpn2 : virtual page number divided by two (maps to two pages). asid : address space id ?ld. an 8-bit ?ld that lets multiple processes share the tlb; virtual addresses for each process can be shared. r : region. (00 ? user, 01 ? supervisor, 11 ? kernel) used to match vaddr 63...62 fill : rfu. writing this data to this area is ignored. 0 is returned when this bit area read. 0 : rfu. must be written as zeroes, and returns zeroes when read. 63 vpn2 27 0 5 8 asid 12 13 87 0 2 62 61 40 39 22 fill r 32-bit mode 64-bit mode mask : page comparison mask. determines the virtual page size of the corresponding entry. 0 : reserved for future use (rfu). must be written as zeroes, and returns zeroes when read. user? manual u10504ej7v0um00 145 memory management system figure 5-10 tlb entry registers (2/2) whether the cache is used when a page is referenced is specified by the page coherency attribute (c) bit of the tlb. to use the cache, specify ?ache is used? or ?ache is not used?by algorithm as a page attribute. table 5-6 shows the page attributes selected by the c bit. table 5-6 cache algorithm value of c bit cache algorithm 0 cache is used 1 cache is used 2 cache is not used 3 cache is used 4 cache is used 5 cache is used 6 cache is used 7 cache is used 20 31 pfn 31 6 3111 20 pfn 6 0 0 pfn : page frame number; the high-order bits of the physical address. c : speci?s the tlb page attribute; refer to table 5-6 . d : dirty. if this bit is set, the page is marked as dirty and, therefore, writable. this bit is actually a write-protect bit that software can use to prevent alteration of data. v : valid. if this bit is set, it indicates that the tlb entry is valid; otherwise, a tlbl or tlbs miss occurs. g : global. if this bit is set in both entry lo0 and entry lo1, then the processor ignores the asid during tlb lookup. 0 : rfu. must be written as zeroes, and returns zeroes when read. 20 63 pfn 63 38 20 pfn 25 26 38 0 0 entrylo0 and entrylo1 registers 26 g cv d 0 1 2 3 5 6 25 25 entrylo0 32-bit mode entrylo1 32-bit mode entrylo0 64-bit mode entrylo1 64-bit mode 26 26 25 g cv d 0 1 2 3 5 6 g cv d 0 1 2 3 5 6 g cv d 0 1 2 3 5 6 3111 3111 3111 chapter 5 146 user? manual u10504ej7v0um00 5.4 cp0 registers the following sections describe the cp0 registers that can be accessed through the memory management system and software (each register is followed by its register number in parentheses). 5.4.1 index register (0) the index register is a 32-bit, read/write register containing six bits to index an entry in the tlb. the most-significant bit of the register shows the success or failure of a tlb probe (tlbp) instruction. the index register also specifies the tlb entry affected by tlb read (tlbr) or tlb write index (tlbwi) instructions. although the index register index field is six bits wide, only the five least- significant bits (4:0) are used in tlb operations, since the v r 4300 tlb has 32 entries. bit 5 is readable and writable, but is ignored during tlb operations. the value of the index register on reset is undefined. therefore, initialize the index register in software. figure 5-11 index register index register 31 1 30 6 5 0 25 6 index p 0 p : probe success or failure. set to 1 when the previous tlbprobe (tlbp) instruction was unsuccessful; set to 0 when successful. index : index to the tlb entry affected by the tlbread and tlbwrite instructions 0 : rfu. must be written as zeroes, and returns zeroes when read. user? manual u10504ej7v0um00 147 memory management system 5.4.2 random register (1) the random register is a read-only register of which six bits are used for referring to the tlb entry. although the random field is six bits wide, only the five low- order bits (4:0) are used in tlb operations, since the v r 4300 tlb has 32 entries. bit 5 is readable and writable by software, but is ignored during tlb operations. this register decrements as each instruction executes, and its values range between an upper and a lower bound, as follows: a lower bound is indicated by the contents of the wired register. an upper bound limit is 31. the random register specifies the entry in the tlb that is affected by the tlb write random instruction. the register does not need to be read for this purpose; however, the register is readable to verify proper operation of the processor. to simplify testing, the random register is set to the value of the upper bound upon cold reset. this register is also set to the upper bound when the wired register is written. figure 5-12 shows the format of the random register. figure 5-12 random register random register 31 65 0 26 6 random 0 random : tlb random index. 0 : rfu. must be written as zeroes, and returns zeroes when read. chapter 5 148 user? manual u10504ej7v0um00 5.4.3 entryhi (10), entrylo0 (2), entrylo1 (3), and pagemask (5) registers these registers are used to rewrite the tlb or to check coincidence of a tlb entry when addresses are converted. if the tlb exception occurs, information on the address that has caused the exception is loaded to these registers. figure 5-10 shows the formats of the entryhi , entrylo0 , entrylo1 , and pagemask registers. the values of these registers on reset are undefined. therefore, initialize the registers by software. entryhi register the entryhi register is a read/write register and is used to access the high-order bits of the internal tlb. the entryhi register retains the contents of the high-order bits of a tlb entry when a tlb read or write operation is executed. if a tlb miss, tlb invalid, or tlb modification exception occurs, the virtual page number (vpn2) of the virtual address that has caused the exception and asid are set to the entryhi register. for the details of the tlb exception, refer to chapter 6 exception processing . asid is used to write or read the asid area of the tlb entry. when an address is converted, it is verified against the asid of the tlb entry as the asid of the virtual address. to access this register, use the tlbp, tlbwr, tlbwi, or tlbr instruction. entrylo0 and entrylo1 registers entrylo consists of two registers: entrylo0 for even virtual pages and entrylo1 for odd virtual pages. entrylo0 and lo1 registers are read/write registers and are used to access the low-order bits of the internal tlb. when a tlb read/write operation is executed, entrylo0 and lo1 access the contents of the low-order bits of the tlb entry on an even and odd pages. user? manual u10504ej7v0um00 149 memory management system pagemask register the pagemask register is a read/write register used for reading from or writing to the tlb; it holds a comparison mask that sets the page size for each tlb entry, as shown in table 5-7. there are seven page sizes selectable. tlb read and write operations use this register as either a destination or a source; when virtual addresses are presented for translation into physical address, the bits 24:13 which are used in the comparison are masked. when the mask field is not one of the values shown in table 5-7, the operation of the tlb is undefined. table 5-7 mask field values for page sizes page size bit 24 23 22 21 20 19 18 17 16 15 14 13 4 kb 000000000000 16 kb 000000000011 64 kb 000000001111 256 kb 000000111111 1 mb 000011111111 4 mb 001111111111 16 mb 111111111111 chapter 5 150 user? manual u10504ej7v0um00 5.4.4 wired register (6) the wired register is a read/write register that specifies the boundary between the wired and random entries of the tlb as shown in figure 5-13. wired entries are fixed, nonreplaceable entries, which cannot be overwritten by a tlbwr (tlb write random) operation. they can, however, be overwritten by a tlbwi (tlb write indexed) instruction. random entries can be overwritten. figure 5-13 wired register boundary although the wired field is six bits wide, only the five low-order bits are used in tlb operations, since the v r 4300 tlb has 32 entries. bit 5 is readable and writable by software, but is ignored during tlb operations. the wired register is set to 0 upon cold reset. writing this register also sets the random register to the value of its upper bound of 31 (refer to 5.4.2 random register (1) ). figure 5-14 shows the format of the wired register. figure 5-14 wired register 31 value of 0 tlb register range of wired entries wired range of random entries tlb wired register 31 65 0 26 6 wired 0 wired : tlb wired boundary. 0 : rfu. must be written as zeroes, and returns zeroes when read. user? manual u10504ej7v0um00 151 memory management system 5.4.5 processor revision identifier (prid) register (15) the 32-bit, read-only processor revision identifier ( prid ) register contains information identifying the implementation and revision level of the cpu and cp0. figure 5-15 shows the format of the prid register. figure 5-15 processor revision identifier register the processor revision number is a value in the format of yx. y is the major revision number contained in bits 7:4, and x is the minor revision number contained in bits 3:0. the processor revision number identifies revision of the chip. however, revision of the chip is not always reflected on the prid register. conversely, a change in the revision number does not always reflect on the actual change of the chip. therefore, develop your program so that it does not depend on the processor revision number area. 5.4.6 config register (16) this register displays or sets various processor statuses of the v r 4300. although consideration is given to maintain compatibility of this register with the config register of the v r 4400, some pins of this register are fixed to 0. the ep and be area are initialized on cold reset. these areas can be read or written by software. the default values of these areas are as follows: ep: 0000 be: 1 the cu bit and k0 area can be read or written in software. however, because these bit and area are not initialized, the user must set the default values to them after reset. 16 15 prid register 31 0 16 imp 88 0 8 rev 7 imp : processor id number (0x0b for the v r 4300 series tm ) rev : processor revision number 0 : rfu. must be written as zeroes, and returns zeroes when read. chapter 5 152 user? manual u10504ej7v0um00 the values of the ep and be areas can be changed only when initialization is executed in the non-cache area immediately after cold reset and before a store instruction is executed. the operation is not guaranteed if the values of these areas are changed at any other time. figure 5-16 shows the format of the config register. ec : operating frequency ratio (read-only). the value displayed corresponds to the frequency ratio set by the divmode pins on power application. (for details of divmode pin setting, refer to table 2-2 clock/control interface signals. ) m pd30200-80 (v r 4305) 110 ? 1:1 (masterclock: pciock) 111 ? rfu 000 ? 1:2 001 ? 1:3 others ? rfu m pd30200-100 (v r 4300) 110 ? rfu 111 ? 1:1.5 (masterclock: pclock) 000 ? 1:2 001 ? 1:3 others ? rfu m pd30200-133 (v r 4300) 110 ? 1:4 (masterclock: pciock) 111 ? rfu 000 ? 1:2 001 ? 1:3 others ? rfu m pd30210-133 (v r 4310) 010 ? 1:5 (masterclock: pciock) 011 ? 1:6 100 ? rfu 101 ? 1:3 110 ? 1:4 111 ? rfu 000 ? 1:2 001 ? 1:3 figure 5-16 config register (1 / 2) 0 31 ec ep be cu k0 00000110 11001000110 30 28 27 24 23 15 16 14 4 3 2 0 134 8 11113 user? manual u10504ej7v0um00 153 memory management system m pd30210-167 (v r 4310) 010 ? 1:5 (masterclock: pciock) 011 ? 1:6 100 ? 1:2.5 101 ? 1:3 110 ? 1:4 111 ? rfu 000 ? 1:2 001 ? 1:3 ep : sets transfer data pattern (single/block write request). 0 ? d (default on cold reset) 6 ? dxxdxx: 2 doublewords/6 cycles others ? rfu be : sets bigendianmem (endianness). 0 ? little endian 1 ? big endian (default on cold reset) cu : rfu. however, can be read or written by software. k0 : sets coherency algorithm of kseg0 (refer to table 5-6 cache algorithm ). 010 ? cache is not used others ? cache is used 1 : returns 1 when read. 0 : returns 0 when read. caution if the be bit of this register is changed by using the mtc0 instruction, insert two or more nop instructions or an instruction other than the load/store instruction in between the mtc0 and load/store instructions. figure 5-16 config register (2 / 2) chapter 5 154 user? manual u10504ej7v0um00 5.4.7 load linked address (lladdr) register (17) the read/write load linked address ( lladdr ) register contains the physical address read by the most recent load linked instruction. this register is for diagnostic purposes only. figure 5-17 shows the format of the lladdr register. the paddr area in the figure shows the value with the high-order four bits of the physical address pa(31:4) read on execution of the ll instruction zero-extended. the contents of the lladdr register are undefined on reset. figure 5-17 lladdr register 5.4.8 cache tag registers [taglo (28) and taghi (29)] the taglo and taghi registers are 32-bit read/write registers that hold the primary cache tag for cache initialization, cache diagnostics, or cache error processing. the tag registers are written by the cache and mtc0 instructions. figure 5-18 shows the format of these registers. the contents of these registers are undefined on reset. paddr : stores the bits 31 through 4 of the physical address read by the last ll instruction to bits 27 through 0, and 0 to bits 31 through 28. lladdr register 31 0 paddr 32 user? manual u10504ej7v0um00 155 memory management system cautions 1. if 10 is written to pstate by using the cache (index_store_tag) instruction, the cache is clean. however, 11 is read when the pstate value is read by using the cache (index_load_tag) instruction. 2. if 01 is written to pstate by using the cache (index_store_tag) instruction, the cache operation is not guaranteed. 3. if 11 is written to pstate by using the cache (index_store_tag), the cache is dirty. figure 5-18 taglo and taghi register 5.4.9 virtual-to-physical address translation process during virtual-to-physical address translation, the cpu compares the 8-bit asid (if the global bit, g , is not set) of the virtual address to the asid of the tlb entry to see if there is a match. one of the following comparisons are also made: in 32-bit mode, the high-order bits* of the virtual address are compared to the contents of the tlb entry, vpn2 (virtual page number divided by two). in 64-bit mode, the high-order bits* of the virtual address are compared to the contents of the tlb entry, vpn2 (virtual page number divided by two). ptaglo : physical address bits 31:12 pstate : specifies the primary cache state data cache 11 = valid 00 = invalid instruction cache 10 = valid 00 = invalid others = undefined 0 : rfu. must be written as zeroes; returns zeroes when read 31 0 32 taglo taghi 31 0 20 87 pstate 65 6 2 0 ptaglo 0 28 0 4 27 chapter 5 156 user? manual u10504ej7v0um00 if a tlb entry matches, the physical address and access control bits ( c , d , and v ) are retrieved from the matching tlb entry. while the v bit of the entry must be set for a valid translation to take place, it is not involved in the determination of a matching tlb entry. figure 5-19 illustrates the tlb address translation process. * the number of bits differs depending on the page size. here are examples where the page size is 16 mb and 4 kb: page size mode 16 mb 4 kb 32-bit mode a (31:25) a (31:13) 64-bit mode a63, a62, and a (39:25) a63, a62, and a (39:13) user? manual u10504ej7v0um00 157 memory management system figure 5-19 tlb address translation user mode? vpn match? asid match? g = 1? legal v = 1? d = 1? no yes yes yes no no yes write? yes no yes tlb invalid tlb mod exception tlb miss exception vpn and asid virtual address (input) uncached? yes no access main access cache physical address (output) memory no valid dirty global no no mode? sup address error exception yes no yes address error yes exception no no no yes 32-bit address? yes xtlb miss no address error yes address? legal address? legal address? mapped address? yes no exception exception chapter 5 158 user? manual u10504ej7v0um00 5.4.10 tlb misses if there is no tlb entry that matches the virtual address, a tlb miss exception occurs.* if the access control bits ( d and v ) indicate that the access is not valid, a tlb modification exception or tlb invalid exception occurs. if the c bits equal 010, the physical address that is retrieved accesses main memory, bypassing the cache. * tlb miss exceptions are described in chapter 6 exception processing . 5.4.11 tlb instructions the following instructions are used to control the tlb. tlbp (translation lookaside buffer probe) loads a tlb number that matches the contents of the entryhi register to the index register. if the tlb entry does not match, the most significant bit of the index register is set. tlbr (translation lookaside buffer read) writes the contents of the tlb entry indicated by the index register to the entryhi , entrylo0 , entrylo1 , and pagemask registers. tlbwi (translation lookaside buffer write index) writes the contents of the entryhi , entrylo0 , entrylo1 , and pagemask registers to the tlb entry indicated by the contents of the index register. tlbwr (translation lookaside buffer write random) writes the contents of the entryhi , entrylo0 , entrylo1 , and pagemask registers to the tlb entry indicated by the contents of the random register. user? manual u10504ej7v0um00 159 exception processing 6 this chapter describes the exception processing and the hardware used for the exception processing. for the fpu exception, refer to chapter 8 floating-point exceptions . chapter 6 160 user? manual u10504ej7v0um00 6.1 exception processing operation the processor receives exceptions from a number of sources, including translation lookaside buffer (tlb) misses, arithmetic overflows, i/o interrupts, and system calls. when the cpu detects an exception, the normal sequence of instruction execution is suspended and the processor enters kernel mode (refer to chapter 5 memory management system for a description of system operating modes). the processor then disables interrupts and forces execution of a software exception process (called an exception handler ) located at a fixed address. the handler saves the context of the processor, including the contents of the program counter, the current operating mode (user or supervisor), and the status of the interrupts (enabled or disabled). this context is saved so it can be restored when the exception processing has been performed. when an exception occurs, the cpu loads the exception program counter ( epc ) register with a location where execution can restart after the exception processing has been performed. the restart location in the epc register is the address of the instruction that caused the exception. if the instruction was executing in a branch delay slot, the cpu loads the epc register to the address of the branch instruction immediately preceding the branch delay slot. for the exception processing, the following modes can be set. interrupt enable ( ie ) base operating mode (user, supervisor, or kernel) exception level (normal or exception, as indicated by the exl bit in the status register) error level (normal or error, as indicated by the erl bit in the status register). each setting condition is described below. interrupt enable interrupts are enabled if the following conditions are satisfied. ie (interrupt enable bit) = 1 exl bit = 0, erl bit = 0 bit of corresponding im area in status register = 1 base operating mode the operating mode that is the basis when the exception level is normal (0) is specified by the ksu area of the status register. user? manual u10504ej7v0um00 161 exception processing exception/error level the kernel mode is set when either of the exl or erl bit is set to 1. when execution returns from exception processing, the exception level is reset to normal (0) (for details, refer to eret instruction of chapter 16 cpu instruction set details) . in addition to the above, registers that hold information on addresses, causes, and statuses during exception processing are provided. for details, refer to 6.3 exception processing registers . for details of the exception processing, refer to 6.4 exception details . 6.2 precision of exceptions v r 4300 exceptions are logically precise; the instruction that causes an exception and all those that follow it are aborted and can be re-executed after servicing the exception. when succeeding instructions are killed, exceptions associated with those instructions are also killed. exceptions are not taken in the order detected, but in instruction fetch order. 6.3 exception processing registers this section describes the cp0 registers that are used in exception processing. table 6-1 lists these registers, along with their number?ach register has a unique identification number that is referred to as its register number . the remaining cp0 registers are used in memory management, as described in chapter 5 memory management system . software examines the cp0 registers to determine the cause of the exception and the state of the cpu at the time the exception occurred. the registers in table 6- 1 are used in exception processing, and are described in the sections that follow. chapter 6 162 user? manual u10504ej7v0um00 table 6-1 cp0 exception processing registers * this register is defined to maintain compatibility between the v r 4300 and v r 4200, and is not used with the hardware of the v r 4300. hazard of cp0 with the general purpose registers of the cpu, when the result of an operation is to be used by the next instruction, the hardware generates a stall and waits until the result can be used. however, the cp0 register and tlb do not generate a stall. if a value is stored to the cp0 register, that value may not be used by the immediately following instruction because the value is stored in the register several cycles later. when designing a program, therefore, you must take this into consideration when setting values to the cp0 register and tlb (for details, refer to chapter 19 coprocessor 0 hazards ). register name reg. no. context 4 badvaddr (bad virtual address) 8 count 9 compare 11 status 12 cause 13 epc (exception program counter) 14 watchlo 18 watchhi 19 xcontext 20 perr* 26 cacheerr (cache error)* 27 errorepc (error exception program counter) 30 user? manual u10504ej7v0um00 163 exception processing 6.3.1 context register (4) the context register is a read/write register containing the pointer to an entry in the page table entry (pte) array on memory; this array is an operating system data structure that stores virtual-to-physical address translations. when there is a tlb miss, the operating system loads the tlb with the missing translation from the pte array. the context register is used by the tlb miss exception handler to load the tlb entry. the context register duplicates some of the information provided in the badvaddr register, but the information is arranged in a form that is more useful for a software tlb exception handler. figure 6-1 shows the format of the context register. figure 6-1 context register the context register bit field is described below. badvpn2 field is written by hardware on a tlb miss. it contains the virtual page number (vpn2), divided by 2, of the most recent virtual address that did not have a valid translation. ptebase area can be read or written and is controlled by the operating system. it is used only by the software as a pointer to the current pte array on the memory. the 19-bit badvpn2 field contains bits 31:13 of the virtual address that caused the tlb miss; bit 12 is excluded because a single tlb entry maps to an even-odd address pair. for a 4 kb page size, this format can be used as the pointer to refer to the pair-table of 8-byte ptes. for 16 kb page or larger, shifting and masking this value produces the correct pte reference address. 23 22 4 3 31 0 9 ptebase badvpn2 19 4 0 context register 23 22 4 3 63 0 41 ptebase badvpn2 19 4 0 32-bit mode 64-bit mode ptebase : base address of page table entry badvpn2 : page number of virtual address whose translation is invalid divided by 2 0 : rfu. must be written zeroes; returns zeroes when read chapter 6 164 user? manual u10504ej7v0um00 6.3.2 badvaddr register (8) the bad virtual address ( badvaddr ) register is a read-only register and holds a virtual address that was translated but became invalid last, or a virtual address at which an addressing error occurred. figure 6-2 shows the format of the badvaddr register. caution this register does not hold information even when a bus error exception occurs because it is not an address error exception. figure 6-2 badvaddr register 6.3.3 count register (9) the read/write count register acts as a timer, incrementing at a constant rate?alf the pclock speed?hether or not instructions are being executed. this register is a free-running type. when the register reaches all ones, it rolls over to zero and continues counting. this register can be used for diagnostic purposes, system initialization or synchronization between the processes. figure 6-3 shows the format of the count register. figure 6-3 count register badvaddr register 31 0 32 bad virtual address 63 0 64 bad virtual address 32-bit mode 64-bit mode badvaddr : virtual address at which an address error occurred last or which failed in address translation count register 31 0 32 count count : latest count value (incremented at frequency half pclock) user? manual u10504ej7v0um00 165 exception processing 6.3.4 compare register (11) the compare register is used to generate a timer interrupt; it maintains a stable value that does not change on its own. when the value of the compare register equals the value of the count register (refer to 6.3.3 ), interrupt bit ip(7) in the cause register is set. this causes an interrupt in the df stage as soon as the interrupt is enabled. writing a value to the compare register, as a side effect, clears the timer interrupt. for diagnostic purposes, the compare register is a read/write register. however, it is usually used as a write register. figure 6-4 shows the format of the compare register. figure 6-4 compare register 6.3.5 status register (12) the status register ( sr ) is a read/write register that contains the operating mode, interrupt enabling, and the diagnostic states of the processor. figure 6-5 shows the format of the entire register. compare register 31 0 32 compare compare : value to be compared with count register chapter 6 166 user? manual u10504ej7v0um00 * the low power mode is supported only in the 100 mhz model of the v r 4300 and thev r 4305. fix the rp bit of the 133 mhz model of the v r 4300 and the v r 4310 to 0. figure 6-5 status register status register 4 31 15 28 27 25 24 16 9 87 5 4 3 2 1 0 82111 26 1 6 111 1 1 cu : controls the usability of each of the four coprocessor unit numbers. (1 ? usable, 0 ? unusable) cp0 is always usable when in kernel mode, regardless of the setting of the cu0 bit. cp2 and cp3 are reserved for future expansion. rp : enables low-power operation by reducing the internal clock frequency and the system interface clock frequency to one-quarter speed. (0 ? normal, 1 ? low power mode) * (for details, refer to 15.1.2 low power mode .) fr : enables additional floating-point registers (0 ? 16 registers, 1 ? 32 registers) re : reverse-endian bit, enables reverse of system endianness in user mode. (0 ? disabled, 1 ? reversed) ds : diagnostic status field (see figure 6-6, for details). im(7:0) : interrupt mask field , enables external, internal, coprocessors or software interrupts. (0 ? disabled, 1 ? enabled) im(7) : mask bit for timer interrupt im(6:2) : mask bits for external interrupts int[4:0] , or external write requests im(1:0) : mask bits for software interrupts and ip(1:0) of the cause register kx : enables 64-bit addressing in kernel mode. when this bit is set, xtlb miss exception is generated on tlb misses in kernel mode addresses space. (0 ? 32-bit, 1 ? 64-bit) 64-bit operation is always valid in kernel mode. sx : enables 64-bit addressing and operations in supervisor mode. when this bit is set, xtlb miss exception is generated on tlb misses in supervisor mode addresses space. (0 ? 32-bit, 1 ? 64-bit) ux : enables 64-bit addressing and operations in user mode. when this bit is set, xtlb miss exception is generated on tlb misses in user mode addresses space. (0 ? 32-bit, 1 ? 64-bit) ksu : specifies and indicates mode bits (10 ? user, 01 ? supervisor, 00 ? kernel) erl : specifies and indicates error level (0 ? normal, 1 ? error) exl : specifies and indicates exception level (0 ? normal, 1 ? exception) ie : specifies and indicates global interrupt enable (0 ? disable interrupts, 1 ? enable interrupts) im(7:0) ksu erl exl ie kx ux sx (cu3:cu0) cu re ds rp fr user? manual u10504ej7v0um00 167 exception processing figure 6-6 shows the format of the self-diagnostic status (ds) area. all the bits in the ds area, except the ts bit, can be read or written. figure 6-6 self-diagnostic status field self-diagnostic status field 24 22 21 20 19 18 17 16 ts sr ch ce de 11111 1 1 bev 23 1 00 1 its its : enables instruction trace support. for details, refer to 9.3.5 instruction trace support . bev : controls the location of tlb miss and general purpose exception vectors. 0 ? normal 1 ? bootstrap ts : indicates tlb shutdown has occurred (read-only); used to avoid damage to the tlb if more than one tlb entry matches a single virtual address. 0 ? does not occur 1 ? occur after tlb shutdown, the processor must be reset to restart. tlb shutdown can occur even when a tlb entry with which the virtual address has matched is set to be invalid (v bit of the entry is cleared). sr : 0 ? indicates a soft reset or nmi has not occurred. 1 ? indicates a soft reset or nmi has occurred. ch : cp0 condition bit. 0 ? false 1 ? true read/write access by software only; not accessible by hardware. ce, de : these bits are defined to maintain compatibility with the v r 4200, and is not used by the hardware of the v r 4300. 0 : rfu. must be written as zeroes, and returns zeroes when read. chapter 6 168 user? manual u10504ej7v0um00 fields of the status register set the modes and access states described in the sections that follow. instruction trace support the v r 4300 can output the physical address at the branch destination from sysad(31:0) if the instruction address is internally changed by the branch or jump instruction, or occurrence of an exception. to use this function, set the its bit to 1. an instruction cache miss is forcibly generated in the following cases to output the physical address at the branch destination. if the branch condition is satis?d when a branch instruction is executed if the value of pc is changed by a jump instruction or occurrence of an exception if an instruction cache miss is generated, sysad(31:0) issues a processor block read request, which allows an external device to learn a change of the address. return response data in response to the processor block read request in the same manner as to the ordinary request. the address to be output is not the value of the pc (virtual address), but a physical address. interrupt enable interrupts are enabled when all of the following conditions are satisfied: ie = 1 exl = 0 erl = 0 when corresponding bit of im is set to 1 user? manual u10504ej7v0um00 169 exception processing operating modes the following status register bit settings are required for user, kernel, and supervisor modes. the processor is in user mode when ksu = 10, exl = 0, and erl = 0. the processor is in supervisor mode when ksu = 01, exl = 0, and erl = 0. the processor is in kernel mode when ksu = 00, or exl = 1, or erl = 1. 32- and 64-bit modes the following status register bit settings select 32- or 64-bit operation for user, kernel, and supervisor operating modes. enabling 64-bit operation permits the execution of 64-bit opcodes and translation of 64-bit addresses. 64-bit operation for user, kernel and supervisor modes can be set independently. 64-bit addressing for kernel mode is enabled when kx = 1. 64-bit operations are always valid in kernel mode. 64-bit addressing and operations are enabled for supervisor mode when sx = 1. 64-bit addressing and operations are enabled for user mode when ux = 1. kernel address space accesses access to the kernel address space is allowed when the processor is in kernel mode. supervisor address space accesses access to the supervisor address space is allowed when the processor is in kernel or supervisor mode. user address space accesses access to the user address space is allowed in any of the three operating modes. chapter 6 170 user? manual u10504ej7v0um00 status on reset the contents of the status register on reset are undefined except for the following bits: ts and rp = 0 erl and bev = 1 sr = 0 on cold reset; sr = 1 on soft reset or nmi interrupt inverting endian the v r 4300 is set to big endian at reset. after that, the endian setting can changed by using the be bit of the config register. when re bit = 1 the endian setting in the kernel and supervisor modes is speci?d by the be bit of the con? register. the endian setting in the user mode is opposite to the speci?d endian setting. when re bit = 0 the endian setting in the kernel, supervisor mode, and user mode is speci?d by the be bit of the con? register. user? manual u10504ej7v0um00 171 exception processing 6.3.6 cause register (13) the cause register is a 32-bit read/write register and holds the cause of the exception that has occurred last. the 5 bits in the exception code area of this register indicate the cause of the exception (refer to table 6-2 ). the remaining areas hold detailed information on a specific exception. all the bits, except ip1 and ip0, are read-only. the ip1 and ip0 bits are used to generate the software interrupt. figure 6-7 shows the format of the cause register, and table 6-2 describes the exception code area. figure 6-7 cause register cause register 1 ip(7:0) 31 15 27 16 212 876 2 0 812 5 1 0 exc code 1 0 0 28 29 30 bd 0 ce bd : indicates whether the last exception occurred has been executed in a branch delay slot. 1 ? delay slot 0 ? normal ce : coprocessor unit number referenced when a coprocessor unusable exception has occurred. if this exception does not occur, undefined. ip(7:0) : indicates an interrupt is pending. 1 ? interrupt pending 0 ? no interrupt ip(7) : timer interrupt ip(6:2) : external normal interrupts. controlled by int[4:0] , or external write requests ip(1:0) : software interrupts. only these bits can cause interrupt exception when they are set to 1 by software. exccode : exception code field (refer to table 6-2 for details.) 0 : rfu. must be written as zeroes, and returns zeroes when read. chapter 6 172 user? manual u10504ej7v0um00 table 6-2 cause register exccode field exception mnemonic description code value 0 int interrupt 1 mod tlb modification exception 2 tlbl tlb miss exception (load or instruction fetch) 3 tlbs tlb miss exception (store) 4 adel address error exception (load or instruction fetch) 5 ades address error exception (store) 6 ibe bus error exception (instruction fetch) 7 dbe bus error exception (data reference: load or store) 8 sys syscall exception 9 bp breakpoint exception 10 ri reserved instruction exception 11 cpu coprocessor unusable exception 12 ov arithmetic overflow exception 13 tr trap exception 14 rfu 15 fpe floating-point exception 16?2 rfu 23 watch watch exception 24?1 rfu user? manual u10504ej7v0um00 173 exception processing the v r 4300 has eight interrupt requests: ip7 through ip0. these interrupt requests are used for the following purposes. ip7 indicates whether a timer interrupt request has been issued. this interrupt request is set when the contents of the count register have become equal to those of the compare register. ip6 through ip2 ip6 through ip2 reflect the logical sum of the two internal registers of the v r 4300. one is the register that latches the status of an interrupt request pin in each cycle, and the other is a register to which data is written by the external write request of the system interface. ip1 and ip0 ip1 and ip0 set or clear the software interrupt request by manipulating each bit. for details, refer to chapter 14 interrupts . the floating-point exception uses the exception code contained in the floating-point control/status register (refer to chapter 8 floating-point exceptions ). chapter 6 174 user? manual u10504ej7v0um00 6.3.7 exception program counter (epc) register (14) the exception program counter ( epc ) is a read/write register that contains the address at which processing resumes after an exception has been serviced. the epc register contains either: the virtual address of the instruction that was the direct cause of the exception, or the virtual address of the immediately preceding branch or jump instruction (when the instruction that was the direct cause of the exception is in a branch delay slot, and the branch delay bit in the cause register is set). the exl bit in the status register is set to 1 to keep the processor from overwriting the address of the exception-causing instruction contained in the epc register in the event of another exception. figure 6-8 shows the format of the epc register. figure 6-8 epc register epc register 31 0 epc 32 63 0 epc 64 32-bit mode 64-bit mode epc : address from which program execution is resumed after an exception processing user? manual u10504ej7v0um00 175 exception processing 6.3.8 watchlo (18) and watchhi (19) registers the v r 4300 processor provides a debugging feature to detect request of references to a selected physical address; load and store operations cause a watch exception. figure 6-9 shows the format of the watchlo and watchhi registers. initialize the values of these registers in software since these values are undefined on reset. figure 6-9 watchlo and watchhi registers watchlo register 31 29 1 r w paddr0 11 30 1 watchhi register 2 31 28 4 40 0 paddr1 0 3 paddr1 : bits 35:32 of a physical address. because the most significant bit of a physical address handled by the v r 4300 is bit 31, the value in this area is invalid. this area is provided to maintain software compatibility of the v r 4300 with the v r 4400 and v r 4200, and all the 4 bits of this area can be read. paddr0 : bits 31:3 of the physical address r : exception occurs when load instruction is executed if set to 1. w : exception occurs when store instruction is executed if set to 1. 0 : rfu. must be written as zeroes, and returns zeroes when read. chapter 6 176 user? manual u10504ej7v0um00 6.3.9 xcontext register (20) the xcontext register is a read/write register and indicates one entry of the page table entry array (pte) on the memory. the pte array is the data structure of the operating system and preserves a conversion table that translates virtual addresses into physical addresses. if a tlb miss occurs, the operating system loads the data that has caused the miss from the pte to the tlb, and a remedial action is executed by the software. the xcontext register is used by the xtlb miss exception handler that loads a tlb entry in the 64-bit addressing mode. although this register contains several pieces of information that overlap with those of the badvaddr register, it is in the format easy to be used by the xtlb exception handler. this register is used by the operating system only. the ptebase area of this register is set as necessary. figure 6-10 shows the format of the xcontext register. figure 6-10 xcontext register each bit area of the xcontext register is described next. xcontext register 3130 4 3 63 0 31 ptebase badvpn2 27 4 0 r 2 33 32 ptebase : base address of page table entry r : space identifier (bits 63 and 62 of virtual address) 00 ? user 01 ? supervisor 11 ? kernel badvpn2 : virtual address whose translation is invalid (bits 39:13) 0 : must be written as zeroes, and returns zeroes when read. user? manual u10504ej7v0um00 177 exception processing badvpn2 area the badvpn2 area is written by the hardware in case of a tlb miss. r area the r area is written by the hardware in case of a tlb miss. ptebase area the ptebase area is a read/write area and is used by the operating system. the 27-bit badvpn2 area holds the values of the bits 39:13 of the virtual address that has caused a tlb miss. because a tlb entry consists of a pair of an even page and an odd page, it does not include bit 12. this register can be used as a pointer that references an 8- byte pte pair table as it is where the page size is 4 kb. with the page size of 16 kb or more, an appropriate pte reference address can be generated by shifting or masking the value of this register. chapter 6 178 user? manual u10504ej7v0um00 6.3.10 parity error (perr) register (26) the parity error register is a read/write register. this register is defined to maintain the software compatibility of the v r 4300 with the v r 4200. because the v r 4300 does not have a parity, this register is not used by the hardware. figure 6-11 shows the format of the parity error register. figure 6-11 perr register 6.3.11 cache error (cacheerr) register (27) the cache error register is a read-only register. this register is defined to maintain the compatibility of the v r 4300 with the v r 4200. because the v r 4300 does not generate a cache error, this register is not used by the hardware. figure 6-12 shows the format of the cache error register. figure 6-12 cacheerr register perr register 31 24 8 80 7 0 diagnostic diagnostic : 8-bit self-diagnosis area 0 : rfu. must be written as zeroes, and returns zeroes when read. cacheerr register 31 32 0 0 0 : rfu. must be written as zeroes, and returns zeroes when read. user? manual u10504ej7v0um00 179 exception processing 6.3.12 error exception program counter (error epc) register (30) the errorepc register is similar to the epc register. it is also used to store the program counter (pc) on cold reset, soft reset, and nonmaskable interrupt (nmi) exceptions. the read/write errorepc register contains the virtual address at which instruction processing can resume after servicing an error. this address can be: the virtual address of the instruction that caused the exception the virtual address of the immediately preceding branch or jump instruction, when the instruction which is the cause of the error exception is in a branch delay slot. there is no branch delay slot indication for the errorepc register. figure 6-13 shows the format of the errorepc register. figure 6-13 errorepc register errorepc register 31 0 errorepc 32 63 0 errorepc 64 32-bit mode 64-bit mode errorepc : indicates the program counter on cold reset or soft reset, or in case of the nmi exception. chapter 6 180 user? manual u10504ej7v0um00 6.4 exception details this section describes the processor exceptions (cause, processing, manipulation). 6.4.1 exception types this section gives sample exception handler operations for the following exception types: cold reset soft reset nonmaskable interrupt (nmi) remaining processor exceptions when the exl and erl bits in the status register are 0 in normal operation either user, supervisor, or kernel operating mode is specified by the ksu bits in the status register. if one of the exl and rel bits is 1, the processor is in the kernel mode. if an exception occurs in the processor, the exl bit is set to 1, and the system enters the kernel mode. after information has been saved, the exl bit is reset to 0 by an exception handler in most of the cases. the exl bit is set to 1 again by an exception handler so that the information that has been saved is not lost due to occurrence of another exception while the information is restored. when execution exits from the exception processing, the exl bit is reset to 0. for details, refer to eret instruction of chapter 16 cpu instruction set details . 6.4.2 exception vector locations the cold reset, soft reset, and nmi exceptions are always vectored to: location 0xbfc0 0000 in 32-bit mode location 0xffff ffff bfc0 0000 in 64-bit mode these addresses are a non-cache, non-tlb mapping area. addresses for the remaining exceptions are a combination of a vector offset and a base address . 64-bit mode exception and 32-bit mode exception vectors, and their offsets are shown next. user? manual u10504ej7v0um00 181 exception processing table 6-3 64-bit mode exception vector base addresses table 6-4 32-bit mode exception vector base addresses e.g. tlb miss vector (exl = 0): when bev = 0, the vector base for this exception vector is in kseg0 (uncached, tlb unmapped space) (0x8000 0000 in 32-bit mode, 0xffff ffff 8000 0000 in 64-bit mode). when bev = 1, the vector base address for this exception vector is in kseg1 (uncached, tlb unmapped space) 0xbfc0 0200 in 32-bit mode and 0xffff ffff bfc0 0200 in 64-bit mode. this is a tlb unmapped space, allowing the exception to bypass the tlb. e.g. general exception vector: when bev = 0, the vector base address for this exception vector is in kseg0 (uncached, unmapped space) (0x8000 0180 in 32-bit mode, 0xffff ffff 8000 0180 in 64-bit mode). when bev = 1, the vector base address for this exception vector is in kseg1 (uncached, tlb unmapped space) (0x8000 0180 in 32-bit mode and 0xffff ffff bfc0 0380 in 64-bit mode). this space is an uncached and tlb unmapped space, allowing the exception handler to bypass the cache and tlb. vector base address vector offset cold reset, soft reset, and nmi 0xffff ffff bfc0 0000 (bev bit is automatically set to 1.) 0x0000 tlb miss, exl=0 0xffff ffff 8000 0000 (bev=0) 0xffff ffff bfc0 0200 (bev=1) 0x0000 xtlb miss, exl=0 0x0080 other 0x0180 vector base address vector offset cold reset, soft reset, and nmi 0xbfc0 0000 (bev bit is automatically set to 1.) 0x0000 tlb miss, exl=0 0x8000 0000 (bev=0) 0xbfc0 0200 (bev=1) 0x0000 xtlb miss, exl=0 0x0080 other 0x0180 chapter 6 182 user? manual u10504ej7v0um00 6.4.3 priority of exceptions while more than one exception can occur for a single instruction, only the exception with the highest priority is reported. the priority is as follows: table 6-5 exception priority order generally speaking, the exceptions described in the following sections are handled (?rocessing? by hardware; these exceptions are handled (?ervicing? by software. cold reset (highest priority) soft reset nonmaskable interrupt (nmi) address error instruction fetch tlb/xtlb miss instruction fetch tlb invalid instruction fetch bus error instruction fetch system call breakpoint coprocessor unusable reserved instruction trap integer overflow floating-point exception address error data access tlb/xtlb miss data access tlb invalid data access tlb modification data write watch bus error data access interrupt (lowest priority) user? manual u10504ej7v0um00 183 exception processing 6.4.4 cold reset exception cause the cold reset exception occurs when the coldreset signal is asserted and then deasserted. this exception is not maskable. processing the cpu provides a special interrupt vector for this reset exception: location 0xbfc0 0000 in 32-bit mode location 0xffff ffff bfc0 0000 in 64-bit mode the cold reset vector resides in unmapped and uncached cpu address space, so the hardware need not initialize the tlb or the cache to process this exception. it also means the processor can fetch and execute instructions while the caches and virtual memory are in an undefined state. the contents of all registers in the cpu are undefined when this exception occurs, except for the following register fields: the ts , sr , and rp bits of the status register and the ep (3:0) bits of the con? register are cleared to 0. the erl and bev bits of the status register and the be bit of the con? register are set to 1. the random register is set to the upper-limit value (31). the ec (2:0) bits of the con? register are set to the contents of the divmode(1:0)* pins. * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). servicing the cold reset exception is serviced by: initializing all processor registers, coprocessor registers, tlb, caches, and the memory system performing diagnostic tests bootstrapping the operating system chapter 6 184 user? manual u10504ej7v0um00 6.4.5 soft reset exception cause a soft reset (sometimes called warm reset) occurs when the coldreset signal remains deasserted while the reset pin is deasserted after assertion of more than 16 masterclock cycles. a soft reset immediately resets all state machines, and sets the sr bit of the status register. execution begins at the reset vector when a soft reset occurs. this exception is not maskable. processing the cpu provides a special interrupt vector for this exception (same location as cold reset): location 0xbfc0 0000 in 32-bit mode location 0xffff ffff bfc0 0000 in 64-bit mode this vector is located within unmapped and uncached address space, so that the cache and tlb need not be initialized to process this exception. when a soft reset occurs, the sr bit of the status register is set to distinguish this exception from a cold reset exception. when this exception occurs, the contents of all registers are preserved except for: the program counter value when this exception occurs is set to the errorepc register, when the erl bit of the status register is 0. ts and rp bits of the status register are cleared to 0. erl , sr , and bev bits of the status register are set to 1. because the soft reset can abort cache and access to the system interface, cache and memory state is undefined when this exception occurs. servicing the soft reset exception is serviced by saving the current processor state for self- diagnostic purposes, and reinitializing the system in the same manner as the cold reset exception. user? manual u10504ej7v0um00 185 exception processing 6.4.6 non-maskable interrupt (nmi) exception cause the non-maskable interrupt (nmi) exception occurs in response to the falling edge of the nmi pin. an nmi can also be set by externally writing 1 to the bit 6 of the internal interrupt register through the sysad6 bus. unlike all other interrupts, this interrupt is not maskable; it occurs regardless of the settings of the exl , erl , and the ie bits in the status register. processing the cpu provides a special interrupt vector for this exception (same location as cold reset): location 0xbfc0 0000 in 32-bit mode location 0xffff ffff bfc0 0000 in 64-bit mode this vector is located within unmapped and uncached address space so that the cache and tlb need not be initialized to process this exception. when an nmi exception occurs, the sr bit of the status register is set to differentiate this exception from a reset exception. unlike cold reset and soft reset, but like other exceptions, nmi is taken only at instruction boundaries. the state of the caches and memory system are preserved by this exception. when this exception occurs, the contents of all registers are preserved except for: the program counter value when this exception occurs is set to the errorepc register. ts bit of the status register are cleared to 0. erl , sr , and bev bits of the status register are set to 1. servicing the nmi exception is serviced by saving the current processor state for self- diagnostic purposes, and reinitializing the system in the same manner as the cold reset exception. chapter 6 186 user? manual u10504ej7v0um00 6.4.7 address error exception cause the address error exception occurs when an attempt is made to execute one of the following: execute the lw or sw instruction to the word data that is not located at the word boundary. execute the lh or sh instruction to the halfword data that is not located at the halfword boundary. execute the ld or sd instruction to the doubleword data that is not located at the doubleword boundary. reference the kernel address space from user or supervisor mode reference the supervisor address space from user mode reference an address not in kernel, supervisor, or user space in 64- bit kernel, supervisor, or user mode. this exception is not maskable. processing the common exception vector is used for this exception. the adel or ades code in the cause register is set, indicating whether the instruction caused the exception with an instruction reference ( adel ), load operation ( adel ), or store operation ( ade s). when this exception occurs, the badvaddr register retains the virtual address that was not properly aligned or was referenced in protected address space. the contents of the vpn field of the context and entryhi registers are undefined, as are the contents of the entrylo register. the epc register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot. if it is in a branch delay slot, the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set. servicing the process executing at the time is handed a unix tm sigsegv (segmentation violation) signal by kernel. this error is usually fatal to the process incurring the exception. user? manual u10504ej7v0um00 187 exception processing 6.4.8 tlb exceptions three types of tlb exceptions can occur: tlb miss exception occurs when there is no tlb entry that matches an attempted reference to a mapped address space. tlb invalid exception occurs when a virtual address reference matches a tlb entry that is marked invalid (v bit = 0). tlb modi?ation exception occurs when a store operation virtual address reference to memory matches a tlb entry which is marked valid but is not dirty (the entry is not writable, d bit = 0). as a result, this exception only occurs for the data cache, resulting in a lower priority for this exception. the following describe these tlb exceptions. tlb miss exception (32-bit mode)/xtlb miss exception (64-bit mode) cause the tlb (xtlb) miss exception occurs when there is no tlb entry to match an address to be referenced. this exception is not maskable. processing there are two special vectors for this exception. one is for the 32-bit mode, and the other is for the 64-bit mode. the ux, sx, and kx bits of the status register determine whether the user, supervisor or kernel address spaces referenced are 32-bit or 64-bit spaces. all tlb miss exceptions use these two special vectors when the exl bit is set to 0 in the status register, and they use the common ex- ception vector when the exl bit is set to 1 in the status register. this exception sets the tlbl or tlbs code to the exccode area of the cause reg- ister. if the cause of the exception is an instruction reference or load operation, the tlbl code is set; if the cause is a store operation, the tlbs code is set. when this exception occurs, the badvaddr , context , xcontext and entryhi registers hold the virtual address that failed address translation. the entryhi register also contains the asid from which the translation fault occurred. the random register normally contains a valid location in which to place the replacement tlb entry. the contents of the entrylo register are undefined. the epc register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot, in which case the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set. chapter 6 188 user? manual u10504ej7v0um00 servicing to service this exception, the contents of the context or xcontext register are used as a virtual address to load memory words containing the physical page frame and access control bits to a pair of tlb entries. memory words are written into the tlb through the entrylo0/entrylo1/entryhi register. it is possible that the page frame and access control bit are placed on a page where the virtual address is not resident in the tlb. this condition is processed by allowing a tlb miss exception in the tlb miss exception handler. this second exception goes to the common exception vector because the exl bit of the status register is set. tlb invalid exception cause the tlb invalid exception occurs when a virtual address reference matches a tlb entry that is marked invalid (tlb valid bit cleared). this exception is not maskable. processing the common exception vector is used for this exception. the tlbl or tlbs code is set to the exccode field of the cause register. if the cause of the exception is an instruction reference or load operation, the tlbl code is set; if the cause is a store operation, the tlbs code is set. when this exception occurs, the badvaddr , context , xcontext and entryhi registers contain the virtual address that failed address translation. the entryhi register also contains the asid from which the translation fault occurred. the contents of the entrylo register are undefined. the epc register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot, in which case the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set. user? manual u10504ej7v0um00 189 exception processing servicing a tlb entry is typically marked invalid when one of the following is true: a virtual address does not exist the virtual address exists, but is not in main memory (a page fault) a trap is desired on any reference to the page (for example, to maintain a reference bit) after removing the cause of a tlb invalid exception, place another entry to the location of the tlb entry where the exception has occurred by the tlb probe (tlbp) instruction and set 1 to the v bit. tlb modification exception cause the tlb change exception occurs if the tlb entry that matches the virtual address referenced by the store instruction is disabled from being written (the d bit is 0), though the tlb entry is valid ( v bit is 1). this exception occurs only when an attempt is made to write the data cache. note, however, that the priority of this exception is low. processing the common exception vector is used for this exception, and the mod code is set to the exccode field in the cause register. when this exception occurs, the badvaddr , context , xcontext and entryhi registers contain the virtual address that failed address translation. the entryhi register also contains the asid from which the translation fault occurred. the contents of the entrylo register are undefined. the epc register contains the address of the instruction that caused the exception unless that instruction is in a branch delay slot, in which case the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set. servicing the kernel uses the failed virtual address or virtual page number to identify the corresponding access control bits. the page identified may or may not permit write accesses; if writes are not permitted, a write protection violation occurs. if write accesses are permitted, the page frame is marked dirty/writable by the kernel in its own data structures. chapter 6 190 user? manual u10504ej7v0um00 the tlbp instruction places the index of the tlb entry that must be altered into the index register. the entrylo register is loaded with a word containing the physical page frame and access control bits (with the d bit set), and the contents of the entryhi and entrylo registers are written into the tlb. 6.4.9 bus error exception cause a bus error exception is raised by board-level circuitry for events such as bus time-out, local bus parity errors, and invalid physical memory addresses or access types. this exception is not maskable. a bus error exception occurs only when a cache miss refill, uncached field reference, or unbuffered write occurs synchronously; in concrete terms, a bus error exception occurs if syscmd(0) indicates that the data contains an error when it is transferred on the system bus, regardless of the direction of the transfer between the system and the processor. an exception for the local bus error of the system resulting from a buffered write transaction is generated using the interrupt exception. processing the common interrupt vector is used for a bus error exception. the ibe or dbe code in the exccode field of the cause register is set. if the cause of the exception is an instruction reference (instruction fetch), the ibe code is set. if the cause is a data reference (load/store), the dbe code is set. the epc register contains the address of the instruction that caused the exception, unless it is in a branch delay slot, in which case the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set. user? manual u10504ej7v0um00 191 exception processing servicing the physical address at which the fault occurred can be computed from information available in the system control coprocessor registers. if the ibe code in the cause register is set (indicating an instruction fetch), the virtual address is contained in the epc register (or 4 + the contents of the epc register if the bd bit of the cause register is set). if the dbe code is set (indicating a load or store), the virtual address of the instruction that caused the exception (the address of the preceding branch instruction if the bd bit of the cause register is set) is stored in the epc register (or 4 + the contents of the epc register if the bd bit of the cause register is set). the virtual address of the load and store reference can then be obtained by interpreting the instruction. the physical address can be obtained by using the tlbp instruction and reading the entrylo register to compute the physical page number. the process executing at the time of this exception is handed a unix sigbus (bus error) signal, which is usually fatal. 6.4.10 system call exception cause a system call exception occurs during an attempt to execute the syscall instruction. this exception is not maskable. processing the common exception vector is used for this exception, and the sys code is set to the exccode field in the cause register. the epc register contains the address of the syscall instruction unless it is in a branch delay slot. if the syscall instruction is in a branch delay slot, the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set; otherwise this bit is cleared. chapter 6 192 user? manual u10504ej7v0um00 servicing when this exception occurs, control is transferred to the applicable system routine. to resume execution, the epc register must be altered so that the syscall instruction does not re-execute; this is accomplished by adding a value of 4 to the epc register ( epc register + 4) before returning. if a syscall instruction is in a branch delay slot, the branch instruction is decoded to branch and re-execute. 6.4.11 breakpoint exception cause a breakpoint exception occurs when an attempt is made to execute the break instruction. this exception is not maskable. processing the common exception vector is used for this exception, and the bp code is set to the exccode in the cause register. the epc register contains the address of the break instruction unless it is in a branch delay slot. if the break instruction is in a branch delay slot, the epc register contains the address of the preceding branch instruction and the bd bit of the cause register is set, otherwise the bit is cleared. servicing when the breakpoint exception occurs, servicing is transferred to the applicable system routine. additional information can be passed using the unused bits of the break instruction (bits 25:6). this information can be obtained by reading the contents indicated by the epc register as data. (a value of 4 must be added to the contents of the epc register (epc register + 4) to locate the instruction if it resides in a branch delay slot.) to resume execution, the epc register must be altered so that the break instruction does not re-execute; this is accomplished by adding a value of 4 to the epc register ( epc register + 4) before returning. if a break instruction is in a branch delay slot, decode the branch instruction to get the branch destination and resume execution. user? manual u10504ej7v0um00 193 exception processing 6.4.12 coprocessor unusable exception cause the coprocessor unusable exception occurs when an attempt is made to execute a coprocessor instruction for either: if use of the corresponding coprocessor unit is not marked usable ( cu bits (3:1) of the status register = 0). if the cp0 instruction is executed in the user or supervisor mode when cp0 cannot be used ( cu0 bit of the status register = 0). this exception is not maskable. processing the common exception vector is used for this exception, and the cpu code is set to the exccode in the cause register. the ce bits of the cause register indicate which of the four coprocessors was referenced. the epc register indicates the coprocessor instruction that caused an exception. if the coprocessor instruction that caused the exception is in a branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing the coprocessor unit to which an attempted reference was made is identified by the ce bit of the cause register, process as follows by a handler. if the process is entitled access to the coprocessor, the coprocessor is marked usable and the coprocessor resumes execution. if the process is entitled access to the coprocessor, but the coprocessor does not exist or has failed, decoding of the coprocessor instruction is possible. if the bd bit is set in the cause register, the branch instruction must be decoded; then the coprocessor instruction can be emulated and execution resumed by making the contents of the epc register advanced past the coprocessor instruction. chapter 6 194 user? manual u10504ej7v0um00 if the process is not entitled access to the coprocessor, the kernel informs the current process of the unix sigill/ill_privin_ fault (illegal instruction/privileged instruction fault) signal. this exception is usually fatal. 6.4.13 reserved instruction exception cause the reserved instruction exception occurs when one of the following conditions occurs: an attempt is made to execute an instruction with an unde?ed opcode (bits 31:26) an attempt is made to execute a special instruction with an unde?ed sub-opcode (bits 5:0) an attempt is made to execute a regimm instruction with an unde?ed sub-opcode (bits 20:16) an attempt is made to execute 64-bit operations in 32-bit mode when in user or supervisor modes 64-bit operations are always valid in kernel mode regardless of the value of the kx bit in the status register. this exception is not maskable. processing the common exception vector is used for this exception, and the ri code is set in the exccode field in the cause register. the epc register indicates the instruction that caused an exception if the reserved instruction is not in a branch delay slot, in which case the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing all instructions in the mips isa that are currently defined can be executed. the process executing at the time of this exception is handled by a unix sigill/ ill_resop_fault (illegal instruction/reserved operand fault) signal. this exception is usually fatal. user? manual u10504ej7v0um00 195 exception processing 6.4.14 trap exception cause the trap exception occurs when a tge, tgeu, tlt, tltu, teq, tne, tgei, tgeui, tlti, tltui, teqi, or tnei instruction results in a true condition. this exception is not maskable. processing the common exception vector is used for this exception, and the tr code is set in the exccode field in the cause register. the epc register indicates the trap instruction that caused the exception. if the instruction is in a branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing the process executing at the time of a trap exception is handed a unix sigfpe/ fpe_intovf_trap (floating-point exception/integer overflow) signal by kernel. this exception is usually a fatal error. chapter 6 196 user? manual u10504ej7v0um00 6.4.15 integer overflow exception cause an integer overflow exception occurs when an add, addi, sub, dadd, daddi or dsub instruction results in a 2? complement overflow. this exception is not maskable. processing the common exception vector is used for this exception, and the ov code is set in the exccode field in the cause register. the epc register indicates the instruction that caused the exception. if the instruction is in a branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing the process executing at the time of the exception is handed a unix sigfpe/ fpe_intovf_trap (floating-point exception/integer overflow) signal by kernel. this exception is usually a fatal error to the current process. user? manual u10504ej7v0um00 197 exception processing 6.4.16 floating-point exception cause the floating-point exception is generated by the floating-point coprocessor. this exception is not maskable. processing the common exception vector is used for this exception, and the fpe code is set in the exccode field in the cause register. the contents of the floating-point control/status register indicate the cause of this exception. the epc register indicates the reserved instruction if the instruction is not in a branch delay slot. if the instruction is in the branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing this exception is cleared by clearing the appropriate bit in the floating-point control/status register. for an unimplemented instruction exception, the kernel must emulate the instruction; for other exceptions, the kernel should pass the exception to the user program that caused the exception. chapter 6 198 user? manual u10504ej7v0um00 6.4.17 watch exception cause a watch exception occurs when a load or store instruction references the physical address specified in the watchlo/watchhi registers. the exception is caused by the following instructions: a load instruction when the r bit is set in the watchlo register; a store instruction when the w bit is set in the watchlo register; a load or store instruction when both the r and w bits are set in the watchlo register. the cache instruction never causes a watch exception. the watch exception is postponed if the exl bit is set in the status register. the watch exception is maskable by setting the exl bit in the status register to 1 or by clearing the r and w bits in the watchlo register to 0. processing the common exception vector is used for this exception, and the watch code is set in the exccode field in the cause register. the epc register indicates the load and store instructions if they are not in a branch delay slot. if these instructions are in the branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing the watch exception is a debugging aid; typically the exception handler transfers control to a debugger, allowing the user to examine the situation. to continue, the watch exception must be masked to execute the faulting instruction. the watch exception must then be reenabled. because the contents of the watchlo/watchhi registers become undefined after reset, initialize the registers by software (especially clear the r and w bits to 0). if not initialized, the watch exception may occur. user? manual u10504ej7v0um00 199 exception processing 6.4.18 interrupt exception cause the interrupt exception occurs when one of the eight interrupt conditions (one for timer interrupt; five for hardware interrupt; two for software interrupt) is asserted. the significance of these interrupts is dependent upon the specific system implementation. an interrupt request signal from a pin is detected by the level. each of the eight interrupts can be masked by clearing the corresponding bit in the int-mask field of the status register, and all of the eight interrupts can be masked at once by clearing the ie bit, setting the exl bit, or setting the erl bit of the status register. processing the common exception vector is used for this exception, and the int code is set in the exccode field in the cause register. the ip field of the cause register indicates current interrupt requests. it is possible before this register is read that more than one of the bits can be simultaneously set if the interrupt request signal is asserted; or that more than one of the bits can be simultaneously cleared if the interrupt request signal is deasserted. if the instruction that causes an exception is not in a branch delay slot the epc register indicates that instruction. if the instruction is in the branch delay slot, the epc register indicates the preceding branch instruction and the bd bit of the cause register is set. servicing if the interrupt is caused by one of the two software-generated exceptions ( sw1 or sw0 ), the interrupt condition is cleared by setting the corresponding cause register bit to 0. if an interrupt is generated by the hardware, the interrupt is cleared by asserting inactive the interrupt request signal that has caused the interrupt. if the timer interrupt request is generated, either clear the ip7 bit of the cause register or change the contents of the compare register, to clear this interrupt. chapter 6 200 user? manual u10504ej7v0um00 6.5 exception handling and servicing flowcharts the remainder of this chapter contains flowcharts for the following exceptions and guidelines for their handlers: general purpose exceptions handling and a guideline for their exception handler tlb/xtlb miss exception handling and a guideline for their exception handler cold reset, soft reset and nmi exceptions handling, and a guideline for their handler. generally speaking, the exceptions are handled (?rocessing? by hardware; the exceptions are then handled (?ervicing? by software. user? manual u10504ej7v0um00 201 exception processing figure 6-14 general purpose exception handler (1/2) bev =1 (bootstrap) = 0 (normal) yes no exl=1? (sr1) (a) exceptions other than cold reset, soft reset, nmi, or tlb/xtlb miss handling (hardware) start set fp control status register enhi <- vpn2, asid x/context <- vpn2 set cause register exccode, ce badvaddr register setting instr. in br.dly. slot? comments ; fp control/status register are only set if the respective exception occurs. enhi, x/context are set only for tlb-invalid, modification & miss exceptions. it is not set by bus error exceptions, however. ; check for multiple exception bd bit of cause register <- 0 epc <- pc bd bit of cause register <- 1 epc <- (pc?) exl <- 1 pc <- 0xffff ffff bfc0 0200 + 180 (unmapped, uncached) pc <- 0xffff ffff 8000 0000 + 180 (unmapped, cached) ; processor moves to kernel mode & interrupt disabled to general purpose exception servicing guidelines no yes remark interrupts can be masked by ie or ims and watch is postponed if exl = 1 chapter 6 202 user? manual u10504ej7v0um00 figure 6-14 general purpose exception handler (2/2) comments yes general purpose exception servicing guidelines (b) general purpose exception servicing guidelines (software) ; prevents tlb modification, tlb invalid, and tlb miss exceptions from occurring by using mapping disable area ; exl=1 so watch, interrupt exceptions disabled ; os/system to avoid all other exceptions ; only cold reset, soft reset, nmi exceptions possible. ; optional: interrupts are enabled in kernel mode. mfc0 instruction executed x/context epc status cause ; after exl=0, all exceptions allowed. (except interrupt if masked by ie or im) ; optional: check only if double tlb miss ; save register file ; eret is not allowed in the branch delay slot of another jump instruction ; processor does not execute the instruction which is in the eret instruction? branch delay slot ; pc <- epc, exl <- 0, llbit <- 0 eret mfc0 instruction executed (set status bits:) ksu<- 00 exl <- 0 ie=1 check cause register & jump to appropriate service routine each exception routine service exl = 1 mfc0 instruction executed epc status ts bit of status register = 0? reset the processor no user? manual u10504ej7v0um00 203 exception processing figure 6-15 tlb/xtlb miss exception handler (1/2) bev =1 (bootstrap) = 0 (normal) yes no instr. in br.dly. slot? (a) hardware start enhi <- vpn2, asid x/context <- vpn2 cause register setting (exccode) badvaddr register setting ; check for multiple exception bd bit of cause register <- 0 epc <- pc bd bit of cause register <- 1 epc <- (pc?) exl <- 1 pc <- 0xffff ffff bfc0 0200 + vec. off. (unmapped, uncached) pc <- 0xffff ffff 8000 0000 + vec. off. (unmapped, cached) ; processor moves to kernel mode & interrupt disabled to tlb/xtlb exception servicing guidelines exl = 0? (sr bit 1) exl = 0? (sr bit 1) yes no yes xtlb exception? xtlb miss exception vec. off. = 0x080 tlb miss exception vec. off. = 0x000 no yes (sr bit 22) general purpose exception vec. off. = 0x080 comments no chapter 6 204 user? manual u10504ej7v0um00 figure 6-15 tlb/xtlb miss exception handler (2/2) comments tlb/xtlb exception servicing guidelines mfc0 instruction executed context (b) tlb/xtlb exception servicing guidelines (software) ; prevents tlb modification, tlb invalid, and tlb miss exceptions from occurring by using mapping disable area ; exl=1 so watch, interrupt exceptions disabled ; os/system to avoid all other exceptions ; only cold reset, soft reset, nmi exceptions possible ; load the physical address corresponding to the virtual address in loaded in x/context register to entry lo register and write into the tlb ; there could be a tlb miss again during the mapping of the data or instruction address. the processor may jump to the general purpose exception vector since the exl is 1. ; (either processes tlb miss in general purpose exception handler, or returns to user program by using eret instruction and generates tlb miss exception again.) ; eret is not allowed in the branch delay slot of another jump instruction ; processor does not execute the instruction which is in the eret instruction? branch delay slot ; pc <- epc, exl <- 0, llbit <- 0 each exception routine servicing eret user? manual u10504ej7v0um00 205 exception processing figure 6-16 cold reset, soft reset & nmi exception handler nmi? yes no = 1 =0 eret (optional) status: rp <- 0 (soft reset) bev <- 1 ts <- 0 sr<- 1 erl <- 1 soft reset or nmi exception random <- 31 wired <- 0 update 31? bit of config register status: rp <- 0 bev <- 1 ts <- 0 sr<- 0 erl <- 1 cold reset exception errorepc <- pc pc <- 0xffff ffff bfc0 0000 cold reset, soft reset & nmi exception processing guidelines (hw) nmi exception routine service sr bit of status register servicing of soft reset exception routine servicing of cold reset exception routine comments ; there is no indication from the processor to differentiate between nmi & soft reset; there must be a system level indication. cold reset, soft reset & nmi exception servicing guidelines (sw) 206 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 207 floating-point operations 7 chapter 7 208 user? manual u10504ej7v0um00 7.1 overview all floating-point instructions, as defined in the mips isa for the floating-point coprocessor, cp1, can be processed by the v r 4300. logically, the floating-point arithmetic unit (fpu) exists as an individual coprocessor; however, unlike those of the v r 4400, the v r 4300 fpu is physically integrated into the integer arithmetic unit (cpu). the cpu and the fpu use a common datapath and fpu instructions are fully-implemented in the cpu hardware. unlike the v r 4400 implementation, v r 4300 integer instructions cannot be executed until a multicycle floating-point instruction has been completed. the execution of floating-point instructions can be disabled by the coprocessor usability cu bit defined in the system control coprocessor (cp0) status register. 7.2 fpu programming model this section describes the structure of the registers, memory, and data, and usable general purpose registers. moreover, the fpu registers are described in detail. 7.2.1 floating-point general purpose register (fgr) the fpu has one set of floating-point general purpose register (fgr) and two control registers ( control/status register: fcr31, implementation/revision register: fcr0). the general purpose register can be used in the following three ways. as 32 general purpose registers (32 fgrs), each of which is 32 bits wide when the fr bit in the status register equals 0; or as 32 general purpose registers (32 fgrs), each of which is 64-bits wide when fr equals 1. the cpu accesses these registers through load, store, and transfer instructions. as 16 ?ating-point registers (fpr) (see the next section for a description of fprs), each of which is 64-bits wide, when the fr bit in the status register equals 0. the fprs hold values in either single- or double-precision ?ating-point format. each fpr corresponds to adjacently numbered fgrs as shown in figure 7-1. as 32 ?ating-point registers (fpr) (see the next section for a description of fprs), each of which is 64-bits wide, when the fr bit in the status register equals 1. the fprs hold values in either single- or double-precision ?ating-point format. each fpr corresponds to an individual fgr as shown in figure 7-1. user? manual u10504ej7v0um00 209 floating-point operations figure 7-1 fpu registers control/status register 31 0 31 0 implementation/revision register floating-point control registers (fcr) (fcr0) (fcr31) registers (fpr) (fr bit = 0) floating-point fpr0 0 general purpose registers (fgr) fgr0 fgr1 fgr2 fgr31 fgr30 fgr29 31 fgr3 (low-order) (high-order) fpr2 fpr30 fpr28 fgr28 floating-point 0 general purpose registers (fgr) fgr0 fgr1 fgr2 fgr31 fgr30 fgr29 63 fgr3 fgr28 floating-point registers (fpr) (fr bit = 1) floating-point fpr0 fpr2 fpr30 fpr28 fpr3 fpr1 fpr29 fpr31 (low-order) (high-order) (low-order) (high-order) (low-order) (high-order) chapter 7 210 user? manual u10504ej7v0um00 7.2.2 floating-point registers (fpr) cp1 provides: 16 floating-point registers ( fpr s) when the fr bit in the status register equals 0, or 32 floating-point registers ( fpr s) when the fr bit in the status register equals 1. fpr possesses logical 64-bit registers, holds floating-point values during floating- point operations, and is physically formed from the general purpose registers ( fgr s). fpr can be accessed through a floating-point arithmetic instruction. fpr is physically configured with general purpose registers ( fgr s). when the fr bit in the status register equals 0, the fpr is configured with two 32-bit fgr s. when the fr bit in the status register equals 1, the fpr is configured with a single 64-bit fgr. the fpr s hold values in either single- or double-precision floating-point format. if the fr bit equals 0, only even numbers (the least register, as shown in figure 7- 1) can be used to address fpr s. when the fr bit equals 1, all fpr register numbers are valid. if the fr bit equals 0 during a double-precision floating-point operation, the fgr can be used in double pairs. thus, in a double-precision operation, selecting floating-point register 0 ( fpr0 ) actually uses adjacent floating-point general purpose registers fgr0 and fgr1 . user? manual u10504ej7v0um00 211 floating-point operations 7.2.3 floating-point control registers (fcrs) the fpu in the v r 4000 series (excluding v r 4100) has 32 control registers. with the v r 4300, the following two fcrs are valid. the control/status register (fcr31) controls and monitors exceptions, holds the result of compare operations, and establishes rounding modes. the implementation/revision register (fcr0) holds revision information about the fpu. table 7-1 lists the assignments of the fcr s. table 7-1 floating-point control register assignments 7.2.4 control/status register (fcr31) the control/status register ( fcr31 ) is a read/write register, and holds control data and status data. fcr31 controls the rounding mode and enables occurrence of the floating-point exception. it also indicates the information on the exception that has caused by the instruction executed last and information on the exceptions that have been masked and therefore have not occurred. figure 7-2 shows the configuration of fcr31 . figure 7-2 control/status register bit assignments fcr number use fcr0 coprocessor implementation/revision register fcr1 to fcr30 reserved fcr31 rounding mode, cause, exception enables, and flags control/status register (fcr31) 31 24 23 22 18 17 12 11 7 6 2 1 0 7156552 c rm flags enables cause 0 0 e v z o u i v z o u i v z o u i 25 fs 1 ( chapter 7 212 user? manual u10504ej7v0um00 figure 7-3 control/status register (fcr31) cause, enable, and flag bit fields the contents of fcr31 and fcr0 can be read by using the cfc1 instruction. the bits of fcr31 can be set or cleared by using the ctc1 instruction. fcr0 is a read-only register. the contents of a register to which data is to be written are undefined when an instruction that immediately follows the instruction that writes data to the register is executed. the pipeline does not interlock. the ieee754 specifies detection of an exception during a floating-point operation, setting flags, and calling an exception handler in case of an exception. with the mips architecture, these specifications are realized by the cause, enable, and flag bits of the control/status register. the flag bit conforms to the exception status flag of the ieee754, and the cause and enable bits conform to the exception handler of the ieee754. each bit of fcr31 is described next. ezoui v 17 16 15 14 13 12 unimplemented operation invalid operation division by zero inexact operation overflow underflow bit # zo u i v 11 10 9 8 7 bit # zo u i v 6543 2 bit # cause bits flag bits enable bits user? manual u10504ej7v0um00 213 floating-point operations fs bit the fs bit enables a value that cannot be normalized (denormalized number) to be flashed. when the fs bit is set and the enable bit is not set for the underflow exception and illegal exception, the result of the denormalized number does not cause the unimplemented operation exception, but is flushed. whether the flushed result is 0 or the minimum normalized value is determined depending on the rounding mode (refer to table 7-2 ). if the result is flushed, the flag and cause bits are set for the underflow and illegal exceptions. table 7-2 flush values of denormalized number results c bit when a floating-point compare operation takes place, the result is stored at bit 23, the condition bit. the c bit is set to 1 if the condition is true; the bit is cleared to 0 if the condition is false. bit 23 is affected only by compare and ctc1 instructions. cause, flag, and enable fields figure 7-3 illustrates the cause , enable , and flag fields of the fcr31 . the cause and flag fields are updated by all conversion, computational (except mov.fmt), ctc1, reserved, and unimplemented operation instructions. all other instructions have no affect on these fields. cause bits bits 17:12 in the fcr31 contain cause bits which reflect the results of the most recently executed floating-point instruction. the cause bits are a logical extension of the cp0 cause register; they identify the exceptions raised by the last floating-point operation; and generate exceptions if the corresponding enable bit is set. if more than one exception occurs on a single instruction, each appropriate bit is set. denormalized number result flushed result rounding mode rn rz rp rm positive +0 +0 +2 emin +0 negative -0 -0 -0 -2 emin chapter 7 214 user? manual u10504ej7v0um00 the cause bits are updated by the floating-point operations (except load, store, and transfer instructions). the unimplemented operation instruction ( e ) bit is set to a 1 if software emulation is required, otherwise it remains 0. the other bits are set to 0 or 1 to indicate the occurrence or non-occurrence (respectively) of an ieee754 exception. if the floating-point operation exception occurs, the operation result is not stored, and only the cause bit is influenced. the type of the exception that has been caused by the most-recently-executed floating-point operation can be identified by reading the cause bit. enable bits a floating-point exception is generated any time a cause bit and the corresponding enable bit are set. as soon as the cause bit enabled through the floating-point operation, an exception occurs. when both cause and enable bits are set by the ctc1 instruction, an exception also occurs. there is no enable bit for unimplemented operation instruction ( e ). an unimplemented exception always generates a floating-point exception. before returning from a floating-point exception, software must first clear the cause bits that are enabled to generate exceptions to prevent a repeat of exceptions. thus, user mode programs cannot observe the set cause bits. to use the information by the handler in user mode, save the value of the status register and then call the handler in user mode. if the cause bit is set but the corresponding enable is not set, no floating-point exception occurs and the default result defined by ieee754 is stored. in this case, whether the exceptions were caused by the immediately previous floating-point operation can be determined by reading the cause bit. flag bits the flag bits are cumulative and indicate the exceptions that were raised after reset. flag bits are set to 1 if an ieee754 exception is raised but the occurrence of the exception is prohibited. otherwise, they remain unchanged. the flag bits are never cleared as a side effect of floating-point operations; however, they can be set or cleared by writing a new value into the fcr31 , using a ctc1 instruction. rounding mode control bits bits 1 and 0 in the fcr31 register constitute the rounding mode ( rm ) bits. these bits specify the rounding mode that fpu uses for all floating-point operations. user? manual u10504ej7v0um00 215 floating-point operations table 7-3 rounding mode control bits rm bits mnemonic description bit 1 bit 0 00 rn round result to nearest representable value; round to value with least-significant bit 0 when the two nearest representable values are equally near. 01 rz round toward 0: round to value closest to and not greater in magnitude than the infinitely precise result. 10 rp round toward + : round to value closest to and not less than the infinitely precise result. 11 rm round toward ? : round to value closest to and not greater than the infinitely precise result. chapter 7 216 user? manual u10504ej7v0um00 7.2.5 implementation/revision register (fcr0) the implementation/revision register (fcr0) is a read-only register and holds the implementation identification number and implementation revision number of the fpu. this information is used to revise the coprocessor, determine the performance level, and to execute self-diagnosis. figure 7-4 shows the layout of the register. figure 7-4 implementation/revision register the implementation revision number is a value in the format of y.x, where y is the major revision number stored to the bits 7:4, and x is the minor revision number stored to bits 3:0. revision of the chip can be identified by the implementation revision number. however, the fact that a chip has been changed is not always reflected on the revision number. conversely, a change in the revision number does not always reflect an actual change of the chip. therefore, design the program so that it does not depend on the revision number of this register. 16 15 7 implementation/revision register (fcr0) 31 0 16 rev 88 8 0 imp imp : implementation number (0x0b) rev : revision number in the form of y.x 0 : rfu. returns zeroes when read. user? manual u10504ej7v0um00 217 floating-point operations 7.3 floating-point formats the fpu supports the performances of both 32-bit (single-precision) and 64-bit (double-precision) ieee754 standard floating-point operations. the 32-bit single-precision format has a 24-bit signed fraction field ( s+f ) and an 8-bit exponent ( e ), as shown in figure 7-5. figure 7-5 single-precision floating-point format the double-precision format has a 53-bit signed fraction field ( s+f) and an 11-bit exponent, as shown in figure 7-6. figure 7-6 double-precision floating-point format as shown in the above figures, numbers in floating-point format are composed of three fields: sign ?ld, s exponent, e = e + bias fraction, f = b 1 b 2 ....b p? (value at ?st decimal place or beyond) the range of the unbiased exponent e includes every integer between the two values e min and e max inclusive, together with two other reserved values: ? min -1 (to encode 0 and denormalized numbers) e max +1 (to encode and nans [not a number]) for single- and double-precision formats, each representable nonzero numerical value has just one encoding. for single- and double-precision formats, the value of a number, v , is determined by the equations shown in table 7-4. 31 30 23 22 0 fraction sign exponent 23 18 se f 63 62 52 51 0 fraction sign exponent 52 111 se f chapter 7 218 user? manual u10504ej7v0um00 table 7-4 equations for calculating values in single-and double-precision floating-point format nan (not a number) the ieee754 specifies a floating-point value called nan (not a number). this is not a numeric value and therefore, is not greater or smaller than anything. for all floating-point formats, if v is nan, the most-significant bit of f determines whether the value is a signaling or quiet nan: v is a signaling nan if the most- significant bit of f is set, otherwise, v is a quiet nan. table 7-5 defines the values for the format parameters. table 7-5 floating-point format parameter values no. equation nan (not a number) if e = e max +1 and f 1 0, then v is nan, regardless of s (infinite number) if e = e max +1 and f = 0, then v = (?) s normalized number if e min e e max , then v = (?) s 2 e (1 .f ) denormalized number if e = e min ? and f 1 0, then v = (?) s 2 emin (0 .f ) 0 (zero) if e = e min ? and f = 0, then v = (?) s 0 parameter format single double e max +127 +1023 e min ?26 ?022 exponent bias +127 +1023 exponent width in bits 8 11 integer bit hidden hidden fraction width in bits 24 53 format width in bits 32 64 user? manual u10504ej7v0um00 219 floating-point operations the minimum and maximum values that can be expressed in this floating-point format are shown in table 7-6. table 7-6 minimum and maximum floating-point values type value single-precision floating-point minimum 1.40129846e ?5 single-precision floating-point minimum (normal) 1.17549435e ?8 single-precision floating-point maximum 3.40282347e +38 double-precision floating-point minimum 4.9406564584124654e ?24 double-precision floating-point minimum (normal) 2.2250738585072014e ?08 double-precision floating-point maximum 1.7976931348623157e +308 chapter 7 220 user? manual u10504ej7v0um00 7.4 fixed-point format fixed-point values are held in 2? complement format. unsigned fixed-point values are not directly provided by the floating-point instruction set. figure 7-7 illustrates 32-bit fixed-point format and figure 7-8 illustrates 64-bit fixed-point format. figure 7-7 32-bit fixed-point format figure 7-8 64-bit fixed-point format 31 30 0 sign 31 1 integer s : sign bit i : integer value (2? complement) si 63 62 0 63 1 s : sign bit i : integer value (2? complement) sign integer si user? manual u10504ej7v0um00 221 floating-point operations 7.5 fpu set overview all fpu instructions are 32 bits long, aligned on a word boundary. they can be divided into the following groups: load/store/transfer instructions move data between the fpu general purpose register, control register, cpu, and memory. conversion instructions perform conversion operations between the various data formats. computational instructions perform arithmetic operations on ?ating-point values in fpu registers. compare instructions perform comparisons of the contents of registers and set the results to a condition bit of the fcr31 . fpu branch instructions perform a branch to the speci?d target if the speci?d coprocessor condition is met. for details of each instruction, refer to chapter 17 fpu instruction set details . 7.5.1 floating-point load/store/transfer instructions loads/stores from/to cp1 and memory loads/stores from/to cp1 and memory are accomplished by using one of the following instructions: load word to coprocessor 1 (lwc1) or store word from coprocessor 1 (swc1) instructions, which reference a single 32-bit word of the fp general registers load doubleword (ldc1) or store doubleword (sdc1) instructions, which reference a 64-bit doubleword. these load and store operations are unformatted; no format conversions are performed and therefore no floating-point exceptions can occur due to these operations. chapter 7 222 user? manual u10504ej7v0um00 transfers between cp1 and cpu data can also be moved directly between cp1 general purpose registers and the cpu by using one of the following instructions: move to coprocessor 1 (mtc1) move from coprocessor 1 (mfc1) doubleword move to coprocessor 1 (dmtc1) doubleword move from coprocessor 1 (dmfc1) like the floating-point load and store operations, these operations perform no format conversions and never cause floating-point exceptions. data transfer between cp1 control registers and the cpu is accomplished with the following instructions: move control word to coprocessor 1 (ctc1) move control word from coprocessor 1 (cfc1) load delay and hardware interlocks the instruction immediately following a load or a mtc1 can use the contents of the loaded register. in such cases the hardware interlocks, requiring additional real cycles; for this reason, scheduling load delay slots is desirable to avoid the interlocks. data alignment all coprocessor loads and stores reference the following aligned data items: for word loads and stores, the access type is always word, and the low-order 2 bits of the address must always be 0. for doubleword loads and stores, the access type is always doubleword, and the low-order 3 bits of the address must always be 0. endianness regardless of byte-numbering order (endianness) of the data, the address specifies the byte that has the smallest byte address in the addressed field. for a big-endian system, it is the leftmost byte; for a little-endian system, it is the rightmost byte. table 7-7 lists load, store, and transfer instructions. user? manual u10504ej7v0um00 223 floating-point operations table 7-7 load/store/transfer instructions instruction format and description load word to fpu lwc1 ft, offset (base) sign-extends the 16-bit offset and adds it to the cpu register base to generate an address. loads the contents of the word specified by the address to the fpu general purpose register ft. store word from fpu swc1 ft, offset (base) sign-extends the 16-bit offset and adds it to the cpu register base to generate an address. stores the contents of the fpu general purpose register ft to the memory position specified by the address. load doubleword to fpu ldc1 ft, offset (base) sign-extends the 16-bit offset and adds it to the cpu register base to generate an address. loads the contents of the doubleword specified by the address to the fpu general purpose registers ft and ft+1 when fr = 0, or to the fpu general purpose register ft when fr = 1. store doubleword from fpu sdc1 ft, offset (base) sign-extends the 16-bit offset and adds it to the cpu register base to generate an address. stores the contents of the fpu general purpose registers ft and ft+1 to the memory position specified by the address when fr = 0, and the contents of the fpu general purpose register ft when fr = 1. instruction format and description move word to fpu mtc1 rt, fs transfers the contents of cpu general purpose register rt to fpu general purpose register fs. move word from fpu mfc1 rt, ft transfers the contents of fpu general purpose register fs to cpu general purpose register rt. move control word to fpu ctc1 rt, fs transfers the contents of cpu general purpose register rt to fpu control register fs. move control word from fpu cfc1 rt, fs transfers the contents of fpu control register fs to cpu general purpose register rt. doubleword move to fpu dmtc1 rt, fs transfers the contents of cpu general purpose register rt to fpu general purpose register fs. doubleword move from fpu dmfc1 rt, fs transfers the contents of fpu general purpose register fs to cpu general purpose register rt. op base ft offset rd funct cop1 sub rt fs funct 0 chapter 7 224 user? manual u10504ej7v0um00 7.5.2 convert instructions convert instructions perform conversions between the various data formats such as single- or double-precision, fixed- or floating-point formats. table 7-8 lists conversion instructions. when converting a long integer to a single- or double-precision floating-point number (cvt. [s,d]. l), bits 63:55 of the 64-bit integer must be all zeroes or ones, otherwise the v r 4300 processor raises a floating-point instruction exception. the floating-point instruction exception allows these cases to be handled by software. table 7-8 convert instruction (1/2) instruction format and description floating-point convert to single floating- point format cvt.s.fmt fd, fs converts the contents of floating-point register fs from the specified format (fmt) to a single-precision floating-point format. stores the rounded result to floating-point register fd. floating-point convert to double floating- point format cvt.d.fmt fd, fs converts the contents of floating-point register fs from the specified format (fmt) to a double-precision floating-point format. stores the rounded result to floating-point register fd. floating-point convert to long fixed-point format cvt.l.fmt fd, fs converts the contents of floating-point register fs from the specified format (fmt) to a 64-bit fixed-point format. stores the rounded result to floating- point register fd. floating-point convert to single fixed- point format cvt.w.fmt fd, fs converts the contents of floating-point register fs from the specified format (fmt) to a 32-bit fixed-point format. stores the rounded result to floating- point register fd. floating-point round to long fixed-point format round.l.fmt fd, fs rounds the contents of floating-point register fs to a value closest to the 64- bit fixed-point format and converts them from the specified format (fmt). stores the result to floating-point register fd. floating-point round to single fixed-point format round.w.fmt fd, fs rounds the contents of floating-point register fs to a value closest to the 32- bit fixed-point format and converts them from the specified format (fmt). stores the result to floating-point register fd. floating-point truncate to long fixed-point format trunc.l.fmt fd, fs rounds the contents of floating-point register fs toward 0 and converts them from the specified format (fmt) to a 64-bit fixed-point format. stores the result to floating-point register fd. cop1 fmt 0 fd fs funct user? manual u10504ej7v0um00 225 floating-point operations floating-point truncate to single fixed- point format trunc.w.fmt fd, fs rounds the contents of floating-point register fs toward 0 and converts them from the specified format (fmt) to a 32-bit fixed-point format. stores the result to floating-point register fd. floating-point ceiling to long fixed-point format ceil.l.fmt fd,fs rounds the contents of floating-point register fs toward + and converts them from the specified format (fmt) to a 64-bit fixed-point format. stores the result to floating-point register fd. floating-point ceiling to single fixed-point format ceil.w.fmt fd,fs rounds the contents of floating-point register fs toward + and converts them from the specified format (fmt) to a 32-bit fixed-point format. stores the result to floating-point register fd. floating-point floor to long fixed-point format floor.l.fmt fd, fs rounds the contents of floating-point register fs toward - and converts them from the specified format (fmt) to a 64-bit fixed-point format. stores the result to floating-point register fd. floating-point floor to single fixed-point format floor.w.fmt fd, fs rounds the contents of floating-point register fs toward - and converts them from the specified format (fmt) to a 32-bit fixed-point format. stores the result to floating-point register fd. table 7-8 convert instruction (2/2) instruction format and description cop1 fmt 0 fd fs funct chapter 7 226 user? manual u10504ej7v0um00 7.5.3 computational instructions computational instructions perform arithmetic operations on floating-point values, in registers. table 7-9 lists the computational instructions. there are two categories of computational instructions: 3-operand register-type instructions, which perform ?ating-point add, subtract, multiply, and divide operations 2-operand register-type instructions, which perform ?ating-point absolute value, transfer, square root, and negate operations. table 7-9 computational instructions instruction format and description floating-point add add.fmt fd, fs, ft arithmetically adds the contents of floating-point registers fs and ft in the specified format (fmt). stores the rounded result to floating-point register fd. floating-point subtract sub.fmt fd, fs, ft arithmetically subtracts the contents of floating-point registers fs and ft in the specified format (fmt). stores the rounded result to floating-point register fd. floating-point multiply mul.fmt fd, fs, ft arithmetically multiplies the contents of floating-point registers fs and ft in the specified format (fmt). stores the rounded result to floating-point register fd. floating-point divide div.fmt fd, fs, ft arithmetically divides the contents of floating-point registers fs and ft in the specified format (fmt). stores the rounded result to floating-point register fd. floating-point absolute value abs.fmt fd, fs calculates the arithmetic absolute value of the contents of floating-point register fs in the specified format (fmt). stores the result to floating-point register fd. floating-point move mov.fmt fd, fs copies the contents of floating-point register fs to floating-point register fd in the specified format (fmt). floating-point negate neg.fmt fd, fs arithmetically negates the contents of floating-point register fs in the specified format (fmt). stores the result to floating-point register fd. floating-point square root sqrt.fmt fd, fs calculates arithmetic positive square root of the contents of floating-point register fs in the specified format. stores the rounded result to floating-point register fd. cop1 fmt ft fd fs funct user? manual u10504ej7v0um00 227 floating-point operations fmt appended to the instruction op code of the arithmetic operation and compare instruction indicates the data format. s indicates the single-precision floating decimal point, d indicates the double-precision floating decimal point, l indicates the 64-bit fixed decimal point, and w indicates the 32-bit fixed decimal point. for example, ?dd.d?means that the operand of the addition instruction is a double- precision floating-point value. if the fr bit is 0, an odd-numbered register cannot be specified. 7.5.4 compare instructions the floating-point compare (c.cond.fmt) instructions interpret the contents of two fpu registers ( fs, ft ) in the specified format ( fmt ) and arithmetically compare them. a result is determined based on the comparison and conditions ( cond ) specified in the instruction. table 7-10 lists the compare instructions. table 7-11 lists the mnemonics for the compare instruction conditions. table 7-10 compare instruction instruction format and description floating-point compare c.cond.fmt fs, ft interprets and arithmetically compares the contents of fpu registers fs and ft in the specified format (fmt). the result is identified by comparison and the specified condition (cond). after a delay of one instruction, the comparison result can be used by the fpu branch instruction of the cpu. cop1 fmt ft 0 fs funct chapter 7 228 user? manual u10504ej7v0um00 table 7-11 mnemonics and definitions of compare instruction conditions mnemonic definition mnemonic definition t true f false un unordered or ordered eq equal neq not equal ueq unordered or equal olg ordered or less than or greater than olt ordered less than uge unordered or greater than or equal ult unordered or less than oge ordered greater than or equal ole ordered less than or equal ugt unordered or greater than ule unordered or less than or equal ogt ordered greater than sf signaling false st signaling true ngle not greater than or less than or equal gle greater than, or less than or equal seq signaling equal sne signaling not equal ngl not greater than or less than gl greater than or less than lt less than nlt not less than nge not greater than or equal ge greater than or equal le less than or equal nle not less than or equal ngt not greater than gt greater than user? manual u10504ej7v0um00 229 floating-point operations 7.5.5 fpu branch instructions table 7-12 lists the fpu branch instructions. these instructions can be used to test the result of the compare (c.cond.fmt) instruction. the delay slot in this table indicates the instruction that immediately follows a branch instruction. for details, refer to chapter 4 pipeline . table 7-12 fpu branch instructions instruction format and description branch on fpu true bc1t offset adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits to the left and sign-extended) to calculate the branch target address. if the fpu condition line is true, branches to the target address (delay of one instruction). branch on fpu false bc1f offset adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits to the left and sign-extended) to calculate the branch target address. if the fpu condition line is false, branches to the target address (delay of one instruction). branch on fpu true likely bc1tl offset adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits to the left and sign-extended) to calculate the branch target address. if the fpu condition line is true, branches to the target address (delay of one instruction). if conditional branch does not take place, the instruction in the delay slot is invalidated. branch on fpu false likely bc1fl offset adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits to the left and sign-extended) to calculate the branch target address. if the fpu condition line is false, branches to the target address (delay of one instruction). if conditional branch does not take place, the instruction in the delay slot is invalidated. cop1 bc br offset rd funct chapter 7 230 user? manual u10504ej7v0um00 7.5.6 fpu instruction execution time unlike the cpu, which executes almost all instructions in a single cycle, more time must be used to execute fpu instructions. all data transfer between the floating-point and memory is accomplished by coprocessor load and store operations. data may be directly moved between the floating-point coprocessor and the integer processor by load to and load from coprocessor instructions as shown below: table 7-13 number of load/store/transfer instruction execution cycles to obtain optimum performance, the v r 4300 pipeline does not perform a bypass from ex to ex stage of the next instruction for the floating-point result of a compare, computational, lwc1, or ldc1 instruction. if the subsequent ex- stage floating-point instruction depends on the result of the current ex-stage floating-point instruction, the current floating-point instruction completes and its ex-stage result is registered in the dc stage and the bypass is enabled. meanwhile, the rf-stage floating-point instruction advances to the ex-stage, where it is stalled for one pipeline clock to wait for the result to be bypassed from dc to ex, before it begins execution. caution this limitation on bypass from ex to ex stage of the next instruction does not apply to integer operations nor to float- ing-point load/store/transfer instructions (except lwc1 and ldc1). instruction cycles lwc1 2/1* * * the hardware interlocks for one cycle if the load result is used by the instruction in the load delay slot. swc1 1 ldc1 2/1* sdc1 1 mtc1 1 mfc1 1 dmtc1 1 dmfc1 1 ctc1 1 cfc1 1 user? manual u10504ej7v0um00 231 floating-point operations figure 7-9 dc-to-ex hardware interlock bypass the execution unit of the v r 4300 can shorten the delay time of almost all the floating-point instructions depending on the circumstances. by using this feature, the performance can be improved and design can be simplified. changes in the delay time are simplified as much as possible. if occurrence of an exception is detected by checking the source operand when a multicycle instruction is executed (if a source exception occurs), this multicycle instruction is executed for only 2 cycles, and exception processing is started. similarly, if the result of an operation is found to be the value that does not cause an exception (zero or infinite) as a result of checking the operand, the result (e.g., a value other than 0) is written back 2 cycles after, and the operation ends. floating-point exceptions, except the source exception, are not aborted until instruction execution is completed. in other words, an exception is reported not when it has been found, but when instruction execution has been completed. next, the execution time of each instruction is described. floating-point add/subtract instructions floating point add and subtract terminate on the second cycle if a source exception occurs, or if at least one operand is zero or infinity. the instruction completes on the third cycle in all other cases. fp #1 ?? i-cache run stall run run ic rf ex ex ex ex run fp #2 ?? bypass no bypass allowed ex dc wb i-cache ic rf rf rf rf rf ex ex ex ?? ?? stall run run run run run run run run chapter 7 232 user? manual u10504ej7v0um00 floating-point multiply instruction a floating point multiply completes in two cycles if a source exception is detected, or if, during the first cycle, the result can be determined to be zero or infinity. a floating-point multiply also finishes in the second cycle if at least one of the operands is a power of 2. in all other cases it takes the full number (the maximum specified for each format) of cycles to complete. thus, multiply does not finish as soon as the remaining bits are zero. also, there can be no overlap between multiply and add. floating-point divide/square root instructions floating point divide and square root complete in the second cycle on either a source exception or if, during the first cycle, the result can be determined to be either zero or infinity. otherwise they continue, taking the maximum amount of cycles. floating-point convert instruction floating-point convert instructions also complete in the second cycle for trivial cases. execution cycle numbers of floating-point instructions are listed in table 7-14. if a floating-point result for these instructions is needed by the subsequent instruction, the latency is the execution rate plus one, due to the fact that an ex-to-rf bypass is not performed for the results of these instructions. all cpu/fpu instruction delay times that are not mentioned in these tables have a latency of one pipeline clock cycle (1pclock). user? manual u10504ej7v0um00 233 floating-point operations 7.6 fpu pipeline synchronization since the integer and floating-point units share a common hardware pipeline, a cfc1 instruction is not needed to synchronize the pipeline operation. table 7-14 number of fpu instruction delay cycles *1 *1. if the result of a floating-point instruction is needed by the subsequent instruction, one additional pipeline clock is required to perform a hardware interlock bypass. *2. the multicycle floating-point operation instructions whose results are obvious are not described in this table; it takes two pipeline clocks to complete. *3. the architecturally defined branch delay slot of one cycle also applies to all fpu branch instructions. instruction pipeline cycles *2 sdwl add.fmt 3 3 sub.fmt 3 3 mul.fmt 5 8 div.fmt 29 58 sqrt.fmt 29 58 abs.fmt 1 1 mov.fmt 1 1 neg.fmt 1 1 round.w.fmt 5 5 trunc.w.fmt 5 5 ceil.w.fmt 5 5 floor.w.fmt 5 5 round.l.fmt 5 5 trunc.l.fmt 5 5 ceil.l.fmt 5 5 floor.l.fmt 5 5 cvt.s.fmt - 2 5 5 cvt.d.fmt 1 - 5 5 cvt.w.fmt 5 5 cvt.l.fmt 5 5 c.cond.fmt 1 1 bc1t *3 1 bc1f *3 1 bc1tl *3 1 bc1fl *3 1 234 user? manual u10504ej7v0um00 [memo] [memo] user? manual u10504ej7v0um00 235 floating-point exceptions 8 this chapter explains how the fpu handles the floating-point exception. chapter 8 236 user? manual u10504ej7v0um00 8.1 types of exceptions the floating-point exception occurs if a floating-point operation or the result of the operation cannot be handled by the ordinary method. the fpu performs either of the following two operations in case of an exception. when exception is enabled sets the cause bit of the control/status register ( fcr31 ) of the fpu, and transfers servicing to the exception handler routine (software servicing). when exception is disabled stores an appropriate value (default value) to the destination register of the fpu, sets the cause bit and ?g bit of fcr31 , and continues execution. the fpu supports the five ieee754 exceptions: inexact (i) over?w (o) under?w (u) division by zero (z) invalid operation (v) cause bits, enable bits, and flag bits ( status flags) are used. fpu has an unimplemented operation (e) as the sixth exception cause, which is used when the floating-point operation cannot be executed with the standard mips architecture (including when the fpu cannot correctly process exceptions). this exception requires service by the software. the e bit does not exit in the enable or flag bit. when this exception occurs, unimplemented exception processing is executed (when interrupt input by the fpu to the cpu is enabled). figure 8-1 shows the bits of the fcr31 used to support the exception. remark the unimplemented operation exception is defined by the ieee754 standard. with the v r 4300, however, this is an exception that occurs if an operation not supported by the hardware is executed. user? manual u10504ej7v0um00 237 floating-point exceptions figure 8-1 fcr31 cause/enable/flag bits the five exceptions (v, z, o, u, and i) of the ieee754 are enabled when the enable bit is set. when an exception occurs, the corresponding cause bit is set. if the corresponding enable bit is set, the fpu generates an interrupt to the cpu, and starts exception processing. if occurrence of the exception is disabled, the cause and flag bits corresponding to the exception are set. 8.2 exception processing when a floating-point exception is taken, the cause register of the cp0 indicates the fpu is the cause of the exception. the floating-point exception (fpe) code is used, and the cause bits of the fcr31 indicate the reason for the floating-point exception. these bits are, in effect, an extension of the cp0 cause register. ezoui v 17 16 15 14 13 12 unimplemented operation invalid operation division by zero inexact operation overflow underflow bit # zo u i v 11 10 9 8 7 bit # zo u i v 654 3 2 bit # cause bits flag bits enable bits chapter 8 238 user? manual u10504ej7v0um00 8.2.1 flags flag bits corresponding to the respective ieee754 exceptions are provided. the flag bit is set when occurrence of the corresponding exception is disabled and when the condition of the exception is detected. the flag bit can be reset by writing a new value to the status register by using the ctc1 instruction. if an exception is disabled by the corresponding enable bit, the fpu performs predetermined processing. this processing gives the default value as the result, instead of the result of the floating-point operation. this default value is determined by the type of the exception. in the case of the overflow and underflow exceptions, the default value differs depending on the rounding mode used at that time. table 8-1 shows the default values to be given by the respective ieee754 exceptions of the fpu. table 8-1 default fpu ieee754 exception values field description rounding mode default values v invalid operation supply a quiet not a number (q-nan) z division by zero supply a properly signed o overflow rn signed with intermediate result rz maximum normal number signed with intermediate result rp negative overflow: maximum negative normal number positive overflow: + rm positive overflow: maximum positive normal number negative overflow: - u underflow rn 0 signed with intermediate result rz 0 signed with intermediate result rp positive underflow: minimum positive normal number negative underflow: 0 rm negative underflow: minimum negative normal number positive underflow: 0 i inexact exception e supply a rounded result user? manual u10504ej7v0um00 239 floating-point exceptions the fpu detects the nine exception causes internally. when the fpu detects one of these unusual situations, it causes either an ieee754 exception or an unimplemented operation exception (e). table 8-2 lists the exception-causing situations and compares the contents of the cause bits of the fpu with the ieee754 standard when each exception occurs. table 8-2 fpu internal results and flag status *1. with the ieee754, the inexact operation exception occurs only if an overflow occurs only when the overflow exception is disabled. however, the v r 4300 always generates the overflow exception and inexact operation exception when an overflow occurs. *2. if both the underflow exception and inexact operation exception are disabled when the exponent underflow occurs, and if the fs bit of fcr31 is set, the cause bit and flag bit of the underflow exception and inexact operation exception are set. otherwise, the cause bit of the unimplemented operation exception is set. next, each fpu exception is described. fpu internal result ieee754 exception enable exception disable remarks inexact result i i i loss of accuracy exponent overflow o,i *1 o,i o,i normalized exponent > e max division by zero z z z zero is (exponent = e min -1, mantissa = 0) overflow on convert to integer v e e source out of integer range signaling nan (s-nan) source vvv invalid operation v v v *2 0/0, etc. exponent underflow u e u, i normalized exponent < e min denormalized source none e e exponent = e min -1 and mantissa 1 0 q-nan none e e chapter 8 240 user? manual u10504ej7v0um00 8.2.2 inexact exception (i) the fpu generates the inexact operation exception in the following cases. if the accuracy of the rounded result drops if the rounded result over?ws if the rounded result under?ws and if the fs bit of fcr31 is set with the under?w and illegal operation exceptions disabled if exception is enabled: the destination register is not modified, the source registers are preserved and an inexact operation exception occurs. if exception is not enabled: the rounded result or underflowed/overflowed result is delivered to the destination register if no other exception occurs. 8.2.3 invalid operation exception (v) the invalid operation exception is generated if one or both of the operands are invalid. when the exception is not enabled, the mips isa defines the result as a quiet not a number (q-nan). the invalid operations are: add or subtract: add and subtract of in?ities, such as: ( + ) + ( e ) or ( e ) e ( e ) multiply: 0 divide: 0 ? 0, or ? compare of predicates involving < or > without ? , when the operands are unordered any arithmetic operation, when one or both operands is a s-nan. a transfer (mov) operation is not considered to be an arithmetic operation, but absolute value (abs) and negate (neg) are. compare or convert to ?oating-point operation when the operand is s-nan. square root: , where x is less than zero. x user? manual u10504ej7v0um00 241 floating-point exceptions software can simulate the invalid operation exception for other operations that are invalid for the given source operands. examples of these operations include ieee754-specified functions implemented in software, such as remainder x rem y , where y is 0 or x is infinite; conversion of a floating-point number to a decimal format whose value causes an overflow, is infinity, or is nan; and transcendental functions, such as ln (?) or cos ? (3). refer to chapter 17 fpu instruction set details . refer to appendix b for examples or for routines to handle these cases. if exception is enabled: the destination register is not modified, the source registers are preserved, and the invalid operation exception occurs. if exception is not enabled: if any other exception does not occur, q-nan is stored to the destination register. 8.2.4 divide-by-zero exception (z) the division-by-zero exception occurs if the divisor is zero and the dividend is a finite nonzero number. this exception occurs due to other operations that produce a signed infinity, such as ln(0), sec( p /2) or q -1 . if exception is enabled: the contents of the destination register are not changed, the contents of the source register are preserved, and the zero division exception occurs. if exception is not enabled: if any other exception does not occur, the infinite number ( ) determined by the sign of the operand is stored to the destination register. chapter 8 242 user? manual u10504ej7v0um00 8.2.5 overflow exception (o) the overflow exception occurs when the magnitude of the rounded floating-point result, with an unbounded exponent range, is larger than the largest finite number of the destination format. (an inexact exception and flag bit is set.) if exception is enabled: the contents of the destination register is not modified, and the source registers are preserved, and the overflow exception occurs. if exception is not enabled: if any other exception does not occur, the default value determined by the rounding mode is stored to the destination register (refer to table 8-1 default fpu ieee754 exception values ). 8.2.6 underflow exception (u) two related events generate the underflow exception: if the operation result is ? emin to +2 emin (other than 0) extraordinary loss of accuracy during the arithmetic operation of such tiny numbers by denormalized numbers. the ieee754 provides several methods of underflow detection. note, however, that the same detection method must be used for any processing. the following two methods are used to detect an underflow. after rounding (when a nonzero result, computed as though the exponent range were unbounded, would lie strictly between 2 emin ) before rounding (when a nonzero result, computed as though the exponent range and the precision were unbounded, would lie strictly between 2 emin ). the mips architecture detects an underflow after rounding. to detect a drop in the accuracy, the following two methods are used. denormalize loss (if a given result differs from the result calculated when the exponent range is intnite) inexact result (if a given result differs from the result calculated when the exponent range and accuracy are intnite) the mips architecture detects a drop in the accuracy as an inexact result. user? manual u10504ej7v0um00 243 floating-point exceptions if exception is enabled: if the underflow exception or inexact operation exception is enabled, or if the fs bit of the fcr31 register is not set, the unimplemented operation exception (e) occurs. at this time, the contents of the destination register are not changed. if exception is not enabled: if the underflow exception and inexact operation exception are disabled, and if the fs bit of the fcr31 register are set, the default value determined by the rounding mode is stored to the destination register (refer to table 8-1 default fpu ieee754 exception values ). 8.2.7 unimplemented operation exception (e) if an attempt is made to execute an instruction of an operation code or format code reserved for future expansion, the e bit is set and an exception occurs. the operand and the contents of the destination register are not changed. usually, instructions are emulated by software. if the ieee754 exceptions occur from an emulated operation, simulate those exceptions. the unimplemented operation exception also occurs in the following cases. these are cases where an abnormal operand that cannot be handled correctly by hardware, or an abnormal result is detected. if the operand is a denormalized number (except compare instruction) if the operand is q-nan (except compare instruction) if the result is a denormalized number or under?ws when the under?w/inexact operation exception is enabled and when the fs bit of the fcr31 register is set if a reserved instruction is executed if a unimplemented format is used if a format whose operation is invalid is used (e.g., cvt.s.s) caution if the type conversion or arithmetic operation instruction is executed and if the operand is a denormalized number or nan, the exception occurs. the exception does not occur even if the operand is a denormalized number of nan when the transfer instruction is executed. how to use the unimplemented operation exception is arbitrarily determined by the system. to maintain complete compatibility with the ieee754, the unimplemented operation exception can be handled by software if occurs. chapter 8 244 user? manual u10504ej7v0um00 if exception is enabled: the contents of the destination register are not changed, the contents of the source register are preserved, and the unimplemented operation exception occurs. if exception is not enabled: this exception cannot be disabled because there is no corresponding enable bit. restrictions: an unimplemented operation exception will occur in response to the execution of a type conversion instruction in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan the type conversion instructions affected by this restriction are as follows. ceil.l.fmt fd, fs floor.l.fmt fd, fs ceil.w.fmt fd, fs floor.w.fmt fd, fs cvt.d.fmt fd, fs round.l.fmt fd, fs cvt.l.fmt fd, fs round.w.fmt fd, fs cvt.s.fmt fd, fs trunc.l.fmt fd, fs cvt.w.fmt fd, fs trunc.w.fmt fd, fs 8.3 saving and returning state sixteen doubleword * ldc1 or sdc1 operations save or return the coprocessor floating-point register state in memory. the information in the control and status register can be saved or returned to the cpu register through cfc1 and ctc1 instructions. normally, the control / status register is saved first and returned last. when state is returned, state information in the control / status register indicates the exceptions that are pending. writing a zero value to the cause field of fcr31 register clears all pending exceptions, permitting normal processing to restart after the floating-point register state is returned. * 32 doublewords if the fr bit is set to 1. user? manual u10504ej7v0um00 245 floating-point exceptions 8.4 handling of ieee754 exceptions the ieee754 recommends the exception handler for any of the five standard exceptions; the exception handler can compute and restore a substitute result in the destination register. by retrieving an instruction using the processor exception program counter (epc ) register, the exception handler determines: exceptions occurring during the operation the operation being performed the destination format to obtain the correct rounded result if the overflow, underflow (except when the conversion instruction is executed), or inexact operation exception occurs, develop software that checks the source register or that simulates the instructions while an exception handler is executed. on invalid operation and divide-by-zero exceptions, conversions, and on overflow or underflow exceptions occurred on floating-point, the exception handler gains access to the operand values by examining the source registers of the instruction. the ieee754 recommends that, if enabled, the overflow and underflow exceptions take precedence over a separate inexact exception. this prioritization is accomplished in software; hardware sets the bits for both the overflow or underflow exception and the inexact exception. 246 user? manual u10504ej7v0um00 [memo] [memo] user? manual u10504ej7v0um00 247 initialization interface 9 this chapter describes the v r 4300 initialization interface, and the processor modes. this includes the reset signal description and types, and initialization sequence, with signals and timing dependencies, and the user-selectable v r 4300 processor modes. chapter 9 248 user? manual u10504ej7v0um00 9.1 functional overview the v r 4300 processor has the following three types of resets; they use the coldreset and reset signals. power-on reset : when the coldreset signal is asserted active after the power is applied and has become stable all clocks are restarted. a power-on reset completely initializes the internal state of the processor without saving any state information. cold reset : when the coldreset signal is asserted active while the processor is operating all clocks are restarted. a cold reset completely initializes the internal state of the processor without saving any state information. soft reset : restarts processor, but does not affect clocks. the major part of the initial status of the processor can be retained by using soft reset. after reset, the processor is bus master and drives the sysad(31:0) bus. care must be taken to coordinate system reset with other system elements. in general, bus errors immediately before, during, or after a reset may result in undefined operations. since the initialization of the internal state by a reset of the v r 4300 processor is performed only for some parts, make sure to completely initialize the processor through software. the operation of each type of reset is described in sections that follow. refer to figures 9-1 to 9-3 later in this chapter for timing diagrams of the power-on, cold, and soft resets. user? manual u10504ej7v0um00 249 initialization interface 9.2 reset signal description this section describes the two reset signals, coldreset and reset . coldreset signal the coldreset signal must be asserted active to initialize the processor using power-on reset or cold reset. at this time, the reset signal can be asserted active or inactive. set divmode (1:0)* before the power-on reset. do not deassert the coldreset signal inactive at least for 64000 masterclock cycles after the signal has been asserted active. the coldreset signal may be controlled not in synchronization with the masterclock . when the coldreset signal is deasserted inactive, the sclock , tclock , and syncout clock signals start operating in synchronization with the masterclock . * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). reset signal assert this pin active or inactive in synchronization with masterclock , or keep it inactive at power-on reset or cold reset. assert this pin active or inactive in synchronization with masterclock at soft reset. 9.2.1 power-on reset power-on reset is used to completely reset the processor. as a result: the ts , sr , and rp bits of the status register and ep (3:0) bits of the con? register are cleared to 0. the erl and rev bits of the status register and be bit of the con? register are set to 1. the upper-limit value (31) is assigned to the random register. the ec (2:0) bits of the con? register are assigned to the contents of the divmode (1:0)* pins. all the other internal statuses are unde?ed. * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). after the power supply to the processor has stabilized after power-on reset, assert the coldreset signal active for the duration of 64000 masterclock cycles or more (0.96 ms during external 66.7-mhz operation). chapter 9 250 user? manual u10504ej7v0um00 determine the divmode signal until the coldreset signal is asserted active. the divmode signal cannot be changed after that. if the divmode signal is changed after the coldreset signal has been asserted active, the operation of the processor is not guaranteed. when asserting the coldreset signal active, the reset signal may be active or inactive. however, do not change the value of the reset signal during the reset sequence. keep the reset signal active for the duration of 16 masterclock cycles immediately after the coldreset signal has been deasserted inactive. the output signals of the system interface are as follows during the reset period. pvalid signal : 1 preq signal : 1 pmaster signal : 0 sysad (31:0) : unde?ed syscmd (4:0) : unde?ed when resetting has been completed, the processor serves as the bus master and drives sysad (31:0) . the processor branches to a reset exception vector and starts executing a reset exception code. 9.2.2 cold reset a cold reset is used to completely reset the processor. the ts , sr , and rp bits of the status register and the ep (3:0) bits of the con? register are cleared to 0 the er l and bev bits of the status register and the be bit of the con? register are set to 1 the value of the upper bound (31) is set to the random register all states other than above are unde?ed when executing cold reset, keep the coldreset signal active for the duration of 64000 masterclock cycles or more (0.96 ms during external 66.7-mhz operation). when asserting the coldreset signal active, the reset signal may be active or inactive. however, do not change the value of the reset signal during reset sequence. user? manual u10504ej7v0um00 251 initialization interface keep the reset signal active for the duration of 16 masterclock cycles immediately after the coldreset signal has been deasserted inactive. the output signals of the system interface are as follows during the reset period. pvalid signal : 1 preq signal : 1 pmaster signal : 0 sysad (31:0) : unde?ed syscmd (4:0) : unde?ed when resetting has been completed, the processor serves as the bus master and drives sysad (31:0) . the processor branches to a reset exception vector and starts executing a reset exception code. 9.2.3 soft reset a soft reset is used to reset the processor without affecting the output clocks; in other words, a soft reset is a logic reset. in a soft reset, the processor retains as much state information as possible; all state information except for the following is retained: the status register bev , sr , and erl bits are set (to 1) the status register ts and rp bit is cleared (to 0) because soft reset is executed as soon as the reset signal has asserted active, undefined data remains as a result if a multicycle instruction or floating-point instruction such as cache miss is executed. keep the reset signal asserted active at least for the duration of 16 masterclock cycles. at this time, satisfy the setup and hold times with the masterclock . after the reset is completed, the processor becomes bus master and drives the sysad(31:0) bus, the processor branches to the reset exception vector and begins executing the reset exception code. if reset signal is asserted in the middle of a sysad(31:0) transaction, care must be taken to reset all external agents to avoid sysad(31:0) bus contention. chapter 9 252 user? manual u10504ej7v0um00 figure 9-1 power-on reset figure 9-2 cold reset t dh t ds t dh t ds 3 64000 masterclock cycles 3 16 masterclock cycles undefined undefined masterclock (input) reset (input) coldreset (input) divmode(1:0)* (input) syncout (output) tclock (output) * determine the divmode signal before the coldreset signal is asserted active. in v r 4300 and v r 4305. in v r 4310, divmode(2:0). t dh t ds t dh t ds 3 64000 masterclock cycles 3 16 masterclock cycles undefined undefined masterclock (input) reset (input) coldreset (input) syncout (output) tclock (output) user? manual u10504ej7v0um00 253 initialization interface figure 9-3 soft reset t dh t ds t ds 3 16 masterclock cycles masterclock (input) reset (input) coldreset (input) syncout (output) tclock (output) h t dh chapter 9 254 user? manual u10504ej7v0um00 9.3 v r 4300 processor modes the v r 4300 processor supports several user-selectable modes. all modes except divmode are set/reset by writing to the config register. 9.3.1 power modes the v r 4300 supports three power modes: normal power, low power (100 mhz model of the v r 4300 and the v r 4305 only), and power-off. normal power mode normally the processor clock ( pclock ) is generated from the input clock ( masterclock) . the frequency ratio of the pclock to the masterclock is set by the divmode(1:0)* . for the setting, refer to table 2-2 clock/control interface signals . the frequency of the system interface clock ( sclock ) is the same as those of the masterclock . default state is normal clocking, and the processor returns to default state after any reset. * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). low power mode (100 mhz model of v r 4300 and v r 4305 only) the user may set the processor to low power mode by setting the rp bit of the status register to 1. in rp mode, the processor stalls the pipeline and goes into a quiescent state?he store buffers empty and all cache misses resolved. however, the rp mode operation is guaranteed only when the masterclock is 40 mhz or more. the frequency of pclock drops to the 1/4 of the normal level. the speeds of sclock and tclock also drop to the 1/4 of the normal level. this feature reduces the power consumed by the processor chip to 25% of its normal value. software must guarantee the proper operation of the system upon setting or clearing the rp bit. 1. the functions of circuits such as the dram refresh counter change if the operating frequency changes. therefore, write new values to the registers of the external agent that are directly affected by changes in frequency. 2. set the system interface in the inactive status. for example, execute a read instruction to the non-cache area, and make the write buffer empty before completion of the instruction execution. then the rp bit can be set or cleared. user? manual u10504ej7v0um00 255 initialization interface 3. make sure that the eight instructions before and after the mtc0 instruction that sets or clears the rp bit do not generate exceptions such as cache miss and tlb miss. power off mode before entering power off mode, the system retains as much information as possible by writing the contents of the cp0, floating-point registers and the program counter to the memory. dirty data cache lines are also written out to memory. 9.3.2 privilege modes the v r 4300 supports three modes of system privilege: kernel, supervisor, and user extended addressing. this section describes these three modes. kernel extended addressing when the kx bit is set to 1 by the status register, the expansion tlb miss exception vector is used if the tlb miss exception of the kernel address occurs. in the kernel mode, the mipsiii instruction set can be always used regardless of the kx bit. supervisor extended addressing if the sx bit is set to 1 by the status register, the mipsiii instruction set can be used in the supervisor mode, and the expansion tlb miss exception vector is used if the tlb miss exception of the supervisor address occurs. if this bit is cleared, the mipsi and ii instruction sets and 32-bit virtual addresses are used. user extended addressing if the ux bit is set to 1 by the status register, the mipsiii instruction set can be used in the user mode, and the expansion tlb miss exception vector is used if the tlb miss exception of the user address occurs. if this bit is cleared, the mipsi and ii instruction sets and 32-bit virtual addresses are used. 9.3.3 floating-point registers if the fr bit of the status register is set to 1, all the thirty-two 64-bit floating-point registers defined by the mipsiii architecture can be accessed. if this bit is cleared, the processor accesses the sixteen 64-bit floating-point registers defined by the mipsii architecture. chapter 9 256 user? manual u10504ej7v0um00 9.3.4 reverse endianness if the re bit of the status register is set to 1, the endian in the user mode is reversed. 9.3.5 instruction trace support if the its bit of the status register is set to 1, the physical address at the branch destination can be output from sysad(31:0) when the instruction address is changed by execution of a jump or branch instruction or by occurrence of an exception. this function is disabled when the its bit is cleared. use this function to forcibly generate an instruction cache miss in the following cases. if the branch condition is satis?d when a branch instruction is executed if the contents of the pc are changed by execution of a jump instruction or by occurrence of an exception when the instruction cache miss occurs, a processor block read request is issued from the sysad(31:0) . this informs the change in the address to the outside. return the response data to the processor block read request in the same manner as for a normal request. the address to be output is not a pc value (virtual address) but a physical address. 9.3.6 bootstrap exception vector (bev) this bit is used when diagnostic tests cause exceptions to occur prior to verifying proper operation of the cache and main memory system. the bootstrap exception vector ( bev ) bit is automatically set to 1 at cold reset or soft reset and on occurrence of the nmi exception. this bit can also be set by software. when set, the bootstrap exception vector ( bev) bit in the status register causes the tlb miss exception vector to be relocated to a virtual address of 0xffff ffff bfc0 0200 and the general exception vector relocated to address 0xffff ffff bfc0 0380. when bev is cleared, these vectors are located at 0xffff ffff 8000 0000 (tlb refill) and 0xffff ffff 8000 0180 (general). 9.3.7 interrupt enable (ie) when the ie bit in the status register is cleared, interrupts are not allowed, with the exception of reset and the non-maskable interrupt. user? manual u10504ej7v0um00 257 clock interface 10 this chapter describes the clock signals (?locks? used in the v r 4300 processor. chapter 10 258 user? manual u10504ej7v0um00 10.1 signal terminology the following terminology is used in this chapter (and book) when describing signals: rising edge indicates a low-to-high transition. falling edge indicates a high-to-low transition. clock-to-q delay is the amount of time that is taken for a signal to move from the input of a device ( clock ) to the output of the device ( q ). figures 10-1 and 10-2 illustrate these terms. figure 10-1 signal transitions figure 10-2 clock-to-q delay 1 2 3 4 high-to-low transition low-to-high transition single clock cycle clock input q data in data out clock-to-q delay user? manual u10504ej7v0um00 259 clock interface 10.2 basic system clocks the various clock signals used in the v r 4300 processor are described below. masterclock the internal and external (system interface) clocks of the v r 4300 are generated and operate based on the masterclock . syncin/syncout the v r 4300 processor generates syncout at the same frequency as masterclock and aligns syncin with masterclock . syncout must be connected to syncin either directly, or through an external buffer. the processor can compensate for both output driver and input buffer delays when aligning syncin with masterclock . when syncout is connected to syncin through an external buffer as illustrated in figure 10-7, delay caused by external buffers connected to clock outputs can also be compensated. pclock the pclock is selected by setting the frequency ratio between the pclock and the masterclock . this ratio is set by the divmode pins on power application. table 10-1 indicates the selectable frequency ratio. for details of the divmode pins settings, refer to table 2-2 clock/control interface signals . when the low power mode (100 mhz model of the v r 4300 and the v r 4305 only) is set by setting the rp bit of the status register, the frequency of pclock decreases to the 1/4 of the normal level. all the internal registers and latches use pclock . table 10-1 frequency ratio between pclock and masterclock *1. selectable with the 100 mhz model only (with the 133 mhz model, this setting is reserved.) 2. selectable with the 133 mhz model only (with the 100 mhz model, this setting is reserved.) 3. selectable with the 167 mhz model only (with the 133 mhz model, this setting is reserved.) product name divmode pin selectable frequency ratio (masterclock : pclock) v r 4300 divmode (1 : 0) 1 : 1.5 *1 , 1 : 2, 1 : 3, 1 : 4 *2 v r 4305 divmode (1 : 0) 1 : 1, 1 : 2, 1 : 3 v r 4310 divmode (2 : 0) 1 : 2, 1 : 2.5 *3 , 1 : 3, 1 : 4, 1 : 5, 1 : 6 chapter 10 260 user? manual u10504ej7v0um00 sclock the frequency of the system interface clock ( sclock ) is equal to that of masterclock , and sclock is synchronized with masterclock . because sclock is generated from pclock , the frequency of sclock also drops to the 1/4 of the normal level, like the frequency of pclock , when the low power mode (100 mhz model of the v r 4300 and the v r 4305 only) is set. the output of the v r 4300 is driven at the edge of sclock . sclock rises in synchronization with the first rising edge of masterclock immediately after coldreset is deasserted inactive. tclock tclock (transfer/receive clock) is the reference clock of the output and input registers of the external agent. it is also used as the global clock of the external agent, and a clock can be supplied to all the logic circuits in the external agent. tclock is the same as sclock in frequency, and its edge is accurately synchronized with that of sclock . when syncin is connected to syncout , tclock can also be synchronized with masterclock . user? manual u10504ej7v0um00 261 clock interface figure 10-3 when frequency ratio of masterclock to pclock is 1:1.5 cycle 1 2 3 4 masterclock t mckhigh t mcklow t mckp pclock sclock tclock sysad(31:0) d d d d t do d d d d t ds t dh (input) (internal) (internal) (output) (driven by sysad(31:0) (received by processor) processor) chapter 10 262 user? manual u10504ej7v0um00 figure 10-4 when frequency ratio of masterclock to pclock is 1:2 masterclock pclock sclock tclock sysad(31:0) t mckhigh t mcklow t mckp cycle 1 2 3 4 (input) (internal) (internal) (output) (driven by processor) sysad(31:0) (received by processor) t do t ds t dh dddd dd d d user? manual u10504ej7v0um00 263 clock interface 10.3 system timing parameters as shown in figures 10-3 and 10-4, data provided to the processor must be stable a minimum of t ds nanoseconds (ns) before the rising edge of sclock and be held valid for a minimum of t dh ns after the rising edge of sclock . 10.3.1 synchronization with sclock processor data becomes stable t do ns after the rising edge of sclock . this drive- time is the sum of the maximum delay through the processor output drivers together with the maximum clock-to-q delay of the processor output registers. 10.3.2 synchronization with masterclock certain processor inputs (specifically reset ) are sampled based on masterclock . the same setup, hold, and off time, t ds , t dh , and t do , shown in figures 10-3 and 10-4, apply to these inputs, measured by masterclock . 10.3.3 phase-locked loop (pll) the processor synchronizes syncout , pclock , sclock , and tclock with internal phase-locked loop (pll) circuits that generate aligned clocks based on syncout / syncin . by their nature, pll circuits are only capable of generating synchronized clocks with the masterclock frequencies within a limited range. clocks generated using pll circuits contain some inherent inaccuracy, or jitter ; a clock synchronized with masterclock by the pll can lead or trail masterclock by as much as the related maximum jitter (t mcjitter ). chapter 10 264 user? manual u10504ej7v0um00 10.4 low power mode operation usually, pclock is generated based on masterclock at the frequency ratio set by the divmode(1:0) *1 pins (for the setting, refer to table 2-2 clock/control interface signals ). the frequency of the system interface clock ( sclock ) is the same as that of masterclock . to set the low power mode (rp) *2 , set the rp bit of the status register by using a transfer instruction. when the rp mode has been set, the processor stalls the pipeline which then enters the pause (quiescent) status (in other words, the store buffer becomes empty and all cache misses are solved). next, the frequency of pclock drops to the 1/4 in the normal mode. the frequency of sclock also drops to the 1/4 of the normal level (10 mhz). the normal clocks can be restored by executing reset. for the procedure to set or clear the rp bit, refer to low power mode in 9.3.1 . *1. in v r 4300 and v r 4305. in v r 4310, divmode(2:0). 2. 100 mhz model of the v r 4300 and the v r 4305 only user? manual u10504ej7v0um00 265 clock interface 10.5 connecting clocks to a phase-locked system when the processor is used in a phase-locked system, the external agent must phase lock its operation to a common masterclock . in such a system, the transmission of data and data sampling have common characteristics, even if the components have different delay values. for example, transmission time (the amount of time a signal takes to move from one component to another along a trace on the board) between any two components a and b of a phase-locked system can be calculated from the following equation: transmission time = (sclock period) ? (t do for a) ?(t ds for b) (clock jitter for a max) ?(clock jitter for b max) figure 10-5 shows a block diagram of a phase-locked system using the v r 4300 processor. figure 10-5 phase-locked system masterclock v r 4300 tclock sysad(31:0) syscmd(4:0) masterclock syncout syncin masterclock external agent syscmd(4:0) sysad(31:0) chapter 10 266 user? manual u10504ej7v0um00 10.6 connecting clocks to a system without phase locking when the v r 4300 processor is used in a system in which the external agent cannot lock its phase to a common masterclock , the output clock tclock can clock the remainder of the system. two clocking methodologies are described in this section: connecting to a gate-array device or connecting to cmos discrete devices. 10.6.1 connecting to a gate-array device when the processor is connected to a gate array device, tclock is used as the transmit/receive clock in the gate array. figure 10-6 is a block diagram of a system without phase lock, using the v r 4300 processor with an external agent implemented as a gate array. user? manual u10504ej7v0um00 267 clock interface figure 10-6 gate-array system without phase lock, using the v r 4300 processor masterclock v r 4300 tclock sysad(31:0) syscmd(4:0) masterclock syncout syncin ce gate array output register ce input register output register input register chapter 10 268 user? manual u10504ej7v0um00 signal transmission time from processor to external agent in a system without phase lock, the transmission time for a signal from the processor to an external agent composed of gate arrays can be calculated from the following equation: transmission time = (1tclock period) ?(t do for v r 4300) + (minimum external clock buffer delay) ?(external input register setup time) ?(maximum clock jitter for v r 4300 internal clocks) ?(maximum clock jitter for tclock) signal transmission time from external agent processor the transmission time for a signal from an external agent composed as gate arrays to the processor in a system without phase lock can be calculated from the following equation: transmission time = (1tclock period) ?(t ds for v r 4300) ?(maximum external clock buffer delay) ?(maximum external output register clock-to-q delay) ?(maximum clock jitter for tclock) ?(maximum clock jitter for v r 4300 internal clocks) user? manual u10504ej7v0um00 269 clock interface 10.6.2 connecting to a cmos discrete device the processor uses a clock buffer that corrects the delay to supply a synchronous clock to an external cmos discrete device. the clock buffer that corrects the delay is inserted into the syncout / syncin synchronization bus of the processor to adjust the skew of syncout and tclock by delaying pclock synchronized with masterclock , and advances syncout and tclock from masterclock by the buffer delay. when using tclock whose buffer delay has been corrected, the other delay correcting clock buffers can be used. the phase error of the buffered tclock can be obtained by adding up the maximum delay error of the delay correcting clock buffer and the maximum clock jitter of tclock . functioning as the global clock of the cmos discrete devices that form the external agent, the buffered tclock supplies a clock to the register that samples the processor output and the register that drives the processor input. the transmission time for a signal from the processor to an external agent composed of cmos discrete devices can be calculated from the following equation: transmission time = (1tclock period) ?(t do for v r 4300) ?(external input register setup time) ?(maximum external clock buffer delay mismatch) ?(maximum clock jitter for v r 4300 internal clocks) ?(maximum clock jitter for tclock) figure 10-7 is a block diagram of a system without phase lock, employing the v r 4300 processor and an external agent composed of both a gate array and cmos discrete devices. chapter 10 270 user? manual u10504ej7v0um00 figure 10-7 gate-array and cmos system without phase lock, using the v r 4300 processor memory v r 4300 tclock sysad(31:0) memory ce ce syscmd(4:0) control masterclock masterclock gate array syncout syncin input register output register user? manual u10504ej7v0um00 271 clock interface the transmission time for a signal from an external agent composed of cmos discrete devices can be calculated from the following equation: transmission time = (1tclock period) ?(t ds for v r 4300) ?(maximum external output register clock-to-q delay) ?(maximum external clock buffer delay mismatch) ?(maximum clock jitter for v r 4300 internal clocks) ?(maximum clock jitter for tclock) in this clocking methodology, the hold time of data driven from the processor to an external input register is an important parameter. to guarantee hold time, the minimum output delay of the processor, t do , must be greater than the sum of: minimum hold time for the external input register + maximum clock jitter for v r 4300 internal clocks + maximum clock jitter for tclock + maximum delay mismatch of the external clock buffers 272 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 273 cache memory 11 this chapter describes in detail the cache memory: its place in the v r 4300 memory organization, and individual organization of the caches. this chapter uses the following terminology: the data cache may also be referred to as the d-cache. the instruction cache may also be referred to as the i-cache. these terms are used interchangeably throughout this book. chapter 11 274 user? manual u10504ej7v0um00 11.1 memory organization figure 11-1 shows the v r 4300 system memory hierarchy. in the logical memory hierarchy, the caches lie between the cpu and main memory. they are designed to make the speedup of memory accesses transparent to the user. each functional block in figure 11-1 has the capacity to hold more data than the block above it. for instance, physical main memory has a larger capacity than the caches. at the same time, each functional block takes longer to access than any block above it. for instance, it takes longer to access data in main memory than in the cpu on-chip registers. figure 11-1 logical hierarchy of memory the v r 4300 processor has two on-chip caches: one holds instructions (the instruction cache), the other holds data (the data cache). the instruction and data caches can be read in one pclock cycle. data writes take two pclock cycles. in the first cycle, the store address is generated and the tag is checked; in the second cycle, the data is written into the data ram. registers registers main memory cache v r 4300 cpu i-cache d-cache increasing data capacity disk, cd-rom, tape, etc. registers caches memory peripherals faster access time user? manual u10504ej7v0um00 275 cache memory 11.2 cache organization this section describes the organization of the on-chip data and instruction caches. figure 11-2 provides a block diagram of the v r 4300 cache and memory model. figure 11-2 v r 4300 cache support cache line lengths a cache line is the smallest unit of information that can be fetched from main memory for the cache, and that is represented by a single tag. the line size for the instruction cache is 8 words (32 bytes) and the line size for the data cache is 4 words (16 bytes). for cache tags, refer to 11.2.1 organization of the instruction cache (i-cache ) and 11.2.2 organization of the data cache (d-cache) . cache sizes the v r 4300 instruction cache is 16 kb; the data cache is 8 kb. v r 4300 i-cache cache controller d-cache caches main memory chapter 11 276 user? manual u10504ej7v0um00 11.2.1 organization of the instruction cache (i-cache) each line of i-cache data (although it is actually an instruction, it is referred to as data to distinguish it from its tag) has an associated 21-bit tag that contains a 20- bit physical address and valid bit. the v r 4300 processor i-cache has the following characteristics: direct-mapping method indexed with a virtual address checked with a physical tag organized with an 8-word (32-byte) cache line. figure 11-3 shows the format of an 8-word (32-byte) i-cache line. figure 11-3 v r 4300 8-word i-cache line format 256 255 0 ptag : physical tag (bits 31:12 of the physical address) v : valid bit data : cache data 20 0 19 20 1 v ptag data user? manual u10504ej7v0um00 277 cache memory 11.2.2 organization of the data cache (d-cache) each line of d-cache data has an associated 22-bit tag that contains a 20-bit physical address, a valid bit, and a dirty bit. the v r 4300 processor d-cache has the following characteristics: write-back direct-mapping method indexed with a virtual address checked with a physical tag organized with a 4-word (16-byte) cache line. figure 11-4 shows the format of a 4-word (16-byte) d-cache line. figure 11-4 v r 4300 4-word data cache line format v : valid bit d : dirty bit (refer to 11.4 cache states ) ptag : physical tag (bits 31:12 of the physical address) data : d-cache data 128 127 0 20 0 19 20 1 d ptag data 21 v 1 chapter 11 278 user? manual u10504ej7v0um00 11.2.3 accessing the caches figure 11-5 shows the virtual address (va) index into the caches. the number of virtual address bits used to index the instruction and data caches depends on the cache size. data cache addressing va(12:4) is used. since the cache size is 8 kb, the most significant bit is va12. furthermore, since the line size is 4 words (16 bytes), the least-significant bit is va4. instruction cache addressing va(13:5) is used. since the cache size is 16 kb, the most-significant bit is va13. furthermore, since the line size is 8 words (32 bytes), the least-significant bit is va5. figure 11-5 cache data and tag organization va(12:4) for 8 kb d-cache and va(13:5) for 16 kb i-cache tags data d data 64 tag line data line tag v user? manual u10504ej7v0um00 279 cache memory 11.3 cache operations as described earlier, caches provide temporary data storage, and they make the speedup of memory accesses transparent to the user. in general, the processor accesses cache-resident instructions or data through the following procedure: 1. the processor, through the on-chip cache controller, attempts to access the next instruction or data in the appropriate cache. 2. the cache controller checks to see if this requested instruction or data is present in the cache. if the instruction/data is present, the processor retrieves it. this is called a cache hit . if the instruction/data is not present in the cache, the cache controller must retrieve it from main memory. this is called a cache miss . 3. the processor retrieves the instruction/data from the cache and operation continues. it is possible for the same data to be in two places simultaneously: main memory and cache. this data is kept consistent through the use of a write-back methodology; that is, modified data is not written back to main memory until the cache line is to be replaced. instruction and data cache line replacement operations are described in the following sections. chapter 11 280 user? manual u10504ej7v0um00 11.3.1 cache write policy the v r 4300 processor manages its data cache by using a write-back policy; that is, it stores write data into the cache, instead of writing it directly to the main memory. * some time later this data is independently transferred into the main memory. in the v r 4300 implementation, a modified cache line is not written back to the main memory until the cache line is to be replaced either in the course of satisfying a cache miss, or during the execution of a write-back cache instruction. when the cache-miss occurs and the processor writes the contents of a cache line back to the main memory, it does not ordinarily retain a copy of the cache line, and the state of the cache line is changed to clean. 11.3.2 data cache line replacement since the data cache uses a write-back methodology, a cache line load is issued to main memory on a load or store miss, as described below. after the data from the main memory is written to the data cache, the pipeline resumes execution. the line replacement sequence is based on a ?ritical doubleword first?scheme refer to subblock ordering in 12.2.1 physical addresses . the processor restarts its pipeline as soon as the main memory supplies the desired word in the first doubleword of a block transfer. this sequence is summarized as follows: 1. move the data physical address to the sysad(31:0) . at the same time, move the dirty cache line to the write buffer. 2. at the timing of sclock rising edge, read the data from the main memory, receiving the desired doubleword in two word data first. 3. receive remaining doubleword in word data units. for all loads move the data to target register. for byte, halfword and word stores, it is necessary to do a read in the main memory followed by a write procedure?ead the 64-bit data, write new data to this read data, then write the 64-bit data to cache. as this is being done, interlock the data cache to prevent it from being accessed by any subsequent instruction that tries to access this particular cache line. rules for replacement on data load and data store misses are given below. * an alternative to this is a write-through cache, in which information is written simultaneously to cache and memory. user? manual u10504ej7v0um00 281 cache memory data load miss if the missed cache line is not dirty, it is replaced with a new line. if the missed line is dirty, it is moved to the write buffer. a new line replaces the missed line, and the data in the write buffer is written to the main memory. data store miss if the missed cache line is not dirty, it is replaced with the new cache line merged with the store data. if the missed cache line is dirty, it is moved to the write buffer. a new cache line is merged with the store data and written to cache, and data in the write buffer is written to the memory. the data is written sequentially, starting from the first address of the block (refer to sequential ordering in 12.2.1 physical addresses ). the data cache miss stall in number of pclock cycles is: table 11-1 stall cycle count for data cache miss number of cycles operation 1 dc stage stall 1 transfer address to write buffer and wait for the pipeline start signal 1 to 2 synchronize with sclock and transfer address to internal sysad bus 2 transfer to external sysad bus m time needed to access memory, measured in pclock cycles 2 transfer the cache line from memory to the sysad bus 1 transfer the cache line from the external to internal bus and to d-cache bus 0 restart the dc stage chapter 11 282 user? manual u10504ej7v0um00 11.3.3 instruction cache line replacement for an instruction cache miss, refill is done using sequential ordering, reading from the first word of the requested cache line. during an instruction cache miss, a memory read request is issued by the processor. that is the requested cache line is read from the main memory and written to the instruction cache. at this time the pipeline resumes execution, and the instruction cache is reaccessed. the replacement sequence for an instruction cache miss is: 1. move the instruction physical address to the sysad(31:0) . 2. read the instruction data at the timing of sclock rising edge from the main memory and write it out to the instruction cache. 3. restart the pipeline operation. the instruction cache miss stall in number of pclock cycles is: table 11-2 stall cycle count for instruction cache miss number of cycles operation 1 rf stage stall 1 transfer address to write buffer and wait for the pipeline start signal 1 to 2 synchronize with sclock and transfer address to internal sysad bus 2 transfer to external sysad bus m time needed to access memory, measured in pclock cycles 8 transfer the cache line from memory to the sysad bus 1 transfer the cache line from the external to internal bus and to i-cache bus 0 restart the rf stage user? manual u10504ej7v0um00 283 cache memory 11.4 cache states cache line the four terms below are used to describe the state of a cache line: valid : a cache line that contains valid information. dirty : a cache line containing data that has changed in valid status since it was loaded from memory. clean : a cache line containing data that has not changed in valid status since it was loaded from the main memory. invalid : a cache line that does not contain valid information must be marked invalid, and cannot be used. for example, after a soft reset, software sets all cache lines to invalid. a cache line in any other state than invalid is assumed to contain valid information. neither a cold reset nor a soft reset makes the state of a cache invalid. software invalidates it. data cache the data cache supports three cache states: invalid clean dirty instruction cache the instruction cache supports two cache states: invalid valid the cache line that contains valid information may be changed when the processor executes the cache operation. for cache operation, refer to chapter 16 cpu instruction set details . 11.5 cache state transition diagrams the following section describes the cache state diagrams for the data and instruction caches. these state diagrams do not cover the initial state of the system, since the initial state is system-dependent. chapter 11 284 user? manual u10504ej7v0um00 11.5.1 data cache state transition the following diagram illustrates the data cache state transition sequence. a load or store operation may include one or more of the atomic read and/or write operations shown in the state diagram below, which may cause cache state transitions. read(1) indicates a read operation from memory to cache, inducing a cache state transition. write(1) indicates a write operation from the processor to cache, inducing a cache state transition read(2) indicates a read operation from cache to the processor, which induces no cache state transition write(2) indicates a write operation from the processor to cache, which induces no cache state transition figure 11-6 data cache state diagram invalid clean dirty read(1) write(1) read(2) write back write(2) cache instruction cache instruction cache instruction read(2) write(1) user? manual u10504ej7v0um00 285 cache memory 11.5.2 instruction cache state transition the following diagram illustrates the instruction cache state transition sequence. read(1) indicates a read operation from the main memory to cache, inducing a cache state transition. read(2) indicates a read operation from cache to the processor, which induces no cache state transition. figure 11-7 instruction cache state diagram 11.6 manipulation of the caches by an external agent the v r 4300 does not provide any mechanisms for an external agent to examine and manipulate the state and contents of the caches. valid invalid cache instruction read(2) read(1) 286 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 287 system interface 12 the system interface allows the processor to access external resources needed to perform processing of cache misses and uncached areas, while permitting an external agent to access to some of the processor internal resources. this chapter describes the system interface between the processor and the external agent. the v r 4300 uses a subset of the system interface contained on the v r 4400 and v r 4200. chapter 12 288 user? manual u10504ej7v0um00 12.1 terminology the following terms are used in this chapter: an external agent is any device connected to the processor, over the system interface, that processes requests issued by the processor. ? system event is an event that occurs within the processor and requires access to external resources. system events include: an instruction fetch that misses in the instruction cache; a load/store instruction that misses in the data cache; an uncached load or store instructions; an execution of cache instructions. sequence refers to the series of requests that a processor generates to process a system event. protocol refers to the cycle-by-cycle signal transitions that occur on the system interface pins, which issue external request, or a processor. syntax refers to the de?ition of bit patterns on encoded buses, such as the command bus. block indicates any data transfer of 8 bytes or longer across the system interface. single indicates any data transfer of 7 bytes or shorter across the system interface. fetch refers to the read of information from the instruction cache. load refers to the read of information from the data cache. user? manual u10504ej7v0um00 289 system interface 12.2 system interface description the processor uses the system interface to access external resources required for performing cache misses and uncached area processing. 12.2.1 physical addresses physical addresses are output to sysad(31:0) in the address cycle. the address when the single read request and single write request are issued is determined by the data length as follows. if the data is a word (4 bytes), the low-order 2 bits of the address are 0. if the data is a halfword (2 bytes), the low-order 1 bit of the address is 0. if the data is 1, 3, 5, 6, or 7 bytes, the supplied address is a byte address (the 5-, 6-, or 7-byte data is divided into two single write requests). when a doubleword (2 words), 4 words, or 8 words are transferred, a block request is issued. the block read request and block write request differ as follows in the physical address to be output. block write request the physical address when the block write request is issued is always aligned with the first word address of the block (sequential ordering). block read request instruction cache read request the block read request when a miss occurs in the instruction cache, the physical address is aligned with the 8-word data address (the low- order 5 bits are 0) including the requested word and output. figure 12-1 shows the sequence in which data are transferred from the main memory when a block read request is issued to the instruction cache. when an instruction cache read request is issued, data is always read starting from w0 (sequential ordering). chapter 12 290 user? manual u10504ej7v0um00 figure 12-1 data sequence on instruction cache read request data cache read request if a block read request is issued when a miss occurs in the data cache, the physical address is aligned with the doubleword address (the low- order 3 bits are 0) including the requested data and output. figure 12-2 shows the data sequence in which data is transferred from the main memory when a block read request is issued to the data cache. when a data cache read request is issued, reading a doubleword including the necessary data is started in word units (w2 in this case) (refer to sub block ordering in 12.12.2 sequential and subblock ordering ). figure 12-2 data sequence on data cache read request w0 1 w1 2 w2 3 w3 4 w4 5 w5 6 w6 7 w7 8 transfer sequence (sequential ordering) output physical address requested word w0 3 w1 4 w2 1 w3 2 transfer sequence (subblock ordering) output physical address requested word user? manual u10504ej7v0um00 291 system interface 12.2.2 interface buses figure 12-3 shows the primary communication buses for the system interface: a 32-bit address/data bus, sysad(31:0) , and a 5-bit command bus, syscmd(4:0) . these sysad and the syscmd buses are bidirectional; that is, they are driven by the processor to issue a processor request, and by the external device to issue an external request (refer to 12.4 processor and external requests ). a request through the system interface consists of: an address a system interface command that speci?s the nature of the request response data to read request, and write data to write request figure 12-3 system interface buses v r 4300 external agent sysad(31:0) syscmd(4:0) chapter 12 292 user? manual u10504ej7v0um00 12.2.3 address and data cycles the syscmd (4:0) bus identifies the contents of the sysad(31:0) bus during any cycle in which it is valid. cycles in which the sysad(31:0) bus contains a valid address are called address cycles . cycles in which the sysad(31:0) bus contains valid data are called data cycles . the most significant bit of the syscmd(4:0) bus is always used to indicate whether the current cycle is an address cycle or a data cycle. validity is determined by the state of the ev alid and pv alid signals (described in 12.2.2 interface buses ). when the v r 4300 processor is driving the sysad(31:0) and syscmd(4:0) buses, the system interface is in master state . when the external agent is driving them, the system interface is in slave state . when the processor is master, it asserts the pv alid signal when the sysad(31:0) and syscmd(4:0) buses are valid. when the processor is slave, an external agent asserts the ev alid signal when the sysad(31:0) and syscmd(4:0) buses are valid. syscmd(4:0) indicate the following contents if the pvalid or evalid signal is active. during address cycles [ syscmd4 = 0], the remainder of the syscmd(4:0) bus, syscmd(3:0) , contains a system interface command (the encoding of system interface commands is detailed in 12.11 system interface commands and data identifiers ). during data cycles [ syscmd4 = 1], the remainder of the syscmd(4:0) bus, syscmd(3:0) , contains a data identi?r command (the encoding of data identifiers is detailed in 12.11 system interface commands and data identifiers ). user? manual u10504ej7v0um00 293 system interface 12.2.4 issue cycles processor request there are two types of processor issue cycles: processor read request processor write request the issuance cycle of the processor read/write request is determined by the status of the eok signal. the issuance cycle is a cycle that becomes valid in the address cycle of each processor request. only one issuance cycle exists for one processor request. to define the issuance cycle of the address cycle, assert the eok signal active at the external agent side one cycle before the address cycle of the processor read/ write request as shown in figure 12-4. to define the address cycle as the issuance cycle, do not deassert the eok signal inactive until the address cycle is started. figure 12-4 eok signal status of processor request the processor repeatedly outputs the address cycle until the address cycle of the processor request becomes the issuance cycle. with the v r 4300, therefore, the address cycle next to the cycle in which the eok signal has become active is the issuance cycle, and the address cycle is repeated up to that cycle. figure 12-5 illustrates how the address cycle is extended by the eok signal. sclock (internal) sysad(31:0) (i/o) eok (input) scycle 123456 addr issuance cycle chapter 12 294 user? manual u10504ej7v0um00 figure 12-5 address cycle extended by eok signal processor and external requests the processor accepts external requests, even while attempting to issue a processor request, by releasing the system interface to slave state in response to ereq signal by the external agent. when an issuance of processor request and external request compete with each other, the processor either: completes the issuance of the processor request before the external request is accepted, or releases the system interface to slave state without completing the issuance of the processor request. in the latter case, the processor issues the processor request (provided the processor request is still necessary) after the external request is completed. sclock (internal) sysad(31:0) (i/o) eok (input) scycle 1234567 addr issuance cycle user? manual u10504ej7v0um00 295 system interface 12.2.5 handshake signals the processor manages the flow of requests through the following six control signals: eok signal this signal is used by the external agent to indicate whether it can accept a new read or write transactions. ereq , pmaster and preq signals these signals are used to transfer control of the sysad(31:0) and syscmd(4:0) buses. ereq signal is used by an external agent to indicate a need to control the interface. pmaster signal is deasserted by the processor when it transfers control of the system interface to the external agent. the preq signal is used by the processor to request the external agent, which holds the right to control the system interface, for the right of control. pvalid and evalid signals the v r 4300 processor uses pvalid signal, and the external agent uses evalid signal to indicate valid command/data on the syscmd(4:0) / sysad(31:0) buses. chapter 12 296 user? manual u10504ej7v0um00 12.3 system interface protocols figure 12-6 shows the register-to-register operation of the system interface. that is, output signals of the processor come directly from output registers and begin to change in synchronization with the rising edge of sclock. input signals to the processor are fed directly to input registers that latch these input signals with the rising edge of sclock . figure 12-6 system interface register-to-register operation 12.3.1 master and slave states when the v r 4300 processor is driving the sysad(31:0) and syscmd(4:0) buses, the system interface is in master state . when the external agent is driving these buses, the system interface is in slave state . in master state, the processor asserts the pvalid signal whenever the sysad(31:0) and syscmd(4:0) buses are valid. in slave state, the external agent asserts the evalid signal whenever the sysad(31:0) and syscmd(4:0) buses are valid. v r 4300 input data output data sclock user? manual u10504ej7v0um00 297 system interface 12.3.2 moving from master to slave state the processor is the default master of the system interface. an external agent becomes master of the system interface through external arbitration, or after a processor read request. the external agent returns mastership to the processor after an external request completes. the system interface remains in master state unless one of the following occurs: the external agent requests and is granted the system interface control (external arbitration). the processor issues a read request (uncompelled change to slave state). the following sections describe these two cases. 12.3.3 external arbitration the system interface must be in slave state for the external agent to issue an external request through the system interface. the transition from master state to slave state is arbitrated by the processor using the system interface handshake signals ereq and pmaster . this transition is described by the following procedure: 1. an external agent transmits a request to issue an external request to the processor by asserting ereq signal. 2. when the processor is ready to accept an external request, it releases the system interface from master to slave state by deasserting pmaster signal. 3. the system interface returns to master state as soon as the issue of the external request is completed. this process is described in 12.6.6 external arbitration protocol . chapter 12 298 user? manual u10504ej7v0um00 12.3.4 uncompelled change to slave state an uncompelled change to slave state is the transition of the system interface from master state to slave state, performed by the processor itself when a processor read request is pending. pmaster signal is deasserted automatically after a read request. an uncompelled change to slave state occurs either the first cycle after the issue cycle of a processor read request. when the processor returns from the uncompelled transition differs depending on the cache status. the processor returns to the master status when the following external request (read response or other external request) is completed after the uncompelled transition to the slave status. an external agent must confirm that the processor has performed an uncompelled change to slave state, and begin driving the sysad(31:0) bus along with the syscmd(4:0) bus. as long as the system interface is in slave state, the external agent can begin an external request without arbitrating for the system interface; that is, without asserting ereq signal. if ereq is inactive, at the time the external request is completed, the system interface automatically returns to master state. 12.4 processor and external requests there are two categories of requests: processor requests and external requests . when a system event occurs, the processor issues a request through the system interface to access some external resource necessary to service this event. for this to occur, the system interface must be connected to an external agent that coordinates the access to system resources. an external agent requesting access to an internal resource of the processor issues an external request . processor requests include the following: read requests, which provide a read address to an external agent write requests, which provide an address and a single or block of data to be written to an external agent. external requests include the following: read responses, which provide a block or single transfer of data from an external agent in response to read requests write requests, which provide an address and a word of data to be written to a processor resource user? manual u10504ej7v0um00 299 system interface when an external agent receives a read request, it accesses the specified resource and returns the response data as a read response, which may be returned at any time after the read request is completed. a processor read request is completed after the last response data has been received from the external agent. a processor write request is completed after the last word of data has been transferred. the processor will not issue another request while a read request is pending (before receiving the response data after issuing the read request). system events and requests are shown in figure 12-7. figure 12-7 requests and system events v r 4300 external agent processor requests read write external requests read response write system events fetch miss load miss store miss load/store to uncached area cache instructions chapter 12 300 user? manual u10504ej7v0um00 12.4.1 processor requests a processor request is a request through the system interface, to access some external resource. processor requests are either read or write requests. outline requests read request asks for a block, word, or partial word of data either from main memory or from another system resource. write request provides a block, word, or partial word of data to be written either to main memory or to another system resource. request issuance the processor issues requests in a strict sequential order; that is, the processor is only allowed to have one request pending at any time. for example, the processor issues a read request and waits for a read response before issuing any subsequent requests. the processor issues a write request only if there are no read requests pending. request control the processor has the input signal eok to allow an external agent to control the flow of processor requests. the processor request cycle sequence is shown in figure 12-8. figure 12-8 processor request flow v r 4300 external agent 1. processor issues read or write request 2. external system controls acceptance of requests by asserting eok signal user? manual u10504ej7v0um00 301 system interface 12.4.2 processor read request when a processor issues a read request, the external agent must access the specified resource and return the requested data. a processor read request can be split by the external agent? response data; in other words, the external agent can initiate an unrelated external request before it returns the response data for a processor read. a processor read request is completed after the last word of response data has been received from the external agent. processor read requests that have been issued, but which data has not yet been returned, are said to be pending . a read request remains pending until the requested read data is returned. note that the data identifier associated with the response data can indicate that the response data is erroneous, causing the processor to generate a bus error exception. the external agent must be capable of accepting a new processor read request at any time when the following two conditions are met: no present processor read request pending. the eok signal has been asserted for two or more cycles. 12.4.3 processor write request when a processor issues a write request, the specified external resource is accessed and the data is written to it. a processor write request is completed after the last word of data has been transferred to the external agent. the external agent must be capable of accepting a new processor write request at any time the following two conditions are met: no present processor read request is pending. the eok signal has been asserted for two or more cycles. chapter 12 302 user? manual u10504ej7v0um00 12.4.4 external requests external requests include read response and write requests. outline of requests read response returns data in response to a processor read request. write request provides data to be written to the processor? internal resource. request control the processor controls the flow of external requests through the arbitration signals ereq and pmaster , as shown in figure 12-9. the external agent must acquire mastership of the system interface before it issues an external request; the external agent acquires mastership of the system interface by asserting ereq signal and then waiting for the processor to deassert pmaster signal for one cycle. figure 12-9 external request flow mastership of the system interface always returns to the processor when ereq signal becomes inactive after an external request is issued. the processor does not accept a subsequent external request until it has completed the current request. request issuance if there are no processor requests pending, the processor decides, based on its internal state, whether to accept the external request, or to issue a new processor request. the processor can issue a new processor request even if the external agent is requesting access to the system interface. the external agent asserts ereq signal indicating that it wishes to begin an external request. the processor releases mastership of the system interface by deasserting pmaster signal. an external request can be accepted based on the criteria listed below. v r 4300 external agent 1. external system requests master- ship by asserting ereq signal 2. processor grants mastership by deasserting pmaster signal 3. external system issues an external request 4. processor regains mastership when ereq signal becomes inactive user? manual u10504ej7v0um00 303 system interface the processor completes any processor request in execution. while waiting for the assertion of eok signal to issue a processor read/write request, ereq signal is input to the processor one or more cycles before eok signal is asserted. if waiting for the response to a read request after the processor has made an uncompelled change to a slave state (the external agent can issue an external request before providing the read response data). 12.4.5 external write request when an external agent issues a write request, the specified external resource is accessed and the data is written to it. an external write request is completed after the word data has been transferred to the processor. the only processor resource available to an external write request is the interrupt register. 12.4.6 read response a read response returns data in response to a processor read request. while a read response is an external request, it has one characteristic that differentiates it from all other external requests?t does not perform system interface arbitration (requesting mastership of the system interface using ereq signal. figure 12-10 read response v r 4300 external agent 1. read request 2. read response chapter 12 304 user? manual u10504ej7v0um00 12.5 handling requests this section details the sequence , protocol , and syntax (refer to 12.1 terminology for definitions of these terms) of both processor and external requests. the following system events are discussed here: fetch miss load miss store miss loads/stores to uncached area cache instructions 12.5.1 fetch miss when the processor misses in the instruction cache on an instruction fetch, it issues a read request for the cache line acquisition. an external agent returns data as a read response. 12.5.2 load miss when the processor misses in the data cache on a load, it issues a read request for the cache line acquisition. an external agent returns data as a read response. if the cache data to be replaced is in the dirty state, this data is written to the memory. the above read operation must be completed before the data in the dirty state is written. 12.5.3 store miss if the processor store misses in the data cache, it issues a read request to retrieve the target cache line. after the target line has been retrieved by the external agent, it is updated with the store data and written into the cache. if the cache data to be replaced is in the dirty state, this data is written to the memory. the above read operation must be completed before the data in the dirty state is written. when it is desirable to guarantee that cached data written by a store instruction is consistent with main memory contents, the corresponding cache line must be written back from the cache to the main memory using a cache instruction. cache instructions are described in chapter 16 cpu instruction set details . user? manual u10504ej7v0um00 305 system interface 12.5.4 loads or stores to uncached area when the processor performs a load to uncached area, it issues a read request. an external agent returns a single/block transfer as a read response data. when the processor performs a store to uncached area, it issues a write request and provides a single/block transfer of data to the external agent. 12.5.5 cache instructions the processor provides a variety of cache operations to maintain the state and contents of the caches. the processor can issue write requests unrelated with the cache instruction during the execution of the cache instructions. chapter 12 306 user? manual u10504ej7v0um00 12.6 processor request and external request protocols the following sections contain a cycle-by-cycle description of the bus arbitration protocols for each type of processor and external request. table 12-1 lists the definitions and abbreviations for each of the buses that are used in the timing diagrams that follow. table 12-1 system interface requests 12.6.1 processor request protocols processor request protocols described in this section include: read write 12.6.2 processor read request protocol a processor read request is issued by outputting a read command on the syscmd(4:0) bus and a read address on the sysad(31:0) bus, and asserting pvalid . only one processor read request may be pending at a time; the processor must wait for an external read response before starting a subsequent read request. the processor makes an uncompelled change to slave state after the cycle of the read request by deasserting the pmaster signal. an external agent then returns the requested data through a read response. scope abbreviation meaning global unsd unused sysad(31:0) bus addr physical address data user? manual u10504ej7v0um00 307 system interface once the processor enters slave state (starting at cycle 5 in figure 12-11), the external agent can return the requested data through a read response. the read response returns the requested data or, if the requested data could not be successfully retrieved, indicate to syscmd(4:0) bus that the returned data is erroneous as a read response. if the returned data is erroneous, the processor generates a bus error exception. figure 12-11 illustrates a processor read request, coupled with an uncompelled change to slave state, that occurs as the read request is issued. figure 12-12 shows the processor read request delayed by the eok signal. the following sequence describes the protocol for a processor read request (the numbered steps below correspond to figures 12-11 and 12-12). 1. the processor is in the master status. it outputs a read command to syscmd(4:0) and a read address to sysad(31:0) to issue a read request. after the read request is issued, the processor enters the pending status. only one read request can be pending at a time. 2. the processor asserts the pvalid signal to indicate that the current data of syscmd(4:0) and sysad(31:0) are valid. 3. the external agent asserts the eok signal for two consecutive cycles to enable issuance of a processor read request. if the eok signal is deasserted, the issuance cycle of the read request is delayed. 4. the processor deasserts the pmaster signal at the first cycle after the read request is accepted, and shifts to the slave status unforcibly. 5. the processor releases syscmd(4:0) and sysad(31:0) at the same time as the pmaster signal is deasserted. 6. an external agent can drive syscmd(4:0) and sysad(31:0) from the first cycle after the pmaster signal is deasserted. chapter 12 308 user? manual u10504ej7v0um00 figure 12-11 unforcible transition by processor read request figure 12-12 delayed processor read request sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr hi-z read pmaster (output) slave 1. 5. 6. 2. hi-z h 4. 3. master evalid (input) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr hi-z read pmaster (output) slave 1. 5. 6. 2. hi-z h 4. 3. master evalid (input) user? manual u10504ej7v0um00 309 system interface 12.6.3 processor write request protocol a processor write request is issued by outputting a write command on the syscmd(4:0) bus and a write address on the sysad(31:0) bus, and asserting pvalid signal. after that, a data identifier is output to syscmd(4:0) , write data is output to sysad(31:0) , and the pvalid signal is asserted active to transfer during the cycles necessary for transferring the data. the transfer rate at this time is set by the ep bit of the config register. the data cycle differs depending on the size of the write request. 1 to 4 bytes: single data cycle 5 to 7 bytes: divided into two single write requests (one is 4 bytes long, and the other is 1 to 3 bytes long) 8 bytes or more: block data cycle in 4-byte units the last data is appended with a data identifier eod (end of data). figure 12-13 shows the processor block write request by write data pattern d, and figure 12-14 shows the processor block write request by write data pattern dxx. the following sequence describes the protocol of the processor write request (the numbers correspond to the numbers in figures 12-13 and 12-14). 1. the processor is in the master status. it outputs a write command to syscmd(4:0) and a write address to sysad(31:0) to issue a write request. 2. the processor asserts the pvalid signal to indicate that the current data of syscmd(4:0) and sysad(31:0) are valid. 3. the external agent asserts the eok signal for two consecutive cycles to enable issuance of a processor write request. if the eok signal is deasserted, the issuance cycle of the write request is delayed. 4. the processor outputs a data identifier to syscmd(4:0) and write data to sysad(31:0) . 5. the processor asserts the pvalid signal for the cycles necessary for data transfer, and transfer the data. 6. the last data is appended with data identifier eod. chapter 12 310 user? manual u10504ej7v0um00 figure 12-13 processor block write request (write data pattern: d) figure 12-14 processor block write request (write data pattern: dxx) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr data0 data1 data2 data3 write data data data eod pmaster (output) master 1. 2. 3. 4. 6. 5. l sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr data0 data1 write data eod pmaster (output) master 1. 2. 3. 4. 6. 5. l 5. user? manual u10504ej7v0um00 311 system interface 12.6.4 flow control of processor request the external agent uses the eok signal to control the flow of the processor read request. the processor repeats the current address cycle until the eok signal is asserted active. this address cycle continues for 1 cycle after the eok signal has been asserted, and then the issuance cycle ends. the eok signal must be asserted for at least two consecutive cycles. figures 12-15 and 12-16 show how to use the eok signal (the numbers in the description below correspond to the numbers in figures 12-15 and 12-16. 1. because the eok signal 1 cycle before is inactive, the processor request is delayed, and the address cycle does not end. 2. because the eok signal 1 cycle before is active, the processor request is not delayed, and the address cycle ends. figure 12-15 delayed processor read request sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr hi-z read pmaster (output) 1. 2. hi-z chapter 12 312 user? manual u10504ej7v0um00 figure 12-16 delayed second processor write request 12.6.5 external request protocols external requests can only be issued with the system interface in slave state. ereq signal must be asserted ereq signal to arbitrate (refer to 12.6.6 external arbitration protocol ) for the system interface, and then wait for the processor to release the system interface to slave state. if the system interface is already in slave state?hat is, the processor has previously performed an uncompelled change to slave state?he external agent can begin an external request immediately. after issuing an external request, the external agent must return mastership of the system interface to the processor, as described below. following the description of the arbitration protocol, this section also describes the following external request protocols: write read response sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) eok (input) scycle 123456789 12 11 10 addr write pmaster (output) 1. 2. data eod addr write data eod l user? manual u10504ej7v0um00 313 system interface 12.6.6 external arbitration protocol usually, the processor serves as the bus mastership. however, the processor relinquishes control of the bus and enters the slave status in the following cases. if the external agent issues a request and the system interface responds to that request after the processor has issued a read request arbitration to allow the processor to enter the slave status from the master status is realized by using the handshake signals ( ereq , preq , and pmaster ) of the system interface. status transition on read response while the processor read request is kept pending, the processor enters the slave status by deasserting the pmaster signal inactive, and the external agent returns read response data. if the ereq signal is deasserted inactive, the processor remains in the slave status until the read response data is returned, and then returns to the master status by asserting the pmaster signal active. the external agent can remain in the master status as long as the ereq signal remains active when the read response is returned. acquiring bus mastership by ereq signal if the processor is in the master status when the external agent has issued an external request, assert the ereq signal active and wait until the processor deasserts the pmaster signal inactive. if the processor deasserts the pmaster signal inactive, the external agent acquires the bus mastership. once the external agent has entered the master status, it can remain in the master status as long as the ereq signal is asserted active. when the ereq signal is deasserted, the processor acquires the bus mastership two cycles later. figure 12-17 shows the arbitration protocol of the external request issued by the external agent. the following sequence describes the arbitration protocol (the numbers in the sequence correspond to the numbers in figure 12-17). chapter 12 314 user? manual u10504ej7v0um00 1. the external agent continues asserting the ereq signal active to issue an external request. 2. when the processor is ready to process the external request, it deasserts the pmaster signal inactive. 3. the processor sets sysad(31:0) and syscmd(4:0) in the high-impedance state. 4. the external agent should drive sysad(31:0) and syscmd(4:0) one cycle after the pmaster signal has been deasserted inactive. 5. the external agent should deassert the ereq signal inactive in the last cycle of the external request (2 cycles before the external agent enters the slave status), except when it executes another external request. 6. the external agent should set sysad(31:0) and syscmd(4:0) in the high- impedance state on completion of the external request. figure 12-17 arbitration of external request if the external agent has entered the master status by issuing the processor read request, the external agent must always return read request data. if the external agent has entered the master status by using the ereq signal, any command and data can be issued in accordance with the arbitration process. this means that the processor always satisfies any request from the external agent. sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) evalid (input) scycle 123456789 12 11 10 pmaster (output) master hi-z hi-z slave ereq (input) master external: address/data external: command hi-z hi-z 1. 2. 3. 4. 6. 5. user? manual u10504ej7v0um00 315 system interface restoring bus mastership by preq signal once the external agent has entered the master status, the processor cannot stop the operation of the external agent. however, the processor can request bus mastership by asserting the preq signal. at this time, the external agent must deassert the ereq signal inactive in response to the request by the processor, giving consideration to the priority of the mastership. the processor asserts the pmaster signal two cycles after the ereq signal has deasserted to inform the external agent that the processor has regained the bus mastership. figure 12-18 illustrates how the processor requests the bus mastership and how the external agent releases the bus in response. at reset (when the reset or coldreset signal is active), the processor enters the master status, and the external agent enters the slave status. figure 12-18 bus arbitration of processor sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) eok (input) scycle 123456789 12 11 10 master hi-z hi-z slave preq (output) processor: address/data processor: command l external: data external: command pmaster (output) ereq (input) chapter 12 316 user? manual u10504ej7v0um00 12.6.7 external write request protocol external write requests are similar in operation to a processor single write except that the evalid signal is asserted in place of the pvalid signal. an external write request outputs a write command on the syscmd(4:0) bus and a write address on the sysad(31:0) bus when the processor is in slave state and asserting evalid signal for one cycle. this is followed by outputting a data identifier on the syscmd(4:0) bus and data on the sysad(31:0) bus and asserting evalid signal for one more cycle. the data identifier of the data cycle must contain an end of data cycle indication. keep the ereq signal active while the external write request is issued. after the data cycle is issued, the write request is completed and the external agent releases the syscmd(4:0) and sysad(31:0) buses and allows the system interface to return to master state. an external write request with the processor generated in master state is illustrated in figure 12-19. figure 12-22 shows an example in which the external agent issues an external write request following a read response. the external write request cannot be issued while read response data is transferred. it can be issued before data response or after the last data response. user? manual u10504ej7v0um00 317 system interface figure 12-19 external write request protocol only an interrupt processing can be done by the processor in the external write request. 12.6.8 external read response protocol an external agent returns data to the processor in response to a processor read request by waiting for the processor to move to slave state, and then returning the data through a single data cycle or a number of data cycles sufficient for the requested data size. the syscmd(4:0) and sysad(31:0) buses are released after the last data cycle is issued. if the ereq signal is inactive at this time, the processor returns to master state at the end of two cycles after the last data cycle. the data identifier associated with a data cycle may indicate that data transferred during this cycle is erroneous; however, an external agent must return a specific data block whether or not the data is erroneous. if a read response includes one or more erroneous data cycles, the processor generates a bus error exception. read response data can be transferred to the processor only when a processor read request is pending. if a read response is transferred to the processor while no processor read request is pending, the operation of the processor is undefined. sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) ereq (input) scycle 123456789 12 11 10 master hi-z hi-z master pmaster (output) addr write h pvalid (output) evalid (input) slave data eod hi-z hi-z chapter 12 318 user? manual u10504ej7v0um00 a processor single read request followed by a read response is illustrated in figure 12-20. a read response for a processor block read with the processor already in slave state is illustrated in figure 12-21. figure 12-20 read request/read response protocol figure 12-21 block read response in slave status sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) eok (input) scycle 123456789 12 11 10 master hi-z hi-z master pmaster (output) addr read h pvalid (output) ereq (input) slave data eod hi-z hi-z evalid (input) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 12 11 10 data0 data1 data2 data3 hi-z data data data eod pmaster (output) slave h master hi-z user? manual u10504ej7v0um00 319 system interface figure 12-22 shows the case where an external write request is issued following a read response to a processor single read request. the following sequence describes the protocol (the numbers in the following description correspond to the numbers in figure 12-22). 1. the external agent returns response data to the processor single read request. 2. to issue an external request following the read response, assert the ereq signal active in the cycle in which eod is returned. in this case, the pmaster signal remains inactive two cycles after eod. 3. because the external agent is in the master status, it can issue the external write request. 4. deassert the ereq signal inactive up to the data cycle of the external write request. in this case, the pmaster signal is asserted active two cycles after eod, and the bus mastership is returned to the processor. figure 12-22 external write request following read response sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) eok (input) scycle 123456789 12 11 10 master hi-z hi-z master pmaster (output) addr read 1. pvalid (output) ereq (input) slave data eod hi-z hi-z evalid (input) data eod addr write 3. 2. 4. chapter 12 320 user? manual u10504ej7v0um00 figure 12-23 shows an example in which an external write request interrupts a read response to a processor single read request. cycle 5 in the figure is the write data for the external write request in cycle 4, and cycle 7 is the read response data. figure 12-23 when external write request takes precedence while processor read request is pending as shown in this figure, even if the external request interrupts the processor read request, the processor remains in the slave status until the read response data is returned. sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 12 11 10 addr addr data data hi-z read write eod eod pmaster (output) slave master hi-z hi-z hi-z master eok (input) user? manual u10504ej7v0um00 321 system interface 12.7 successive processing of request 12.7.1 successive processor write requests the processor write requests may be successively operated as follows. in the case of data pattern ? in this case, the processor write requests are processed without wait status as shown in figure 12-24. in the case of data pattern ?xx in this case, the processing is separated by a wait status of two cycles as shown in figure 12-25. the processor write requests may be successively issued in the following four cases. 1. successive single write requests 2. successive block write requests 3. block write request after single write request 4. single write request after block write request for the timing of the processor single write request, refer to 12.6.3 processor write request protocol . figure 12-24 successive block write requests (write data pattern: d) figure 12-25 successive single write requests (write data pattern: dxx) addr data0 data1 addr data0 data1 processor block write processor block write addr data wait wait addr data processor wait single write processor single write chapter 12 322 user? manual u10504ej7v0um00 12.7.2 processor write request followed by processor read request figure 12-26 shows the case where a processor read request follows a processor write request. figure 12-26 processor write request followed by processor read request (write data pattern: d) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 12 11 10 addr data1 addr data hi-z write eod read eod pmaster (output) slave master hi-z hi-z hi-z master eok (input) data0 data user? manual u10504ej7v0um00 323 system interface 12.7.3 processor read request followed by processor write request figure 12-27 shows the case where a processor read request is followed by a processor write request. figure 12-27 processor single read request followed by block write request (write data pattern: d) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 11 10 addr data addr hi-z read eod write pmaster (output) slave master hi-z hi-z hi-z master eok (input) data1 eod data0 data chapter 12 324 user? manual u10504ej7v0um00 12.7.4 processor write request followed by external write request figure 12-28 shows the case where processor write requests are followed by an external write request. figure 12-28 successive processor write requests followed by external write request (write data pattern: d) sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 12 11 10 addr0 addr1 data addr hi-z write write eod write pmaster (output) slave master hi-z hi-z hi-z master eok (input) data eod data eod ereq (input) l user? manual u10504ej7v0um00 325 system interface 12.8 discarding and re-executing commands 12.8.1 re-execution of processor commands the external agent executes and controls the processor commands by using the eok signal. when the processor serves as the master, the processor cannot issue a command until the eok signal is active for at least two cycles. if the eok signal is active for only one cycle before the processor issues a command and then becomes inactive in the next cycle in which the command is issued, this processor command is discarded. at this time, the external agent should ignore the discarded command. if write command is discarded the processor issues write data and then the write command again. at this time, the external agent should ignore the write data following the discarded write command. if read command is discarded the processor enters the slave status in the cycle following the address cycle of a read request. if the ereq signal is inactive at this time, the processor returns to the master status again one cycle later, and reissues a read request. 12.8.2 discarding and re-executing write command figure 12-29 illustrates how a processor single write request is discarded and re- executed. the following sequence describes the protocol (the numbers in the following description correspond to the numbers in figure 12-29). 1. because the eok signal is active one cycle before (cycle 2) the write request of data0, this cycle is the issuance cycle. 2. because the eok signal is active in the write request cycle of data0 (cycle 3), the next cycle is a normal data cycle. 3. because the eok signal is active in one cycle (cycle 4) before the write request of data1, this cycle is the issuance cycle. 4. because the eok signal is inactive in the write request cycle of data1 (cycle 5), the data of the next cycle is discarded. at this time, data/command is output to sysad(31:0) and syscmd(4:0) , which should be ignored by the external agent. 5. because the eok signal is inactive one cycle (cycle 6) before the write request of the second data1, the write request is delayed. chapter 12 326 user? manual u10504ej7v0um00 6. because the eok signal is active in one cycle (cycle 9) before the write request of the second data1, this cycle is the issuance cycle. 7. because the eok signal is active in the write request cycle (cycle 10) of the second data1, the next cycle is a normal data cycle. figure 12-29 discarding and re-executing processor single write request sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) scycle 123456789 12 11 10 addr0 data0 addr1 data1 write eod write eod pmaster (output) eok (input) data1 eod addr1 write l 1. 2. 3. 4. 5. 6. 7. user? manual u10504ej7v0um00 327 system interface 12.8.3 discarding and re-executing read command figure 12-30 illustrates how a processor single read request is discarded and re- executed. the following sequence describes the protocol (the numbers in the following description correspond to the numbers in figure 12-30). 1. because the eok signal is low in cycle 5, the processor tries to issue an address (cycle 6). 2. if the eok signal is high at this point, the processor discards this read request and enters the slave status in the next cycle. 3. because the ereq signal is inactive, the processor returns to the master status again and reissues a read request. because the eok signal is low in both the cycles 7 and 8, the issuance cycle of the read request is determined. 4. the external agent outputs data at the requested address. figure 12-30 discarding and re-executing processor single read request sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) evalid (input) scycle 123456789 12 11 10 addr hi-z read pmaster (output) slave slave hi-z hi-z hi-z master eok (input) addr read ereq (input) hi-z hi-z master data eod master 1. 2. 4. h 3. chapter 12 328 user? manual u10504ej7v0um00 12.8.4 executing and discarding command when external agent requests bus mastership the external agent requests the bus mastership by asserting the ereq signal active. at this time, the external agent can acquires the bus mastership after it has accepted one processor read/write request only, or without accepting any request. if the ereq signal is asserted active while the external agent delays the processor request by deasserting eok signal inactive, the external agent can forcibly acquires the bus mastership. when processor requests bus mastership the processor requests the bus mastership by asserting the preq signal active. at this time, the external agent should transfer the bus mastership to the processor, giving consideration to the priority of the system. if the external agent keeps the ereq signal inactive for more than one cycle, the bus is released. the processor acquires the bus mastership by asserting the pmaster signal active two cycles after the ereq signal has become inactive. if the eok signal is active at this time, the processor can issue a request. figure 12-31 shows an example where the external agent has entered the slave status (the ereq signal is inactive) from the master status, and then acquires the bus mastership again after accepting one processor request. user? manual u10504ej7v0um00 329 system interface figure 12-31 discarding bus mastership by external agent by processor request sclock (internal) sysad(31:0) (i/o) syscmd(4:0) (i/o) pvalid (output) preq (output) scycle 123456789 12 11 10 pmaster (output) slave hi-z hi-z slave eok (input) data1 eod ereq (input) data0 data master addr write hi-z hi-z chapter 12 330 user? manual u10504ej7v0um00 12.9 data flow control the system interface supports a maximum data rate of one word per cycle. read response an external agent may transfer data to the processor at the maximum data rate of the system interface. the rate at which data is transferred to the processor can be controlled by the external agent, which asserts evalid signal at the cycle which data is transferred. the processor accepts cycles as valid only when evalid signal is asserted and the syscmd(4:0) bus contains a data identifier; thereafter, the processor continues to accept data until it receives the data word tagged as the last one. data identifier eod must be attached to the last data word. without this, the system interface hangs up as a protocol error. in this case, because the protocol error state is identified with the preq signal at double the cycle of sclock oscillating in synchronization with the masterclock , the processor should be reset and initialized. write request the rate at which the processor transfers data to an external agent is programmable through the ep bit of the config register (setting at reset is d) signal. data patterns are defined using the letters d and x , where d indicates a data cycle and x indicates an unused cycle. for example, a dxx data pattern indicates a data rate of one word every three cycles. the v r 4300 has two data transfer rates: d and dxx . the processor continues outputting data output in the period of d immediately before, while the processor is in the master status and during the period of x. a processor block write request with a dxx data pattern (one word every three cycles) is shown in figure 12-14. user? manual u10504ej7v0um00 331 system interface 12.9.1 independent transfer on sysad(31:0) bus in general applications, the sysad(31:0) bus is a point-to-point connection, running from the processor to a bidirectional register transceiver residing in an external agent. for these applications, the sysad(31:0) bus has only two possible devices to connect, the processor or the external agent. certain applications may require connection of additional drivers and receivers to the sysad(31:0) bus, to allow transfers over the sysad(31:0) bus that the processor is not involved in. these are called independent transfers . to effect an independent transfer, the external agent must coordinate mastership of the sysad(31:0) bus by using arbitration handshake signals ( ereq , pmaster and preq signals). an independent transfer on the sysad(31:0) bus follows this procedure: 1. the external agent asserts ereq signal, and requests mastership of the sysad(31:0) bus, to issue an external request. 2. the processor deasserts pmaster signal, and releases the system interface to slave state. 3. the external agent then allows the independent transfer to take place on the sysad(31:0) bus, making sure that evalid signal is not asserted during the transfer. 4. when the transfer is completed, the external agent deasserts ereq signal to return the system interface to master state. to connect multiple devices, separate enable signals for device to input/output are required to allow the non-processor chips to communicate. 12.9.2 system endianness the endianness of the system is set by the be bit of the config register: byte order is big endian when this bit is set to 1, and little endian when this bit is set to 0. this bit is set to 1 at cold reset. set this bit first in the initial sequence with a little endian system. software can set the reverse endian ( re ) bit in the status register to one to reverse the user mode byte ordering during operation. chapter 12 332 user? manual u10504ej7v0um00 12.10 system interface cycle time the processor specifies minimum and maximum cycle counts for the time required for various processor transactions and for the processor response time to external requests. processor requests themselves are constrained by the system interface protocol, and request cycle counts can be determined by examining the protocol. the following system interface interactions can vary within minimum and maximum cycle counts: waiting period for the processor to release the system interface to slave state in response to an external request ( release latency ). the remainder of this section describes and tabulates the minimum and maximum cycle counts for these system interface interactions. 12.10.1 release latency time release latency time is defined as the number of cycles the processor can wait to release the system interface to slave state for an external request. when no processor requests are in progress, internal activity can cause the processor to wait some number of cycles before releasing the system interface. release latency time is therefore the number of cycles when ereq signal becomes active until pmaster signal becomes inactive. there are two categories of release latency time: category 1: when the ereq signal is asserted by one cycle before the last cycle of a processor request. category 2: when the ereq signal is not asserted during a processor request, or is asserted during the last cycle of a processor request. table 12-2 shows the minimum and maximum release latency time for requests that fall into categories 1 and 2. note that the maximum and minimum cycle counts are subject to change. table 12-2 release latency time for external requests category minimum pcycles maximum pcycles 14 6 24 24 user? manual u10504ej7v0um00 333 system interface 12.11 system interface commands and data identifiers system interface commands specify the types and attributes of any system interface request; this specification is made during the address cycle for the request. system interface data identifiers specify the attributes of data transferred during a system interface data cycle. the following sections describe the syntax, that is, the bitwise encoding of system interface commands and data identifiers. reserved bits and reserved fields should be set to 1 for system interface commands and data identifiers associated with external requests. for system interface commands and data identifiers associated with processor requests, reserved bits and reserved fields in the commands and data identifiers are undefined. 12.11.1 command and data identifier syntax system interface commands and data identifiers are encoded in 5 bits and are transferred on the syscmd(4:0) bus from the processor to an external agent, or from an external agent to the processor, during address and data cycles. bit 4 (the most-significant bit) of the syscmd(4:0) bus determines whether the current content of the syscmd bus is a command or a data identifier and, therefore, whether the current cycle is an address cycle or a data cycle. for system interface commands, syscmd4 must be set to 0. for system interface data identifiers, syscmd4 must be set to 1. bit meaning syscmd4 attributes. 0: command (address) 1: data identifier chapter 12 334 user? manual u10504ej7v0um00 12.11.2 system interface command syntax this section describes the syscmd(4:0) bus encoding for system interface commands. figure 12-32 shows a common encoding used for all system interface commands. figure 12-32 system interface command syntax bit definition syscmd4 must be set to 0 for all system interface commands. syscmd3 specify the system interface request type which may be read or write. table 12-3 encoding of syscmd3 for system interface commands syscmd(2:0) are specific to each type of request and are defined in each of the following sections. 12.11.3 read requests for read requests, the encoding of the syscmd(2:0) is as follows. figure 12-33 shows the format of a syscmd read request. figure 12-33 read request syscmd(4:0) bus bit definition bit meaning syscmd3 command. 0: read request 1: write request request type 0 request details 4 3 2 0 0 0 4 3 2 0 read request details (see tables) user? manual u10504ej7v0um00 335 system interface tables 12-4 through 12-6 list the encodings of syscmd(2:0) bit read attributes for read requests. table 12-4 encoding of syscmd2 for read requests table 12-5 encoding of syscmd(1:0) for block read requests table 12-6 encoding of syscmd(1:0) for single read requests bit meaning syscmd2 read attributes. 0: single read 1: block read bit meaning syscmd(1:0) read block size. 0: 2 words 1: 4 words (d-cache only) 2: 8 words (i-cache only) 3: reserved bit meaning syscmd(1:0) read data size. 0: 1 byte valid (byte) 1: 2 bytes valid (halfword) 2: 3 bytes valid 3: 4 bytes valid (word) chapter 12 336 user? manual u10504ej7v0um00 12.11.4 write requests the encoding of syscmd(2:0) for write request is shown below. figure 12-34 shows the format of a syscmd write request. table 12-7 lists the write attributes encoded in bits syscmd2 . table 12-8 lists the block write replacement attributes encoded in bits syscmd(1:0) . table 12-9 lists the single write request encoded in bits syscmd(1:0) . figure 12-34 write request syscmd(4:0) bus bit definition table 12-7 encoding of syscmd2 for write request s table 12-8 encoding of syscmd(1:0) for block write requests table 12-9 encoding of syscmd(1:0) for single write requests bit meaning syscmd2 write attributes. 0: single write 1: block write bit meaning syscmd(1:0) write block size. 0: 2 words 1: 4 words (for d-cache only) 2: 8 words (for i-cache only) (for test) 3: reserved bit meaning syscmd(1:0) write data size. 0: 1 byte valid (byte) 1: 2 bytes valid (halfword) 2: 3 bytes valid 3: 4 bytes valid (word) 1 0 4 3 2 0 write request details (see tables) user? manual u10504ej7v0um00 337 system interface 12.11.5 system interface data identifier syntax this section defines the encoding of the syscmd(4:0) bus for system interface data identifiers. figure 12-35 shows a common encoding used for all system interface data identifiers. figure 12-35 data identifier syscmd(4:0) bus bit definition syscmd4 must be set to 1 for all system interface data identifiers. 12.11.6 data identifier bit definitions bit definitions of syscmd(3:0) are described next. syscmd3 marks the last data element. syscmd2 indicates whether or not the data is response data. response data is data returned in response to a read request. syscmd1 indicates whether or not the data element is error free. erroneous data contains an uncorrectable error and is returned to the processor, resulting a bus error exception. because the v r 4300 does not have a parity check function, the processor does not transfer data by setting the error bit to 1. syscmd0 enables data check (reserved function). because the v r 4300 does not have a data check function, the processor outputs 1 (data check disable) when it transfers data. when the external agent transfers data, the processor ignores this bit. but set this bit to 1 to disable checking. table 12-10 lists the encodings of syscmd(3:0) for processor data identifiers. table 12-11 lists the encodings of syscmd(3:0) for external data identifiers. 4 3 1 0 2 1 command of last data command of response data command of error data enables data check chapter 12 338 user? manual u10504ej7v0um00 table 12-10 processor data identifier encoding of syscmd(3:0) table 12-11 external data identifier encoding of syscmd(3:0) bit meaning syscmd3 last data element indication. 0: last data element, or data element on single transfer 1: not the last data element syscmd2 reserved syscmd1 reserved: error data indication. the processor outputs 0 (error free). syscmd0 reserved: data check enabled processor outputs 1 (data check disabled). bit meaning syscmd3 last data element indication. 0: last data element or data element on single transfer 1: not the last data element syscmd2 response data indication. 0: data is response data 1: data is not response data syscmd1 error data indication. 0: data is error free 1: data is erroneous syscmd0 reserved: data checking enable. processor ignores this bit. (external agent transfers 1) user? manual u10504ej7v0um00 339 system interface 12.12 system interface addresses system interface addresses are full 32-bit physical addresses output to the sysad(31:0) bus during address cycles. 12.12.1 addressing conventions addresses associated with word or partial word data transfers are aligned for the size of the data element. the system uses the following address conventions: addresses associated with block requests are aligned to requested doubleword boundaries; that is, the low-order 3 bits of address are 0. word requests set the low-order 2 bits of address to 0. halfword requests set the low-order bit of address to 0. byte, tribyte requests use the byte address. 12.12.2 sequential and subblock ordering sequential ordering an instruction cache read request returns data in sequential order, starting with the first word (dw0) of the 8-word block, no matter which word is requested. subblock ordering when a read request is issued to the data cache, the low-order word of the doubleword that includes the word required by the cpu is first returned, and then the high-order word, the low-order word of the remaining doubleword, and the high-order word of it is returned in that order (for details, refer to 12.2.1 physical addresses ). 340 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 341 jtag interface 13 the v r 4300 processor is provided with a boundary-scan interface that is compatible with joint test action group (jtag) speci?ations, conforming to the industry-standard jtag protocol (ieee standard 1149.1/d6). this chapter describes the functions related to jtag interface. chapter 13 342 user? manual u10504ej7v0um00 13.1 principles of boundary scanning with the evolution of integrated circuits (ics), surface-mounted devices, double- sided component mounting on printed-circuit boards (pcbs), and via hole technology, in-circuit tests connected to boards and chips have become more and more difficult to perform. the greater complexity of ics has also meant that testing all the circuits in a chip have become much larger in size of the test pattern and more difficult to write. one solution to this difficulty has been the development of testing method using boundary-scan circuits. a boundary-scan circuit is shift register organization of a series of connected cells placed between each pin of the chip and the internal circuitry of the ic, as shown in figure 13-1. in normal operation these boundary- scan cells are bypassed; in the test mode, however, the scan cells are directed by the test program to pass data along the shift register path and perform various diagnostic tests. to accomplish this, the tests use the four signals described in the next section: jtdi, jtdo , jtms , and jtck . figure 13-1 jtag boundary-scan cells boundary-scan cells ic external pin integrated circuit chip user? manual u10504ej7v0um00 343 jtag interface 13.2 signal summary the jtag interface signals used are listed below. jtdi jtag serial data input jtdo jtag serial data output jtms jtag test mode select jtck jtag serial clock input caution when the jtag interface is not used, keep the jtck signal low . figure 13-2 jtag interface signals and registers the jtag boundary-scan mechanism (referred to as jtag mechanism in this chapter) allows testing of the connections between the processor, the printed circuit board to which it is attached, and the other device on the board. the jtag mechanism does not provide any capability for testing the processor itself. cpu jtdo pin context is saved instruction register context is saved boundary- scan register context is saved bypass register 0 2 0 0 56 jtdi pin jtms pin jtck pin context is saved ta p controller chapter 13 344 user? manual u10504ej7v0um00 13.3 jtag controller and registers the processor contains the following registers and jtag controller: instruction register boundary-scan register bypass register test access port (tap) controller the processor executes the standard jtag extest operation associated with external test function testing. the basic operation of jtag is for the tap controller state machine to monitor the jtms input signal, as shown in table 13-1. when it starts, the tap controller determines the test function to be implemented. this includes either loading an instruction register (ir), or beginning a serial data scan through a data register (dr). as the data is scanned in, the state of the jtms pin transmits each new data word, and indicates the end of the data stream. the data register to be selected is determined by the contents of the instruction register. 13.3.1 instruction register the jtag instruction register includes three shift register-organization cells; this register is used to select the test to be performed and the test data register to be accessed. as listed in table 13-1, the register value setting selects either the boundary-scan register or the bypass register. table 13-1 jtag instruction register bit encoding the instruction register has two stages: shift register, and parallel output latch. refer to 13.3.7 controller states for detail. figure 13-3 shows the format of the instruction register. figure 13-3 instruction register msb. . . . . lsb data register 0 0 0 boundary-scan register (external test only) 0 1 1 setting prohibited others bypass register msb lsb 1 2 0 user? manual u10504ej7v0um00 345 jtag interface 13.3.2 bypass register the bypass register is 1 bit wide. when the tap controller is in the shift-dr (bypass) state, the data on the jtdi pin is shifted into the bypass register, and the data on bypass register output shifts to the jtdo output pin. actually the bypass register is a short-circuit which allows bypassing of board- level devices, in the boundary-scan chain, which do not require a specific test. the logical location of the bypass register in the boundary-scan chain is shown in figure 13-4. use of the bypass register speeds up access to boundary-scan registers in those ics that remain active in the board-level test data path. figure 13-4 bypass register operation board ic package jtdo bypass register jtdi jtdo jtdi jtdo jtdi jtdo jtdi board input board output jtdi jtdo boundary-scan register pad cell chapter 13 346 user? manual u10504ej7v0um00 13.3.3 boundary-scan register the boundary-scan register retains states all of the input and output pins of the v r 4300 processor, except for some clock and phase lock loop signals. the external pins of the v r 4300 can be configured to drive any arbitrary pattern depending on scanning contents into the boundary-scan register from the shift- dr state. incoming data to the processor is examined by shifting while in the capture-dr state with the boundary-scan register enabled. the boundary-scan register is a single bus comprised of 58-bit shift registers, each bit of which is connected to all input and output pads one by one on the v r 4300 processor. figure 13-5 shows the most-significant bit of the boundary- scan register; this one bit controls the output enable signals on the various bidirectional buses. figure 13-5 output enable bit of boundary-scan register oe1 (jsysaden) is the jtag output enable bit for all outputs of the processor. output is enabled when this bit is set to 1 (default state). the remaining 57 bits correspond to 57 signal pads. outputs are enabled when this bit is set to 1. table 13-2 lists the scan order of these scan bits. 57 oe1 0 56 user? manual u10504ej7v0um00 347 jtag interface 13.3.4 test access port (tap) the test access port (tap) consists of the four signal pins: jtdi , jtdo , jtms , and jtck . these pins control the test to be executed. as figure 13-6 shows, data is serially scanned into one of the three registers ( instruction register, bypass register, or the boundary-scan register) from the jtdi pin, or it is scanned from one of these three registers onto the jtdo pin. data is input to the jtdi pin from the least-significant bit (lsb) of the selected register, whereas the most-significant bit (msb) of the selected register appears on the jtdo pin output. the jtms signal controls the state transitions of the main tap controller state machine. the jtck signal is a dedicated test clock that allows serial jtag data to be shifted synchronously, independent of any chip-specific or system clock. figure 13-6 jtag test access port the jtdi and jtms signals are sampled in synchronization with the rising edge of the jtck signal. state on the jtdo signal changes in synchronization with the falling edge of the jtck signal. jtdo pin cpu data scanned in serially context is saved instruction register context is saved boundary- scan register context is saved bypass register data scanned out serially 0 2 0 0 56 cpu context is saved instruction register context is saved boundary- scan register context is saved bypass register 0 2 0 0 56 lsb (msb) jtdi pin jtms pin jtms and jtdi sampled at rising edge of jtck jtck jtdo changes at falling edge of jtck chapter 13 348 user? manual u10504ej7v0um00 13.3.5 tap controller the processor incorporates a 16-state tap controller conforming to the ieee jtag standard. 13.3.6 controller reset the tap controller can be reset by one of the following: assert the coldreset signal keep the jtms signal asserted and input ?e rising edges of jtck signal in either case, keeping jtms signal asserted maintains the reset state. 13.3.7 controller states the tap controller has four states: reset, capture, shift, and update. they can be further classified as shift-r state or capture-dr state, depending on whether the type of signal is instruction or data. reset state (tap controller) the value 0x7 is loaded into the parallel output latch, selecting the bypass register as default. the most-significant bits of the boundary-scan register is cleared to 0, disabling the outputs. capture ir state the value 0x4 is loaded into the shift register stage. capture dr (boundary scan) state the data currently on the processor input and i/o pins is latched into the boundary-scan register. in this state, the boundary-scan register bits corresponding to output pins are undefined and cannot be checked during the scan out processing. shift ir state data is loaded serially into the shift register stage of the instruction register from the jtdi input pin, and the msb of the instruction register? shift register stage is shifted out to the jtdo pin. user? manual u10504ej7v0um00 349 jtag interface shift dr (boundary scan) state data is serially shifted into the boundary-scan register from the jtdi pin, and the contents of the boundary-scan register are serially shifted onto the jtdo pin. update ir state the current data in the shift register stage is loaded into the parallel output latch. update dr (boundary scan) state data in the boundary-scan register is latched into the register parallel output latch. bits corresponding to output pins, and those i/o pins whose outputs are enabled by the msb (oe1) of the boundary-scan register, are loaded onto the processor pins. table 13-2 shows the boundary scan order of the processor signals. table 13-2 jtag scan order no. signal name no. signal name no. signal name no. signal name 1 sysad4 16 sysad26 31 sysad23 46 sysad14 2 sysad3 17 pmaster 32 int3 47 sysad13 3 sysad2 18 sysad25 33 sysad22 48 sysad12 4 sysad1 19 ereq 34 sysad21 49 sysad11 5 sysad0 20 syscmd0 35 sysad20 50 sysad10 6 preq 21 syscmd1 36 rfu (input: always 1) 51 int0 7 sysad31 22 reset 37 rfu (input: always 1) 52 sysad9 8pv alid 23 ev alid 38 tclock 53 sysad8 9 sysad30 24 syscmd2 39 syncout 54 sysad7 10 eok 25 syscmd3 40 sysad19 55 sysad6 11 sysad29 26 coldreset 41 sysad18 56 sysad5 12 sysad28 27 syscmd4 42 sysad17 57 int1 13 sysad27 28 divmode1 43 int4 58 jsysaden 14 int2 29 sysad24 44 sysad16 15 nmi 30 divmode0 45 sysad15 chapter 13 350 user? manual u10504ej7v0um00 13.4 notes on implementation this section describes points to be noted of jtag boundary-scan operation that are specific to the processor. the masterclock , syncin , and syncout signal pads do not support jtag. the update function occurs on the falling edge of jtck signal after the tap controller enters the update-dr state. this conforms to the ieee standard. the v r 4200 generates the update function at the next rising edge. in other words, it is 1/2jtck cycle late as compared with the v r 4300. user? manual u10504ej7v0um00 351 interrupts 14 four types of interrupt are available on the v r 4300. these are: one non-maskable interrupt, nmi ve external normal interrupts two software interrupts one timer interrupt these are described in this chapter. chapter 14 352 user? manual u10504ej7v0um00 14.1 non-maskable interrupt the non-maskable interrupt request is accepted by asserting the nmi signal (low), forcing the processor to branch to the reset exception vector. nmi signal is latched into an internal register in synchronization with the rising edge of sclock signal, as shown in figure 14-1. the nmi signal is edge-triggered, and nmi request is acknowledged when the nmi signal is kept low for more than one cycle. this signal must be high after an exception occurs. an nmi request can also be set by an external write request through the sysad(31:0) bus. on the data cycle, sysad6 acts as the nmi request bit (1:requested) and sysad22 acts as the write enable bit (1:enable) for sysad6 . nmi only takes effect when the processor pipeline is running. thus nmi can be used to recover the processor from a software hang up (for example, in an infinite loop) but cannot be used to recover the processor from a hardware hang up (for example, no read response from an external device). nmi cannot cause drive contention on the sysad(31:0) bus and no reset of external agents is required. this interrupt cannot be masked. figure 14-1 shows the internal processing of the nmi signal. the low-level signal input to nmi pin is latched into an internal register in synchronization with the rising edge of sclock . bit 6 of the internal register is then ored with the inverted value of latched nmi signal to transfer internally as the non-maskable interrupt request. user? manual u10504ej7v0um00 353 interrupts figure 14-1 nmi signal 14.2 external normal interrupts these interrupt requests are accepted by asserting int(4:0) signal (low). int(4:0) signals are level-triggered, and these signals must be kept low until an external interrupt exception is generated. after an external interrupt exception occurs, int(4:0) signal must be high before the processor returns to its normal routine, or before multiple interrupts are enabled. this interrupt request can be set by an external write request through the sysad(31:0) bus. during the data cycle, sysad(4:0) acts as the external interrupt request bit (1:requested) and sysad(20:16) acts as the write enable bit (1:enable) for sysad(4:0) . after an external interrupt exception occurs, an external write request must be issued to clear the corresponding bit of the interrupt register to 0 before the processor returns to its normal routine, or before multiple interrupts are enabled. these interrupt requests can be masked with the im(6:2) , ie , exl, and erl fields of the status register. 6 interrupt request register (6) nmi or gate nmi inverter sclock (internal register) external write request chapter 14 354 user? manual u10504ej7v0um00 14.3 software interrupts these interrupt requests are accepted by setting bit 1 or 0 of the interrupt pending, ip , field in the cause register to 1. these bits can be written by software, but there is no hardware mechanism to set or clear these bits. after a software interrupt exception occurs, the corresponding bit of the ip field in the cause register must be cleared to 0 before the processor returns to its normal routine, or before multiple interrupts are enabled. these interrupt requests are maskable with the im(1:0) , ie , exl , and erl fields of the status register. 14.4 timer interrupt these interrupt requests use bit 7 of the ip (interrupt pending) field in the cause register. the timer interrupt is automatically set and accepted whenever the value of the count register equals the value of the compare register. to clear this interrupt request, either clear the ip7 bit of the cause register, or change the contents of the compare register. this interrupt request is maskable through the im7 bit and ie , exl and erl fields of the status register. 14.5 generation of interrupt request signal when an external agent issues an external write request, it is written to the interrupt register. this register can be used in an external write cycle, but not in an external read cycle. when data is written to the interrupt register, the processor ignores the address issued by the external agent. this register cannot be read or written by software unlike the cp0 register. in the data cycle, bits sysad20 through sysad16 are used as individual write enable bits corresponding to the 5 bits of the interrupt register. the values sysad4 through sysad0 are written to the bits of the interrupt register. therefore, the bits 0 through 4 of the interrupt register can be set or cleared by issuing an external write request only once. figure 14-2 illustrates this along with the nmi described earlier. user? manual u10504ej7v0um00 355 interrupts figure 14-2 interrupt register bits and enables bits sysad6 32 0 1 4 19 18 16 17 20 sysad(4:0) interrupt set value sysad(20:16) write enables interrupt register refer to figures 14-3 and 14-4 . 2 1 0 4 3 22 sysad22 nonmaskable interrupt 6 refer to figure 14-1 . bit sysad(4:0) sysad(20:16) sysad6 sysad22 meaning external interrupt request int (4:0) write enable bits for sysad(4:0) nmi write enable bit for sysad6 setting 1 : requested 0 : no request (for each bit) 1 : enable 0 : disable (for each bit) 1 : requested 0 : no request 1 : enable 0 : disable 6 chapter 14 356 user? manual u10504ej7v0um00 14.5.1 detection of hardware interrupts figure 14-3 shows how the v r 4300 hardware interrupt causes are detected through the cause register. the timer interrupt signal, ip7 , is directly detected as bit 15 of the cause register. the other hardware interrupt signals are directly detected since bits 4:0 of the interrupt register are ored one by one with each signal of the interrupt pins int(4:0) and the result is input to bits 14:10 of the cause register. ip(1:0) of the cause register are related to software interrupts. (refer to chapter 6 exception processing for detail.) there is no hardware mechanism for setting or clearing the software interrupts. figure 14-3 hardware interrupt request signals 21 0 4 3 cause register (15:10) interrupt register (4:0) timer interrupt refer to figure 14-4 . int4 10 3 2 4 (internal register) int0 int3 int2 int1 ip4 ip3 ip2 ip6 ip5 ip7 12 11 10 14 13 15 user? manual u10504ej7v0um00 357 interrupts 14.5.2 masking of interrupt request signals figure 14-4 shows the masking of the v r 4300 interrupt request signals. cause register bits 15:8 (ip7-ip0) are and-ored with status register interrupt mask bits 15:8 (im7-im0) to mask individual interrupt signals. status register bit 0 is a global interrupt enable (ie) bit. the output of this bit is anded with the output of the and-or logic block to produce the v r 4300 interrupt signal as shown in figure 14-4. the exl bit in the status register also enables these interrupts. figure 14-4 masking of interrupt requests bit ie im(7:0) ip(7:0) meaning enable all interrupts mask interrupts interrupt requests setting 1 : enable 0 : disable 1 : enable 0 : disable (for each bit) 1 : request pending 0 : no pending (for each bit) status register sr(15:8) and-or block im2 im1 im0 im4 im3 im5 im6 im7 cause register (15:8) ip2 ip1 ip0 ip4 ip3 ip5 ip6 ip7 and block v r 4300 interrupt ie status register sr0 1 8 8 1 software interrupts external normal interrupts timer interrupt 10 9 8 12 11 13 14 15 10 9 8 12 11 13 14 15 358 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 359 power management 15 one of the objectives of the design of the v r 4300 processor is to minimize power consumption in order to make the processor suitable for use in battery operated systems, as well as in environments where low power consumption and heat dissipation are desirable. to accomplish this, the v r 4300 has power management features which bring a dynamic reduction of power consumption, described in this chapter. chapter 15 360 user? manual u10504ej7v0um00 15.1 features the v r 4300 has three processor-level operation modes: normal, low power (100 mhz model of the v r 4300 and the v r 4305 only), and power off. these modes allow processor power consumption to be managed by system logic. generally a notebook system has many different levels of power management. it is the responsibility of system logic to switch the processor between the three available modes in order to reflect the power management state of the system. 15.1.1 normal power mode the normal pipeline clock ( pclock ) is generated based on the input clock ( masterclock ). the ratio of the frequency of pclock to that of masterclock is set by the divmode(1:0)* pins. for the details of setting, refer to 2.2.2 clock/ control interface signals . the frequency of the system interface clock ( sclock ) is the same as that of masterclock . the processor operates in the normal mode as default condition. the processor enters the default status after reset. * in v r 4300 and v r 4305. in v r 4310, divmode(2:0). 15.1.2 low power mode the low power mode is supported only in the 100 mhz model of the v r 4300 and the v r 4305. the processor operates in the low power mode when the rp bit of the status register is set. in this mode, the processor once stalls the pipeline, entering the quiescent status. in this status, the store buffer becomes empty, and all cache misses are processed. the frequency of pclock drops to the 1/4 of the normal level. the speeds of sclock and tclock also drop to the 1/4 of the normal level. example when divmode (1:0) = 10 in 100 mhz model of the v r 4300 masterclock pclock sclock, tclock normal mode 50 mhz 100 mhz 50 mhz low power mode 50 mhz 25 mhz 12.5 mhz the low power mode can reduce the power consumption of the processor to about 25% of the normal level. when setting or clearing the rp bit, guarantee the normal operation of the system by software. user? manual u10504ej7v0um00 361 power management also keep in mind the following points. 1. the functions of circuits such as the dram refresh counter change if the operating frequency changes. consequently, first write new values to the registers of the external agent that are directly affected by changes in the frequency. 2. make sure that the operation of the system interface is inactive. for example, execute an instruction that reads the non-cache area, and vacate the write/ buffer after execution of the instruction. after that, the rp bit can be set or cleared. 3. make sure that eight instructions before and after the mtc0 instruction that sets or clears the rp bit do not cause an exception such as cache miss or tlb miss exception. 15.1.3 power off mode in the power off mode, power supply to the processor is entirely cut off and operation of the processor stops completely. before entering power off mode, the state of the processor is written to non- volatile memory. when the processor returns to the normal mode, all registers are restored to their previous state. in order to support power off mode, all internal state information necessary for restoring the processor from the state of power off is read and write accessible. prior to power off, this information must be saved into non-volatile memory connected externally. it is the system? responsibility to power off the chip when the system is in idle state. at this time the load link ll bit is not required to be saved since it is automatically cleared by the cache start-up. cache content is not retained, and therefore the cache should be invalidated during the power-on routine and written back to the memory during the power-off routine. the v r 4300 chip supports the cache instructions and tlb operation instructions which invalidate all caches and tlb contents. 362 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 363 cpu instruction set details 16 this chapter provides a detailed description of the function of each v r 4300 cpu instruction in both 32- and 64-bit modes. the instructions are listed in alphabetical order. for details of the fpu instruction set, refer to chapter 17 fpu instruction set details . chapter 16 364 user? manual u10504ej7v0um00 16.1 instruction notation conventions in this chapter, all variable subfields in an instruction format (such as rs, rt, immediate , etc.) are shown in lowercase characters. instruction names (such as add , sub , etc. ) are shown in upper case characters. for the sake of clarity, sometimes an alias is used for a subfield in the specific instructions. for example, we use rs = base for load and store instructions. such an alias is always lower case characters, since it also refers to a subfield. the actual encoding for all the mnemonics are located in 16.7 cpu instruction opcode bit encoding , and the bit encoding also accompanies each instruction description. in the instruction descriptions, the operation section describes the operation performed by each instruction using a high-level language notation. the v r 4300 can operate in either 32- or 64-bit mode. differences in operations in each mode are shown in operation section. special symbols used in the notation are described in table 16-1. user? manual u10504ej7v0um00 365 cpu instruction set details table 16-1 cpu instruction operation notations symbol meaning ? substitution || bit string concatenation. x y repetition of bit string x with a y -bit string. x is always a single-bit value. x y...z selection of bits y through z for bit string x . little-endian bit notation is always used. if y is less than z , this expression is an empty (zero length) bit string. + 2? complement or floating-point addition. 2? complement or floating-point subtraction. * 2? complement or floating-point multiplication. div 2? complement integer division. mod 2? complement remainder. / floating-point division. < 2? complement less than comparison. and bit-wise logical and. or bit-wise logical or. xor bit-wise logical xor. nor bit-wise logical nor. gpr[ x ] general purpose register x. the content of gpr[0] is always zero. attempts to alter the content of gpr[0] have no effect. cpr[ z,x ] coprocessor unit z , general purpose register x. ccr[ z,x ] coprocessor unit z , control register x. coc[ z ] coprocessor unit z , condition signal. bigendianmem endian mode as configured at reset (0 ? little, 1 ? big). specifies the endianness of the memory interface (see loadmemory and storememory), and the endianness of kernel and supervisor modes. reverseendian signal to reverse the endianness of load and store instructions. this feature is available in user mode only, and is effected by setting the re bit of the status register. thus, reverseendian is set to 1 only when the re bit is set in user mode . bigendiancpu the endianness for load and store instructions (0 ? little, 1 ? big). in user mode, this endianness is reversed by setting re bit . thus, bigendiancpu is calculated as bigendianmem xor reverseendian. llbit bit showing synchronized state of instructions. set by ll instruction, cleared by eret instruction and read by sc instruction. t+ i : indicates the time steps between operations. each statement within a time step are defined to be executed in sequential order (instruction execution order may be changed by conditional branch and loop). operations which are marked t+i: are executed at instruction cycle i from the start of execution of the instruction. thus, an instruction which starts at time j executes operations marked t+ i : at time of i + j th cycle. the order is not defined for instructions executed at the same time or operations. chapter 16 366 user? manual u10504ej7v0um00 instruction notation examples the following are examples of the instruction notations: example #1: gpr[rt] ? sixteen zero bits are concatenated with a low-order immediate value (normally 16 bits), and the 32-bit string is substituted to cpu general purpose register rt . example #2: bit 15 (the sign bit) of an immediate value is extended by 16 bit positions, and the result is concatenated with bits 15 through 0 of the immediate value to generate a 32-bit sign extended value. immediate || 0 16 (immediate 15 ) 16 || immediate 15...0 user? manual u10504ej7v0um00 367 cpu instruction set details 16.2 load and store instructions in the v r 4300, the instruction immediately following a load instruction may use the loaded register contents. in such cases, the hardware interlocks by 1pcycle only, so scheduling load delay slots is desirable to improve performance, although not required as a functional code. two special instructions are provided in the v r 4300 implementation of the mips isa, load link and conditional store instructions. these instructions are used in carefully coded sequences to execute one of several synchronization primitives, including test-and-set, bit-level locks, semaphores, and sequencers/event counter, etc. this synchronization is essential in multi-processor systems. this functionality is included in the v r 4300 primarily for reasons to keep compatibility with the v r 4000 and v r 4200. in the load and store instruction descriptions, the functions listed below are used to simplify the handling of virtual addresses and physical memory. table 16-2 load and store instruction common functions function meaning addresstranslation uses tlb to search a physical address from a virtual address. if tlb does not have the requested contents of conversion, this function fails, and tlb non-coincidence exception occurs. loadmemory searches the cache and main memory to search for the contents of the specified data length stored in a specified physical address. if the specified data length is less than a word, the contents of a data position taking the endian mode and reverse endian mode of the processor into consideration are loaded. the low-order 3 bits and access type field of the address determine the data position in a data word. the data is loaded to the cache if the cache is enabled. storememory searches the cache, write buffer, and main memory to store the contents of a specified data length to a specified physical address. if the specified data length is less than a word, the contents of a data position taking the endian mode and reverse endian mode of the processor into consideration are stored. the low-order 3 bits and access type field of the address determine the data position in a data word. chapter 16 368 user? manual u10504ej7v0um00 the access type field indicates the size of the data to be loaded or stored. regardless of access type or byte order (endianness), the address specifies the byte which has the smallest byte address in the field accessed. for a big-endian system, this is the leftmost byte and contains the sign for a 2? complement value; for a little-endian system, this is the rightmost byte. table 16-3 access type specifications for load/store instructions the bytes within the accessed doubleword can be determined directly from the access type and the low-order three bits of the address. access type syscmd(2:0) meaning doubleword 7 8 bytes (64 bits) septibyte 6 7 bytes (56 bits) sextibyte 5 6 bytes (48 bits) quintibyte 4 5 bytes (40 bits) word 3 4 bytes (32 bits) triplebyte 2 3 bytes (24 bits) halfword 1 2 bytes (16 bits) byte 0 1 byte (8 bits) user? manual u10504ej7v0um00 369 cpu instruction set details 16.3 jump and branch instructions all jump and branch instructions have structural delay of exactly one instruction. that is, the instruction immediately following a jump or branch instruction (that is, occupying the delay slot) is executed while the target instruction is being fetched from the cache. a jump or branch instruction cannot be used in a delay slot; however, if they are used, the error is not detected and the results of such an operation are undefined. if an exception or interrupt prevents the completion of the instruction during it is in a delay slot, the hardware sets a virtual address to the epc register at the point of the jump or branch instruction that precedes it. when processing exceptions or interrupts is completed and the program is restored, both the jump or branch instruction and the instruction in the delay slot are reexecuted. because jump and branch instructions may be reexecuted after exception or interrupt processing, register 31 (the register in which the link address is stored) should not be used as a source register in jump and link/branch and link instructions. since instructions must be word-aligned, a jump register or jump and link register instruction must use a register which contains an address whose low- order two bits are zero. if these low-order two bits are not zero, an address exception will occur when the jump destination instruction is fetched. 16.4 coprocessor instructions coprocessors are alternate execution units, which have register files separate from the cpu. the mips architecture provides four coprocessor units and these coprocessors have two register spaces, each space containing thirty-two 32-bit registers. the ?st space, coprocessor general purpose registers , is directly loaded from and stored into the main memory, and their contents can be transferred between the coprocessor and processor. the second space, coprocessor control registers , can only have their contents transferred between the coprocessor and the processor. coprocessor instructions may alter registers in either space. chapter 16 370 user? manual u10504ej7v0um00 16.5 system control coprocessor (cp0) instructions there are some limitations imposed on operations involving cp0 that is incorporated within the cpu. although load and store instructions to transfer data to/from coprocessors and to exchange control codes to/from coprocessor instructions are generally permitted by the mips architecture, cp0 is given a somewhat protected status since it has responsibility for exception handling and memory management. therefore, the coprocessor transfer instructions are the only valid way for writing to and reading from the cp0 registers. some cp0 instructions are defined to directly read, write, and probe tlb entries and to change the operating modes in preparation for restoring to user mode or interrupt-enabled states. 16.6 cpu instructions this section describes in detail each function of cpu instructions in 32- or 64-bit mode. possible exceptions, which may occur are caused by instruction execution, and are explained at the end of the description for each instruction. refer to chapter 6 exception processing for details of exceptions and their processing. user? manual u10504ej7v0um00 371 cpu instruction set details format: add rd, rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are added to store the result in general purpose register rd . in 64-bit mode, the operands must be sign-extended, 32-bit values. an integer overflow exception occurs if the carries out of bits 30 and 31 differ (2? complement overflow). the contents of destination register rd is not modified when an integer overflow exception occurs. operation: exceptions: integer overflow exception add add 31 25 26 20 21 15 16 special rs rt 655 rd 0 add 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 add 32 t: gpr[rd] ? gpr[rs] + gpr[rt] 64 t: temp ? gpr[rs] + gpr[rt] gpr[rd] ? (temp 31 ) 32 || temp 31...0 chapter 16 372 user? manual u10504ej7v0um00 format: addi rt, rs, immediate description: the 16-bit immediate is sign-extended and added to the contents of general purpose register rs to store the result in general purpose register rt. in 64-bit mode, the operand must be sign-extended, 32-bit values. an integer overflow exception occurs if carries out of bits 30 and 31 differ (2? complement overflow). the contents of destination register rt is not modified when an integer overflow exception occurs. operation: exceptions: integer overflow exception addi add immediate 31 25 26 20 21 15 16 0 addi rs rt immediate 655 16 0 0 1 0 0 0 addi 32 t: gpr [rt] ? gpr[rs] +(immediate 15 ) 16 || immediate 15...0 64 t: temp ? gpr[rs] + (immediate 15 ) 48 || immediate 15...0 gpr[rt] ? (temp 31 ) 32 || temp 31...0 user? manual u10504ej7v0um00 373 cpu instruction set details format: addiu rt, rs, immediate description: the 16-bit immediate is sign-extended and added to the contents of general purpose register rs to store the result in general purpose register rt. no integer overflow exception occurs under any circumstance. in 64-bit mode, the operand must be sign-extended, 32-bit values. the only difference between this instruction and the addi instruction is that addiu instruction never causes an integer overflow exception. operation: exceptions: none addiu add immediate unsigned 31 25 26 20 21 15 16 0 addiu rs rt immediate 655 16 0 0 1 0 0 1 addiu 32 t: gpr [rt] ? gpr[rs] + (immediate 15 ) 16 || immediate 15...0 64 t: temp ? gpr[rs] + (immediate 15 ) 48 || immediate 15...0 gpr[rt] ? (temp 31 ) 32 || temp 31...0 chapter 16 374 user? manual u10504ej7v0um00 format: addu rd, rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are added to store the result in general purpose register rd . no integer overflow exception occurs under any circumstance. in 64-bit mode, the operands must be sign-extended, 32-bit values. the only difference between this instruction and the add instruction is that addu instruction never causes an integer overflow exception. operation: exceptions: none addu add unsigned 31 25 26 20 21 15 16 special rs rt 655 rd 0 addu 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 addu 32 t: gpr[rd] ? gpr[rs] + gpr[rt] 64 t: temp ? gpr[rs] + gpr[rt] gpr[rd] ? (temp 31 ) 32 || temp 31...0 user? manual u10504ej7v0um00 375 cpu instruction set details format: and rd, rs, rt description: the contents of general purpose register rs are combined with the contents of general purpose register rt in a bit-wise logical and operation. the result is stored in general purpose register rd . operation: exceptions: none and and 31 25 26 20 21 15 16 special rs rt 655 rd 0 and 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 and 32 t: gpr[rd] ? gpr[rs] and gpr[rt] 64 t: gpr[rd] ? gpr[rs] and gpr[rt] chapter 16 376 user? manual u10504ej7v0um00 format: andi rt, rs, immediate description: the 16-bit immediate is zero-extended and combined with the contents of general purpose register rs in a bit-wise logical and operation. the result is stored in general purpose register rt . operation: exceptions: none andi and immediate 31 25 26 20 21 15 16 0 andi rs rt immediate 655 16 0 0 1 1 0 0 andi 32 t: gpr[rt] ? 0 16 || (immediate and gpr[rs] 15...0 ) 64 t: gpr[rt] ? 0 48 || (immediate and gpr[rs] 15...0 ) user? manual u10504ej7v0um00 377 cpu instruction set details format: bczf offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if cpz? condition signal (cpcond), as sampled during the previous instruction execution, is false, then the program branches to the branch address with a delay of one instruction. because the condition signal is sampled during the previous instruction execution, there must be at least one instruction between this instruction and a coprocessor instruction that changes the condition signal. operation: * refer to the table opcode bit encoding on the next page, or 16.7 cpu instruction opcode bit encoding . bczf branch on coprocessor z false 5 16 15 bc 31 25 26 copz 6 0 16 offset bcf 21 20 5 0 1 0 0 x x * 0 1 0 0 0 0 0 0 0 0 bczf t: target ? (offset 15 ) 14 || offset || 0 2 32 t?: condition ? not coc[z] t+1: if condition then pc ? pc + target endif t: target ? (offset 15 ) 46 || offset || 0 2 64 t?: condition ? not coc[z] t+1: if condition then pc ? pc + target endif chapter 16 378 user? manual u10504ej7v0um00 exceptions: coprocessor unusable exception opcode bit encoding: bczf (continued) branch on coprocessor z false bczf bczf 31 30 29 28 27 26 bit # 25 0 bc0f 24 23 22 21 coprocessor number branch condition bc sub-opcode 20 19 18 17 16 opcode 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 31 30 29 28 27 26 bit # 25 0 bc1f 24 23 22 21 20 19 18 17 16 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 31 30 29 28 27 26 bit # 25 0 bc2f 24 23 22 21 20 19 18 17 16 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 user? manual u10504ej7v0um00 379 cpu instruction set details format: bczfl offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the cpz? condition signal (cpcond), as sampled during the previous instruction execution, is false, the program branches to the branch address with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. because the condition signal is sampled during the previous instruction execution, there must be at least one instruction between this instruction and a coprocessor instruction that changes the condition signal. * refer to the table opcode bit encoding on the next page, or 16.7 cpu instruction opcode bit encoding . bczfl 5 16 15 bc 31 25 26 copz 6 0 16 offset bcfl 21 20 5 0 1 0 0 x x * 0 1 0 0 0 0 0 0 1 0 bczfl branch on coprocessor z false likely chapter 16 380 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception opcode bit encoding: bczfl (continued) branch on coprocessor z bczfl false likely t: target ? (offset 15 ) 14 || offset || 0 2 32 t?: condition ? not coc[z] t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction t: target ? (offset 15 ) 46 || offset || 0 2 64 t?: condition ? not coc[z] t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction bczfl 31 30 29 28 27 26 bit # 25 0 bc0fl 24 23 22 21 coprocessor number branch condition bc sub-opcode 20 19 18 17 16 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 31 30 29 28 27 26 bit # 25 0 bc1fl 24 23 22 21 20 19 18 17 16 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 31 30 29 28 27 26 bit # 25 0 bc2fl 24 23 22 21 20 19 18 17 16 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 opcode user? manual u10504ej7v0um00 381 cpu instruction set details format: bczt offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the cpz? condition signal (cpcond) sampled during the previous instruction execution is true, then the program branches to the branch address with a delay of one instruction. because the condition signal is sampled during the previous instruction execution, there must be at least one instruction between this instruction and a coprocessor instruction that changes the condition signal. operation: * refer to the table opcode bit encoding on the next page, or 16.7 cpu instruction opcode bit encoding . bczt branch on coprocessor z true 5 16 15 bc 31 25 26 copz 6 0 16 offset bct 21 20 5 0 1 0 0 x x * 0 1 0 0 0 0 0 0 0 1 bczt t: target ? (offset 15 ) 14 || offset || 0 2 32 t?: condition ? coc[z] t+1: if condition then pc ? pc + target endif t: target ? (offset 15 ) 46 || offset || 0 2 64 t?: condition ? coc[z] t+1: if condition then pc ? pc + target endif chapter 16 382 user? manual u10504ej7v0um00 exceptions: coprocessor unusable exception opcode bit encoding: bczt (continued) branch on coprocessor z true bczt bczt 31 30 29 28 27 26 bit # 25 0 bc0t 24 23 22 21 coprocessor number branch condition bc sub-opcode 20 19 18 17 16 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 31 30 29 28 27 26 bit # 25 0 bc1t 24 23 22 21 20 19 18 17 16 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 31 30 29 28 27 26 bit # 25 0 bc2t 24 23 22 21 20 19 18 17 16 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 opcode user? manual u10504ej7v0um00 383 cpu instruction set details format: bcztl offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the cpz? condition signal (cpcond), as sampled during the previous instruction execution, is true, the program branches to the branch address with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. because the condition signal is sampled during the previous instruction execution, there must be at least one instruction between this instruction and a coprocessor instruction that changes the condition signal. operation: * refer to the table opcode bit encoding on the next page, or 16.7 cpu instruction opcode bit encoding . bcztl branch on coprocessor z 5 16 15 bc 31 25 26 copz 6 0 16 offset bctl 21 20 5 0 1 0 0 x x * 0 1 0 0 0 0 0 0 1 1 bcztl true likely t: target ? (offset 15 ) 14 || offset || 0 2 32 t?: condition ? coc[z] t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction t: target ? (offset 15 ) 46 || offset || 0 2 64 t?: condition ? coc[z] t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction chapter 16 384 user? manual u10504ej7v0um00 exceptions: coprocessor unusable exception opcode bit encoding: bcztl (continued) branch on coprocessor z bcztl true likely bcztl 31 30 29 28 27 26 bit # 25 0 bc0tl 24 23 22 21 coprocessor number branch condition bc sub-opcode 20 19 18 17 16 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 31 30 29 28 27 26 bit # 25 0 bc1tl 24 23 22 21 20 19 18 17 16 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 31 30 29 28 27 26 bit # 25 0 bc2tl 24 23 22 21 20 19 18 17 16 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 opcode user? manual u10504ej7v0um00 385 cpu instruction set details format: beq rs, rt, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. the contents of general purpose register rs and the contents of general purpose register rt are compared. if the two registers are equal, then the program branches to the branch address with a delay of one instruction. operation: exceptions: none beq branch on equal beq 31 25 26 20 21 15 16 0 beq rs rt offset 655 16 0 0 0 1 0 0 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] = gpr[rt]) t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] = gpr[rt]) t+1: if condition then pc ? pc + target endif chapter 16 386 user? manual u10504ej7v0um00 format: beql rs, rt, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. the contents of general purpose register rs and the contents of general purpose register rt are compared. if the two registers are equal, the program branches to the branch address with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none beql branch on equal likely 31 25 26 20 21 15 16 0 beql rs rt offset 655 16 0 1 0 1 0 0 beql 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] = gpr[rt]) t+1: if condition then pc ? pc + target else endif nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] = gpr[rt]) t+1: if condition then pc ? pc + target else endif nullifycurrentinstruction user? manual u10504ej7v0um00 387 cpu instruction set details format: bgez rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the contents of general purpose register rs are equal to or larger than 0, then the program branches to the branch address with a delay of one instruction. operation: exceptions: none bgez or equal to zero branch on greater than 31 25 26 20 21 15 16 0 regimm rs bgez offset 655 16 0 0 0 0 0 1 0 0 0 0 1 bgez 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) t+1: if condition then pc ? pc + target endif chapter 16 388 user? manual u10504ej7v0um00 format: bgezal rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. unconditionally, the address of the instruction next to the delay slot is stored in the link register, r31 . if the contents of general purpose register rs are equal to or larger than 0, then the program branches to the branch address, with a delay of one instruction. generally, general purpose register r31 should not be specified as general purpose register rs , because the contents of rs are destroyed by storing link address, and then it may not be reexecutable. an attempt to execute this instruction does not cause exception, however. operation: exceptions: none bgezal or equal to zero and link branch on greater than 31 25 26 20 21 15 16 0 regimm rs bgezal offset 655 16 0 0 0 0 0 1 1 0 0 0 1 bgezal 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 user? manual u10504ej7v0um00 389 cpu instruction set details format: bgezall rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. unconditionally, the address of the instruction next to the delay slot is stored in the link register, r31 . if the contents of general purpose register rs are equal to or larger than 0, then the program branches to the branch address, with a delay of one instruction. when it does not branch, instruction in the delay slot are discarded. generally, general purpose register r31 should not be specified as general purpose register rs , because the contents of rs are destroyed by storing link address, and then it may not be reexecutable. an attempt to execute this instruction does not cause any exception, however. o peration: exceptions: none bgezall or equal to zero branch on greater than bgezall and link likely 31 25 26 20 21 15 16 0 regimm rs bgezall offset 655 16 0 0 0 0 0 1 1 0 0 1 1 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 nullifycurrentinstruction else 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 nullifycurrentinstruction else chapter 16 390 user? manual u10504ej7v0um00 format: bgezl rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the contents of general purpose register rs are equal to or larger than 0, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none bgezl than or equal to zero likely branch on greater 31 25 26 20 21 15 16 0 regimm rs bgezl offset 655 16 0 0 0 0 0 1 0 0 0 1 1 bgezl 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction user? manual u10504ej7v0um00 391 cpu instruction set details format: bgtz rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. the contents of general purpose register rs are larger than zero, then the program branches to the branch address, with a delay of one instruction. operation: exceptions: none bgtz branch on greater than zero 31 25 26 20 21 15 16 0 bgtz rs 0 offset 655 16 0 0 0 1 1 1 0 0 0 0 0 bgtz 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) and (gpr[rs] 1 0 32 ) t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) and (gpr[rs] 1 0 64 ) t+1: if condition then pc ? pc + target endif chapter 16 392 user? manual u10504ej7v0um00 format: bgtzl rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. the contents of general purpose register rs are larger than 0, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none bgtzl than zero likely branch on greater 31 25 26 20 21 15 16 0 bgtzl rs 0 offset 655 16 0 1 0 1 1 1 0 0 0 0 0 bgtzl 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 0) and (gpr[rs] 1 0 32 ) t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 0) and (gpr[rs] 1 0 64 ) t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif user? manual u10504ej7v0um00 393 cpu instruction set details format: blez rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the contents of general purpose register rs are equal to 0 or smaller than 0, then the program branches to the branch address, with a delay of one instruction. operation: exceptions: none blez branch on less than 31 25 26 20 21 15 16 0 blez rs 0 offset 655 16 or equal to zero 0 0 0 1 1 0 0 0 0 0 0 blez 32 t: target ? (offset 15 ) 14 || offset || 0 2 t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) and (gpr[rs] = 0 64 ) t+1: if condition then pc ? pc + target endif condition ? (gpr[rs] 31 = 1) or (gpr[rs] = 0 32 ) chapter 16 394 user? manual u10504ej7v0um00 format: blezl rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. the contents of general purpose register rs is equal to or smaller than zero, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none blezl branch o n less than 31 25 26 20 21 15 16 0 blezl rs 0 offset 655 16 or equal to zero likely 0 1 0 1 1 0 0 0 0 0 0 blezl 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 1) or (gpr[rs] = 0 32 ) t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) and (gpr[rs] = 0 64 ) t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif user? manual u10504ej7v0um00 395 cpu instruction set details format: bltz rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. if the contents of general purpose register rs are smaller than 0, then the program branches to the branch address, with a delay of one instruction. operation: exceptions: none bltz branch on less than zero 31 25 26 20 21 15 16 0 regimm rs bltz offset 655 16 0 0 0 0 0 1 0 0 0 0 0 bltz 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 1) t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) t+1: if condition then pc ? pc + target endif chapter 16 396 user? manual u10504ej7v0um00 format: bltzal rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. unconditionally, the address of the instruction next to the delay slot is stored in the link register, r31 . if the contents of general purpose register rs are smaller than 0, then the program branches to the branch address, with a delay of one instruction. generally, general purpose register r31 should not be specified as general purpose register rs , because the contents of rs are destroyed by storing link address, and then it is not reexecutable. an attempt to execute this instruction does not generate exceptions, however. operation: exceptions: none bltzal than zero and link branch on less 31 25 26 20 21 15 16 0 regimm rs bltzal offset 655 16 0 0 0 0 0 1 1 0 0 0 0 bltzal 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 1) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 user? manual u10504ej7v0um00 397 cpu instruction set details format: bltzall rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. unconditionally, the instruction next to the delay slot is stored in the link register, r31 . if the contents of general purpose register rs is smaller than 0, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. generally, general purpose register r31 should not be specified as general purpose register rs , because the contents of rs are destroyed by storing link address, and then it is not reexecutable. an attempt to execute this instruction does not cause exception, however. operation: exceptions: none bltzall than zero and link likely branch on less 31 25 26 20 21 15 16 0 regimm rs bltzall offset 655 16 0 0 0 0 0 1 1 0 0 1 0 bltzall 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 1) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 nullifycurrentinstruction else 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) t+1: if condition then pc ? pc + target endif gpr[31] ? pc + 8 nullifycurrentinstruction else chapter 16 398 user? manual u10504ej7v0um00 format: bltzl rs, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted two bits left and sign-extended. unconditionally, the instruction next to the delay slot is stored in the link register, r31. if the contents of general purpose register rs are smaller than 0, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none bltzl branch on less than zero likely 31 25 26 20 21 15 16 0 regimm rs bltzl offset 655 16 0 0 0 0 0 1 0 0 0 1 0 bltzl 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 31 = 1) t+1: if condition then pc ? pc + target endif else nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 63 = 1) t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif user? manual u10504ej7v0um00 399 cpu instruction set details format: bne rs, rt, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. the contents of general purpose register rs and the contents of general purpose register rt are compared. if the two registers are not equal, then the program branches to the branch address, with a delay of one instruction. operation: exceptions: none bne branch on not equal 31 25 26 20 21 15 16 0 bne rs rt offset 655 16 0 0 0 1 0 1 bne 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 1 gpr[rt]) t+1: if condition then pc ? pc + target endif 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 1 gpr[rt]) t+1: if condition then pc ? pc + target endif chapter 16 400 user? manual u10504ej7v0um00 format: bnel rs, rt, offset description: a branch address is calculated from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted two bits left and sign-extended. the contents of general purpose register rs and the contents of general purpose register rt are compared. if the two registers are not equal, then the program branches to the branch address, with a delay of one instruction. if it does not branch, the instruction in the branch delay slot is discarded. operation: exceptions: none bnel branch on not equal likely 31 25 26 20 21 15 16 0 bnel rs rt offset 655 16 0 1 0 1 0 1 bnel 32 t: target ? (offset 15 ) 14 || offset || 0 2 condition ? (gpr[rs] 1 gpr[rt]) t+1: if condition then pc ? pc + target else endif nullifycurrentinstruction 64 t: target ? (offset 15 ) 46 || offset || 0 2 condition ? (gpr[rs] 1 gpr[rt]) t+1: if condition then pc ? pc + target else endif nullifycurrentinstruction user? manual u10504ej7v0um00 401 cpu instruction set details format: break description: a breakpoint exception occurs after execution of this instruction, transferring control to the exception handler. the code area is available for use to transfer parameters to the exception handler, the parameter is retrieved by the exception handler only by loading the contents of the memory word containing the instruction as data. operation: exceptions: breakpoint exception break breakpoint 31 25 26 special 6 0 break code 6 5 6 20 0 0 0 0 0 0 0 0 1 1 0 1 break 32, 64 t: breakpointexception chapter 16 402 user? manual u10504ej7v0um00 format: cache op, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the virtual address is translated to a physical address using the tlb, and the 5-bit sub-opcode op specifies a cache operation contents for the specified address. cp0 is not usable if the cp0 enable bit cu 0 in the status register in the user or supervisor mode is cleared, and a coprocessor unusable exception occurs after execution of this instruction. the execution of this instruction on any cache/ operation combination not listed below, or on a secondary cache which is not supplied to v r 4300, is undefined. the execution of this instruction in uncached area is also undefined. the index operation uses a part of the virtual address to specify a cache block. for example a cache of 2 cachebits bytes with 2 linebits bytes per tag, vaddr cachebits ... linebits specifies the block. the hit operation accesses the cache as normal data references, and performs the specified cache operation only if the cache contains valid data of the specified physical address (a hit). if data is not in the cache (a miss), the cache operation is not executed. cache cache operation 31 25 26 20 21 15 16 0 cache base op offset 655 16 1 0 1 1 1 1 cache user? manual u10504ej7v0um00 403 cpu instruction set details write back from a cache goes to the main memory. the address in the main memory to be written is the address in the cache tag and not the physical address translated by using tlb. the tlb miss exception and tlb invalid exception may occur when any cache operation is performed. the index* operation executed to the address in the unmapped area is used to prevent occurrence of the tlb exception. the index operation never generates the tlb change exception. bits 16 and 17 of the instruction code indicate the cache subject to the operation as follows. * although a physical address is used to index the cache, it does not have to coincide with the cache tag. bits 20:18 of this instruction specify the contents of the cache operation. for details, refer to the following pages. code symbol cache 0 i instruction cache 1 d data cache 2 reserved 3 reserved (continued) cache c ac h e o perat i on cache chapter 16 404 user? manual u10504ej7v0um00 op 4...2 caches cache operation operation 0 i index_invalidate set the cache state of the cache block to invalid. 0 d index_write_back _invalidate examine the cache state of the data cache block at the invalidate index specified by the virtual address. if the state is not invalid, then write back the block to main memory. the address to write is taken from the cache tag. set cache state of cache block to invalid. 1 i, d index_load_tag read the tag for the cache block at the specified index and place it into the t aglo register of the cp0. 2 i, d index_store_tag write the contents of the lo register of the cp0 register to the tag for the cache block at the specified index. 3 d create_dirty_exclusive this operation is used to load as little data as possible from main memory when writing new data into the entire cache block where the coherency is kept. if the cache block does not contain the specified address, and the block is dirty, write it back to main memory. in all cases, set the cache block tag to the specified physical address, set the cache state to dirty. 4 i, d hit_invalidate if the cache block contains the specified address, set the cache block state invalid. 5 d hit_write_back_invalidate if the cache block contains the specified address, write back the data if it is dirty, and set the cache block state invalid. 5 i fill fill the instruction cache block with the data from main memory. 6 d hit_write_back if the cache block contains the specified address and the cache state is in the dirty state, write back the data to main memory. 6 i hit_write_back if the cache block contains the specified address, write back the data unconditionally. cache cache (continued) c ac h e o perat i on user? manual u10504ej7v0um00 405 cpu instruction set details operation: exceptions: coprocessor unusable exception tlb invalid exception tlb miss exception bus error exception address error exception cache cache (continued) cache operation 32, 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) cacheop (op, vaddr, paddr) chapter 16 406 user? manual u10504ej7v0um00 format: cfcz rt, rd description: the contents of coprocessor control register rd of cpz are loaded to general purpose register rt. this instruction is not valid for cp0. operation: exceptions: coprocessor unusable exception opcode bit encoding: * refer to 16.7 cpu instruction opcode bit encoding . coprocessor z cfcz 11 move c ontrol from 31 25 26 20 21 15 16 copz cf rt 655 rd 0 5 11 10 0 0 1 0 0 x x * 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 cfcz 32 t: data ? ccr[z, rd] t+1: gpr[rt] ? data 64 t: data ? (ccr[z, rd] 31 ) 32 || ccr[z, rd] t+1: gpr[rt] ? data cfcz 31 30 29 28 27 26 bit # 25 0 cfc1 24 23 22 21 coprocessor number coprocessor sub-opcode opcode 0 1 0 0 0 1 0 0 0 1 0 31 30 29 28 27 26 bit # 25 0 cfc2 24 23 22 21 0 1 0 0 1 0 0 0 0 1 0 user? manual u10504ej7v0um00 407 cpu instruction set details format: copz cofun description: a coprocessor operation is performed. the operation may specify and reference internal coprocessor registers, and may change the state of the coprocessor condition line, but does not modify state within the processor or the cache/main memory. for details of coprocessor operations, refer to chapter 17 fpu instruction set details . operation: exceptions: coprocessor unusable exception floating-point exception (cp1 only) opcode bit encoding: * refer to 16.7 cpu instruction opcode bit encoding . copz coprocessor z operation 31 25 24 26 copz 6 0 cofun 25 1 co 0 1 0 0 x x * 1 copz 32, 64 t: coprocessoroperation (z, cofun) copz 31 30 29 28 27 26 bit # 25 0 cop0 coprocessor number coprocessor sub-opcode opcode 0 1 0 0 0 0 1 31 30 29 28 27 26 bit # 25 0 cop1 0 1 0 0 0 1 1 31 30 29 28 27 26 bit # 25 0 cop2 0 1 0 0 1 0 1 chapter 16 408 user? manual u10504ej7v0um00 format: ctcz rt, rd description: the contents of general purpose register rt are loaded into coprocessor control register rd of cpz . this instruction is not valid for cp0. operation: exceptions: coprocessor unusable exception opcode bit encoding: * refer to 16.7 cpu instruction opcode bit encoding . ctcz 11 move control to coprocessor z 31 25 26 20 21 15 16 copz ct rt 655 rd 0 5 11 10 0 0 1 0 0 x x * 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ctcz 32,64 t: data ? gpr[rt] t + 1: ccr[z, rd] ? data ctcz 31 30 29 28 27 26 bit # 25 0 ctc1 24 23 22 coprocessor number coprocessor sub-opcode opcode 0 1 0 0 0 1 0 0 1 1 31 30 29 28 27 26 bit # 25 0 ctc2 24 23 22 21 0 1 0 0 1 0 0 0 1 1 0 21 0 user? manual u10504ej7v0um00 409 cpu instruction set details format: dadd rd, rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are added, and the result is stored in general purpose register rd . an integer overflow exception occurs if the carries out of bits 62 and 63 differ (2? complement overflow). the contents of the destination register rd are not modified when an integer overflow exception occurs. this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: integer overflow exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dadd doubleword add 31 25 26 20 21 15 16 special rs rt 655 rd 0 dadd 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 dadd 64 t: gpr[rd] ? gpr[rs] + gpr[rt] chapter 16 410 user? manual u10504ej7v0um00 format: daddi rt, rs, immediate description: the 16-bit immediate is sign-extended and added to the contents of general purpose register rs , and the result is stored in general purpose register rt . an integer overflow exception occurs if carries out of bits 62 and 63 differ (2? complement overflow). the contents of the destination register rt are not modified when an integer overflow exception occurs. this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: integer overflow exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) daddi doubleword add immediate 31 25 26 20 21 15 16 0 daddi rs rt immediate 655 16 0 1 1 0 0 0 daddi 64 t: gpr [rt] ? gpr[rs] + (immediate 15 ) 48 || immediate 15...0 user? manual u10504ej7v0um00 411 cpu instruction set details format: daddiu rt, rs, immediate description: the 16-bit immediate is sign-extended and added to the contents of general purpose register rs , and the result is stored in general purpose register rt . this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. the only difference between this instruction and the daddi instruction is that daddiu instruction never causes an integer overflow exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) daddiu doubleword add 31 25 26 20 21 15 16 0 daddiu rs rt immediate 655 16 0 1 1 0 0 1 daddiu immediate unsigned 64 t: gpr [rt] ? gpr[rs] + (immediate 15 ) 48 || immediate 15...0 chapter 16 412 user? manual u10504ej7v0um00 format: daddu rd, rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are added, and the result is stored in general purpose register rd . this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. the only difference between this instruction and the dadd instruction is that daddu instruction never causes an integer overflow exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) daddu doubleword add unsigned 31 25 26 20 21 15 16 special rs rt 655 rd 0 daddu 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 daddu 64 t: gpr[rd] ? gpr[rs] + gpr[rt] user? manual u10504ej7v0um00 413 cpu instruction set details format: ddiv rs, rt description: the contents of general purpose register rs are divided by the contents of general purpose register rt, treating both operands as signed integers. an integer overflow exception never occurs, and the result of this operation is undefined when the divisor is zero. this instruction is usually executed after additional instructions to check for a zero divisor and for overflow. when the operation completes, the quotient word of the double result is loaded into special register lo , and the remainder word of the double result is loaded into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of those instructions are undefined. to obtain the correct result, insert two or more additional instructions between the mfhi or mflo and ddiv instruction. this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) ddiv doubleword divide 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 ddiv 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 ddiv ? undefined ? undefined ? undefined t?: lo ? undefined t: lo ? gpr[rs] div gpr[rt] hi t?: lo hi 64 hi ? gpr[rs] mod gpr[rt] chapter 16 414 user? manual u10504ej7v0um00 format: ddivu rs, rt description: the contents of general purpose register rs are divided by the contents of general purpose register rt, treating both operands as unsigned integers. an integer overflow exception never occurs, and the result of this operation is undefined when the divisor is zero. this instruction is executed after the instructions to check for a zero division. when the operation completes, the quotient (doubleword) is stored into special register lo , and the remainder (doubleword) is stored into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of those instructions are undefined. to obtain the correct result, insert two or more instructions in between the mfhi or mflo and ddivu instructions. this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) ddivu doubleword divide unsigned 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 ddivu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 ddivu ? undefined ? undefined ? undefined ? (0 || gpr[rs]) mod (0 || gpr[rt]) t?: lo ? undefined t: lo ? (0 || gpr[rs]) div (0 || gpr[rt]) hi t?: lo hi 64 hi user? manual u10504ej7v0um00 415 cpu instruction set details format: div rs, rt description: the contents of general purpose register rs are divided by the contents of general purpose register rt, treating both operands as unsigned integers. an overflow exception never occurs, and the result of this operation is undefined when the divisor is zero. in 64-bit mode, the result must be sign-extended, 32-bit values. this instruction is usually executed after the instructions to check for a zero division and for overflow. when the operation completes, the quotient (doubleword) is stored into special register lo , and the remainder (doubleword) is stored into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of those instructions are undefined. to obtain the correct result, insert two or more additional instructions in between the mfhi or mflo and div instructions. div divide 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 div 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 div chapter 16 416 user? manual u10504ej7v0um00 operation: exceptions: none (continued) div divide div ? undefined ? undefined ? undefined hi ? gpr[rs] mod gpr[rt] t?: lo ? undefined t: lo ? gpr[rs] div gpr[rt] hi t?: lo hi ? undefined ? undefined ? undefined t?: lo ? undefined t: q ? gpr[rs] 31...0 div gpr[rt] 31...0 hi t?: lo hi lo ? (q 31 ) 32 || q 31...0 hi ? (r 31 ) 32 || r 31...0 r ? gpr[rs] 31...0 mod gpr[rt] 31...0 32 64 user? manual u10504ej7v0um00 417 cpu instruction set details format: divu rs, rt description: the contents of general purpose register rs are divided by the contents of general purpose register rt, treating both operands as unsigned integers. an integer overflow exception never occurs, and the result of this operation is undefined when the divisor is zero. in 64-bit mode, the result must be sign-extended, 32-bit values. this instruction is executed after the instructions to check for a zero division. when the operation completes, the quotient (doubleword) is stored into special register lo , and the remainder (doubleword) is stored into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of those instructions are undefined. to obtain the correct result, insert two or more additional instructions in between the mfhi or mflo and divu instructions. divu divide unsigned 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 divu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 divu chapter 16 418 user? manual u10504ej7v0um00 operation: exceptions: none (continued) divu divide unsigned divu ? undefined ? undefined ? undefined hi ? (0 || gpr[rs]) mod (0 || gpr[rt]) t?: lo ? undefined t: lo ? (0 || gpr[rs]) div (0 || gpr[rt]) hi t?: lo hi ? undefined ? undefined ? undefined t?: lo ? undefined t: q ? (0 || gpr[rs] 31...0 ) div (0 || gpr[rt] 31...0 ) hi t?: lo hi lo ? (q 31 ) 32 || q 31...0 hi ? (r 31 ) 32 || r 31...0 r ? (0 || gpr[rs] 31...0 ) mod (0 || gpr[rt] 31...0 ) 32 64 user? manual u10504ej7v0um00 419 cpu instruction set details format: dmfc0 rt, rd description: the contents of coprocessor register rd of the cp0 are loaded into general purpose register rt. this operation is de?ed for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. the contents of the source coprocessor register rd are written to the 64-bit destination general purpose register rt . the operation of dmfc0 instruction on a 32-bit register of the cp0 is undefined. operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception (v r 4300 in 64-/32-bit user mode and supervisor mode if cp0 is disabled) reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dmfc0 doubleword move from rd 11 10 5 31 25 26 20 21 15 16 0 cop0 dmf rt 0 655 11 system control coprocessor 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 dmfc0 64 t: data ? cpr[0,rd] t+1: gpr[rt] ? data chapter 16 420 user? manual u10504ej7v0um00 format: dmtc0 rt, rd description: the contents of general purpose register rt are loaded into coprocessor register rd of the cp0. this operation is de?ed for the v r 4300 operating in 64-bit mode or in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. the contents of the source general purpose register rd are written to the 64-bit destination coprocessor register rt . the operation of dmtc0 instruction on a 32- bit register of the cp0 is undefined. because the state of the virtual address translation system may be altered by this instruction, the operation of load instructions, store instructions, and tlb operations immediately prior to and after this instruction are undefined. operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception (v r 4300 in 64-/32-bit user and supervisor mode if cp0 is disabled) reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dmtc0 doubleword move to 31 25 26 20 21 15 16 0 cop0 rt 655 11 system control coprocessor 0 1 0 0 0 0 dmtc0 rd dmt 0 0 1 0 1 5 10 11 0 0 0 0 0 0 0 0 0 0 0 0 64 t: data ? gpr[rt] t+1: cpr[0, rd] ? data user? manual u10504ej7v0um00 421 cpu instruction set details format: dmult rs, rt description: the contents of general purpose registers rs and rt are multiplied, treating both operands as signed integers. an integer overflow exception never occurs. when the operation completes, the low-order doubleword is stored into special register lo , and the high-order doubleword is stored into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of these instructions are undefined. to obtain the correct result, insert two or more other instructions in between the mfhi or mflo and dmult instructions. this operation is only defined for the v r 4300 operating in 64-bit mode and in 32- bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dmult doubleword multiply 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 dmult 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 dmult 64 t?: lo ? undefined hi ? undefined t?: lo ? undefined hi ? undefined t: t ? gpr[rs] * gpr[rt] lo ? t 63...0 h i ? t 127...64 chapter 16 422 user? manual u10504ej7v0um00 format: dmultu rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are multiplied, treating both operands as unsigned integers. an overflow exception never occurs. when the operation completes, the low-order doubleword is stored into special register lo , and the high-order doubleword is stored into special register hi . if either of the two preceding instructions is mfhi or mflo, the results of these instructions are undefined. to obtain the correct result, insert two or more other instructions in between the mfhi or mflo and dmultu instructions. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dmultu doubleword multiply 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 dmultu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 dmultu unsigned 64 t?: lo ? undefined hi ? undefined te1: lo ? undefined hi ? undefined t: t ? (0 || gpr[rs]) * (0 || gpr[rt]) lo ? t 63...0 hi ? t 127...64 user? manual u10504ej7v0um00 423 cpu instruction set details format: dsll rd, rt, sa description: the contents of general purpose register rt are shifted left by sa bits, inserting zeros into the low-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsll doubleword shift left logical 31 25 26 20 21 15 16 special 0 rt 655 rd sa dsll 55 6 11 10 6 5 0 0 0 0 0 0 0 1 1 1 0 0 0 dsll 0 0 0 0 0 64 t: s ? 0 || sa gpr[rd] ? gpr[rt] (63?)...0 || 0 s chapter 16 424 user? manual u10504ej7v0um00 format: dsllv rd, rt, rs description: the contents of general purpose register rt are shifted left by the number of bits specified by the low-order six bits contained in general purpose register rs , inserting zeros into the low-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsllv doubleword shift left 31 25 26 20 21 15 16 special rs rt 655 rd 0 dsllv 55 6 11 10 6 5 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 dsllv logical variable 64 t: s ? gpr[rs] 5...0 gpr[rd] ? gpr[rt] (63?)...0 || 0 s user? manual u10504ej7v0um00 425 cpu instruction set details format: dsll32 rd, rt, sa description: the contents of general purpose register rt are shifted left by 32+sa bits, inserting zeros into the low-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsll32 doubleword shift left 31 25 26 20 21 15 16 special rt 655 rd sa dsll32 5 5 6 11 10 6 5 0 0 0 0 0 0 0 1 1 1 1 0 0 dsll32 logical + 32 0 0 0 0 0 0 64 t: s ? 1 || sa gpr[rd] ? gpr[rt] (63?)...0 || 0 s chapter 16 426 user? manual u10504ej7v0um00 format: dsra rd, rt, sa description: the contents of general purpose register rt are shifted right by sa bits, sign- extending the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsra doubleword 31 25 26 20 21 15 16 special 0 rt 655 rd sa dsra 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 dsra shift right arithmetic 64 t: s ? 0 || sa gpr[rd] ? (gpr[rt] 63 ) s || gpr[rt] 63...s user? manual u10504ej7v0um00 427 cpu instruction set details format: dsrav rd, rt, rs description: the contents of general purpose register rt are shifted right by the number of bits specified by the low-order six bits of general purpose register rs , sign-extending the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsrav doubleword shift right 31 25 26 20 21 15 16 special rs rt 655 rd 0 dsrav 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 dsrav arithmetic variable 64 t: s ? gpr[rs] 5...0 gpr[rd] ? (gpr[rt] 63 ) s || gpr[rt] 63...s chapter 16 428 user? manual u10504ej7v0um00 format: dsra32 rd, rt, sa description: the contents of general purpose register rt are shifted right by 32+sa bits, sign- extending the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsra32 doubleword shift right 31 25 26 20 21 15 16 special 0 rt 655 rd sa dsra32 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 dsra32 arithmetic + 32 64 t: s ? 1 || sa gpr[rd] ? (gpr[rt] 63 ) s || gpr[rt] 63...s user? manual u10504ej7v0um00 429 cpu instruction set details format: dsrl rd, rt, sa description: the contents of general purpose register rt are shifted right by sa bits, inserting zeros into the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsrl doubleword 31 25 26 20 21 15 16 special rt 655 rd sa dsrl 55 6 11 10 6 5 0 0 0 0 0 0 0 1 1 1 0 1 0 dsrl 0 0 0 0 0 0 shift right logical 64 t: s ? 0 || sa gpr[rd] ? 0 s || gpr[rt] 63...s chapter 16 430 user? manual u10504ej7v0um00 format: dsrlv rd, rt, rs description: the contents of general purpose register rt are shifted right by the number of bits specified by the low-order six bits of general purpose register rs, inserting zeros into the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsrlv doubleword shift right 31 25 26 20 21 15 16 special rt 655 rd 0 dsrlv 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 dsrlv logical variable rs 64 t: s ? gpr[rs] 5...0 gpr[rd] ? 0 s || gpr[rt] 63...s user? manual u10504ej7v0um00 431 cpu instruction set details format: dsrl32 rd, rt, sa description: the contents of general purpose register rt are shifted right by 32+sa bits, inserting zeros into the high-order bits. the result is stored in general purpose register rd . this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsrl32 doubleword shift right 31 25 26 20 21 15 16 special rt 655 rd sa dsrl32 55 6 11 10 6 5 0 0 0 0 0 0 0 1 1 1 1 1 0 dsrl32 logical + 32 0 0 0 0 0 0 64 t: s ? 1 || sa gpr[rd] ? 0 s || gpr[rt] 63...s chapter 16 432 user? manual u10504ej7v0um00 format: dsub rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs , and the result is stored in general purpose register rd. an integer overflow exception takes place if the carries out of bits 62 and 63 differ (2? complement overflow). the contents of destination register rd are not modified when an integer overflow exception occurs. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: integer overflow exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsub dsub doubleword subtract 31 25 26 20 21 15 16 special rs rt 655 rd 0 dsub 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 64 t: gpr[rd] ? gpr[rs] ?gpr[rt] user? manual u10504ej7v0um00 433 cpu instruction set details format: dsubu rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs , and the result is stored in general purpose register rd . the only difference between this instruction and the dsub instruction is that dsubu instruction never causes an integer overflow exception. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark same operation in the 32-bit kernel mode. exceptions: reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) dsubu doubleword subtract unsigned 31 25 26 20 21 15 16 special rs rt 655 rd 0 dsubu 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 dsubu 64 t: gpr[rd] ? gpr[rs] ?gpr[rt] chapter 16 434 user? manual u10504ej7v0um00 format: eret description: eret is the v r 4300 instruction for returning from an interrupt, exception, or error exception. unlike a branch or jump instruction, eret does not execute the next instruction. eret instruction must not itself be placed in a branch delay slot. if the erl bit of the status register is set ( sr 2 = 1), load the contents of the errorepc register to the pc and clear the erl bit to zero. otherwise ( sr 2 = 0), load the pc from the epc , and clear the exl bit of the status register to zero (sr 1 = 0). an eret instruction executed between a ll instruction and sc instruction also causes the sc instruction to fail, since eret instruction clears the ll bit to zero. operation: exceptions: coprocessor unusable exception eret return from exception 0 6 6 5 31 25 24 26 cop0 6 0 eret 19 1 co 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eret 32, 64 t: if sr 2 = 1 then pc ? errorepc sr ? sr 31...3 || 0 || sr 1...0 else pc ? epc sr ? sr 31...2 || 0 || sr 0 endif llbit ? 0 user? manual u10504ej7v0um00 435 cpu instruction set details format: j target description: the 26-bit target is shifted left two bits and combined with the high-order four bits of the address of the delay slot to calculate the target address. the program unconditionally jumps to this calculated address with a delay of one instruction. operation: exceptions: none j jump 31 25 26 j 6 0 target 26 0 0 0 0 1 0 j 32 t: temp ? target t+1: pc ? pc 31...28 || temp || 0 2 64 t: temp ? target t+1: pc ? pc 63...28 || temp || 0 2 chapter 16 436 user? manual u10504ej7v0um00 format: jal target description: the 26-bit target is shifted left two bits and combined with the high-order four bits of the address of the delay slot to calculate the address. the program unconditionally jumps to this calculated address with a delay of one instruction. the address of the instruction after the delay slot is placed in the link register, r31. operation: exceptions: none jal jump and link 31 25 26 jal 6 0 target 26 0 0 0 0 1 1 jal gpr[31] ? pc + 8 32 t: temp ? target t+1: pc ? pc 31...28 || temp || 0 2 gpr[31] ? pc + 8 64 t: temp ? target t+1: pc ? pc 63...28 || temp || 0 2 user? manual u10504ej7v0um00 437 cpu instruction set details format: jalr rs jalr rd, rs description: the program unconditionally jumps to the address contained in general purpose register rs , with a delay of one instruction. the address of the instruction after the delay slot is stored in general purpose register rd . the default value of rd , if omitted in the assembly language instruction, is 31. register numbers rs and rd should not be equal, because such an instruction does not have the same effect when re-executed. if they are equal, the contents of rs are destroyed by storing link address. however, if an attempt is made to execute this instruction, an exception will not occur, and the result of executing such an instruction is undefined. since instructions must be word-aligned, a jump and link register instruction must specify a target register ( rs ) which contains an address whose low-order two bits are zero. if these low-order two bits are not zero, an address exception will occur when the jump target instruction is fetched. operation: exceptions: none jalr jump and link register 31 25 26 20 21 15 16 special rs 0 655 rd 0 jalr 55 6 11 10 6 5 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 jalr 32, 64 t: temp ? gpr [rs] gpr[rd] ? pc + 8 t+1: pc ? temp chapter 16 438 user? manual u10504ej7v0um00 format: jr rs description: the program unconditionally jumps to the address contained in general purpose register rs , with a delay of one instruction. since instructions must be word-aligned, a jump register instruction must specify a target register ( rs ) which contains an address whose low-order two bits are zero. if these low-order two bits are not zero, an address exception will occur when the jump target instruction is fetched. operation: exceptions: none jr jump register 21 20 31 25 26 special 6 0 jr rs 0 6 5 515 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 jr 32, 64 t: temp ? gpr[rs] t+1: pc ? temp user? manual u10504ej7v0um00 439 cpu instruction set details format: lb rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the byte at the memory location specified by the address are sign-extended and loaded into general purpose register rt . operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lb load byte 31 25 26 20 21 15 16 0 lb base rt offset 655 16 1 0 0 0 0 0 lb t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, byte, paddr, vaddr, data) byte ? vaddr 2...0 xor bigendiancpu 3 gpr[rt] ? (mem 7+8*byte ) 24 || mem 7+8*byte...8*byte paddr ? paddr psize ?1 ... 3 || (paddr 2...0 xor reverseendian 3 ) t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, byte, paddr, vaddr, data) byte ? vaddr 2...0 xor bigendiancpu 3 gpr[rt] ? (mem 7+8*byte ) 56 || mem 7+8*byte...8*byte paddr ? paddr psize ?1 ... 3 || (paddr 2...0 xor reverseendian 3 ) 32 64 chapter 16 440 user? manual u10504ej7v0um00 format: lbu rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the byte at the memory location specified by the address are zero-extended and loaded into general purpose register rt . operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lbu load byte unsigned 31 25 26 20 21 15 16 0 lbu base rt offset 655 16 1 0 0 1 0 0 lbu t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?1 ... 3 || (paddr 2...0 xor reverseendian 3 ) mem ? loadmemory (uncached, byte, paddr, vaddr, data) byte ? vaddr 2...0 xor bigendiancpu 3 gpr[rt] ? 0 24 || mem 7+8* byte...8* byte t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?1...3 || (paddr 2...0 xor reverseendian 3 ) mem ? loadmemory (uncached, byte, paddr, vaddr, data) byte ? vaddr 2...0 xor bigendiancpu 3 gpr[rt] ? 0 56 || mem 7+8* byte...8* byte 32 64 user? manual u10504ej7v0um00 441 cpu instruction set details format: ld rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the 64-bit doubleword at the memory location specified by the address are loaded into general purpose register rt . if any of the low-order three bits of the address are not zero, an address error exception occurs. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. exceptions: tlb miss exception tlb invalid exception bus error exception address error exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) ld load doubleword 31 25 26 20 21 15 16 0 ld base rt offset 655 16 1 1 0 1 1 1 ld 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, doubleword, paddr, vaddr, data) gpr[rt] ? mem chapter 16 442 user? manual u10504ej7v0um00 format: ldcz rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the processor loads a doubleword from the addressed memory location to cpz. the manner in which each coprocessor uses the data is defined by the individual coprocessor specifications. if any of the low-order three bits of the address are not zero, an address error exception takes place. this instruction is not valid for use with cp0. when the cp1 is specified, the fr bit of the status register equals zero, and the least-significant bit in the rt field is not zero; the operation of the instruction is undefined. if fr bit equals one, an odd or even register is specified by the rt . * refer to the table opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . ldcz load doubleword to coprocessor z 31 25 26 20 21 15 16 0 ldcz base rt offset 655 16 1 1 0 1 x x * ldcz user? manual u10504ej7v0um00 443 cpu instruction set details operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception coprocessor unusable exception opcode bit encoding: (continued) ldcz load doubleword to coprocessor z ldcz 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, doubleword, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, doubleword, paddr, vaddr, data) copzld (rt, mem) copzld (rt, mem) ldcz 31 30 29 28 27 26 bit # 0 ldc1 coprocessor number opcode 1 1 0 1 0 1 31 30 29 28 27 26 bit # 0 ldc2 1 1 0 1 1 0 chapter 16 444 user? manual u10504ej7v0um00 format: ldl rt, offset(base) description: this instruction is used in combination with the ldr instruction to load the doubleword data in the memory that is not at the word boundary to general purpose register rt . the ldl instruction loads the high-order portion of the data to the register, while the ldr instruction loads the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address that can specify any byte. of the doubleword data in the memory whose most-significant byte is specified by the generated address, only the data at the same word boundary as the target address is loaded and stored to the high-order portion of general purpose register rt . the remaining portion of the register is not affected. depending on the address specified, the number of bytes to be loaded changes from 1 to 8. in other words, first the addressed byte is stored to the most-significant byte position of general purpose register rt . if there is data of the low-order byte that follows the same doubleword boundary, the operation to store this data to the next byte of general purpose register rt is repeated. the remaining low-order byte is not affected. ldl load doubleword left 31 25 26 20 21 15 16 0 ldl base rt offset 655 16 0 1 1 0 1 0 ldl address 0 address 8 memory register ldl $24,3($0) $24 (big-endian) before after 1 0 234567 9 8 10 11 12 13 14 15 abcdefgh $24 34567fgh loading loading user? manual u10504ej7v0um00 445 cpu instruction set details the contents of general purpose register rt are internally bypassed within the processor so that no nop instruction is needed between an immediately preceding load instruction which targets general purpose register rt and a subsequent ldl (or ldr) instruction. the address error exception does not occur even if the specified address is not at the doubleword boundary. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. (continued) ldl load doubleword left ldl 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr psize?...3 || 0 3 endif byte ? vaddr 2...0 xor bigendiancpu 3 mem ? loadmemory (uncached, byte, paddr, vaddr, data) gpr[rt] ? mem 7+8*byte...0 || gpr[rt] 55?*byte...0 chapter 16 446 user? manual u10504ej7v0um00 the relationship between the address given to the ldl instruction and the result (bytes for registers) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception bus error exception address error exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) (continued) ldl load doubleword left ldl ldl acd b register ikl j memory egh f mop n 0pbcdefgh007ijklmnop700 1opcdefgh106jklmnoph601 2nopdefgh205klmnopgh502 3mnopefgh304lmnopfgh403 4lmnopfgh403mnopefgh304 5klmnopgh502nopdefgh205 6 j kl mnoph 6 0 1 op cdefgh 1 0 6 7 i j klmnop 7 0 0 pb cdefgh 0 0 7 bigendiancpu = 0 vaddr 2...0 destination destination type type offset offset bigendiancpu = 1 lem bem lem bem user? manual u10504ej7v0um00 447 cpu instruction set details format: ldr rt, offset(base) description: this instruction is used in combination with the ldl instruction to load the word data in the memory that is not at the word boundary to general purpose register rt. the ldl instruction loads the high-order portion of the data to the register, while the ldr instruction loads the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address that can specify any byte. of the word data in the memory whose least-significant byte is specified by the generated address, only the data at the same doubleword boundary as the target address is loaded and stored to the low-order portion of general purpose register rt . the remaining portion of the register is not affected. depending on the address specified, the number of bytes to be loaded changes from 1 to 8. in other words, first the addressed byte is stored to the least-significant byte position of general purpose register rt . if there is data of the high-order byte that follows the same doubleword boundary, the operation to store this data to the next byte of general purpose register rt is repeated. the remaining high-order byte is not affected. ldr load doubleword right 31 25 26 20 21 15 16 0 ldr base rt offset 655 16 0 1 1 0 1 1 ldr a ldr $24,4($0) after address 0 address 8 register $24 (big-endian) before 1 0234567 9 8 101112131415 bcdef gh a $24 bc01234 memory loading loading chapter 16 448 user? manual u10504ej7v0um00 the contents of general purpose register rt are bypassed within the processor so that no nop instruction is needed between an immediately preceding load instruction which targets general purpose register rt and a subsequent ldr (or ldl) instruction. the address error exception does not occur even if the specified address is not located at the doubleword boundary. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. (continued) ldr load doubleword right ldr 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 1 then paddr ? paddr 31...3 || 0 3 endif byte ? vaddr 2...0 xor bigendiancpu 3 mem ? loadmemory (uncached, doubleword - byte, paddr, vaddr, data) gpr[rt] ? gpr[rt] 63...64-8*byte || mem 63...8*byte user? manual u10504ej7v0um00 449 cpu instruction set details the relationship between the address given to the ldr instruction and the result (bytes for registers) is shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception bus error exception address error exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) (continued) ldr load doubleword right ldr ldr acd b register ikl j memory egh f mop n 0 i j klmnop 7 0 0 ab cdefgi 0 7 0 1 a i j kl mno 6 1 0 ab cdefi j 1 6 0 2 a bi jkl mn 5 2 0 ab cdei j k 2 5 0 3 a bci j kl m 4 3 0 ab cdi j k l 3 4 0 4abcdijkl340abcijklm430 5 a bcdei j k 2 5 0 ab i j klmn 5 2 0 6 abcdefi j 1 6 0 ai j klmno 6 1 0 7abcdefgi070ijklmnop700 bigendiancpu = 0 vaddr 2..0 destination destination type type offset offset bigendiancpu = 1 lem bem lem bem chapter 16 450 user? manual u10504ej7v0um00 format: lh rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the halfword at the memory location specified by the address are sign-extended and loaded into general purpose register rt . if the least-significant bit of the address is not zero, an address error exception occurs. operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lh load halfword 31 25 26 20 21 15 16 0 lh base rt offset 655 16 1 0 0 0 0 1 lh 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) mem ? loadmemory (uncached, halfword, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) gpr[rt] ? (mem 15+8*byte ) 16 || mem 15+8*byte...8* byte paddr ? paddr psize ?1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) mem ? loadmemory (uncached, halfword, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) gpr[rt] ? (mem 15+8*byte ) 16 || mem 15+8*byte...8* byte user? manual u10504ej7v0um00 451 cpu instruction set details format: lhu rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the halfword at the memory location specified by the address are zero-extended and loaded into general purpose register rt . if the least-significant bit of the address is not zero, an address error exception occurs. operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lhu load halfword unsigned 31 25 26 20 21 15 16 0 lhu base rt offset 655 16 1 0 0 1 0 1 lhu 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, halfword, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) gpr[rt] ? 0 16 || mem 15+8*byte...8*byte paddr ? paddr psize ?1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, halfword, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) gpr[rt] ? 0 48 || mem 15+8*byte...8*byte paddr ? paddr psize ?1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) chapter 16 452 user? manual u10504ej7v0um00 format: ll rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the word at the memory location specified by the address are loaded into general purpose register rt . in 64- bit mode, the loaded word is sign-extended. in addition, the specified physical address of the memory is stored to the lladdr register, and sets 1 to llbit. afterward, the processor checks whether the address stored to the lladdr register is not rewritten by the other processors or devices. load linked (ll) and store conditional (sc) instructions can be used to atomically update memory: this atomically increments the word addressed by t0. changing the add instruction to an or instruction changes this to an atomic bit set. this instruction is available in user mode, and it is not necessary to enable cp0. this instruction is defined to maintain the software compatibility with the v r 4400. ll load linked 31 25 26 20 21 15 16 0 ll base rt offset 655 16 1 1 0 0 0 0 ll l1: ll t1, (t0) add t2, t1, 1 sc t2, (t0) beq t2, 0, l1 nop user? manual u10504ej7v0um00 453 cpu instruction set details if the specified address is in the non-cache area, the operation of the ll instruction is undefined. a cache miss that occurs between the ll and sc instructions hinders execution of the sc instruction. usually, therefore, do not use a load or store instruction between the ll and sc instructions. otherwise, the operation of the sc instruction is not guaranteed. if an exception frequently occurs, the exception also hinders execution of the sc instruction. it is therefore necessary to disable the exception temporarily. if either of the low-order two bits of the address are not zero, an address error exception takes place. operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception (continued) ll load linked ll 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) mem ? loadmemory (uncached, word, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) gpr[rt] ? mem 31+8*byte...8*byte llbit ? 1 lladdr ? paddr 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) mem ? loadmemory (uncached, word, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) gpr[rt] ? (mem 31+8*byte ) 32 || mem 31+8*byte...8*byte llbit ? 1 lladdr ? paddr chapter 16 454 user? manual u10504ej7v0um00 format: lld rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the doubleword at the memory location specified by the address are loaded into general purpose register rt . in addition, the specified physical address of the memory is stored to the lladdr register, and sets 1 to llbit. afterward, the processor checks whether the address stored to the lladdr register is not rewritten by the other processors or devices. load linked doubleword (lld) instruction and store conditional doubleword (scd) instruction can be used to atomically update the memory: this atomically increments the doubleword addressed by t0. changing the dadd instruction to an or instruction changes this to an atomic bit set. this instruction is defined to maintain the software compatibility with the v r 4400. lld load linked doubleword 31 25 26 20 21 15 16 0 lld base rt offset 655 16 1 1 0 1 0 0 lld l1: lld t1, (t0) dadd t2, t1, 1 scd t2, (t0) beq t2, 0, l1 nop user? manual u10504ej7v0um00 455 cpu instruction set details if the specified address is in the non-cache area, the operation of the lld instruction is undefined. a cache miss that may occur between the lld and scd instructions hinders execution of the scd instruction. usually, therefore, do not use a load or store instruction between the lld and scd instructions. otherwise, the operation of the scd instruction will not be guaranteed. if an exception frequently occurs, the exception also hinders execution of the scd instruction. it is therefore necessary to disable the exception temporarily. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. (continued) lld load linked doubleword lld 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, doubleword, paddr, vaddr, data) gpr[rt] ? mem llbit ? 1 lladdr ? paddr 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, doubleword, paddr, vaddr, data) gpr[rt] ? mem llbit ? 1 lladdr ? paddr chapter 16 456 user? manual u10504ej7v0um00 exceptions: tlb miss exception tlb invalid exception bus error exception address error exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) (continued) lld load linked doubleword lld user? manual u10504ej7v0um00 457 cpu instruction set details format: lui rt, immediate description: the 16-bit immediate is shifted left 16 bits and combined to 16 bits of zeros. the result is placed into general purpose register rt . in 64-bit mode, the loaded word is sign-extended to 64 bits. operation: exceptions: none lui load upper immediate 31 25 26 20 21 15 16 0 lui rt immediate 655 16 0 0 1 1 1 1 lui 0 0 0 0 0 0 32 t: gpr[rt] ? immediate || 0 16 64 t: gpr[rt] ? (immediate 15 ) 32 || immediate || 0 16 chapter 16 458 user? manual u10504ej7v0um00 format: lw rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the word at the memory location specified by the address are loaded into general purpose register rt . in 64- bit mode, the loaded word is sign-extended to 64 bits. if either of the low-order two bits of the address is not zero, an address error exception occurs. operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lw load word 31 25 26 20 21 15 16 0 lw base rt offset 655 16 1 0 0 0 1 1 lw 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, word, paddr, vaddr, data) gpr[rt] ? mem 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, word, paddr, vaddr, data) gpr[rt] ? mem user? manual u10504ej7v0um00 459 cpu instruction set details format: lwcz rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the processor loads a word at the addressed memory location to the general purpose register rt of the cpz. the manner in which each coprocessor uses the data is defined by the individual coprocessor specifications. if either of the low-order two bits of the address is not zero, an address error exception occurs. this instruction is not valid for use with cp0. * refer to the table opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . lwcz load word to coprocessor z 31 25 26 20 21 15 16 0 lwcz base rt offset 655 16 1 1 0 0 x x * lwcz chapter 16 460 user? manual u10504ej7v0um00 operation: exceptions: tlb miss exception tlb invalid exception bus error exception address error exception coprocessor unusable exception opcode bit encoding: (continued) lwcz load word to coprocessor z lwcz 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) mem ? loadmemory (uncached, word, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) copzlw (byte, rt, mem) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) mem ? loadmemory (uncached, word, paddr, vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) copzlw (byte, rt, mem) lwcz 31 30 29 28 27 26 bit # 0 lwc1 coprocessor number opcode 1 1 0 0 0 1 31 30 29 28 27 26 bit # 0 lwc2 1 1 0 0 1 0 user? manual u10504ej7v0um00 461 cpu instruction set details format: lwl rt, offset(base) description: this instruction is used in combination with the lwr instruction to load the word data in the memory that is not at the word boundary to general purpose register rt. the lwl instruction loads the high-order portion of the data to the register, while the lwr instruction loads the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address that can specify any byte. of the word data in the memory whose most-significant byte is specified by the generated address, only the data at the same word boundary as the target address is loaded and stored to the high-order portion of general purpose register rt . the remaining portion of the register is not affected. depending on the address specified, the number of bytes to be loaded changes from 1 to 4. in other words, first the addressed byte is stored to the most-significant byte position of general purpose register rt . if there is data of the low-order byte that follows the same word boundary, the operation to store this data to the next byte of general purpose register rt is repeated. the remaining low-order byte is not affected. lwl load word left 31 25 26 20 21 15 16 0 lwl base rt offset 655 16 1 0 0 0 1 0 lwl address 0 address 4 0123 4567 memory abcd register $24 (big-endian) before after 123d $24 lwl $24,1($0) loading loading chapter 16 462 user? manual u10504ej7v0um00 the contents of general purpose register rt are bypassed within the processor so that no nop instruction is needed between an immediately preceding load instruction which targets general purpose register rt and a subsequent lwl (or lwr) instruction. the address exception error does not occur even if the specified address is not located at the word boundary. operation: (continued) lwl load word left lwl 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr psize?...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 word ? vaddr 2 xor bigendiancpu mem ? loadmemory (uncached, byte, paddr, vaddr, data) temp ? mem 32*word+8*byte+7 || gpr[rt] 23-8*byte...0 grp[rt] ? temp 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr psize?...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 word ? vaddr 2 xor bigendiancpu mem ? loadmemory (uncached, byte, paddr, vaddr, data) temp ? mem 32*word+8*byte+7 || gpr[rt] 23-8*byte...0 gpr[rt] ? (temp 31 ) 32 || temp user? manual u10504ej7v0um00 463 cpu instruction set details the relationship, between the address given to the lwl instruction and the result (bytes for registers) is shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) s: sign-extension of destination bit 31 exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lwl lwl (continued) load word left lwl acd b register ikl j memory egh f mop n 0 s ssspfgh 0 0 7 ssssi jkl 3 4 0 1 s sssopgh 1 0 6 ssssjkl h 2 4 1 2 s sssnoph 2 0 5 ssssklgh 1 4 2 3 s sssmnop 3 0 4 sssslfgh 0 4 3 4 s sssl fgh 0 4 3 ssssmnop 3 0 4 5 s sssklgh 1 4 2 ssssnoph 2 0 5 6 s sssj klh 2 4 1 ssssopgh 1 0 6 7 s sssi j kl 3 4 0 sssspfgh 0 0 7 bigendiancpu = 0 vaddr 2...0 destination destination type type offset offset bigendiancpu = 1 lem bem lem bem chapter 16 464 user? manual u10504ej7v0um00 format: lwr rt, offset(base) description: this instruction is used in combination with the lwl instruction to load the word data in the memory that is not at the word boundary to general purpose register rt . the lwl instruction loads the high-order portion of the data to the register, while the lwr instruction loads the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address that can specify any byte. of the word data in the memory whose least-significant byte is specified by the generated address, only the data at the same word boundary as the target address is loaded and stored to the low-order portion of general purpose register rt . the remaining portion of the register is not affected. depending on the address specified, the number of bytes to be loaded changes from 1 to 4. in other words, first the addressed byte is stored to the least-significant byte position of general purpose register rt . if there is data of the high-order byte that follows the same word boundary, the operation to store this data to the next byte of general purpose register rt is repeated. the remaining high-order byte is not affected. lwr load word right 31 25 26 20 21 15 16 0 lwr base rt offset 655 16 1 0 0 1 1 0 lwr address 0 address 4 0123 4567 abcd register lwr $24,4($0) $24 memory (big-endian) before after abc4$24 loading loading user? manual u10504ej7v0um00 465 cpu instruction set details the contents of general purpose register rt are bypassed within the processor so that no nop instruction is needed between an immediately preceding load instruction which targets general purpose register rt and a following ldl (or lwr) instruction. the address error exception does not occur even if the specified address is not located at the word boundary. operation: (continued) lwr load word right lwr 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 1 then paddr ? paddr psize?1...3 || 0 3 endif byte ? vaddr 1...0 xor bigendiancpu 2 word ? vaddr 2 xor bigendiancpu mem ? loadmemory (uncached, 0 || byte, paddr, vaddr, data) temp ? mem 31...32-8*byte...0 || mem 31+32*word-32*word+8*byte gpr[rt] ? temp 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 1 then paddr ? paddr psize?1...3 || 0 3 endif byte ? vaddr 1...0 xor bigendiancpu 2 word ? vaddr 2 xor bigendiancpu mem ? loadmemory (uncached, 0 || byte, paddr, vaddr, data) temp ? mem 31...32-8*byte...0 || mem 31+32*word-32*word+8*byte gpr[rt] ? (temp 31 ) 32 || temp chapter 16 466 user? manual u10504ej7v0um00 the relationship between the address given to the lwr instruction and the result (bytes for registers) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) s: sign-extension of destination bit 31 x: not affected (in 32-bit mode) sign-extension of destination bit 31 (in 64-bit mode) exceptions: tlb miss exception tlb invalid exception bus error exception address error exception lwr lwr (continued) load word right lwr acd b register ikl j memory egh f mop n 0 s sssmnop 3 0 4 xxxxefgi 0 7 0 1 x xxxemno 2 1 4 xxxxefi j 1 6 0 2 x xxxefmn 1 2 4 xxxxei j k 2 5 0 3 x xxxefgm 0 3 4 ssssi jkl 3 4 0 4 s sssi j kl 3 4 0 xxxxefgm 0 3 4 5 x xxxei j k 2 5 0 xxxxefmn 1 2 4 6 x xxxefi j 1 6 0 xxxxemno 2 1 4 7 x xxxefgi 0 7 0 ssssmnop 3 0 4 bigendiancpu = 0 vaddr 2...0 destination destination type type offset offset bigendiancpu = 1 lem bem lem bem user? manual u10504ej7v0um00 467 cpu instruction set details format: lwu rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the word at the memory location specified by the address are loaded into general purpose register rt . the loaded word is zero-extended in 64-bit mode. if either of the low-order two bits of the effective address is not zero, an address error exception occurs. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. lwu load word unsigned 31 25 26 20 21 15 16 0 lwu base rt offset 655 16 1 0 0 1 1 1 lwu 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, word, paddr, vaddr, data) gpr[rt] ? mem 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) mem ? loadmemory (uncached, word, paddr, vaddr, data) gpr[rt] ? 0 32 || mem chapter 16 468 user? manual u10504ej7v0um00 exceptions: tlb miss exception tlb invalid exception bus error exception address error exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) (continued) lwu load word unsigned lwu user? manual u10504ej7v0um00 469 cpu instruction set details format: mfc0 rt, rd description: the contents of general purpose register rd of the cp0 are loaded into general purpose register rt. operation: exceptions: coprocessor unusable exception (v r 4300 in 64-/32-bit user and supervisor mode if cp0 is disabled) mfc0 move from rd 11 10 5 31 25 26 20 21 15 16 0 cop0 mf rt 0 655 11 system control coprocessor 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mfc0 32 t: data ? cpr[0,rd] t+1: gpr[rt] ? data 64 t: data ? cpr[0,rd] t+1: gpr[rt] ? (data 31 ) 32 || data 31...0 chapter 16 470 user? manual u10504ej7v0um00 format: mfcz rt, rd description: the contents of general purpose register rd of cpz are loaded into general purpose register rt. operation: exceptions: coprocessor unusable exception * refer to the table opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . mfcz 11 move from coprocessor z 31 25 26 20 21 15 16 copz mf rt 655 rd 0 5 11 10 0 0 1 0 0 x x * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mfcz 32 t: data ? cpr[z,rd] t+1: gpr[rt] ? data 64 t: if rd 0 = 0 then data ? cpr[z, rd 4...1 || 0] 31...0 else data ? cpr[z, rd 4...1 || 0] 63...32 endif t+1: gpr[rt] ? (data 31 ) 32 || data user? manual u10504ej7v0um00 471 cpu instruction set details opcode bit encoding: (continued) mfcz mfcz move from coprocessor z mfcz 31 30 29 28 27 26 bit # 25 0 mfc0 24 23 22 21 coprocessor number coprocessor sub-opcode opcode 0 1 0 0 0 0 0 0 0 0 0 31 30 29 28 27 26 bit # 25 0 mfc1 24 23 22 21 0 1 0 0 0 1 0 0 0 0 0 31 30 29 28 27 26 bit # 25 0 mfc2 24 23 22 21 0 1 0 0 1 0 0 0 0 0 0 chapter 16 472 user? manual u10504ej7v0um00 format: mfhi rd description: the contents of special register hi are loaded into general purpose register rd . to ensure proper operation in the event of interruptions, the two instructions which follow a mfhi instruction may not be any of the instructions which modify the hi register: mult, multu, div, divu, mthi, dmult, dmultu, ddiv, ddivu. operation: exceptions: none mfhi 0 move from hi 31 25 26 15 16 0 rd 6105 65 6 special mfhi 0 5 11 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 mfhi 32, 64 t: gpr[rd] ? hi user? manual u10504ej7v0um00 473 cpu instruction set details format: mflo rd description: the contents of special register lo are loaded into general purpose register rd . to ensure proper operation in the event of interruptions, the two instructions which follow a mflo instruction may not be any of the instructions which modify the lo register: mult, multu, div, divu, mtlo, dmult, dmultu, ddiv, ddivu. operation: exceptions: none mflo move from lo 0 31 25 26 15 16 0 rd 6105 65 6 special mflo 0 5 11 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 mflo 32, 64 t: gpr[rd] ? lo chapter 16 474 user? manual u10504ej7v0um00 format: mtc0 rt, rd description: the contents of general purpose register rt are loaded into general purpose register rd of cp0. because the contents of the tlb may be altered by this instruction, the operation of load instructions, store instructions, and tlb operations immediately prior to and after this instruction are undefined. if the register manipulated by this instruction is used by an instruction before or after this instruction, place that instruction at an appropriate position by referring to chapter 19 coprocessor 0 hazards . operation: exceptions: coprocessor unusable exception (v r 4300 in 64-/32-bit user and supervisor mode if cp0 is disabled) mtc0 m ove t o rd 11 10 5 31 25 26 20 21 15 16 0 cop0 mt rt 0 655 11 system control coprocessor 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 mtc0 32, 64 t: data ? gpr[rt] t+1: cpr[0, rd] ? data user? manual u10504ej7v0um00 475 cpu instruction set details format: mtcz rt, rd description: the contents of general purpose register rt are loaded into general purpose register rd of cpz. operation: exceptions: coprocessor unusable exception * refer to the table opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . mtcz 11 move to coprocessor z 31 25 26 20 21 15 16 copz mt rt 655 rd 0 5 11 10 0 0 1 0 0 x x * 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 mtcz 32 t: data ? gpr[rt] t+1: cpr[z, rd] ? data 64 t: data ? gpr[rt] 31...0 t+1: if rd 0 = 0 cpr[z, rd 4...1 || 0] ? cpr[z, rd 4...1 || 0] 63...32 || data else cpr[z, rd 4...1 || 0] ? data || cpr[z, rd 4...1 || 0] 31...0 endif chapter 16 476 user? manual u10504ej7v0um00 opcode bit encoding: mtcz move to c oprocessor z mtcz (continued) mtcz 31 30 29 28 27 26 bit # 25 0 mtc0 24 23 22 21 coprocessor number coprocessor sub-opcode opcode 0 1 0 0 0 0 0 0 1 0 0 31 30 29 28 27 26 bit # 25 0 mtc1 24 23 22 21 0 1 0 0 0 1 0 0 1 0 0 31 30 29 28 27 26 bit # 25 0 mtc2 24 23 22 21 0 1 0 0 1 0 0 0 1 0 0 user? manual u10504ej7v0um00 477 cpu instruction set details format: mthi rs description: the contents of general purpose register rs are loaded into special register hi . if the mthi instruction is executed following the mult, multu, div, or divu instruction, the operation is performed normally. however, if the mflo, mfhi, mtlo, or mthi instruction is executed following the mthi instruction, the contents of special register lo are undefined. operation: exceptions: none rs mthi move to hi 21 20 31 25 26 special 6 0 mthi 0 6 5 515 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 mthi 32,64 t?: hi ? undefined t?: hi ? undefined t: hi ? gpr[rs] chapter 16 478 user? manual u10504ej7v0um00 format: mtlo rs description: the contents of general purpose register rs are loaded into special register lo. if the mtlo instruction is executed following the mult, multu, div, or divu instruction, the operation is performed normally. however, if the mflo, mfhi, mtlo, or mthi instruction is executed following the mtlo instruction, the contents of special register hi are undefined. operation: exceptions: none rs mtlo move to lo 21 20 31 25 26 special 6 0 mtlo 0 6 5 515 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 mtlo 32,64 t?: lo ? undefined t?: lo ? undefined t: lo ? gpr[rs] user? manual u10504ej7v0um00 479 cpu instruction set details format: mult rs, rt description: the contents of general purpose registers rs and rt are multiplied, treating both operands as 32-bit signed integers. an integer overflow exception never occurs. in 64-bit mode, the operands must be valid 32-bit, sign-extended values. when the operation completes, the low-order word of the double result is loaded into special register lo , and the high-order word of the double result is loaded into special register hi . in the 64-bit mode, the respective results are sign-extended and stored. if either the two instructions immediately preceding this instruction is the mfhi or mflo instruction, the execution result of the transfer instruction is undefined. to obtain the correct result, insert two or more other instructions in between the mfhi or mflo and mult instruction. 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 mult 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 mult multiply mult chapter 16 480 user? manual u10504ej7v0um00 operation: exceptions: none mult multiply mult (continued) 32 t?: lo ? undefined hi ? undefined t?: lo ? undefined hi ? undefined t: t ? gpr[rs] * gpr[rt] lo ? t 31...0 h i ? t 63...32 64 t?: lo ? undefined hi ? undefined t?: lo ? undefined hi ? undefined t: t ? gpr[rs] 31...0 * gpr[rt] 31...0 lo ? (t 31 ) 32 || t 31...0 hi ? (t 63 ) 32 || t 63...32 user? manual u10504ej7v0um00 481 cpu instruction set details format: multu rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are multiplied, treating both operands as 32-bit unsigned values. an overflow exception never occurs. in 64-bit mode, the operands must be valid 32-bit, sign-extended values. when the operation completes, the low-order word of the doubleword result is loaded into special register lo , and the high-order word of the doubleword result is loaded into special register hi . in 64-bit mode, these results are sign-extended and loaded. if either of the two preceding instructions is mfhi or mflo, the execution results of these transfer instructions are undefined. to obtain the correct result, insert two or more additional instructions in between the mfhi or mflo and mult instructions. multu multiply unsigned 31 25 26 20 21 15 16 0 rs rt 655 65 10 6 special 0 multu 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 multu chapter 16 482 user? manual u10504ej7v0um00 operation: exceptions: none multu multiply unsigned multu (continued) 32 t?: lo ? undefined hi ? undefined t?: lo ? undefined hi ? undefined t: t ? (0 || gpr[rs]) * (0 || gpr[rt]) lo ? t 31...0 hi ? t 63...32 64 t?: lo ? undefined hi ? undefined t?: lo ? undefined hi ? undefined t: t ? (0 || gpr[rs] 31...0 ) * (0 || gpr[rt] 31...0 ) lo ? (t 31 ) 32 || t 31...0 hi ? (t 63 ) 32 || t 63...32 user? manual u10504ej7v0um00 483 cpu instruction set details format: nor rd, rs, rt description: a logical nor operation applied between the contents of general purpose registers rs and rt is executed in bit units. the result is stored in general purpose register rd . operation: exceptions: none nor nor 31 25 26 20 21 15 16 special rs rt 655 rd 0 nor 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 nor 32, 64 t: gpr[rd] ? gpr[rs] nor gpr[rt] chapter 16 484 user? manual u10504ej7v0um00 format: or rd, rs, rt description: a logical or operation applied between the contents of general purpose registers rs and rt is executed in bit unites. the result is stored in general purpose register rd . operation: exceptions: none or or 31 25 26 20 21 15 16 special rs rt 655 rd 0 or 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 or 32, 64 t: gpr[rd] ? gpr[rs] or gpr[rt] user? manual u10504ej7v0um00 485 cpu instruction set details format: ori rt, rs, immediate description: a logical or operation applied between 16-bit zero-extended immediate and the contents of general purpose register rs is executed in bit units. the result is stored in general purpose register rt . operation: exceptions: none 31 25 26 20 21 15 16 0 ori rs rt immediate 655 16 0 0 1 1 0 1 ori or immediate ori 32 t: gpr[rt] ? gpr[rs] 31...16 || (immediate or gpr[rs] 15...0 ) 64 t: gpr[rt] ? gpr[rs] 63...16 || (immediate or gpr[rs] 15...0 ) chapter 16 486 user? manual u10504ej7v0um00 format: sb rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the least-significant byte of register rt is stored in the memory specified by the address. operation: exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception sb store byte 31 25 26 20 21 15 16 0 sb base rt offset 655 16 1 0 1 0 0 0 sb 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] 63?*byte...0 || 0 8*byte storememory (uncached, byte, data, paddr, vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor reverseendian 3 ) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) byte ? vaddr 2...0 xor bigendiancpu 3 data ? gpr[rt] 63?*byte...0 || 0 8*byte storememory (uncached, byte, data, paddr, vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor reverseendian 3 ) byte ? vaddr 2...0 xor bigendiancpu 3 user? manual u10504ej7v0um00 487 cpu instruction set details format: sc rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of general purpose register rt are stored at the memory location specified by the address only when the ll bit is set. if the other processor or device changes the physical address after the previous ll instruction has been executed, or if the eret instruction exists between the ll and sc instructions, the register contents are not stored to the memory, and storing fails. the success or failure of the sc operation is indicated by the contents of general purpose register rt after execution of the instruction. a successful sc instruction sets the contents of general purpose register rt to 1; an unsuccessful sc instruction sets it to 0. the operation of sc is undefined when the address is different from the address used in the last ll instruction. this instruction is available in user mode; it is not necessary for cp0 to be enabled. if either of the low-order two bits of the address is not zero, an address error exception takes place. if this instruction both fails and causes an exception, the exception takes precedence. this instruction is defined to maintain software compatibility with the v r 4400. sc store conditional 31 25 26 20 21 15 16 0 sc base rt offset 655 16 1 1 1 0 0 0 sc chapter 16 488 user? manual u10504ej7v0um00 operation: exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception (continued) sc store conditional sc 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] 31...0 if llbit then storememory (uncached, word, data, paddr, vaddr, data) endif gpr[rt] ? 0 31 || llbit 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] 31...0 if llbit then storememory (uncached, word, data, paddr, vaddr, data) endif gpr[rt] ? 0 63 || llbit user? manual u10504ej7v0um00 489 cpu instruction set details format: scd rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of general purpose register rt are stored at the memory location specified by the address only when the ll bit is set. if another processor or device changes the target address after the previous lld instruction has been executed, or if the eret instruction exists between the lld and scd instructions, the register contents are not stored to the memory, and storing fails. the success or failure of the scd operation is indicated by the contents of general purpose register rt after execution of the instruction. a successful scd instruction sets the contents of general purpose register rt to 1; an unsuccessful scd instruction sets it to 0. the operation of scd is undefined when the address is different from the address used in the last lld. this instruction is available in user mode; it is not necessary for cp0 to be enabled. if either of the low-order three bits of the address is not zero, an address error exception takes place. if this instruction both fails and causes an exception, the exception takes precedence. this instruction is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed in the 32-bit user or supervisor mode, the reserved instruction exception occurs. this instruction is defined to maintain software compatibility with the v r 4400. scd store conditional doubleword 31 25 26 20 21 15 16 0 scd base rt offset 655 16 1 1 1 1 0 0 scd chapter 16 490 user? manual u10504ej7v0um00 operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception reserved instruction exception (32-bit user or supervisor mode) (continued) scd store conditional doubleword scd 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] if llbit then storememory (uncached, doubleword, data, paddr, vaddr, data) endif gpr[rt] ? 0 63 || llbit user? manual u10504ej7v0um00 491 cpu instruction set details format: sd rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of general purpose register rt are stored at the memory location specified by the address. if either of the low-order three bits of the address are not zero, an address error exception occurs. this operation is defined for the v r 4300 operating in 64-bit mode and in 32-bit kernel mode. execution of this instruction in 32-bit user or supervisor mode causes a reserved instruction exception. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. sd store doubleword 31 25 26 20 21 15 16 0 sd base rt offset 655 16 1 1 1 1 1 1 sd 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] storememory (uncached, doubleword, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] storememory (uncached, doubleword, data, paddr, vaddr, data) chapter 16 492 user? manual u10504ej7v0um00 exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception reserved instruction exception (32-bit user or supervisor mode) sd store doubleword sd (continued) user? manual u10504ej7v0um00 493 cpu instruction set details format: sdcz rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. register rt of coprocessor unit z sources a doubleword, which the processor writes to the addressed memory location. the stored data is defined by individual coprocessor specifications. if any of the low-order three bits of the address is not zero, an address error exception takes place. this instruction is not valid for use with cp0. when the cp1 is specified, the fr bit of the status register equals 0, and the least- significant bit in the rt field is not 0, the operation of this instruction is undefined. if the fr bit equals 1, both odd and even registers can be specified by rt . * refer to the table, opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . sdcz store doubleword 31 25 26 20 21 15 16 0 sdcz base rt offset 655 16 1 1 1 1 x x * sdcz from coprocessor z chapter 16 494 user? manual u10504ej7v0um00 operation: exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception coprocessor unusable exception opcode bit encoding: (continued) sdcz store doubleword sdcz from coprocessor z 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr(rt), storememory (uncached, doubleword, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr(rt), storememory (uncached, doubleword, data, paddr, vaddr, data) sdcz 31 30 29 28 27 26 bit # 0 sdc1 coprocessor number opcode 1 1 1 1 0 1 31 30 29 28 27 26 bit # 0 sdc2 1 1 1 1 1 0 user? manual u10504ej7v0um00 495 cpu instruction set details format: sdl rt, offset(base) description: this instruction is used in combination with the sdr instruction to store the doubleword data in the register to the doubleword in the memory that is not at the doubleword boundary. the sdl instruction stores the high-order portion of the data to the memory, while the sdr instruction stores the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address. of the doubleword data in the memory whose most-significant byte is specified by the generated address, only the high- order portion of general purpose register rt is stored to the memory at the same doubleword boundary as the target address. depending on the address specified, the number of bytes to be stored changes from 1 to 8. in other words, first the most-significant byte position of general purpose register rt is stored to the bytes in the addressed memory. if there is data of the low-order byte that follows the same doubleword boundary, the operation to store this data to the next byte of the memory is repeated. sdl store doubleword left 31 25 26 20 21 15 16 0 sdl base rt offset 655 16 1 0 1 1 0 0 sdl 14 sdl $24,1($0) after address 0 address 8 memory register $24 (big-endian) before 1 0234567 9 8 101112131415 abcdefgh address 0 address 8 0 9 81011121315 bcdefg a storing storing chapter 16 496 user? manual u10504ej7v0um00 the address error exception does not occur even if the specified address is not located at the doubleword boundary. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed in the 32-bit user or supervisor mode, the reserved instruction exception occurs. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. sdl store doubleword left sdl (continued) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr 31...3 || 0 3 endif byte ? vaddr 2...0 xor bigendiancpu 3 data ? 0 56?*byte || gpr[rt] 63...56?*byte storememory (uncached, byte, data, paddr, vaddr, data) user? manual u10504ej7v0um00 497 cpu instruction set details the relationships between the addresses given to the sdl instruction and the result (bytes for doubleword in the memory) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception reserved instruction exception (32-bit u ser or supervisor mode) sdl store doubleword left sdl (continued) sdl acd b register ikl j memory egh f mop n 0 i j klmnoa 0 0 7 ab cdefgh 7 0 0 1ijklmnab106iabcdefg601 2 i j klmabc 2 0 5 i j abcde f 5 0 2 3 i j klabcd 3 0 4 i j kabcde 4 0 3 4 i j k ab cde 4 0 3 i j k l abc d 3 0 4 5ijabcdef502ijklmabc205 6iabcdefg601ijklmnab106 7abcdefgh700ijklmnoa007 offset bigendiancpu = 1 bigendiancpu = 0 offset lem bem lem bem vaddr 2...0 type destination destination type chapter 16 498 user? manual u10504ej7v0um00 format: sdr rt, offset(base) description: this instruction is used in combination with the sdl instruction to store the doubleword data in the register to the word data in the memory that is not at the doubleword boundary. the sdl instruction stores the high-order portion of the data to the memory, while the sdr instruction stores the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address. of the doubleword data in the memory whose least-significant byte is specified by the generated address, only the low- order portion of general purpose register rt is stored to the memory at the same doubleword boundary as the target address. depending on the address specified, the number of bytes to be stored changes from 1 to 8. in other words, first the least-significant byte position of general purpose register rt is stored to the bytes in the addressed memory. if there is data of the high-order byte that follows the same doubleword boundary, the operation to store this data to the next byte of the memory is repeated. 31 25 26 20 21 15 16 0 sdr base rt offset 655 16 1 0 1 1 0 1 sdr store doubleword right sdr sdr $24,10($0) after a address 0 address 8 register $24 (big-endian) before bcdef gh memory address 0 address 8 1 0234567 9 8 101112131415 456 7 9 8 101112131415 ef gh storing storing user? manual u10504ej7v0um00 499 cpu instruction set details the address error exception does not occur even if the specified address is not located at the doubleword boundary. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed in the 32-bit user or supervisor mode, the reserved instruction exception occurs. operation: remark in the 32-bit kernel mode, the high-order 32 bits are ignored during virtual address creation. sdr store doubleword right sdr (continued) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data paddr ? paddr psize ?1...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr psize ?1...3 || 0 3 endif byte ? vaddr 2...0 xor bigendiancpu 3 data ? gpr[rt] 63?*byte || 0 8*byte storememory (uncached, doubleword-byte, data, paddr, vaddr, data) chapter 16 500 user? manual u10504ej7v0um00 the relationships between the addresses given to the sdr instruction and the result (bytes for doubleword in the memory) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception reserved instruction exception (32-bit user or supervisor mode) sdr store doubleword right sdr (continued) sdr acd b register ikl j memory egh f mop n 0abcdefgh700hjklmnop070 1bcdefghp610ghklmnop160 2cdefghop520fghlmnop250 3defghnop430efghmnop340 4efghmnop340defghnop430 5 f ghlmnop 2 5 0 cdefghop 5 2 0 6ghklmnop160bcdefghp610 7 h j klmnop 0 7 0 ab cdefgh 7 0 0 offset bigendiancpu = 1 bigendiancpu = 0 offset lem bem lem bem vaddr 2...0 type destination destination type user? manual u10504ej7v0um00 501 cpu instruction set details format: sh rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the least-significant halfword of register rt is stored in the memory specified by the address. if the least-significant bit of the address is not zero, an address error exception occurs. operation: sh store halfword 31 25 26 20 21 15 16 0 sh base rt offset 655 16 1 0 1 0 0 1 sh 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) data ? gpr[rt] 63?*byte...0 || 0 8*byte storememory (uncached, halfword, data, paddr, vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] byte ? vaddr 2...0 xor (bigendiancpu 2 || 0) data ? gpr[rt] 63?*byte...0 || 0 8*byte storememory (uncached, halfword, data, paddr, vaddr, data) (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian 2 || 0)) chapter 16 502 user? manual u10504ej7v0um00 exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception sh store halfword sh (continued) user? manual u10504ej7v0um00 503 cpu instruction set details format: sll rd, rt, sa description: the contents of general purpose register rt are shifted left by sa bits, inserting zeros into the low-order bits. the result is stored in general purpose register rd. in the 64-bit mode, the value resulting from sign-extending the shifted 32-bit value is stored as a result. if the shift value is 0, the low-order 32 bits of the 64- bit value is sign-extended. this instruction can generate a 64-bit value that sign- extends a 32-bit value. operation: exceptions: none caution if the shift value of this instruction is 0, the assembler may treats this instruction as nop. when using this instruction for sign extension, check the specifications of the assembler. sll shift left logical 31 25 26 20 21 15 16 special 0 rt 655 rd sa sll 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 sll 0 0 0 0 0 32 t: gpr[rd] ? gpr[rt] 31?sa...0 || 0 sa 64 t: s ? 0 || sa temp ? gpr[rt] 31-s...0 || 0 s gpr[rd] ? (temp 31 ) 32 || temp chapter 16 504 user? manual u10504ej7v0um00 format: sllv rd, rt, rs description: the contents of general purpose register rt are shifted left the number of bits specified by the low-order five bits of the contents of the general purpose register rs , inserting zeros into the low-order bits. the result is stored in general purpose register rd . in the 64-bit mode, the value resulting from sign-extending the shifted 32-bit value is stored as a result. if the shift value is 0, the low-order 32 bits of the 64-bit value is sign-extended. this instruction can generate a 64-bit value that sign-extends a 32-bit value. operation: exceptions: none caution if the shift value of this instruction is 0, the assembler may treats this instruction as nop. when using this instruction for sign extension, check the specifications of the assembler. sllv shift left logical variable 31 25 26 20 21 15 16 special rt 655 rd 0 sllv 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 sllv rs 32 t: s ? gpr[rs] 4...0 gpr[rd] ? gpr[rt] ( 31?)...0 || 0 s 64 t: s ? 0 || gpr[rs] 4...0 temp ? gpr[rt] (31?)...0 || 0 s gpr[rd] ? (temp 31 ) 32 || temp user? manual u10504ej7v0um00 505 cpu instruction set details format: slt rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs . assuming these register contents as signed integers, if the contents of general purpose register rs are less than the contents of general purpose register rt , one is stored in the general purpose register rd ; otherwise zero is stored in the general purpose register rd . an integer overflow exception never occurs. the comparison is valid even if the subtraction used during the comparison overflows. operation: exceptions: none slt set on less than 31 25 26 20 21 15 16 special rs rt 655 rd 0 slt 55 6 11 10 6 5 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 slt 32 t: if gpr[rs] < gpr[rt] then gpr[rd] ? 0 31 || 1 else gpr[rd] ? 0 32 endif 64 t: if gpr[rs] < gpr[rt] then gpr[rd] ? 0 63 || 1 else gpr[rd] ? 0 64 endif chapter 16 506 user? manual u10504ej7v0um00 format: slti rt, rs, immediate description: the 16-bit immediate is sign-extended and subtracted from the contents of general purpose register rs. assuming these values are signed integers, if rs contents are less than the sign-extended immediate , one is stored in the general purpose register rt ; otherwise zero is stored in the general purpose register rt . an integer overflow exception never occurs. the comparison is valid even if the subtraction overflows. operation: exceptions: none slti set on less than immediate 31 25 26 20 21 15 16 0 slti rs rt immediate 655 16 0 0 1 0 1 0 slti 32 t: if gpr[rs] < (immediate 15 ) 16 || immediate 15...0 then gpr[rt] ? 0 31 || 1 else gpr[rt] ? 0 32 endif 64 t: if gpr[rs] < (immediate 15 ) 48 || immediate 15...0 then gpr[rt] ? 0 63 || 1 else gpr[rt] ? 0 64 endif user? manual u10504ej7v0um00 507 cpu instruction set details format: sltiu rt, rs, immediate description: the 16-bit immediate is sign-extended and subtracted from the contents of general purpose register rs. assuming these values are unsigned integers, if rs contents are less than the sign-extended immediate , one is stored in the general purpose register rt ; otherwise zero is stored in the general purpose register rt . an integer overflow exception never occurs. the comparison is valid even if the subtraction overflows. operation: exceptions: none sltiu immediate unsigned set on less than 31 25 26 20 21 15 16 0 sltiu rs rt immediate 655 16 0 0 1 0 1 1 sltiu 32 t: if (0 || gpr[rs]) < (immediate 15 ) 16 || immediate 15...0 then gpr[rt] ? 0 31 || 1 else gpr[rt] ? 0 32 endif 64 t: if (0 || gpr[rs]) < (immediate 15 ) 48 || immediate 15...0 then gpr[rt] ? 0 63 || 1 else gpr[rt] ? 0 64 endif chapter 16 508 user? manual u10504ej7v0um00 format: sltu rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs. assuming these values are unsigned integers, if the contents of general purpose register rs are less than the contents of general purpose register rt , one is stored in the general purpose register rd ; otherwise zero is stored in the general purpose register rd . an integer overflow exception never occurs. the comparison is valid even if the subtraction overflows. operation: exceptions: none sltu set on less than unsigned 31 25 26 20 21 15 16 special rs rt 655 rd 0 sltu 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 sltu 32 t: if (0 || gpr[rs]) < 0 || gpr[rt] then gpr[rd] ? 0 31 || 1 else gpr[rd] ? 0 32 endif 64 t: if (0 || gpr[rs]) < 0 || gpr[rt] then gpr[rd] ? 0 63 || 1 else gpr[rd] ? 0 64 endif user? manual u10504ej7v0um00 509 cpu instruction set details format: sra rd, rt, sa description: the contents of general purpose register rt are shifted right by sa bits, inserting signed bits into the high-order bits. the result is stored in the general purpose register rd . in 64-bit mode, the sign-extended 32-bit value is stored as the result. operation: exceptions: none sra shift right arithmetic 31 25 26 20 21 15 16 special 0 rt 655 rd sa sra 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 sra 32 t: gpr[rd] ? (gpr[rt] 31 ) sa || gpr[rt] 31...sa 64 t: s ? 0 || sa temp ? (gpr[rt] 31 ) s || gpr[rt] 31...s gpr[rd] ? (temp 31 ) 32 || temp chapter 16 510 user? manual u10504ej7v0um00 format: srav rd, rt, rs description: the contents of general purpose register rt are shifted right by the number of bits specified by the low-order five bits of general purpose register rs , sign-extending the high-order bits. the result is stored in the general purpose register rd . in 64- bit mode, the sign-extended 32-bit value is stored as the result. operation: exceptions: none srav shift right 31 25 26 20 21 15 16 special rs rt 655 rd 0 srav 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 srav arithmetic variable 32 t: s ? gpr[rs] 4...0 gpr[rd] ? (gpr[rt] 31 ) s || gpr[rt] 31...s 64 t: s ? gpr[rs] 4...0 temp ? (gpr[rt] 31 ) s || gpr[rt] 31...s gpr[rd] ? (temp 31 ) 32 || temp user? manual u10504ej7v0um00 511 cpu instruction set details format: srl rd, rt, sa description: the contents of general purpose register rt are shifted right by sa bits, inserting zeros into the high-order bits. the result is stored in the general purpose register rd . in 64-bit mode, the sign-extended 32-bit value is stored as the result. operation: exceptions: none srl shift right logical 31 25 26 20 21 15 16 special rt 655 rd sa srl 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 1 0 srl 0 0 0 0 0 0 32 t: gpr[rd] ? 0 sa || gpr[rt] 31...sa 64 t: s ? 0 || sa temp ? 0 s || gpr[rt] 31...s gpr[rd] ? (temp 31 ) 32 || temp chapter 16 512 user? manual u10504ej7v0um00 format: srlv rd, rt, rs description: the contents of general purpose register rt are shifted right by the number of bits specified by the low-order five bits of general purpose register rs, inserting zeros into the high-order bits. the result is stored in the general purpose register rd . in 64-bit mode, the sign-extended 32-bit value is stored as the result. operation: exceptions: none srlv shift right logical variable 31 25 26 20 21 15 16 special rs rt 655 rd 0 srlv 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 srlv 32 t: s ? gpr[rs] 4...0 gpr[rd] ? 0 s || gpr[rt] 31...s 64 t: s ? gpr[rs] 4...0 temp ? 0 s || gpr[rt] 31...s gpr[rd] ? (temp 31 ) 32 || temp user? manual u10504ej7v0um00 513 cpu instruction set details format: sub rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs , and result is stored into general purpose register rd. in 64-bit mode, the sign-extended 32-bit values is stored as the result. an integer overflow exception occurs if the carries out of bits 30 and 31 differ (2? complement overflow). the destination register rd is not modified when an integer overflow exception occurs. operation: exceptions: integer overflow exception sub sub subtract 31 25 26 20 21 15 16 special rs rt 655 rd 0 sub 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 32 t: gpr[rd] ? gpr[rs] ?gpr[rt] 64 t: temp ? gpr[rs] ?gpr[rt] gpr[rd] ? (temp 31 ) 32 || temp 31...0 chapter 16 514 user? manual u10504ej7v0um00 format: subu rd, rs, rt description: the contents of general purpose register rt are subtracted from the contents of general purpose register rs and the result is stored in general purpose register rd . in 64-bit mode, the sign-extended 32-bit values is stored as the result. the only difference between this instruction and the sub instruction is that subu never causes an integer overflow exception. operation: exceptions: none subu subtract unsigned 31 25 26 20 21 15 16 special rs rt 655 rd 0 subu 55 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 subu 32 t: gpr[rd] ? gpr[rs] ?gpr[rt] 64 t: temp ? gpr[rs] ?gpr[rt] gpr[rd] ? (temp 31 ) 32 || temp 31...0 user? manual u10504ej7v0um00 515 cpu instruction set details format: sw rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of general purpose register rt are stored in the memory location specified by the address. if either of the low- order two bits of the address are not zero, an address error exception occurs. operation: exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception sw store word 31 25 26 20 21 15 16 0 sw base rt offset 655 16 1 0 1 0 1 1 sw 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] 31...0 storememory (uncached, word, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? gpr[rt] 31...0 storememory (uncached, word, data, paddr, vaddr, data) chapter 16 516 user? manual u10504ej7v0um00 format: swcz rt, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. coprocessor register rt of the cpz is stored in the addressed memory. the data to be stored is defined by individual coprocessor specifications. this instruction is not valid for use with cp0. if either of the low-order two bits of the address is not zero, an address error exception occurs. operation: * refer to the table opcode bit encoding on next page, or 16.7 cpu instruction opcode bit encoding . swcz store word from coprocessor z 31 25 26 20 21 15 16 0 swcz base rt offset 655 16 1 1 1 0 x x * swcz 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) data ? copzsw (byte, rt) storememory (uncached, word, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize-1...3 || (paddr 2...0 xor (reverseendian || 0 2 )) byte ? vaddr 2...0 xor (bigendiancpu || 0 2 ) data ? copzsw (byte,rt) storememory (uncached, word, data, paddr, vaddr data) user? manual u10504ej7v0um00 517 cpu instruction set details exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception coprocessor unusable exception opcode bit encoding: swcz store word from coprocessor z swcz (continued) swcz 31 30 29 28 27 26 bit # 0 swc1 coprocessor number opcode 1 1 1 0 0 1 31 30 29 28 27 26 bit # 0 swc2 1 1 1 0 1 0 chapter 16 518 user? manual u10504ej7v0um00 format: swl rt, offset(base) description: this instruction is used in combination with the swr instruction to store the word in the register to the word in the memory that is not at the word boundary. the swl instruction stores the high-order portion of the data to the memory, while the swr instruction stores the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address. of the word data in the memory whose most-significant byte is specified by the generated address, only the high-order portion of general purpose register rt is stored to the memory at the same word boundary as the target address. depending on the address specified, the number of bytes to be stored changes from 1 to 4. in other words, first the most-significant byte position of general purpose register rt is stored to the bytes in the addressed memory. if there is data of the low-order byte that follows the same word boundary, the operation to store this data to the next byte of the memory is repeated. no address exceptions occur due to the specified address which is not located at the word boundary. swl store word left 31 25 26 20 21 15 16 0 swl base rt offset 655 16 1 0 1 0 1 0 swl address 0 address 4 0123 4567 abcd register address 0 address 4 0 4567 abc $24 memory (big-endian) before after swl $24,1($0) storing storing user? manual u10504ej7v0um00 519 cpu instruction set details operation: swl store word left swl (continued) 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr 31...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 if (vaddr 2 xor bigendiancpu) = 0 then data ? 0 32 || 0 24-8*byte || gpr[rt] 31...24-8*byte else data ? 0 24-8*byte || gpr[rt] 31...24-8*byte || 0 32 endif storememory (uncached, byte, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr 31...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr 31...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 if (vaddr 2 xor bigendiancpu) = 0 then data ? 0 32 || 0 24-8*byte || gpr[rt] 31...24-8*byte else data ? 0 24-8*byte || gpr[rt] 31...24-8*byte || 0 32 endif storememory (uncached, byte, data, paddr, vaddr, data) chapter 16 520 user? manual u10504ej7v0um00 the relationships between the contents given to the swl instruction and the result (bytes for words in the memory) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception swl store word left swl (continued) swl acd b register ikl j memory egh f mop n 0ijklmnoe007efghmnop340 1ijklmnef106iefgmnop241 2ijklmefg205ijefmnop142 3 i j klefgh 3 0 4 i j kemnop 0 4 3 4ijkemnop043ijklefgh304 5ijefmnop142ijklmefg205 6iefgmnop241ijklmnef106 7efghmnop340ijklmnoe007 offset bigendiancpu = 1 bigendiancpu = 0 offset lem bem lem bem vaddr 2...0 type destination destination type user? manual u10504ej7v0um00 521 cpu instruction set details format: swr rt, offset(base) description: this instruction is used in combination with the swl instruction to store word data in the register to the word data in the memory that is not at the word boundary. the swl instruction stores the high-order portion of the data to the memory, while the swr instruction stores the low-order portion. the 16-bit offset is sign-extended and added to the contents of general purpose register base to generate a virtual address. of the word data in the memory whose least-significant byte is specified by the generated address, only the low-order portion of general purpose register rt is stored to the memory at the same word boundary as the target address. depending on the address specified, the number of bytes to be stored changes from 1 to 4. in other words, first the least-significant byte position of general purpose register rt is stored to the bytes in the addressed memory. if there is data of the high-order byte that follows the same word boundary, the operation to store this data to the next byte of the memory is repeated. no address exceptions occur due to the specified address which is not located at the word boundary. 31 25 26 20 21 15 16 0 swr base rt offset 655 16 1 0 1 1 1 0 swr store word right swr address 0 address 4 0123 4567 abcd register address 0 address 4 0 d5 6 7 123 $24 memory (big-endian) before after swr $24,4($0) storing storing chapter 16 522 user? manual u10504ej7v0um00 operation: swr store word right swr (continued) 32 t: vaddr ? ((offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?1...3 || (paddr 2...0 xor reverseendian 3 ) bigendianmem = 0 then paddr ? paddr 31...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 if (vaddr 2 xor bigendiancpu) = 0 then data ? 0 32 || gpr[rt] 31-8*byte...0 || 0 8*byte else data ? gpr[rt] 31-8*byte...0 || 0 8*byte || 0 32 endif storememory (uncached, word-byte, data, paddr, vaddr, data) 64 t: vaddr ? ((offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) paddr ? paddr psize ?1...3 || (paddr 2...0 xor reverseendian 3 ) if bigendianmem = 0 then paddr ? paddr 31...2 || 0 2 endif byte ? vaddr 1...0 xor bigendiancpu 2 if (vaddr 2 xor bigendiancpu) = 0 then data ? 0 32 || gpr[rt] 31-8*byte...0 || 0 8*byte else data ? gpr[rt] 31-8*byte...0 || 0 8*byte || 0 32 endif storememory (uncached, word-byte, data, paddr, vaddr, data) user? manual u10504ej7v0um00 523 cpu instruction set details the relationships between the register contents given to the swr instruction and the result (bytes for words in the memory) are shown below: remark type: access type output to memory (refer to figure 3-2 byte access within a doubleword .) offset: paddr 2...0 output to memory lem little-endian memory (bigendianmem = 0) bem big-endian memory (bigendianmem = 1) exceptions: tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception swr store word right swr (continued) swr acd b register ikl j memory egh f mop n 0ijklefgh304hjklmnop070 1ijklfghp214ghklmnop160 2ijklghop124fghlmnop250 3ijklhnop034efghmnop340 4efghmnop340ijklhnop034 5fghlmnop250ijklghop124 6ghklmnop160ijklfghp214 7hjklmnop070ijklefgh304 offset bigendiancpu = 1 bigendiancpu = 0 offset lem bem lem bem vaddr 2...0 type destination destination type chapter 16 524 user? manual u10504ej7v0um00 format: sync description: the sync instruction is executed as a nop on the v r 4300. this operation maintains compatibility with code that conforms to the v r 4400. this instruction is defined to maintain software compatibility with the v r 4400. operation: exceptions: none sync synchronize 31 25 26 special 620 0 sync 6 65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 sync 32, 64 t: syncoperation () user? manual u10504ej7v0um00 525 cpu instruction set details format: syscall description: a system call exception occurs after this instruction is executed, unconditionally transferring control to the exception handler. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: system call exception system call 31 25 26 special 620 code syscall 6 65 0 0 0 0 0 0 0 0 0 1 1 0 0 syscall syscall 32, 64 t: systemcallexception chapter 16 526 user? manual u10504ej7v0um00 format: teq rs, rt description: the contents of general purpose register rt are compared with general purpose register rs . if the contents of general purpose register rs are equal to the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception trap if equal 31 25 26 20 21 15 16 special rs rt 655 code teq 10 6 65 0 0 0 0 0 0 0 1 1 0 1 0 0 teq teq 32, 64 t: if gpr[rs] = gpr[rt] then trapexception endif user? manual u10504ej7v0um00 527 cpu instruction set details format: teqi rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . if the contents of general purpose register rs are equal to the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception teqi trap if equal immediate 31 25 26 20 21 15 16 regimm rs 655 immediate teqi 16 0 0 0 0 0 0 1 0 1 1 0 0 teqi 32 t: if gpr[rs] = (immediate 15 ) 16 || immediate 15...0 then trapexception endif 64 t: if gpr[rs] = (immediate 15 ) 48 || immediate 15...0 then trapexception endif chapter 16 528 user? manual u10504ej7v0um00 format: tge rs, rt description: the contents of general purpose register rt are compared with the contents of general purpose register rs . assuming both register contents are signed integers, if the contents of general purpose register rs are greater than or equal to the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception tge trap if greater than or equal 31 25 26 20 21 15 16 special rs rt 655 code tge 10 6 65 0 0 0 0 0 0 0 1 1 0 0 0 0 tge 32, 64 t: if gpr[rs] 3 gpr[rt] then trapexception endif user? manual u10504ej7v0um00 529 cpu instruction set details format: tgei rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . assuming both values are signed integers, if the contents of general purpose register rs are greater than or equal to the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception tgei trap if greater than or equal immediate 31 25 26 20 21 15 16 regimm rs 655 immediate tgei 16 0 0 0 0 0 0 1 0 1 0 0 0 tgei 32 t: if gpr[rs] 3 (immediate 15 ) 16 || immediate 15...0 then trapexception endif 64 t: if gpr[rs] 3 (immediate 15 ) 48 || immediate 15...0 then trapexception endif chapter 16 530 user? manual u10504ej7v0um00 format: tgeiu rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . assuming both values are unsigned integers, if the contents of general purpose register rs are greater than or equal to the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception tgeiu trap if greater than or equal 31 25 26 20 21 15 16 regimm rs 655 immediate tgeiu 16 0 immediate unsigned 0 0 0 0 0 1 0 1 0 0 1 tgeiu 32 t: if (0 || gpr[rs]) 3 (0 || (immediate 15 ) 16 || immediate 15...0 ) then trapexception endif 64 t: if (0 || gpr[rs]) 3 (0 || (immediate 15 ) 48 || immediate 15...0 ) then trapexception endif user? manual u10504ej7v0um00 531 cpu instruction set details format: tgeu rs, rt description: the contents of general purpose register rt are compared with the contents of general purpose register rs . assuming both values are unsigned integers, if the contents of general purpose register rs are greater than or equal to the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception tgeu trap if greater than or equal unsigned 31 25 26 20 21 15 16 special rs rt 655 code tgeu 10 6 65 0 0 0 0 0 0 0 1 1 0 0 0 1 tgeu 32, 64 t: if (0 || gpr[rs]) 3 (0 || gpr[rt]) then trapexception endif chapter 16 532 user? manual u10504ej7v0um00 format: tlbp description: searches a tlb entry that matches with the contents of the entry hi register and sets the number of that tlb entry to the index register. if a tlb entry that matches is not found, sets the most significant bit of the index register. the architecture does not specify the operation of memory references associated with the instruction immediately after a tlbp instruction, nor is the operation specified if more than one tlb entry matches. operation: exceptions: coprocessor unusable exception tlbp probe tlb for matching entry 0 6 6 5 31 25 24 26 cop0 6 0 tlbp 19 1 co 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 tlbp 32 t: index ? 1 || 0 25 || undefined 6 for i in 0...tlbentries? if (tlb[i] 95...77 = entryhi 31...13) and (tlb[i] 76 or (tlb[i] 71...64 = entryhi 7...0 )) then index ? 0 26 || i 5...0 endif endfor 64 t: index ? 1 || 0 25 || undefined 6 for i in 0...tlbentries? if (tlb[i] 167...141 and not (0 15 || tlb[i] 216...205 )) = (entryhi 39...13 and not (0 15 || tlb[i] 216...205 )) and (tlb[i] 140 or (tlb[i] 135...128 = entryhi 7...0 )) then index ? 0 26 || i 5...0 endif endfor user? manual u10504ej7v0um00 533 cpu instruction set details format: tlbr description: the entryhi and entrylo registers are loaded with the contents of the tlb entry pointed at by the contents of the index register. the g bit (which controls asid matching) read from the tlb is written into both of the entrylo0 and entrylo1 registers. the operation is invalid if the contents of the index register are greater than the number of tlb entries in the processor. operation: exceptions: coprocessor unusable exception tlbr read indexed tlb entry 0 6 6 5 31 25 24 26 cop0 6 0 tlbr 19 1 co 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 tlbr 32 t: pagemask ? tlb[index 5...0 ] 127...96 entryhi ? tlb[index 5...0 ] 95...64 and not tlb[index 5...0 ] 127...96 entrylo1 ? tlb[index 5...0 ] 63...33 || tlb[index 5...0 ] 76 entrylo0 ? tlb[index 5...0 ] 31...1 || tlb[index 5...0 ] 76 64 t: pagemask ? tlb[index 5...0 ] 255...192 entryhi ? tlb[index 5...0 ] 191...128 and not tlb[index 5...0 ] 255...192 entrylo1 ? tlb[index 5...0 ] 127...65 || tlb[index 5...0 ] 140 entrylo0 ? tlb[index 5...0 ] 63...1 || tlb[index 5...0 ] 140 chapter 16 534 user? manual u10504ej7v0um00 format: tlbwi description: the tlb entry pointed at by the index register is loaded with the contents of the entryhi and entrylo registers. the g bit of the tlb is written with the logical and of the g bits in the entrylo0 and entrylo1 registers. the operation is invalid if the contents of the index register are greater than the number of tlb entries in the processor. operation: exceptions: coprocessor unusable exception tlbwi write indexed tlb entry 0 6 6 5 31 25 24 26 cop0 6 0 tlbwi 19 1 co 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 tlbwi 32, 64 t: tlb[index 5...0 ] ? pagemask || (entryhi and not pagemask) || entrylo1 || entrylo0 user? manual u10504ej7v0um00 535 cpu instruction set details format: tlbwr description: the tlb entry pointed at by the random register is loaded with the contents of the entryhi and entrylo registers. the g bit of the tlb is written with the logical and of the g bits in the entrylo0 and entrylo1 registers. operation: exceptions: coprocessor unusable exception tlbwr write random tlb entry tlbwr 0 6 6 5 31 25 24 26 cop0 6 0 tlbwr 19 1 co 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 32, 64 t: tlb[random 5...0 ] ? pagemask || (entryhi and not pagemask) || entrylo1 || entrylo0 chapter 16 536 user? manual u10504ej7v0um00 format: tlt rs, rt description: the contents of general purpose register rt are compared with general purpose register rs . assuming both values are signed integers, if the contents of general purpose register rs are less than the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception tlt trap if less than 31 25 26 20 21 15 16 special rs rt 655 code tlt 10 6 65 0 0 0 0 0 0 0 1 1 0 0 1 0 tlt 32, 64 t: if gpr[rs] < gpr[rt] then trapexception endif user? manual u10504ej7v0um00 537 cpu instruction set details format: tlti rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . assuming both values are signed integers, if the contents of general purpose register rs are less than the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception tlti trap if less than immediate 31 25 26 20 21 15 16 regimm rs 655 immediate tlti 16 0 0 0 0 0 0 1 0 1 0 1 0 tlti 32 t: if gpr[rs] < (immediate 15 ) 16 || immediate 15...0 then trapexception endif 64 t: if gpr[rs] < (immediate 15 ) 48 || immediate 15...0 then trapexception endif chapter 16 538 user? manual u10504ej7v0um00 format: tltiu rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . assuming both values are unsigned integers, if the contents of general purpose register rs are less than the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception tltiu trap if less than immediate unsigned 31 25 26 20 21 15 16 regimm rs 655 immediate tltiu 16 0 0 0 0 0 0 1 0 1 0 1 1 tltiu 32 t: if (0 || gpr[rs]) < (0 || (immediate 15 ) 16 || immediate 15...0 ) then trapexception endif 64 t: if (0 || gpr[rs]) < (0 || (immediate 15 ) 48 || immediate 15...0 ) then trapexception endif user? manual u10504ej7v0um00 539 cpu instruction set details format: tltu rs, rt description: the contents of general purpose register rt are compared with general purpose register rs . assuming both values are unsigned integers, if the contents of general purpose register rs are less than the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception tltu trap if less than unsigned 31 25 26 20 21 15 16 special rs rt 655 code tltu 10 6 65 0 0 0 0 0 0 0 1 1 0 0 1 1 tltu 32, 64 t: if (0 || gpr[rs]) < (0 || gpr[rt]) then trapexception endif chapter 16 540 user? manual u10504ej7v0um00 format: tne rs, rt description: the contents of general purpose register rt are compared with general purpose register rs . if the contents of general purpose register rs are not equal to the contents of general purpose register rt , a trap exception occurs. a parameter can be sent to the exception handler by using the code area. if the exception handler uses this parameter, the contents of the memory word including the instruction must be loaded as data. operation: exceptions: trap exception tne trap if not equal 31 25 26 20 21 15 16 special rs rt 655 code tne 10 6 65 0 0 0 0 0 0 0 1 1 0 1 1 0 tne 32, 64 t: if gpr[rs] 1 gpr[rt] then trapexception endif user? manual u10504ej7v0um00 541 cpu instruction set details format: tnei rs, immediate description: the 16-bit immediate is sign-extended and compared with the contents of general purpose register rs . if the contents of general purpose register rs are not equal to the sign-extended immediate , a trap exception occurs. operation: exceptions: trap exception tnei trap if not equal immediate 31 25 26 20 21 15 16 regimm rs 655 immediate tnei 16 0 0 0 0 0 0 1 0 1 1 1 0 tnei 32 t: if gpr[rs] 1 (immediate 15 ) 16 || immediate 15...0 then trapexception endif 64 t: if gpr[rs] 1 (immediate 15 ) 48 || immediate 15...0 then trapexception endif chapter 16 542 user? manual u10504ej7v0um00 format: xor rd, rs, rt description: the contents of general purpose register rs and the contents of general purpose register rt are logical exclusive ored bit-wise. the result is stored into general purpose register rd. operation: exceptions: none xor exclusive or 31 25 26 20 21 15 16 special rs rt 655 rd 0 xor 55 6 11 10 6 5 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 xor 32, 64 t: gpr[rd] ? gpr[rs] xor gpr[rt] user? manual u10504ej7v0um00 543 cpu instruction set details format: xori rt, rs, immediate description: the 16-bit zero-extended immediate and the contents of general purpose register rs are logical exclusive ored bit-wise. the result is stored in general purpose register rt. operation: exceptions: none xori exclusive or immediate 31 25 26 20 21 15 16 0 xori rs rt immediate 655 16 0 0 1 1 1 0 xori 32 t: gpr[rt] ? gpr[rs] xor (0 16 || immediate) 64 t: gpr[rt] ? gpr[rs] xor (0 48 || immediate) chapter 16 544 user? manual u10504ej7v0um00 16.7 cpu instruction opcode bit encoding figure 16-1 lists the v r 4300 opcode bit encoding. figure 16-1 v r 4300 opcode bit encoding (1/2) special addi cop0 daddi e daddiu e ldl e ldr e **** beql bnel blezl bgtzl lb sb cache lwu e * ll ldc1 ldc2 ld e sc sdc1 sdc2 sd e dsll e * dsrl e dsra e dsll32 e * dsrl 32e dsra32 e tge tgeu tlt tltu teq tne 2...0 regimm rt 18...16 sll jr mfhi mult add slt * dsllv e * dsrlv e dsrav e dmult e dmultu e ddiv e ddivu e dadd e daddu e dsub e dsubu e ** * copz rs special function 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 31...29 0 1 2 3 4 5 6 5...3 0 1 2 3 4 5 6 7 20...19 0 1 2 3 7 28...26 opcode 0 1 2 3 4 5 6 7 syscall break sh swl sw swr lwc1 lwc2 * swc1 swc2 * lh lwl lw lbu lhu lwr srl sra sllv srlv srav jalr mthi mflo mtlo multu div divu addu sub subu and or xor nor sltu cop1 cop2 * addiu slti sltiu andi ori xori lui regimm j jal beq bne blez bgtz * * bltzl tlti bltzall bgezl tltiu bgezall tnei teqi mf 23...21 0 1 2 3 4 5 6 7 25...24 0 1 2 3 cf bc mt ct co dmf eg dmt e g sdl e lld e scd e sdr e ** sync d ggggggg *** * * ** * *** * *** * ** bltz bltzal bgez bgezal tgei tgeiu user? manual u10504ej7v0um00 545 cpu instruction set details figure 16-1 v r 4300 opcode bit encoding (2/2) key: * if the operation code marked with an asterisk is executed with the current v r 4300, the reserved instruction exception occurs. this code is reserved for future expansion. g operation codes marked with a gamma cause a reserved instruction exception. they are reserved for future expansion. d operation codes marked with a delta are valid only for v r 4000 processors with cp0 enabled, and cause a reserved instruction exception on other processors. f operation codes marked with a phi are invalid but do not cause reserved instruction exceptions in v r 4300 operation. x operation codes marked with a xi cause a reserved instruction exception on only v r 4300 processors. c operation codes marked with a chi are valid only on v r 4000 series processors. e the operation code marked with an epsilon is valid in the 64-bit mode and 32-bit kernel mode. in the 32-bit user or supervisor mode, this code generates the reserved instruction exception. bcf 18...16 0 1 2 3 4 5 6 7 20...19 0 1 2 3 bcfl gg g g gg g gg bct bctl gg g gg g g gg g g gg g g gg g g cp0 function 2 ... 0 0 1 2 3 4 5 6 7 5 ... 3 0 1 2 3 tlbwi tlbr tlbwr tlbp x 0 1 2 3 eret c ff f ff f f ff f f f f ff f f ff f f ff f f ff f f ff f f ff f f ff f f ff f ff f f ff f ff f f ff f copz rt 546 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 547 fpu instruction set details 17 this chapter provides a detailed description of each floating-point unit (fpu) instruction in alphabetical order. chapter 17 548 user? manual u10504ej7v0um00 17.1 instruction formats there are three basic instruction format types: i-type, or immediate format, which include load and store instructions r-type, or register format, which include the two- and three- register ?ating-point instructions other, which includes branch, and transfer to and from instructions the instruction description subsections that follow show how these three basic instruction formats are used by: load and store instructions transfer instructions floating-point arithmetic instructions floating-point branch instructions floating-point instructions are mapped onto the mips coprocessor instructions, defining coprocessor unit number one (cp1) as the floating-point unit. each operation is valid only for certain formats. implementations may support some of these formats and operations through emulation, but they only need to support combinations that are valid (marked v in table 17-1). combinations marked r in figure 17-1 are not currently specified by this architecture, and cause an unimplemented instruction exception. they will be available for future extensions of the architecture. user? manual u10504ej7v0um00 549 fpu instruction set details table 17-1 valid fpu instruction formats operation source format single double word longword add v v r r sub v v r r mul v v r r div v v r r sqrt v v r r abs v v r r mov v v neg v v r r trunc.l v v round.l v v ceil.l v v floor.l v v trunc.w v v round.w v v ceil.w v v floor.w v v cvt.s v v v cvt.d v v v cvt.w v v cvt.l v v cvv rr chapter 17 550 user? manual u10504ej7v0um00 the fpu branch instruction can be used with the logic of the condition reversed. to compare all the 32 conditions, therefore, comparison need only be performed 16 times, as shown in table 17-2. table 17-2 logical reverse of predicates by condition true/false remark f: false t: true condition relations invalid operation exception if unordered mnemonic code greater than less than equal unordered true false f t 0 f f f f no un or 1 f f f t no eq neq 2 f f t f no ueq ogl 3 f f t t no olt uge 4 f t f f no ult oge 5 f t f t no ole ugt 6 f t t f no ule ogt 7 f t t t no sf st 8 f f f f yes ngle gle 9 f f f t yes seq sne 10 f f t f yes ngl gl 11 f f t t yes lt nlt 12 f t f f yes nge ge 13 f t f t yes le nle 14 f t t f yes ngt gt 15 f t t t yes user? manual u10504ej7v0um00 551 fpu instruction set details floating-point loads, stores, and transfers all movement of data between the floating-point unit (fpu) and memory is accomplished by unit load and store instructions, which reference the floating- point unit general purpose registers. these instructions are unformatted; no format conversions are performed and, therefore, no floating-point exceptions can occur due to these instructions. data may also be directly moved between the floating-point unit and the processor by move to coprocessor (mtc) and move from coprocessor (mfc) instructions. like the floating-point load and store instructions, these instructions perform no format conversions and never cause floating-point exceptions. in addition, two floating-point control registers can be used as the fpu registers. these registers can support only the ctc1 and cfc1 instructions. floating-point operations the floating-point unit instruction set includes: floating-point add floating-point subtract floating-point multiply floating-point divide floating-point square root convert between ?ed-point and ?ating-point formats convert between ?ating-point formats ?ating-point compare these operations satisfy the requirements of ieee standard 754 requirements for accuracy. specifically, these operations obtain a result which is identical to an infinite-precision result rounded to the specified format, using the current rounding mode. instructions must specify the format of their operands. except for conversion functions, mixed-format operations cannot be performed. chapter 17 552 user? manual u10504ej7v0um00 17.2 instruction notation conventions in this chapter, all variable subfields in an instruction format (such as fs, ft, immediate , and so on) are shown in lowercase. instruction names (such as add, sub, and so on) are shown in uppercase. for the sake of clarity, we sometimes use an alias for a variable subfield in the formats of specific instructions. for example, we use rs = base in the format for load and store instructions. such an alias is always lowercase, since it refers to a variable subfield. in some instructions, the instruction subfields op and function have fixed 6-bit values. these instructions use uppercase mnemonic. for instance, in the floating- point add instruction we use op = cop1 and function = fadd. in other cases, a single field has both fixed and variable subfields, so the name contains both uppercase and lowercase characters. the actual code of all the mnemonics and the codes in the function fields are indicated in 17.6 fpu instruction opcode bit encoding . the operation executed by each instruction by using representation in a high-level language is explained in the description of the operation of each instruction. for the meanings of the special symbols in the description, refer to table 16-1 cpu instruction operation notations . instruction notation examples the following examples illustrate the application of some of the instruction notations: example #1: gpr[rt] ? immediate || 0 16 sixteen zero bits are concatenated with a low-order immediate value (typically 16 bits), and the 32-bit string is assigned to general purpose register rt . example #2: (immediate 15 ) 16 || immediate 15...0 bit 15 (the sign bit) of an immediate value is extended for 16 bit positions, and the result is concatenated with bits 15 through 0 of the immediate value to form a 32-bit sign-extended value. example #3: cpr[1, ft] ? data data is assigned to general purpose register ft of cp1, in other words float- ing-point general purpose register fgr . user? manual u10504ej7v0um00 553 fpu instruction set details 17.3 load and store instructions in the v r 4300 implementation, the instruction immediately following a load may use the contents of the register being loaded. in such cases, the hardware interlocks , by the number of cycles required for reading, so scheduling load delay slots is still desirable, although not required for functional code when performance is regarded as the most significant factor, or compatibility with the v r 3000 series is required. the operation of the load and store instructions is dependent on the width of the fgr s. when the fr bit in the status register equals zero, the floating-point general purpose register s ( fgr s) are 32-bits wide. to retain single-precision ?ating-point format data, sixteen even number registers out of thirty-two fgr s can be accessed. to retain double-precision ?ating-point format data, even number registers are used for low-order bits of data, and odd number registers for high-order bits. the registers are used as even-odd pairs, and can retain sixteen double-precision format data. when the fr bit in the status register equals one, the floating-point general purpose register s ( fgr s) are 64-bits wide. to retain single-precision ?ating-point format data, low-order bits of thirty-two fgrs are used. to retain double-precision ?ating-point format data, thirty-two fgrs are used. in the load and store operation descriptions, the functions listed in table 17-3 are used to summarize the handling of virtual addresses and physical memory. chapter 17 554 user? manual u10504ej7v0um00 table 17-3 load and store instructions common functions figure 17-1 shows the i-type instruction format used by load and store instructions. figure 17-1 load and store instruction format all coprocessor loads and stores reference data which is located at the word boundary. thus, for word loads and stores, the access type field is always word, and the low-order two bits of the address must always be zero. for doubleword loads and stores, the access type field is always doubleword, and the low- order three bits of the address must always be zero. regardless of byte-numbering order (endianness), the address specifies that byte which has the smallest byte-address in the accessed field. for a big-endian system, this is the leftmost byte; for a little-endian system, this is the rightmost byte. function meaning addresstranslation uses the tlb to find the physical address given by the virtual address. the function fails and a tlb miss exception occurs if the required translation is not present in the tlb. loadmemory searches cache and main memory to find contents of specified physical address at specified data length (doubleword or word), and loads contents. if cache is enabled, contents are loaded to cache. storememory searches and stores cache, write buffer, and main memory to store contents of specified physical address at specified data length (doubleword or word). op is a 6-bit opcode base is the 5-bit base register specifier ft is a 5-bit source (for stores) or destination (for loads) fpu register specifier offset is the 16-bit signed immediate offset 31 25 21 20 16 0 i-type (immediate) 15 offset 26 ft base op 655 16 user? manual u10504ej7v0um00 555 fpu instruction set details 17.4 floating-point computational instructions computational instructions include all of the floating-point computational operations performed by the fpu. figure 17-2 shows the r-type instruction format used for computational operations. figure 17-2 computational instruction format the function field indicates the floating-point operation to be performed. each floating-point instruction can be applied to a number of operand formats . the operand format for an instruction is specified by the 5-bit format field (fmt); decoding for this field is shown in table 17-4. table 17-4 format field decoding table 17-5 lists all floating-point computational instructions. code mnemonic size format 16 s single (32 bits) binary floating-point 17 d double (64 bits) binary floating-point 18 reserved 19 reserved 20 w 32 bits binary fixed-point 21 l 64 bits binary fixed-point 22?1 reserved cop1 is a 6-bit opcode fmt is a 5-bit format specifier fs is a 5-bit source1 register ft is a 5-bit source2 register fd is a 5-bit destination register function is a 6-bit function field 31 0 r-type (register) 655556 cop1 fmt ft fs fd function 11 10 21 20 16 15 26 25 6 5 chapter 17 556 user? manual u10504ej7v0um00 table 17-5 floating-point computational instructions and operations code (5: 0) mnemonic operation 0 add add 1 sub subtract 2 mul multiply 3 div divide 4 sqrt square root 5 abs absolute value 6 mov transfer 7 neg sign reverse 8 round.l convert to 64-bit fixed-point, rounded to nearest/even 9 trunc.l convert to 64-bit fixed-point, rounded toward zero 10 ceil.l convert to 64-bit ?ed-point, rounded to + 11 floor.l convert to 64-bit txed-point, rounded to e 12 round.w convert to 32-bit fixed-point, rounded to nearest/even 13 trunc.w convert to 32-bit fixed-point, rounded toward zero 14 ceil.w convert to 32-bit fixed-point, rounded to + 15 floor.w convert to 32-bit fixed-point, rounded to e 16e31 e reserved 32 cvt.s convert to single floating-point 33 cvt.d convert to double floating-point 34 e reserved 35 e reserved 36 cvt.w convert to 32-bit fixed-point 37 cvt.l convert to 64-bit fixed-point 38e47 e reserved 48e63 c floating-point compare user? manual u10504ej7v0um00 557 fpu instruction set details in the following pages, the notation fgr means the 32 fpu general purpose registers fgr0 through fgr31 of the fpu, and fpr refers to the floating-point registers of the fpu. an fgr (for some parts, cpr is described instead) is used for the load/store instructions, and the data transfer instruction to/from the cpu. fpr is used for the transfer instruction, arithmetic instruction, and conversion instruction in the cp1. when the fr bit in the status register (26 bit) equals zero, only the even ?ating-point registers are valid and the 32 fpus are 32-bit wide. when the fr bit in the status register (26 bit) equals one, both odd and even fprs can be used and the 32 fpus are 64-bit wide. the following routines are used in the description of the floating-point operations to retrieve the value of an fpr or to change the value of an fgr: 32 bit mode value <-- valuefpr(fpr, fmt) /* undefined for odd fpr */ case fmt of s, w: value <-- fgr[fpr+0] d: value <-- fgr[fpr+1] || fgr[fpr+0] end storefpr(fpr, fmt, value): /* undefined for odd fpr */ case fmt of s, w: fgr[fpr+1] <-- undefined fgr[fpr+0] <-- value d: fgr[fpr+1] <-- value 63...32 fgr[fpr+0] <-- value 31...0 end chapter 17 558 user? manual u10504ej7v0um00 17.5 fpu instructions this section describes in detail the floating-point (fpu) instructions. the exceptions that may occur as a result of executing each instruction are described at the end of the description of each instruction. for the details of the exceptions and exception processing, refer to chapter 8 floating-point exceptions . 64 bit mode value <-- valuefpr(fpr, fmt) case fmt of s, w: value <-- fgr[fpr] 31...0 d, l: value <-- fgr[fpr] end storefpr(fpr, fmt, value): case fmt of s, w: fgr[fpr] <-- undefined 32 || value d, l: fgr[fpr] <-- value end user? manual u10504ej7v0um00 559 fpu instruction set details format: abs.fmt fd, fs description: the absolute value of the contents of floating-point register fs is taken and the value to floating-point register fd is stored. the operand is processed in the floating-point format fmt . the absolute value operation is arithmetically performed. if the operand is nan, therefore, the invalid operation exception occurs. this instruction is valid only in the single- and double-precision floating-point formats. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status bit is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception abs.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd abs 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 abs.fmt absolute value 32, 64 t: storefpr (fd, fmt, absolutevalue (valuefpr (fs, fmt) ) ) chapter 17 560 user? manual u10504ej7v0um00 format: add.fmt fd, fs, ft description: the contents of floating-point registers fs and ft are added, and stores the result is stored to floating-point register fd. the operand is processed in the floating-point format fmt . the operation is executed as if the accuracy were infinite, and the result is rounded according to the current rounding mode. this instruction is valid only in the single- and double-precision floating-point formats. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status bit is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception inexact operation exception overflow exception underflow exception add.fmt floating-point add 31 0 655556 cop1 fmt ft fs fd add 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 0 0 0 add.fmt 32, 64 t: storefpr (fd, fmt, valuefpr (fs, fmt) + valuefpr (ft, fmt) ) user? manual u10504ej7v0um00 561 fpu instruction set details format: bc1f offset description: a branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted left two bits and sign-extended. if the cpz condition signal sampled while the instruction immediately preceding is being executed is false (0), the program branches to the branch target address, with a delay of one instruction. because the result of comparison is sampled while the instruction immediately preceding is executed, at least one instruction must be inserted in between the floating-point compare instruction and this instruction. operation: exceptions: coprocessor unusable exception bc1f branch on fpu false 16 15 31 25 26 cop1 6 0 16 offset (coprocessor 1) 5 bc bcf 5 21 20 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 bc1f 32 t?: condition ? not coc[1] t: target ? (offset 15 ) 14 || offset || 0 2 t+1: if condition then pc ? pc + target endif 64 t?: condition ? not coc[1] t: target ? (offset 15 ) 46 || offset || 0 2 t+1: if condition then pc ? pc + target endif chapter 17 562 user? manual u10504ej7v0um00 format: bc1fl offset description: a branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted left two bits and sign-extended. if the cpz condition signal sampled while the instruction immediately preceding is being executed is false (0), the program branches to the branch target address, with a delay of one instruction. if the branch is not taken, the instruction in the branch delay slot is nullified. because the result of comparison is sampled while the instruction immediately preceding is executed, at least one instruction must be inserted in between the floating-point compare instruction and this instruction. operation: exceptions: coprocessor unusable exception 25 bc 0 1 0 0 0 bcf 0 0 0 1 0 bc1fl branch on fpu false likely 31 26 cop1 6 0 16 offset (coprocessor 1) 0 1 0 0 0 1 bc1fl 16 15 5 5 21 20 64 t?: condition ? not coc[1] t: target ? (offset 15 ) 46 || offset || 0 2 t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif 32 t?: condition ? not coc[1] t: target ? (offset 15 ) 14 || offset || 0 2 t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif user? manual u10504ej7v0um00 563 fpu instruction set details format: bc1t offset description: a branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted left two bits and sign-extended. if the cpz condition signal sampled while the instruction immediately preceding is being executed is true (1), the program branches to the branch target address, with a delay of one instruction. because the result of comparison is sampled while the instruction immediately preceding is executed, at least one instruction must be inserted in between the floating-point compare instruction and this instruction. operation: exceptions: coprocessor unusable exception 25 bc1t branch on fpu true 31 26 cop1 6 0 16 offset (coprocessor 1) 0 1 0 0 0 1 bc1t 16 15 5 bc bct 5 21 20 0 1 0 0 0 0 0 0 0 1 32 t?: condition ? coc[1] t: target ? (offset 15 ) 14 || offset || 0 2 t+1: if condition then pc ? pc + target endif 64 t?: condition ? coc[1] t: target ? (offset 15 ) 46 || offset || 0 2 t+1: if condition then pc ? pc + target endif chapter 17 564 user? manual u10504ej7v0um00 format: bc1tl offset description: a branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset , shifted left two bits and sign-extended. if the result of the last floating-point compare is true (1), the program branches to the branch target address, with a delay of one instruction. if the branch is not taken, the instruction in the branch delay slot is nullified. because the result of comparison is sampled while the instruction immediately preceding is executed, at least one instruction must be inserted in between the floating-point compare instruction and this instruction. operation: exceptions: coprocessor unusable exception bc1tl branch on fpu true likely 5 16 15 bc 31 25 26 cop1 6 0 16 offset (coprocessor 1) bctl 5 21 20 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 bc1tl 32 t?: condition ? coc[1] t: target ? (offset 15 ) 14 || offset || 0 2 t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif 64 t?: condition ? coc[1] t: target ? (offset 15 ) 46 || offset || 0 2 t+1: if condition then pc ? pc + target else nullifycurrentinstruction endif user? manual u10504ej7v0um00 565 fpu instruction set details format: c.cond.fmt fs, ft description: compares the contents of floating-point register fs with those of floating-point register ft based on compare condition cond, and sets the result to condition signal coc [1]. the operand is processed in the floating-point format fmt . if one of the values is nan and if the most-significant bit of compare condition cond is set, the invalid operation exception occurs (the result of the comparison is used to test the fpu branch instruction). at least one instruction is necessary between this instruction and the fpu branch instruction. comparison is performed normally, and does not overflow or underflow. one of four mutually exclusive relations results, ?ess than? ?qual to? ?reater than? or ?annot be compared? occurs. if one of or both the operands are nan, the result of the comparison is always ?annot be compared? during comparison, the sign of 0 is ignored (+0 = ?). this instruction is valid only in the single- and double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status bit is 1, both the odd and even register numbers are valid. * see 17.6 fpu instruction opcode bit encoding . c.cond.fmt floating-point 31 0 65555 4 cop1 fmt ft fs 0 cond * 11 10 21 20 16 15 26 25 2 fc * 65 4 3 0 1 0 0 0 1 0 0 0 0 0 compare c.cond.fmt 1 1 chapter 17 566 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception compare c.cond.fmt floating-point (continued) c.cond.fmt 32, 64 t: if nan (valuefpr (fs, fmt) ) or nan (valuefpr (ft, fmt) ) then less ? false equal ? false unordered ? true if cond 3 then signal invalidoperationexception endif else less ? valuefpr (fs, fmt) < valuefpr (ft, fmt) equal ? valuefpr (fs, fmt) = valuefpr (ft, fmt) unordered ? false endif condition ? (cond 2 and less) or (cond 1 and equal) or (cond 0 and unordered) fcr[31] 23 ? condition coc[1] ? condition user? manual u10504ej7v0um00 567 fpu instruction set details format: ceil.l.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 64-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the + direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 63 e1 to e2 63 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 63 e1 is returned. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed during 32-bit user/supervisor mode, a reserved instruction exception occurs. ceil.l.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd ceil.l 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 ceiling to long ceil.l.fmt chapter 17 568 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. (continued) ceil.l.fmt floating-point fixed-point format ceiling to long ceil.l.fmt 32, 64 t: storefpr (fd, l, convertfmt (valuefpr (fs, fmt) , fmt, l) ) user? manual u10504ej7v0um00 569 fpu instruction set details format: ceil.w.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 32-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the + direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 31 e1 to e2 31 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 31 e1 is returned. ceil.w.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd ceil.w 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 ceiling to single ceil.w.fmt chapter 17 570 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. (continued) ceil.w.fmt floating-point fixed-point format ceiling to single ceil.w.fmt 32, 64 t: storefpr (fd, w, convertfmt (valuefpr (fs, fmt) , fmt, w) ) user? manual u10504ej7v0um00 571 fpu instruction set details format: cfc1 rt, fs description: the contents of the floating-point control register fs are loaded into general purpose register rt. this instruction is only defined when fs equals 0 or 31. the contents of general purpose register rt are undefined while the instruction immediately following this load instruction is being executed. operation: exceptions: coprocessor unusable exception (coprocessor 1) cfc1 11 move control word from fpu 31 25 26 20 21 15 16 cop1 cf rt 655 fs 0 5 11 10 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 cfc1 32 t: temp ? fcr[fs] t+1: gpr[rt] ? temp 64 t: temp ? fcr[fs] t+1: gpr[rt] ? (temp 31 ) 32 || temp chapter 17 572 user? manual u10504ej7v0um00 format: ctc1 rt, fs description: the contents of general purpose register rt are loaded to floating-point register fs . this instruction is defined if fs is 0 or 31. if the cause bit of the floating-point control/status register (fcr31) and the corresponding enable bit are set by writing data to fcr31, the floating-point exception occurs. write the data to the register before the exception occurs. the contents of the floating-point control register fs are undefined while the instruction immediately following this instruction is executed. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception division by zero exception inexact operation exception overflow exception underflow exception ctc1 11 move c ontrol word to fpu 31 25 26 20 21 15 16 cop1 ct rt 655 fs 0 5 11 10 0 (coprocessor 1) 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ctc1 32 t: temp ? gpr[rt] t+1: fcr[fs] ? temp coc[1] ? fcr[31] 23 64 t: temp ? gpr[rt] 31...0 t+1: fcr[fs] ? temp coc[1] ? fcr[31] 23 user? manual u10504ej7v0um00 573 fpu instruction set details format: cvt.d.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a double-precision floating-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . this instruction is valid only for conversion from the single-precision floating- point format, and 32-bit or 64-bit fixed floating-point format. in the single-precision floating-point format or 32-bit fixed point format, this conversion operation is executed correctly without the accuracy becoming degraded. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception cvt.d.fmt floating-point floating-point format convert to double cvt.d.fmt 31 0 65555 6 cop1 fmt 0 fs fd cvt.d 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 32, 64 t: storefpr (fd, d, convertfmt (valuefpr (fs, fmt) , fmt, d) ) chapter 17 574 user? manual u10504ej7v0um00 restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan conversion from floating-point format to fixed-point format essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. conversion from fixed-point format to floating-point format essentially, if 64-bit fixed-point format data in which any of bits 55 to 62 is 1 is converted to floating-point format data, an unimplemented operation exception will occur. cvt.d.fmt floating-point floating-point format convert to double cvt.d.fmt (continued) user? manual u10504ej7v0um00 575 fpu instruction set details format: cvt.l.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 64-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 63 ? to ? 63 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 63 ? is returned. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed during 32-bit user/supervisor mode, a reserved instruction exception occurs. cvt.l.fmt floating-point 31 0 65555 6 cop1 fmt 0 fs fd cvt.l 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 convert to long cvt.l.fmt chapter 17 576 user? manual u10504ej7v0um00 operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. cvt.l.fmt floating-point fixed-point format convert to long cvt.l.fmt (continued) 64 t: storefpr (fd, l, convertfmt (valuefpr (fs, fmt) , fmt, l) ) user? manual u10504ej7v0um00 577 fpu instruction set details format: cvt.s.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a single-precision floating-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded according to the current rounding mode. this instruction is valid only for conversion from the double-precision floating- point format, and 32-bit or 64-bit fixed floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception underflow exception cvt.s.fmt floating-point 31 0 65555 6 cop1 fmt 0 fs fd cvt.s 11 10 21 20 16 15 26 25 floating-point format 65 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 convert to single cvt.s.fmt 32, 64 t: storefpr (fd, s, convertfmt (valuefpr (fs, fmt) , fmt, s) ) chapter 17 578 user? manual u10504ej7v0um00 restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan conversion from floating-point format to fixed-point format essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. conversion from fixed-point format to floating-point format essentially, if 64-bit fixed-point format data in which any of bits 55 to 62 is 1 is converted to floating-point format data, an unimplemented operation exception will occur. cvt.s.fmt floating-point floating-point format convert to single cvt.s.fmt (continued) user? manual u10504ej7v0um00 579 fpu instruction set details format: cvt.w.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 32-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 31 ? to ? 31 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 31 ? is returned. cvt.w.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd cvt.w 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 convert to single cvt.w.fmt chapter 17 580 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. cvt.w.fmt floating-point fixed-point format convert to single cvt.w.fmt (continued) 32, 64 t: storefpr (fd, w, convertfmt (valuefpr (fs, fmt) , fmt, w) ) user? manual u10504ej7v0um00 581 fpu instruction set details format: div.fmt fd, fs, ft description: the contents of floating-point register fs are divided by those of floating-point register ft , and the result are stored to floating-point register rd . the operand is processed in the floating-point format fmt . the operation is executed as if the accuracy were infinite, and the result is rounded according to the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception division-by-zero exception inexact operation exception overflow exception underflow exception div.fmt floating-point divide 31 0 655556 cop1 fmt ft fs fd div 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 0 1 1 div.fmt 32, 64 t: storefpr (fd, fmt, valuefpr (fs, fmt)/valuefpr (ft, fmt) ) chapter 17 582 user? manual u10504ej7v0um00 format: dmfc1 rt, fs description: the contents of floating-point general purpose register fs are stored into cpu general purpose register rt . the contents of general purpose register rt are undefined while the instruction immediately following this instruction is being executed. the fr bit of the status register indicates whether all the 32 registers of the processor can be specified. if the fr bit is 0, and the least-significant bit of fs is 1, this instruction is undefined. the operation is undefined if an odd number is specified when the fp bit of the status register is 0. if the fr bit is 1, both the odd-numbered and even-numbered registers are valid. this operation is defined in 64-bit mode or 32-bit kernel mode. dmfc1 doubleword move from fpu fs 11 10 5 31 25 26 20 21 15 16 0 cop1 dmf rt 0 655 11 (coprocessor 1) 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 dmfc1 user? manual u10504ej7v0um00 583 fpu instruction set details operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: unimplemented operation exception dmfc1 doubleword move from fpu (coprocessor 1) dmfc1 (continued) 64 t: if sr 26 = 1 then data ? fgr [fs] else if fs 0 = 0 then data ? fgr [fs + 1] || fgr [fs] else data ? undefined 64 endif t+1: gpr[rt] ? data chapter 17 584 user? manual u10504ej7v0um00 format: dmtc1 rt, fs description: the contents of general purpose register rt are loaded into floating-point general purpose register fs . the contents of fs are undefined while the instruction immediately following this instruction is being executed. the fr bit of the status register indicates whether all the 32 registers of the processor can be specified. if the fr bit is 0, and the least-significant bit of fs is 1, this instruction is undefined. the operation is undefined if an odd number is specified when the fr bit of the status register is 0. if the fr bit is 1, both the odd-numbered and even-numbered registers are valid. this operation is defined in 64-bit mode or 32-bit kernel mode. dmtc1 doubleword move to fpu fs 11 10 5 31 25 26 20 21 15 16 0 cop1 dmt rt 0 655 11 (coprocessor 1) 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 dmtc1 user? manual u10504ej7v0um00 585 fpu instruction set details operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: unimplemented operation exception dmtc1 doubleword move to fpu (coprocessor 1) dmtc1 (continued) 64 t: data ? gpr[rt] t+1: if sr 26 = 1 then fgr [fs] ? data else if fs 0 = 0 then fgr [fs+1] ? data 63..32 fgr [fs] ? data 31..0 else undefined_result endif chapter 17 586 user? manual u10504ej7v0um00 format: floor.l.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 64-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the ? direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 63 e1 to e2 63 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 63 e1 is returned. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed during 32-bit user/supervisor mode, a reserved instruction exception occurs. floor.l.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd floor.l 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 floor to long floor.l.fmt user? manual u10504ej7v0um00 587 fpu instruction set details operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. floor.l.fmt floating-point fixed-point format floor to long floor.l.fmt (continued) 64 t: storefpr (fd, l, convertfmt (valuefpr (fs, fmt) , fmt, l) ) chapter 17 588 user? manual u10504ej7v0um00 format: floor.w.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 32-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the ? direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 31 e1 to e2 31 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 31 e1 is returned. floor.w.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd floor.w 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 floor to single floor.w.fmt user? manual u10504ej7v0um00 589 fpu instruction set details operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. floor.w.fmt floating-point fixed-point format floor to single floor.w.fmt (continued) 32, 64 t: storefpr (fd, w, convertfmt (valuefpr (fs, fmt) , fmt, w) ) chapter 17 590 user? manual u10504ej7v0um00 format: ldc1 ft, offset (base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. if the fr bit of the status register is 0, the contents of the doubleword at the memory location specified by the virtual address are loaded to floating-point registers ft and ft+1. at this time, the high-order 32 bits of the doubleword are stored to an odd-numbered register specified by ft+1, and the low-order 32 bits are stored to an even-numbered register specified by ft . the operation is undefined if the least significant bit in the ft field is not 0. if the fr bit is 1, the contents of the doubleword at the memory location specified by the virtual address are loaded to floating-point register ft . if any of the low-order three bits of the address are not zero, an address error exception occurs. ldc1 load doubleword to fpu 31 25 26 20 21 15 16 0 ldc1 base ft offset 655 16 (coprocessor 1) 1 1 0 1 0 1 ldc1 user? manual u10504ej7v0um00 591 fpu instruction set details operation: exceptions: coprocessor unusable tlb miss exception tlb invalid exception bus error exception address error exception ldc1 load doubleword to fpu (coprocessor 1) ldc1 (continued) 32 t: vaddr ? ( (offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? loadmemory (uncached, doubleword, paddr, vaddr, data) if sr 26 = 1 then fgr [ft] ? data elseif ft 0 = 0 then fgr [ft+1] ? data 63...32 fgr [ft] ? data 31...0 else undefined_result endif 64 t: vaddr ? ( (offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? address translation (vaddr, data) data ? loadmemory (uncached, doubleword, paddr, vaddr, data) if sr 26 = 1 then fgr [ft] ? data elseif ft 0 = 0 then fgr [ft+1] ? data 63...32 fgr [ft] ? data 31...0 else undefined_result endif chapter 17 592 user? manual u10504ej7v0um00 format: lwc1 ft, offset (base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the word at the memory location specified by the virtual address are loaded to floating-point register ft . if the fr bit of the status register is 0 and if the least-significant bit in the ft field is 0, the contents of the word are stored to the low-order 32 bits of floating-point register ft . if the least-significant bit in the ft area is 1, the contents of the word are stored to the high-order 32 bits of floating-point register ft-1. if the fr bit is 1, all the 64-bit floating-point registers can be accessed; therefore, the contents of the word are stored to floating-point register ft . the value of the high-order 32 bits is undefined. if either of the low-order two bits of the address is not zero, an address error exception occurs. lwc1 load word to fpu 31 25 26 20 21 15 16 0 lwc1 base ft offset 655 16 (coprocessor 1) 1 1 0 0 0 1 lwc1 user? manual u10504ej7v0um00 593 fpu instruction set details operation: exceptions: coprocessor unusable exception tlb miss exception tlb invalid exception bus error exception address error exception lwc1 load word to fpu (coprocessor 1) lwc1 (continued) 32 t: vaddr ? ( (offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? loadmemory (uncached, word, paddr, vaddr, data) if sr 26 = 1 then fgr [ft] ? undefined 32 || data else fgr [ft] ? data endif 64 t: vaddr ? ( (offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? loadmemory (uncached, word, paddr, vaddr, data) if sr 26 = 1 then fgr [ft] ? undefined 32 || data else fgr [ft] ? data endif chapter 17 594 user? manual u10504ej7v0um00 format: mfc1 rt, fs description: the contents of floating-point general purpose register fs are stored to the general purpose register rt of the cpu register rt . the contents of general purpose register rt are undefined while the instruction immediately following this instruction is being executed. if the fr bit of the status register is 0 and if the least-significant bit in the ft field is 0, the low-order 32 bits of floating-point register ft are stored to the general purpose register rt. if the least-significant bit in the ft area is 1, the high-order 32 bits of floating-point register ft-1 are stored to the general purpose register rt. if the fr bit is 1, all the 64-bit floating-point registers can be accessed; therefore, the low-order 32 bits of floating-point register ft are stored to the general purpose register rt. operation: exceptions: coprocessor unusable exception mfc1 11 m ove w or d f rom fpu 31 25 26 20 21 15 16 cop1 mf rt 655 fs 0 5 11 10 0 (coprocessor 1) 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mfc1 32 t: data ? fgr [fs] 31...0 t+1: gpr [rt] ? data 64 t: data ? fgr [fs] 31...0 t+1: gpr[rt] ? (data 31 ) 32 || data user? manual u10504ej7v0um00 595 fpu instruction set details format: mov.fmt fd, fs description: the contents of floating-point register fs are stored to floating-point register fd . the operand is processed in the floating-point format fmt . this instruction is not executed arithmetically, and the ieee754 exception does not occur. this instruction is valid only in the single- and double-precision floating-point formats. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status bit is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception mov.fmt floating-point move 31 0 655556 cop1 fmt 0 fs fd mov 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 mov.fmt 32, 64 t: storefpr (fd, fmt, valuefpr (fs, fmt) ) chapter 17 596 user? manual u10504ej7v0um00 format: mtc1 rt, fs description: the contents of general purpose of the cpu register rt are loaded into the floating- point general purpose register fs . the contents of floating-point register fs is undefined while the instruction immediately following this instruction is being executed. the fr bit of the status register specifies the method of access to the floating- point general purpose registers. if fr bit equals zero, all 32 floating-point general purpose registers can be accessed. access an odd-numbered register for the high-order 32 bits and an even-numbered register for the low-order 32 bits in the format of the floating- point operation instruction when transferring double-precision data. if the fr bit is 1, all the 32 floating-point general purpose registers can be accessed, but the low-order 32 bits of the register are accessed for data. operation: exceptions: coprocessor unusable exception mtc1 11 move to fpu 31 25 26 20 21 15 16 cop1 mt rt 655 fs 0 5 11 10 0 (coprocessor 1) 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 mtc1 32, 64 t: data ? gpr [rt]2 31..0 t+1: if sr 26 = 1 then fgr [fs] ? undefined 32 || data else fgr [fs] ? data endif user? manual u10504ej7v0um00 597 fpu instruction set details format: mul.fmt fd, fs, ft description: the contents of floating-point register fs are multiplied by those of floating-point register ft , and the result is stored to floating-point register fd . the operand is processed in the floating-point format fmt . this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception inexact operation exception overflow exception underflow exception mul.fmt floating-point multiply 31 0 655556 cop1 fmt ft fs fd mul 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 0 1 0 mul.fmt 32, 64 t: storefpr (fd, fmt, valuefpr (fs, fmt) * valuefpr (ft, fmt) ) chapter 17 598 user? manual u10504ej7v0um00 format: neg.fmt fd, fs description: the sign of the contents of floating-point register fs is inverted and the result to floating-point register fd is stored. the operand is processed in the floating-point format fmt . the sign is inverted arithmetically. therefore, the instruction is invalid if nan is specified as the operand. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception neg.fmt floating-point negate 31 0 655556 cop1 fmt 0 fs fd neg 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 neg.fmt 32, 64 t: storefpr (fd, fmt, negate (valuefpr (fs, fmt) ) ) user? manual u10504ej7v0um00 599 fpu instruction set details format: round.l.fmt fd, fs description: the contents of floating-point register fs are converted into the 64-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded to the closest value or even number regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 63 ? to ? 63 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 63 ? is returned. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed during 32-bit user/supervisor mode, a reserved instruction exception occurs. round.l.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd round.l 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 round to long round.l.fmt chapter 17 600 user? manual u10504ej7v0um00 operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. round.l.fmt floating-point fixed-point format round to long round.l.fmt (continued) 64 t: storefpr (fd, l, convertfmt (valuefpr (fs, fmt) , fmt, l) ) user? manual u10504ej7v0um00 601 fpu instruction set details format: round.w.fmt fd, fs description: the contents of floating-point register fs are converted into the 32-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded to the closest value or even number regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 31 ? to ? 31 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 31 ? is returned. round.w.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd round.w 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 round to single round.w.fmt chapter 17 602 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. round.w.fmt floating-point fixed-point format round to single round.w.fmt (continued) 32, 64 t: storefpr (fd, w, convertfmt (valuefpr (fs, fmt) , fmt, w) ) user? manual u10504ej7v0um00 603 fpu instruction set details format: sdc1 ft, offset(base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of floating-point registers ft and ft+1 are stored to the memory position specified by the virtual address as a doubleword if the fr bit of the status register is 0. at this time, the contents of the odd-numbered register specified by ft+1 correspond to the high-order 32 bits of the doubleword, and the contents of the even-numbered register specified by ft correspond to the low-order 32 bits. if the least significant bit in the ft field is not 0, this instruction is not defined. if the fr bit is 1, the contents of floating-point register ft are stored to the memory location specified by the virtual address as a doubleword. if any of the low-order three bits of the address are not zero, an address error exception occurs. sdc1 store doubleword from fpu 31 25 26 20 21 15 16 0 sdc1 base ft offset 655 16 (coprocessor 1) 1 1 1 1 0 1 sdc1 chapter 17 604 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception sdc1 store doubleword from fpu (coprocessor 1) sdc1 (continued) 32 t: vaddr ? ( (offset 15 ) 16 || offset 15...0 ) + gpr [base] (paddr, uncached) ? addresstranslation (vaddr, data) if sr 26 = 1 data ? fgr [ft] 63...0 elseif ft 0 = 0 then data ? fgr [ft+1] 31...0 || fgr [ft] 31...0 else data ? undefined 64 endif storememory (uncached, doubleword, data, paddr, vaddr, data) 64 t: vaddr ? ( (offset 15 ) 48 || offset 15...0 ) + gpr [base] (paddr, uncached) ? addresstranslation (vaddr, data) if sr 26 = 1 data ? fgr [ft] 63...0 elseif ft 0 = 0 then data ? fgr [ft+1] 31...0 || fgr [ft] 31...0 else data ? undefined 64 endif storememory (uncached, doubleword, data, paddr, vaddr, data) user? manual u10504ej7v0um00 605 fpu instruction set details format: sqrt.fmt fd, fs description: the positive arithmetic square root of the contents of floating-point register fs is calculated and the result is stored to floating-point register fd . the operand is processed in the floating-point format fmt . the result is rounded as if calculated to infinite precision and then rounded according to the current rounding mode. if the value of the source operand is ?, the result will be ?. the result is placed in the floating-point register specified by fd. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception inexact operation exception sqrt.fmt floating-point 31 0 655556 cop1 fmt fs fd sqrt 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 1 0 0 square root sqrt.fmt 0 0 0 0 0 0 32, 64 t: storefpr (fd, fmt, squareroot (valuefpr (fs, fmt) ) ) chapter 17 606 user? manual u10504ej7v0um00 format: sub.fmt fd, fs, ft description: the contents of floating-point register ft from those of floating-point register fs , and the result is stored to floating-point register fd . the result is rounded as if calculated to infinite precision and then rounded according to the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: unimplemented operation exception invalid operation exception inexact operation exception overflow exception underflow exception sub.fmt floating-point subtract 31 0 655556 cop1 fmt ft fs fd sub 11 10 21 20 16 15 26 25 6 5 0 1 0 0 0 1 0 0 0 0 0 1 sub.fmt 32, 64 t: storefpr (fd, fmt, valuefpr (fs, fmt) ?valuefpr (ft, fmt) ) user? manual u10504ej7v0um00 607 fpu instruction set details format: swc1 ft, offset (base) description: the 16-bit offset is sign-extended and added to the contents of general purpose register base to form a virtual address. the contents of the floating-point general purpose register ft are stored at the memory location of the specified address. if the fr bit of the status register is 0 and the least-significant bit in the ft field is 0, the contents of the low-order 32 bits of floating-point register ft are stored. if the least-significant bit in the ft field is 1, the contents of the high-order 32 bits of floating-point register ft-1 are stored. if the fr bit is 1, all the 64-bit floating-point registers can be accessed; therefore, the contents of the low-order 32 bits in the ft field are stored. if either of the low-order two bits of the address are not zero, an address error exception occurs. swc1 store word from fpu 31 25 26 20 21 15 16 0 swc1 base ft offset 655 16 (coprocessor 1) 1 1 1 0 0 1 swc1 chapter 17 608 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable tlb miss exception tlb invalid exception tlb modification exception bus error exception address error exception swc1 store word from fpu (coprocessor 1) swc1 (continued) 32 t: vaddr ? ( (offset 15 ) 16 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? fgr [ft] 31...0 storememory (uncached, word, data, paddr, vaddr, data) 64 t: vaddr ? ( (offset 15 ) 48 || offset 15...0 ) + gpr[base] (paddr, uncached) ? addresstranslation (vaddr, data) data ? fgr [ft] 31...0 storememory (uncached, word, data, paddr, vaddr, data) user? manual u10504ej7v0um00 609 fpu instruction set details format: trunc.l.fmt fd, fs description: the contents of floating-point register fs are converted into the 64-bit fixed-point format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the 0 direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 63 ? to ? 63 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 63 ? is returned. this operation is defined in the 64-bit mode and 32-bit kernel mode. if this instruction is executed during 32-bit user/supervisor mode, a reserved instruction exception occurs. trunc.l.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd trunc.l 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 01 truncate to long trunc.l.fmt chapter 17 610 user? manual u10504ej7v0um00 operation: remark same operation in the 32-bit kernel mode. exceptions: coprocessor unusable exception floating-point exception reserved instruction exception (v r 4300 in 32-bit user or supervisor mode) floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. trunc.l.fmt floating-point fixed-point format truncate to long trunc.l.fmt (continued) 64 t: storefpr (fd, l, convertfmt (valuefpr (fs, fmt) , fmt, l) ) user? manual u10504ej7v0um00 611 fpu instruction set details format: trunc.w.fmt fd, fs description: the contents of floating-point register fs are arithmetically converted into a 32-bit fixed-point single format, and the result is stored to floating-point register fd . the source operand is processed in the floating-point format fmt . the result of the conversion is rounded toward the 0 direction, regardless of the current rounding mode. this instruction is valid only for conversion from the single- or double-precision floating-point format. if the fr bit of the status register is 0, only an even number can be specified as a register number because adjacent even-numbered and odd-numbered registers are used in pairs as a floating-point registers. if an odd number is specified, the operation is undefined. if the fr bit of the status register is 1, both the odd and even register numbers are valid. if the source operand is infinite or nan, and if the rounded result is outside the range of 2 31 ? to ? 31 , the invalid operation exception occurs. if the invalid operation exception is not enabled, the exception does not occur, and 2 31 ? is returned. trunc.w.fmt floating-point 31 0 655556 cop1 fmt 0 fs fd trunc.w 11 10 21 20 16 15 26 25 fixed-point format 65 0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 truncate to single trunc.w.fmt chapter 17 612 user? manual u10504ej7v0um00 operation: exceptions: coprocessor unusable exception floating-point exception floating-point exceptions: invalid operation exception unimplemented operation exception inexact operation exception overflow exception restrictions: an unimplemented operation exception will occur in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan essentially, if any of bits 53 to 62 of the result of conversion from a floating-point format to a fixed-point format is 1, an unimplemented operation exception will occur. this includes cases when there is an overflow during conversion. trunc.w.fmt trunc.w.fmt floating-point fixed-point format truncate to single (continued) 32, 64 t: storefpr (fd, w, convertfmt (valuefpr (fs, fmt) , fmt, w) ) user? manual u10504ej7v0um00 613 fpu instruction set details 17.6 fpu instruction opcode bit encoding figure 17-3 lists the bit encoding for fpu instructions. figure 17-3 bit encoding for fpu instructions (1/2) 31...29 0 1 2 3 4 5 6 7 28...26 opcode br 0 1 2345 6 7 0 1 g gggg 23...21 sub 01 234567 25...24 ggg g lwc1 swc1 cop1 ldc1 sdc1 dmf h l h gg gggg ggg g mf bc cf mt s ct d 2 3 0 1 ** **** 18...16 0 1 2 3 4 5 6 7 20...19 *** bcf bcfl bct bctl 2 3 ** * w gg **** *** * **** *** * dmt h chapter 17 614 user? manual u10504ej7v0um00 figure 17-3 bit encoding for fpu instructions (2/2) key: * when the operation code marked with an asterisk is executed, the reserved instruction exception occurs. this code is reserved for future expansion. g operation codes marked with a gamma cause unimplemented operation exceptions in all current implementations and are reserved for future expansion. h when the operation code marked with an eta is executed, the result is valid only when use of the mips iii instruction set is enabled. if the operation code is executed when use of the instruction set is disabled (in the 32 bit user/supervisor mode), the unimplemented operation exception occurs. 01234567 2...0 5...3 function 0 1 2 3 4 5 6 add sub 7 gggg cvt.s c.f mul div abs mov neg sqrt round.l h trunc.l h ceil.l h floor.l h round.w trunc.w ceil.w floor.w gggg cvt.d cvt.w c.un c.eq c.ueq c.ole c.ule c.lt c.nge c.sf c.ngle c.seq c.ngl c.le c.ngt c.olt c.ult gggg gggg gg cvt.l h gg gg g g gggg user? manual u10504ej7v0um00 615 pll passive elements 18 chapter 18 616 user? manual u10504ej7v0um00 connect several passive elements externally to the v r 4300 so that the processor can operate normally. connect the elements to the pllcap0, pllcap1, v dd p, and gndp pins. figure 18-1 shows the connections of the passive elements for pll. remarks 1. c1, c2, c3, cp%1, cp%2, r, and l are mounted on the board. 2. either r or l may do in a system where it has been confirmed through experiment that noise is not superimposed on v dd p and gndp. 3. the value of each element differs depending on the system. find the appropriate values for each system through experiment. figure 18-1 connection example of pll passive elements c1 cp c2 r rl l %2 cp %1 v r 4300 c3 v dd p v dd pllcap1 gndp pllcap0 gnd user? manual u10504ej7v0um00 617 pll passive elements figure 18-2 shows a layout example of 120-pin plastic qfp and capacitor on pwb. remarks x : gnd-v dd bypass capacitors c2 : gndp-v dd p bypass capacitors %1, %2 : pll capacitors figure 18-2 layout example of qfp and capacitor on pwb separate the wiring of the power (v dd p) and ground (gndp) for pll from the normal power (v dd ) and ground (gnd) wiring. here is an example of the value of each element. r = 5 w c1 = 1 nf c2 = 82 nf c3 = 10 m f cp = 470 pf because the optimum values of filter elements differ depending on the application and noise environment of the system. therefore, the above values are given for reference only. find the optimum values for users? application through trial and error. a choke element (inductor: l) may be used instead of the resistor (r) used as a power filter. x x x x c2 %2 %1 m pd30200gd pwb 618 user? manual u10504ej7v0um00 [memo] user?s manual u10504ej7v0um00 619 coprocessor 0 hazards 19 chapter 19 620 user?s manual u10504ej7v0um00 if a conflict of internal resources takes place between instructions (such as when the contents of the destination register are used as the source for the next instruction), the v r 4300 interlocks the pipeline to prevent conflict of internal resources. therefore, it is not necessary to insert a nop instruction between instructions. however, the cp0 register and tlb are not interlocked. when developing a program that uses the cp0 register and tlb, therefore, take conflict of the internal resources into consideration. cp0 hazard defines the number of nop instructions to be inserted between instructions to avoid conflict of internal resources, or the number of instructions independent of the conflict. this chapter explains this cp0 hazard. the value of v r 4300 cp0 hazards is equivalent or less than those of the v r 4400; table 19-1 lists the v r 4300 cp0 hazards. code which complies with these hazards will run without modification on the v r 4400 or v r 4200. when the data of the cp0 register or bit is defined in the source column in the following table, that data can be used as a source. if data is stored in the cp0 register or bit shown in the destination column, that data is used as the destination. the number of nop instructions between the instructions related to the cp0 register and tlb, or the number of the instructions independent of the conflict can be calculated from the following expression, using this table. (number of destination hazards of instruction a) - {(number of source hazards of instruction b) +1} as an example, to find the number of instructions required between an mtc0 and a subsequent mfc0 instruction, this is: (7) - (4 + 1) = 2 instructions caution the hazard related to cp0 does not generate the interlock of the pipeline. therefore, control the number of required instructions by program. user?s manual u10504ej7v0um00 621 coprocessor 0 hazards table 19-1 coprocessor 0 hazards operation source destination name number of hazard name number of hazard mtc0 cpr rd 7 mfc0 cpr rd 4 tlbr index, tlb 5-7 pagemask, entryhi entrylo0, entrylo1 8 tlbwi tlbwr index or random pagemask, entryhi, entrylo0, entrylo1 5-8 tlb 8 tlbp pagemask, entryhi 3-6 index 7 eret epc or errorepc, status, tlb 4 g status.exl, status.erl 4-8 a llbit 7 cache index load tag taglo, taghi, ecc 8 b cache index store tag taglo, taghi, ecc 7 cache hit ops. status.ch 8 coprocessor usable test status.cu, status.ksu status.exl, status.erl 2 instruction fetch entryhi.asid status.ksu, status.exl, status.erl, status.re, config.k0 0 tlb 2 instruction fetch exception epc, status 8 cause, badvaddr, context 3 interrupt cause.ip, status.im status.ie, status.exl status.erl 3 load/store entryhi.asid status.ksu, status.exl, status.erl, status.re, config.k0, tlb 4 watchhi, watchlo 4-5 load/store exception epc, status, cause 8 badvaddr, context tlb shutdown status.ts 7 chapter 19 622 user?s manual u10504ej7v0um00 remarks 1. a hazard is associated when an instruction related to the bit specified by the source or destination is executed. for example, if cp1 is enabled by setting status.c to 1 by the mtc0 instruction, all the instructions using cp1 (fpu) are subject to hazard. 2. a status.exl and status.erl are cleared in stage 8, but the effect of clearing them is visible at the time of an instruction fetch starting at the beginning of stage 4. b one instruction to separate index load tag and mfc0 tag will do, even though a above would imply three instructions. the instruction following a mtc0 instruction must not be a mfc0 instruction. the ?e instructions following a mtc0 instruction to status register that changes ksu bit and sets exl or erl bits may be executed in the new mode, and not in the kernel mode. this can be avoided by setting exl bit ?st, leaving ksu bit set to kernel, and later changing ksu bit. there must be two non-load, non-cache instructions between a store instruction and a cache instruction directed to the same cache line as the store destination. g an eret instruction following an mtc0 instruction that sets the erl bit in the status register (status.erl) must be separated from the mtc0 instruction by three instructions. cautions 1. if the k0 bit of the config register is changed to the non-cache mode by using the mtc0 instruction, the non-cache area is set when the instruction fetch two instructions after the mtc0 instruction is executed. 2. if a jump or branch instruction is executed immediately after the its bit of the status register has been set, a stall lasting for several instructions will occur. user?s manual u10504ej7v0um00 623 coprocessor 0 hazards the status in which cp0 hazard must be taken into consideration when each instruction is executed is explained below. (1) mtc0 destination: completion of writing to destination register (cp0) by mtc0 instruction (2) mfc0 source: determination of source register (cp0) of mfc0 instruction (3) tlbr source: determination of tlb status and index register before execution of tlbr instruction (4) tlbwi, tlbwr source: determination of source register of tlbwi and tlbwr instructions and register used for tlb entry specification destination: completion of writing to tlb by tlbwi and tlbwr instructions (5) tlbp source: determination of pagemask register and entryhi register before execution of tlbp instruction destination: completion of writing result of tlbp instruction execution to index register (6) eret source: determination of register holding information necessary for eret instruction execution destination: completion of processor status transition due to eret instruction execution (7) cache index load tag destination: completion of writing execution of this instruction to each register (8) cache index store tag source: determination of register holding information necessary for execution of this instruction (9) coprocessor use test source: determination of mode set by bit value of cp0 register in source column chapter 19 624 user?s manual u10504ej7v0um00 examples 1. when accessing the cp0 register in the user mode after changing the content of the status.cu0 bit or when executing an instruction using the resources of cp0 (such as tlb instruction, cache instruction, or branch instruction) 2. when accessing the cp0 register in the operating mode used after the contents of the status.ksu , exl , and erl bits have been changed 3. when using the fpu (cp1) after the content of the status.cu1 bit has been changed (10) instruction fetch source: determination of operating mode and tlb necessary for instruction fetch examples 1. when fetching instructions after the mode has been changed from user to kernel after the contents of the status.ksu , exl , and erl bits have been changed 2. when rewriting tlb and fetching an instruction by using its tlb entry (11) instruction fetch exception destination: completion of writing to each register holding information related to an exception when the exception has occurred as a result of instruction fetch (12) interrupt source: determination of each register that identifies an exception generation condition when an interrupt cause occurs (13) load/store source: determination of operating mode related to address generation by load/ store instruction, determination of tlb entry, determination of cache mode set by the config.k0 bit, and determination of a register that sets a watch exception generation condition example when executing the load/store instruction in the kernel area after the mode has been changed from user to kernel (14) load/store exception destination: completion of writing to each register holding information related to an exception when the exception occurs as a result of a load/store operation (15) tlb shut down destination: completion of writing to status.ts bit when tlb shut down occurs user?s manual u10504ej7v0um00 625 coprocessor 0 hazards table 19-2 shows examples of calculating the number of cp0 hazards and the number of instructions to be inserted. table19-2 example of calculating number of cp0 hazards and number of instructions inserted * the number of hazards is undefined if the execution sequence is changed by an exception. in this case, the minimum number of hazards until the val- ue of the ie bit is determined and the maximum number of hazards until a pending and enabled interrupt occurs may be the same. destination source conflicting internal resources number of instructions inserted expression tlbwr/tlbwi tlbp tlb entry 3 8?4+1) tlbwr/tlbwi load/store using newly rewritten tlb tlb entry 3 8?4+1) tlbwr/tlbwi instruction fetch using newly rewritten tlb tlb entry 5 8?2+1) mtc0, status [cu] coprocessor instruction requiring setting of cu status [cu] 4 7?2+1) tlbwr mfc0 entryhi entryhi 3 8?4+1) mtc0 entrylo0 tlbwr/tlbwi entrylo0 1 7?5+1) tlbp mfc0 index index 2 7?4+1) mtc0 entryhi tlbp entryhi 1 7?5+1) mtc0 epc eret epc 2 7?4+1) mtc0 status eret status 3 7?3+1) mtc0 status [ie]* instruction causing interrupt status [ie] 3 7?3+1) 626 user?s manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 627 differences between the v r 4300, v r 4305, and v r 4310 a the following table describes the differences between the v r 4300, v r 4305, and v r 4310. appendix a 628 user? manual u10504ej7v0um00 table a-1 differences between the v r 4300, v r 4305, and v r 4310 *1. the 1.5 times frequency setting is allowed with the 100 mhz model only. (with the 133 mhz model, this setting is reserved.) *2. the 4 times frequency setting is allowed with the 133 mhz model only. (with the 100 mhz model, this setting is reserved.) *3. the 2.5 times frequency setting is allowed with the 167 mhz model only. (with the 133 mhz model, this setting is reserved.) *4. the 133 mhz model of the v r 4300 is not supported. parameter v r 4300 v r 4305 v r 4310 system bus write data transfer two buses (d/d ) initial value setting pins at reset time divmode (1:0) (can be set on power application only) divmode (2:0) (can be set on power application only) block write access sequential ordering state after final data write final data retained in transfer rate setting non-cache high- speed write provided integer operation unit corresponding instructions mips i, ii, and iii instruction sets cache memory data protection none jtag interface provided syncout-syncln path provided clock interface input vs. internal multiplication rate 1.5 *1 , 2, 3, 4 *2 1, 2, 3 2, 2.5 *3 , 3, 4, 5, 6 internal vs. bus frequency division rate 1.5 *1 , 2, 3, 4 *2 1, 2, 3 2, 2.5 *3 , 3, 4, 5, 6 power mode low power mode pipeline/system bus operated at a quarter of the normal rate *4 none wait mode none prid register imp = 0 0b user? manual u10504ej7v0um00 629 differences from v r 4400 b the v r 4300 is slightly different from the v r 4400 in terms of system design and software. this appendix describes the differences between the v r 4300 and v r 4400. the major differences lie in cache handling. this is because the v r 4300 does not support a secondary cache control function and a multi-processing function and because it employs a 32-bit external bus interface. appendix b 630 user? manual u10504ej7v0um00 b.1 differences in software the logical differences in software are the specifications of the cp0 registers. these differences are shown in table b-1. b.1.1 cache instruction up to 4 mb of a secondary cache memory can be connected to the v r 4400. by contrast, the v r 4300 does not support a secondary cache. therefore, the operations of the cache instructions that reference sd (secondary data cache) and si (secondary instruction cache) are undefined. all write back processing is transfer from the primary cache to the main memory. the cache instruction hit set virtual that is used to access the sd and si with the v r 4400 is undefined with the v r 4300. the dirty bit (w bit of the v r 4400) of the data cache can be cleared by the cache instruction hit_write_back. the v r 4300 has a cache state bit. the v r 4400 has two cache state bits to support multi-processing. to manipulate this bit of the v r 4300, write the bit 7 of the taglo register using a cache instruction (index_store_tag_d). with the v r 4400, the bits 6 and 7 of the taglo register are written. b.1.2 cache parity because the v r 4300 does not check the cache data by using a parity, the cache error register (27) always outputs 0, and writing this register is ignored. the parity error register (26) can be used for only self-diagnosis and cannot be used to manipulate the cache. b.1.3 status register the bit specifications of the status registers are slightly different between the v r 4300 and v r 4400. the fixed bits (bits 24 and 27) of the status register of the v r 4400 function as an instruction trace support (its) bit (bit 24) and low power mode * (rp) bit (bit 27) with the v r 4300. * the low power mode is supported only in the 100 mhz model of the v r 4300 and the v r 4305. fix the rp bit of the 133 mhz model of the v r 4300 and the v r 4310 to 0. user? manual u10504ej7v0um00 631 differences from v r 4400 the ch bit of the v r 4300 can be written only by software. with the v r 4400, however, this bit is set or cleared by hardware when a secondary cache instruction is executed. the ce and de bits of the status register of the v r 4300 are used to manipulate the parity and do not affect the operation. for details, refer to 6.3.5 status register (12) . b.1.4 config register the config register of the v r 4300 only supports part of the bit functions of the config register of the v r 4400. for details, refer to 5.4.6 config register (16) . b.1.5 status of fcr31 on occurrence of unimplemented operation exception if the floating-point unimplemented operation exception occurs with the v r 4400, the cause bits of the fcr31 for the floating-point operation exception other than the unimplemented operation exception bit (e) are undefined. the exception handler for the unimplemented operation should ignore the cause bits other than the e bit. the v r 4300 is more strictly defined. if the unimplemented operation exception occurs, the cause bits of the other floating-point operation exceptions are not set. b.1.6 integer zero division if an integer is divided by zero, the result is undefined with mips isa (instruction set architecture). this illegal operation returns the following values to the registers of the v r 4300 and v r 4400. processor dividend lo register hi register v r 4400 3 0 0xffff ffff dividend < 0 0x0000 0001 dividend v r 4300 3 0 0x7fff ffff dividend < 0 0x8000 0001 dividend appendix b 632 user? manual u10504ej7v0um00 b.1.7 cache parity error exception because the v r 4300 does not check data by using a cache parity, a parity error exception does not occur. table b-1 differences in software * 100 mhz model of the v r 4300 and the v r 4305 only product name function v r 4300 v r 4400 cache instruction secondary cache not supported supported parity none provided status register bit 27 low power mode * 0 bit 24 instruction trace support 0 ce and de bits do not affect processor operation used for parity config register only part of bit functions supported all supported unimplemented operation exception cause bits other than e bit cleared cause bits other than e bit undefined integer zero division value returned to register differs cache error exception does not occur always normal operation user? manual u10504ej7v0um00 633 differences from v r 4400 b.2 differences in system design next, the differences in system design between the v r 4300 and v r 4400 are described. table b-2 shows these differences. b.2.1 initialization of processor with the v r 4400, many modes must be set on boot. setting mode of the v r 4300 is more simple. this is because the v r 4300 sets mode not by software but by using external pins. the reset signal of the v r 4300 may be active or inactive during cold reset. however, do not change the value of this signal during reset sequence. at soft reset, assert the reset signal of the v r 4300 active for the duration of 16masterclock or longer. with the v r 4400, the reset signal must be asserted active for the duration of at least 64masterclock cycles. b.2.2 system interface the sysad bus of the v r 4400 is 64 bits wide, but the v r 4300 has a 32-bit sysad bus without a parity check function. multi-processing function and secondary cache control function the v r 4300 uses the same sysad bus protocol as the v r 4400. but because the v r 4300 does not support a multi-processing function and a secondary cache control function, its external bus is provided with only part of the sysad bus specifications. the operations related to the multi-processing function and secondary cache that are defined for the v r 4400 are undefined with the v r 4300. line size of cache the line size of the cache of the v r 4300 is as follows. instruction cache : 8 words (32 bytes) data cache : 4 words (16 bytes) appendix b 634 user? manual u10504ej7v0um00 data transfer rate the v r 4400 has nine data rates (d, ddx, ddxx, dxdx, ddxxx, ddxxxx, dxxdxx, ddxxxxx, and dxxxdxxx). the v r 4300 has two data rates (d and dxx). these data rates are selected by using the ep bit of the config register. the v r 4400 requires at least 4 cycles as processor request cycles. consequently, if successive single read request are made, or if write requests and read requests are made successively, two idle cycles are inserted in between two requests, like ?dxxad? if write or read are performed successively in the fastest mode (data rate: d) of the v r 4300, however, no idle cycle is needed between write/read cycles, like ?dad? when data is input from an external device, the v r 4300 can support any data transfer via the sysad bus. the v r 4300 can input data at a data rate of ?ddddddd? but cannot input a data stream exceeding 8 words (32 bytes). tclock and rclock the v r 4400 has two tclock pins. the v r 4300 has only one tclock pin to reduce the power consumption. the v r 4400 has rclock as the reception clock of the external agent, but the v r 4300 does not have rclock because it transfers or receives data by using tclock. effect of rp bit with the v r 4400, sclock and tclock are not affected by the rp bit. the v r 4300, in contrast, can reduce the clock frequencies of sclock and tclock to the 1/4 of the normal level by using the rp bit * . to use this function, if there is an external circuit (such as a dram refresh counter) that is affected by changes in the frequency of the clock supplied by the v r 4300 to external devices, incorporate a process that supports frequency conversion of the external circuit into the software. * 100 mhz model of the v r 4300 and the v r 4305 only user?s manual u10504ej7v0um00 635 differences from v r 4400 table b-2 differences in system design * 100 mhz model of the v r 4300 and the v r 4305 only product name function v r 4300 v r 4400 initialization of processor set by external pins set by software system interface bus width 32 64 data check not performed parity/ecc selectable multi-processing and secondary cache not supported supported line size of cache instruction: 8 words data: 4 words 4/8 words selectable for both instruction/data cache data rate 2 types 9 types tclock 1 2 rclock none 2 effect of rp bit reduces frequencies of tclock and sclock to 1/4 * does not affect tclock and sclock appendix b 636 user? manual u10504ej7v0um00 b.3 other differences in addition to the above differences, the v r 4300 and v r 4400 differ in the following points. the differences described in this section are summarized in table b-3. b.3.1 cache size the specifications of the primary cache of the v r 4000, v r 4400, and v r 4300 are shown in the following table. to initialize or invalidate, or program each routine of flash, keep in mind the differences in cache size. b.3.2 tlb tlb entry the v r 4300 has a full-associate tlb with 32 entries. each entry is mapped to the even/odd page of a page frame number. the tlb of the v r 4400 is the same as that of the v r 4300 in structure, but has 48 entries. interaction between imt and tlb manipulations the operation of the v r 4400 is undefined when the tlb instruction accesses jtlb during the instruction tlb miss (imt) stall, and consequently, the tlb invalid exception may occur. this exception is likely to occur especially when an entry different from the one that has caused the instruction tlb miss is accessed by software for read/write manipulation (tlbwi, tlbwr, or tlbr). this does not apply to the v r 4300. product name item v r 4300 v r 4000 v r 4400 cache capacity instruction 16 kb 8 kb 16 kb data 8 kb 8 kb 16 kb line size instruction: 8 words (32 bytes) data: 4 words (16 bytes) 4/8 words selectable method direct map, virtual index user? manual u10504ej7v0um00 637 differences from v r 4400 b.3.3 floating-point unit floating-point data path the floating-point operation of the v r 4300 is executed by using the main pipeline and data path of the integer operation unit. while a multicycle instruction of floating-point operation is executed, therefore, the pipeline of integer operation stalls. the v r 4400 has a dedicated floating-point data path in addition to an integer data path. therefore, if a program with the floating-point operation instruction and integer operation instruction optimized for the v r 4400 is executed with the v r 4300, not much effect can be expected. instruction execution time the v r 4300 completely executes any multicycle instruction that has caused a source exception (exception of the source operand of an instruction) in one cycle. instead, it issues the default result to the cycle according to the trap enable flag, or notifies occurrence of a trap exception in the next cycle. in addition, calculation such as 0 x 0 can be executed with the fewer cycles than the ordinary calculation. the v r 4400 always executes each multicycle instruction with the same number of cycles, regardless of whether or not an exception occurs. cvt. [s,d] .i instruction when converting a 64-bit integer into a single- or double-precision floating-point number, the v r 4400 generates a floating-point unimplemented operation exception unless all the bits 63 through 52 of the integer are 0 or 1. the v r 4300 generates the floating-point unimplemented operation exception unless all the bits 63 through 55 of a 64-bit integer are 0 or 1. b.3.4 pipeline the v r 4400 uses an 8-stage super pipeline. the v r 4300 uses a 5-stage pipeline like that of the v r 3000. the pipeline of the v r 4300 is not a super pipeline, but is not different from the super pipeline in terms of functions. however, if the program is optimized, the performance of the pipeline may be influenced. the number of stall cycles that are generated by the v r 4300 is fewer than that of the v r 4400. appendix b 638 user? manual u10504ej7v0um00 b.3.5 interrupt the bit 15 of the cause register of the v r 4300 is dedicated to the timer interrupt that occurs if the value of the counter register coincides with the value of the compare register. therefore, the v r 4300 is not provided with the int5 pin that is provided to the v r 4400. because the v r 4300 does not have bit 5 in the interrupt register * , it does not operate even if data is written to the interrupt register via the system interface. with the v r 4400, the user can select whether to use the timer interrupt, or the bit 5 of the interrupt register, by using the bit 15 of the cause register. * this register cannot be directly written by the user via software. b.3.6 kernel physical address segment configuration the v r 4300 supports two algorithms (uncached and non-coherent) to maintain the coherency of the cache. while the v r 4400 supports a 36-bit physical address space, the v r 4300 supports a 32-bit physical address space. these two points affect the virtual address mapping of the kernel physical address space segment (xkphys) that does not use the tlb. both the v r 4400 and v r 4300 has eight address spaces in this segment, but the size of each area in these spaces is different between the v r 4400 and v r 4300. each area in the address spaces of the v r 4400 is 64 gb, while that of the v r 4300 is 4 gb. b.3.7 jtag the v r 4300 conforms to ieee149.1-1990. consequently, the jtdo signal becomes active in the shift ir and shift dr modes. because the v r 4400 conforms to the previous version of the ieee149.1, the jtdo signal is not driven. user? manual u10504ej7v0um00 639 differences from v r 4400 table b-3 other differences * 100 mhz model of the v r 4300 and the v r 4305 only product name item v r 4300 v r 4000 v r 4400 instruction cache size 16 kb 8 kb 16 kb data cache size 8 kb 8 kb 16 kb tlb tlb size 32 entries 48 entries interaction between imt and tlb manipulations tlb operation is corrected tlb invalid exception occurs floating-point operation data path shared with integer operation pipeline processed by dedicated pipeline instruction execution time all multi-cycle instructions are executed in 1 cycle when source exception occurs. each multi-cycle instruction is executed in the same number of cycles regardless of whether exception occurs. cvt.[s, d].i instruction (checking of floating-point unimplemented operation exception) all bits 63 to 55 are 1 or 0 all bits 63 to 52 are 1 or 0 effect of rp bit reduces operating frequency to 1/4 * does not affect operating frequency pipeline 5 stages basic pipeline 8 stages super pipeline interrupt cause register (bit 15) dedicated to timer interrupt selectable by user interrupt register (bit 5) none kernel physical address segment configuration (xkphys) physical address space supported 32 bits 36 bits valid address space 85 jtag jtdo active in shift ir and shift dr modes jtdo not driven in shift ir and shift dr modes 640 user? manual u10504ej7v0um00 [memo] user? manual u10504ej7v0um00 641 differences from v r 4200 c the v r 4300 is slightly different from the v r 4200 in terms of system design and software. this appendix describes the differences between the v r 4300 and v r 4200. the major differences are that the v r 4300 employs a new 32-bit system interface and deletes the data check function by parity. appendix c 642 user? manual u10504ej7v0um00 c.1 differences in software the logical differences in software are the specifications of the cp0 registers. these differences are shown in table c-1. c.1.1 cache parity because the v r 4300 does not check the cache data by using a parity, the cache error register (27) always outputs 0, and writing this register is ignored. the parity error register (26) can be used for only self-diagnosis and cannot be used to manipulate the cache. c.1.2 status register the bit specifications of the status registers are slightly different between the v r 4300 and v r 4200. the ce and de bits of the status register of the v r 4300 are used to manipulate the parity and do not affect the operation. c.1.3 config register the bit specifications slightly differ. the be bit and ep area of the v r 4200 set information on the external pins bigendian and datarate by hardware on reset which can be read by software. with the v r 4300, the default values are set to the be bit and ep area at the time of cold reset. the default value of the ep area is 0000 and that of the be bit is 1. after that, the values of these area and bit can be changed by software. bits 18 and 19 which are 00 with the v r 4200 are 01 with the v r 4300. for details, refer to 5.4.6 config register (16) . user? manual u10504ej7v0um00 643 differences from v r 4200 c.1.4 cache parity error exception because the v r 4300 does not check data by using the cache parity, it does not generate the parity error exception. the v r 4200 generates the cache parity error exception (dcpe) in the wb stage. table c-1 differences in software product name function v r 4300 v r 4200 cache parity not supported supported status register ce and de bits do not function used to manipulate parity config register be bit and ep area set default values set information on external pins bits 18 and 19 01 00 appendix c 644 user? manual u10504ej7v0um00 c.2 differences in system design next, the differences in system design between the v r 4300 and v r 4200 are described. table c-2 shows these differences. c.2.1 system interface the system interface of the v r 4200 is a 64-bit bus with a parity check function, but that of the v r 4300 is a 32-bit bus without a parity check function. for details, refer to chapter 12 system interface . during block write of an instruction, the v r 4200 executes doubleword data transfer four times with one idle cycle. the v r 4300 executes word data transfer eight times to write the main memory. during block write of data, the v r 4200 executes doubleword data transfer two times. the v r 4300 executes word data transfer four times to write the main memory. the v r 4200 has two data rates, ?dx?and ?xx? the v r 4300 also has two data rates, ??and ?xx? the v r 4200 can set a data rate by using the datarate pin. the data rate of the v r 4300 is set by software, by using the ep area of the config register. the table below shows the transfer data patterns in the ep area. c.2.2 clock the v r 4300 does not output the masterout and rclock signals. the frequency of the pipeline clock ( pclock ) of the v r 4400 and v r 4200 is usually two times faster than masterclock . the v r 4300 can change the frequency ratio by using the value of divmode(1:0) *1 pins. (refer to table 2-2 clock/control interface signals .) the frequency ratio pclock : masterclock can be selected from 2:1, 3:1, 4:1 or 3:2 *2 . the v r 4200 usually generates sclock and tclock by dividing pclock by 2. the pclock of the v r 4300 is usually at the same frequency as masterclock . in the low power mode *3 , the speeds of pclock , sclock , and tclock of the v r 4300 can be reduced to the 1/4 of the normal level like the v r 4200. * 1. in v r 4300 and v r 4305. in v r 4310, divmode(2:0). ep area transfer pattern 0000 d 0110 dxxdxx user? manual u10504ej7v0um00 645 differences from v r 4200 * 2. in v r 4300. in v r 4305, the frequency ratio can be set to 1:1, 2:1, or 3:1. in v r 4310, it can be set to 2:1, 3:1 4:1, 5:1, 6:1, or 5:2. * 3. 100 mhz model of the v r 4300 and the v r 4305 only c.2.3 package the v r 4200 employs a 208-pin plastic qfp. the v r 4300 is housed in a 120-pin plastic qfp. table c-2 differences in system design c.3 other differences in addition to the above differences, the v r 4300 and v r 4200 differ in the following points. the differences described in this section are summarized in table c-3. c.3.1 physical address the physical address and address space of the v r 4200 are 33 bits wide, and those of the v r 4300 are 32 bits wide. consequently, the tag of the cache and the page frame number area of the tlb entry are 20 bits each at hi and lo sides. product name function v r 4300 v r 4200 system interface sysad bus no parity, 32 bits with parity, 64 bits instruction block write word data, 8 times doubleword data, 4 times data block write word data, 4 times doubleword data, 2 times data pattern set by config register (d, dxx) set by external pins (ddx, dxx) clock masterout, rclock not output output pclock frequency ratio to masterclock variable frequency two times higher than normal masterclock tclock same frequency as normal materclock pclock divided by two package 120-pin plastic qfp 208-pin plastic qfp appendix c 646 user? manual u10504ej7v0um00 c.3.2 write buffer the write buffer of the v r 4200 is a doubleword buffer with two entries. the v r 4300 has a 4-entry word buffer to improve the performance during uncache write. c.3.3 reset the v r 4200 simultaneously asserts the coldreset and reset signals active. these signals of the v r 4300 need not to be asserted active at the same time. the reset signal of the v r 4300 may be active or inactive during cold reset. however, do not change the value of this signal during reset sequence.the coldreset signal of the v r 4300 needs not to be synchronized with the masterclock signal. c.3.4 status(3:0) pins the status(3:0) pins provided to the v r 4200 are not provided to the v r 4300. with the v r 4300, when the its bit of the status register is set, an instruction cache miss occurs when a branch instruction is executed, and the branch destination address is output to sysad(31:0) . however, because the v r 4300 does not have status(3:0) pins, the internal status of the processor cannot be output. table c-3 other differences product name function v r 4300 v r 4200 physical address 32 bits 33 bits write buffer 4-entry word buffer 2-entry doubleword buffer coldreset signal and masterclock need not to be synchronized must be synchronized status (3:0) pins not provided provided user? manual u10504ej7v0um00 647 restrictions of v r 4300 d appendix d 648 user? manual u10504ej7v0um00 an unimplemented operation exception will occur in response to the execution of a type conversion instruction of the floating-point operation instruction in the following cases. if an over?w occurs during conversion to integer format if the source operand is an in?ite number if the source operand is nan the type conversion instructions affected by this restriction are as follows. ceil.l.fmt fd, fs floor.l.fmt fd, fs ceil.w.fmt fd, fs floor.w.fmt fd, fs cvt.d.fmt fd, fs round.l.fmt fd, fs cvt.l.fmt fd, fs round.w.fmt fd, fs cvt.s.fmt fd, fs trunc.l.fmt fd, fs cvt.w.fmt fd, fs trunc.w.fmt fd, fs user? manual u10504ej7v0um00 649 index e appendix e 650 user? manual u10504ej7v0um00 a address cycle ... 292 address error exception ... 186 address translation ... 125, 126 addressing ... 41 b badvaddr register ... 164 basic system clock ... 259 bev ... 256 block read request ... 289 block write request ... 289 bootstrap exception vector (bev) ... 256 boundary scan ... 342 boundary scan register ... 346 branch address ... 78 branch delay ... 94 branch instruction ... 77, 369 breakpoint exception ... 192 bus error exception ... 190 bus mastership ... 313, 328 bypass ... 119 bypass register ... 345 c cache error register ... 178 cache instruction ... 112, 305 cache line ... 275, 283 cache line replacement ... 280, 282 cache memory ... 273 cache operation ... 279 cache state transition ... 283 cache states ... 283 cause register ... 171 clock generator ... 35 clock interface ... 257 clock-to-q delay ... 258 cmos discrete device ... 269 code compatibility ... 119 cold reset ... 248, 250 cold reset exception ... 183 command ... 328 compare instruction ... 227 compare register ... 165 computational instruction ... 68, 226 config register ... 151 context register ... 163 control/status register ... 211 convert instruction ... 224 cop ... 112 coprocessor 0 (cp0) ... 35 coprocessor instruction ... 83, 369 coprocessor unusable exception ... 193 count register ... 164 cp0 ... 35 cp0i ... 113 cp0 bypass interlock ... 113 cp0 register ... 146 cpu instruction ... 370 cpu instruction set ... 39, 59, 363 cpu register ... 37 d data cache ... 36, 277, 283 data cache addressing ... 278 data cache busy ... 111 data cache miss ... 111 data cache read request ... 290 data cycle ... 292 data format ... 41 data identifier ... 333, 337 data load miss ... 281 data store miss ... 281 dcb ... 111 dcm ... 111 user? manual u10504ej7v0um00 651 index defining access types ... 62 discarding command ... 325 divide-by-zero exception ... 241 e endianness ... 331 entryhi register ... 148 entrylo register ... 148 epc register ... 174 error epc register ... 179 exception ... 103, 106, 180 exception processing ... 159, 200, 237 exception processing register ... 161 exception program counter register ... 174 exception vector location ... 180 execution time ... 230 execution unit ... 35 external agent ... 268 external arbitration ... 297, 313 external normal interrupt ... 353 external request ... 294, 298, 302, 306, 312 external write request ... 303, 316 f fcr ... 211 fcr0 ... 216 fcr31 ... 211 fetch miss ... 304 fgr ... 208 fixed-point format ... 220 flag ... 238 floating-point computational instruction ... 226, 555 floating-point control register ... 211 floating-point exception ... 235 floating-point format ... 217 floating-point general purpose register ... 208 floating-point load instruction ... 221, 553 floating-point register ... 210, 255 floating-point store instruction ... 221, 553 floating-point transfer instruction ... 221 floating-point unit ... 47, 207 flow control ... 311, 330 fpr ... 210 fpu branch instruction ... 229 fpu instruction ... 221, 558 fpu instruction set ... 547 g gate array ... 266 h handshake signal ... 295 hardware interrupt ... 356 hazard of cp0 ... 162 i icb ... 108 ie ... 256 ieee754 exception ... 244 implementation/revision register ... 216 independent transfer ... 331 index register ... 146 inexact exception ... 240 initialization interface ... 247 instruction address ... 36 instruction cache ... 35, 276, 283 instruction cache addressing ... 278 instruction cache busy ... 108 instruction cache read request ... 289 instruction-dependent exception ... 115 instruction format ... 60 instruction-independent exception ... 114 instruction micro-tlb ... 49 instruction pipeline ... 49 appendix e 652 user? manual u10504ej7v0um00 instruction register ... 344 instruction tlb miss ... 107 instruction trace support ... 168, 256 integer overflow exception ... 196 interface bus ... 291 interlock ... 103, 106 internal cache ... 47 interrupt ... 351 interrupt enable (ie) ... 168, 256 interrupt exception ... 199 interrupt request signal ... 354 invalid operation exception ... 240 inverting endian ... 170 issue cycle ... 293 itlb ... 49 itm ... 107 j joint tlb ... 48 jtag ... 341 jtlb ... 48 jump instruction ... 77, 369 k kernel address space ... 169 kernel extended addressing mode ... 255 kernel mode ... 133 l ldi ... 110 lladdr register ... 154 load delay ... 95 load delay slot ... 61 load instruction ... 61, 367, 553 load interlock ... 110 load miss ... 304 low power mode ... 254, 264, 360 m master state ... 296 masterclock ... 259, 263 mci ... 109 memory hierarchy ... 274 memory management system ... 48, 121 multicycle instruction interlock ... 109 n nan ... 218 nmi ... 352 nmi exception ... 185 non-maskable interrupt (nmi) ... 352 normal power mode ... 254, 360 number of delay cycles ... 233 o opcode bit encoding ... 544, 613 operating mode ... 49, 127, 169 operation during no branch ... 78 overflow exception ... 242 p pagemask register ... 148, 149 parity error register ... 178 pclock ... 259 phase-locked loop (pll) ... 263 phase-locked system ... 265, 266 physical address ... 123, 289 pin configuration (top view) ... 52 pin function ... 51, 54 pipeline ... 36, 89 pipeline exception ... 114 pll ... 263 pll passive element ... 615 power-on reset ... 248, 249 privilege mode ... 255 user? manual u10504ej7v0um00 653 index processor read request ... 301, 306 processor request ... 293, 298, 306 processor revision identifier register ... 151 processor write request ... 301, 309 power mode ... 254 power off mode ... 255, 361 precision of exception ... 161 priority (exception) ... 182 priority (exception and interlock) ... 116 prid register ... 151 r random register ... 147 read command ... 327 read request ... 334 read response ... 303, 313, 317, 330 re-executing command ... 325 release latency time ... 332 request control ... 300, 302 request issuance ... 300, 302 reserved instruction exception ... 194 reverse endianness ... 256 s saving and returning ... 244 sclock ... 260, 263 sequential ordering ... 339 slave state ... 298 soft reset ... 248, 251 soft reset exception ... 184 software interrupt ... 354 special instruction ... 81 subblock ordering ... 339 supervisor address space ... 169 supervisor extended addressing mode ... 255 supervisor mode ... 129 status register ... 165 status on reset ... 170 store delay slot ... 61 store instruction ... 61, 367, 553 store miss ... 304 successive processing of request ... 321 syncin/cyncout ... 259 system call exception ... 191 system control coprocessor (cp0) ... 44, 142 system control coprocessor (cp0) instruction ... 86, 370 system event ... 299 system interface ... 35, 289, 296 system interface address ... 339 system interface cycle time ... 332 system timing parameter ... 263 t taghi register ... 154 taglo register ... 154 tap ... 347 tap controller ... 348 tclock ... 260 test access port ... 347 timer interrupt ... 354 tlb ... 48, 122 tlb entry ... 143 tlb exception ... 187 tlb invalid exception ... 188 tlb instruction ... 158 tlb miss ... 158 tlb miss exception ... 187 tlb modification exception ... 189 translation lookaside buffer ... 48, 122 transmission time ... 268 trap exception ... 195 appendix e 654 user? manual u10504ej7v0um00 u uncached area ... 305 uncompelled change to slave state ... 298 underflow exception ... 242 unimplemented operation exception ... 243 user address space ... 169 user extended addressing mode ... 255 user mode ... 127 v virtual address ... 124 virtual address translation ... 155 w watch exception ... 198 watchhi register ... 175 watchlo register ... 175 wired register ... 150 write buffer ... 120 write command ... 325 write request ... 330, 336 x xcontext register ... 176 although nec has taken all possible steps to ensure that the documentation supplied to our customers is complete, bug free and up-to-date, we readily accept that errors may occur. despite all the care and precautions we've taken, you may encounter problems in the documentation. please complete this form whenever you'd like to report errors or suggest improvements to us. hong kong, philippines, oceania nec electronics hong kong ltd. fax: +852-2886-9022/9044 korea nec electronics hong kong ltd. seoul branch fax: 02-528-4411 taiwan nec electronics taiwan ltd. fax: 02-2719-5951 address north america nec electronics inc. corporate communications dept. fax: 1-800-729-9288 1-408-588-6130 europe nec electronics (europe) gmbh technical documentation dept. fax: +49-211-6503-274 south america nec do brasil s.a. fax: +55-11-6465-6829 asian nations except philippines nec electronics singapore pte. ltd. fax: +65-250-3583 japan nec semiconductor technical hotline fax: 044-435-9608 i would like to report the following error/make the following suggestion: document title: document number: page number: thank you for your kind support. if possible, please fax the referenced page or drawing. excellent good acceptable poor document rating clarity technical accuracy organization cs 00.6 name company from: tel. fax facsimile message |
Price & Availability of UPD30200
![]() |
|
|
All Rights Reserved © IC-ON-LINE 2003 - 2022 |
[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy] |
Mirror Sites : [www.datasheet.hk]
[www.maxim4u.com] [www.ic-on-line.cn]
[www.ic-on-line.com] [www.ic-on-line.net]
[www.alldatasheet.com.cn]
[www.gdcy.com]
[www.gdcy.net] |