Part Number Hot Search : 
1120E E440A 0M04R 09A00 2SA1327 N5359 1N4084E3 G121SN1
Product Description
Full Text Search
 

To Download BENCHMARKING Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
  BENCHMARKING the am186em using the dhrystone v2.1 as an example don gille sr. field application engineer advanced micro devices inc.
2 _______________________________________________________________________ table of contents _______________________________________________________________________ overview _______________________________________________________________4 the evaluation kit_______________________________________________________6 the development board_______________________________________________________________6 the serial uart port ________________________________________________________________6 the memory system__________________________________________________________________6 optional user wiring_________________________________________________________________6 emon monitor _____________________________________________________________________6 quick start _____________________________________________________________7 the tool set ____________________________________________________________8 requirements _______________________________________________________________________8 paradigm pdrem remote server _______________________________________________________8 paradigm pdrt186 debugger ________________________________________________________8 paradigm locate_____________________________________________________________________8 borland c++ 4.5_____________________________________________________________________8 the 186em timing unit __________________________________________________9 capabilities_________________________________________________________________________9 addressing _________________________________________________________________________9 programming ______________________________________________________________________10 dhrystone source code ______________________________________________________________14 profile of dhrystone _________________________________________________________________25 installation of pdrem __________________________________________________27 description ________________________________________________________________________27 the pdrem development process _____________________________________________________27 configuration file __________________________________________________________________27 source file editing__________________________________________________________________28 downloading to emon ______________________________________________________________31 running a benchmark __________________________________________________35 the benchmark development process___________________________________________________35 instrumenting the code ______________________________________________________________36 editing configuration files ___________________________________________________________39 compilation _______________________________________________________________________40 running paradigms locate___________________________________________________________48 running paradigms debugger ________________________________________________________48 optimization_______________________________________________________________________54 available software ______________________________________________________57 file name listing___________________________________________________________________57 bbs acces s________________________________________________________________________57 other information ______________________________________________________57 data sheet ________________________________________________________________________57 other application notes______________________________________________________________57 ordering board level products ________________________________________________________58 books on the subject ________________________________________________________________58
3 appendix a ____________________________________________________________59 understanding the dhrystone benchmark________________________________________________59 appendix b ____________________________________________________________65 summary of processors ______________________________________________________________65 index _______________________________________________________________87
4 overview this application note is intended to demonstrate the processing power of the am186em microcontroller. by using a well-known and comparable benchmark, the dhrystone, the evaluator can quickly determine the capability of the processor. the dhrystone v2.1 benchmark may be replaced by another program more like the intended application or ... the application itself. the evaluation kit is discussed first. here the major functions and features are reviewed. a small amount of work is required to make the development platform ready to provide accurate timing measurements. for those who wish to get a quick start, instructions are given to get the development board running pre-compiled dhrystone in just a few minutes. the tool set consists of a c++ compiler from borland and a linker/locator from paradigm. these state-of-the-art software systems provide the underlayment for the development of benchmarks and embedded programs. an overview is offered of the timer/counter channels on the am186em. covered are the capabilities, architecture and programming of each channel. to facilitate the development process, the sd186em board is programmed with pdrem. this flash based software allows source level debugging, easy loading and execution of benchmarks. the programmer is shown how to program pdrem into the flash memory on the sd186em board. after successfully running the dhrystone, the programmer is guided to the point where he/she may run their own program. the step-by-step process of using the files developed for the dhrystone on another program is explained. supplied files are covered as to their use and origin. this section also explains which files are expected from other sources. other information sources are covered in the last section.
5 appendix a) the dhrystone benchmark is completely documented to help explain the BENCHMARKING procedure. from this example the programmer can extract the necessary information to use all the support files for the dhrystone on his/her own program. appendix b) a summary of processors is included to complete the dhrystone picture. this listing contains various processors and their performance on the dhrystone benchmark. compiler flags and options are listed after the performance ranking, as well as, the site or person running the benchmark.
6 the evaluation kit the development board the sd186em development board is a full function platform for code development, architectural evaluation and BENCHMARKING. the board includes an am186em processor (40mhz), memory, an rs-232 communications port, eight leds and a standardized bus connection. the eight leds are cr2 through cr10. cr2 is optionally used to signal program start and cr3 is used to indicate program stop. any led can be turned off or on, using the functions provided, to indicate other program conditions. the serial uart port the am186em has an internal uart which is used to communicate with the board. this port can be polled or interrupt driven at user selectable speeds from 300 to 115,200 baud. as shipped, the emon monitor software operates the serial port at 19,200 baud. when the board is programmed with pdrem the port operates at 115,200 baud. the memory system the sd186em is supplied with 128k x 16 of 70ns amd flash and 128k x 16 of 70ns sram. this memory configuration allows the sd186em platform to run full speed, making it suitable for performance BENCHMARKING. the am186em has programmable wait-state registers as a standard feature. the programmer wishing to use the platform for the purpose of BENCHMARKING can initialize these registers before invoking the code under test. wait-state selection inserts predictable memory delays allowing performance evaluation of proposed targets with less expensive memories. optional user wiring the evaluation kit, as supplied, does not connect timers 1 and 0 together. for short benchmarks with run times less than 40 seconds, no connection is necessary. for much longer run times, solder a jumper between pin 6 and pin 8 on row b of j3 (the larger connector site at the opposite end of the board from the rs-232 connector). this connects the timer 1 output to the timer 0 input. emon monitor as delivered, the sd186em development board powers up running the emon monitor. emon provides both debugging capability and user interface. because the user interface is transmitted from the target board, only a simple modem communication or terminal emulation program needs to be run on the host.
7 quick start to get the sd186em up and running in the shortest amount of time, do the following: 1) install all the required tools. for more information see the chapter titled the tool set. - paradigm pdrem - paradigm locate - paradigm pdrt186 debugger - borland c++ v4.5 2) edit changes in the correct files as covered in the chapter installation of pdrem then compile, link and locate the file. 3) program the new pdrem created above into the amd flash memory on the sd186em board following the instructions found in the same chapter under the heading of downloading to emon. 4) make sure the port your board is connected to is the same port you called out in the installation of pdrt186. if not edit pdrt186.ini file and correct the port entry. 5) run the debugger as shown in the index under the heading running paradigms debugger. after following these instructions, you will have developed a working software remote server, successfully programmed it into amd flash memory, downloaded the dhrystone benchmark and run the dhrystone with results printed on your screen. you are now ready to develop your own benchmark for the fastest 186 processor on earth!
8 the tool set requirements it is expected that you own the tool set: - borland c++ v4.5 (optionally microsoft c v8.0 or visual c++ v2.0) - paradigm locate v5.0, pdrt186 v4.0, pdrem if you are using older versions of these tools, you must replace each file from paradigm and borland with the equivalent from your version. note: be sure to read all the supplied readme files from each paradigm pdrem remote server the sd186em flash memory should be programmed with paradigms remote debug server pdrem. this rom based support code assumes complete control of the sd186em target board which allows the pc based debugger to operate the target over a serial link. pdrem is controlled by the pdrt186 debugger running on the host computer. actions like single step, stop, run and read memory are supported by pdrem through the source level debugger. paradigm pdrt186 debugger this debugger is based on the borland turbo debugger with extensions for the am186em processor. this debugger provides an easily understood user interface with specialized windows which make all the am186em internal registers visible. paradigm locate this tool is also required for real mode code development. functioning as a second linker/locator, locate takes an ".exe" file and resolves jump targets to produce an executable file for an embedded design. to produce a program which will run on the sd186em, the programmer must prepare a configuration file for use by the locator. this file tells the locator where to place each section (code/data) in memory in the embedded system. borland c++ 4.5 the borland environment, under windows, gives the programmer an environment to develop functional dos/windows programs and then easily move them to the am186em embedded environment. borland compilers give very good compile time performance, as well as, good optimization of code. borlands tool kit along with the paradigm locator, allows development of more complex c++ programs. this application recommends that the borland tool set be run from the command line in dos. this was done to allow make files to be easily used.
9 the 186em timing unit capabilities the am186em timer has exactly the same capabilities as the standard am80186. these features include: 3 channels, 2 external pairs of inputs and outputs, programmable duty cycle and external or internal clocking. the am186em timers have also been enhanced to allow clocking at the extremely high speed the cpu is capable of, 40mhz! the timers will be used to measure the benchmarks run in this application note. note: if your benchmark run time is under 15 us, place a loop around the code and run through it a number of times. addressing the am186em uses a programmable address register to fix the upper 8 bits of the internal register array. it is assumed that this register is left in the default or reset state (ffh). the registers which control each timer are addressed by the address register as the upper 8 bits and the offset into the am186ems register array as the lower eight bits of the address. the following c code is an example which could be placed in a header file and included in any code using the timer unit. the supplied programs use the header file "am186em.h" for all addressing. // example header file for timer addressing
10 #define t0cnt 0ff50h // timer 0 count register #define t1cnt 0ff58h // timer 1 count register #define t2cnt 0ff60h // timer 2 count register #define t0mode 0ff56h // timer 0 mode register #define t1mode 0ff5eh // timer 1 mode register #define t2mode 0ff66h // timer 2 mode register #define t0maxa 0ff52h // timer 0 primary maximum count compare register #define t1maxa 0ff5ah // timer 1 primary maximum count compare register #define t2maxa 0ff62h // timer 2 primary maximum count compare register #define t1maxb 0ff54h // timer 1 secondary maximum count compare register #define t2maxb 0ff5ch // timer 2 secondary maximum count compare register programming operation of the timer unit is as simple as setting the correct registers. the code in this section will connect timer 2 to timer 1 which in-turn drives timer 0. the clock source for timer 2 is the pre-scaled (divide by 4) cpu clock or 10mhz. timer 2 divides this by 65535 and clocks timer 1. timer 1 divides this by 65535 and clocks timer 0. this long chain allows benchmarks to run for a number of hours. this code should be added to, and called by, your benchmark code. it was coded in assembly language to make sure it was as small and fast as possible. no call overhead is incurred because the compiler will inline the functions containing the assembly language. also, compiler optimizations do not affect assembly coded routines. the supplied programs use the 'c' language as the frame work while inserting assembly code into the program to do the actual work. most 'c' compilers allow mixed language but their syntax may differ. the timer operation code is in a file called timer.c and the header code is in a file called timer.h. /* timer.h timer counter function prototypes and structure type defs */ #ifndef timer_h #define timer_h
11 // definitions #define maxcnt1a 0xffff //maximum count for register a of timer 1 #define maxcnt1b 0xffff #define maxcnt0a 0xffff #define maxcnt0b 0xffff #define maxcnt2a 0xffff // type definitions typedef struct { unsigned int t2; //fastest changing count unsigned int t1; unsigned int t0; //slowest count } t_time; // function prototypes void timer_init(void); void timer_start(void); void timer_stop(void); void read_time(t_time *); float elapsed_time(float cpu_clock,t_time * end); #endif //timer_h /* timer.c timer functions for BENCHMARKING */ #pragma inline #include "asmrules.h" #include "am186em.h" #include "timer.h" #define timer_c /* initialize the timers */ void timer_init(void){ asm mov dx,t2mode //point to mode register asm mov ax,0100000000000001b //prescale mode cpu/4 as input asm out dx,ax asm mov dx,t1mode //point to mode register asm mov ax,1100000000001001b //t2 internal select as source (tie input pin low) asm out dx,ax
12 asm mov dx,t0mode //point to mode register asm mov ax,1100000000000101b //t1 as input (wire t1 out to t0 in) asm out dx,ax // timer 2 stopped, timers 1 and 0 enabled waiting on t2 to start // initialize count and max registers // zero the current count registers asm xor ax,ax // 0 for storage asm mov dx,t2cnt asm out dx,ax asm mov dx,t1cnt asm out dx,ax asm mov dx,t0cnt asm out dx,ax // setup maximum count registers asm mov ax,maxcnt2a asm mov dx,t2maxa asm out dx,ax asm mov ax,maxcnt1a asm mov dx,t1maxa asm out dx,ax asm mov ax,maxcnt0a asm mov dx,t0maxa asm out dx,ax asm mov ax,maxcnt1b asm mov dx,t1maxb asm out dx,ax asm mov ax,maxcnt0b asm mov dx,t0maxb asm out dx,ax // setup pios for t1 -> t0 // set p11 to input (tmrin0) and set p1 to output as tmrout1 asm mov dx,pio0en asm in ax,dx asm mov bx,1111011111111101b asm and ax,bx asm out dx,ax } void timer_start(void){ asm mov dx,t2mode asm mov ax,1100000000000001b // start up counter 2 asm out dx,ax } void timer_stop(void){ asm mov dx,t2mode
13 asm mov ax,0100000000000001b asm out dx,ax } void read_time(t_time * emtime){ asm les_ di,emtime //get pointer asm mov dx,t2cnt //point to count register asm in ax,dx asm mov es_ [di],ax //first member // use all three counters in a serial chain asm mov dx,t1cnt asm in ax,dx asm mov es_ [di+2],ax //second member asm mov dx,t0cnt asm in ax,dx asm mov es_ [di+4],ax //third member #endif } #ifdef et /* elapsed_time calculate the time in seconds from a t-type structure as a floating point number of seconds assumes timer initialized to 0 before start. */ float elapsed_time(float cpu_clock,t_time * end){ float t2clx,t1clx,t2time; float t0clx,t1time,t0time; t2clx = cpu_clock/4.0; // calculate clock to each timer stage t1clx = t2clx/(float)maxcnt2a; t0clx = t1clx/(float)maxcnt1a; t2time = (1/t2clx) * (float)end->t2; // in rev e timers chained inline t1time = (1/t1clx) * (float)end->t1; t0time = (1/t0clx) * (float)end->t0; return(t2time+t1time+t0time); } #endif
14 dhrystone source code /* modified timer start/stop for amd demo */ /* **************************************************************************** * * "dhrystone" benchmark program * ----------------------------- * * version: c, version 2.1 * * file: dhry_1.c (part 2 of 3) * * date: may 25, 1988 * * author: reinhold p. weicker * **************************************************************************** */ #include "dhry.h" // amd includes #include "timer.h" #include "eleds.h" /* global variables: */ rec_pointer ptr_glob, next_ptr_glob; int int_glob; boolean bool_glob; char ch_1_glob, ch_2_glob; #if 0 int arr_1_glob[50]; int arr_2_glob[50][50]; #else unsigned int arr_1_glob[50]; unsigned int arr_2_glob[50][50]; #endif boolean reg = def_reg; /* variables for time measurement: */ /* end of variables for time measurement */ void main() /*****/ /* main program, corresponds to procedures */ /* main and proc_0 in the ada version */
15 { one_fifty int_1_loc; reg one_fifty int_2_loc; one_fifty int_3_loc; reg char ch_index; enumeration enum_loc; str_30 str_1_loc; str_30 str_2_loc; #if 0 // reg int run_index; // reg int number_of_runs; #else // reg unsigned int run_index; // reg unsigned int number_of_runs; #endif long number_of_runs,run_index; // added by amd #ifdef am186em // variables used t_time ttest_time; float loop_time,total_time,bench_time; long amd; //initialization ledinit(); //initialize outputs for am186em, amd nowait(); // set wait states to 0 for sram timer_init(); //setup timer and zero #endif /* initializations */ next_ptr_glob = (rec_pointer) malloc(sizeof(rec_type)); ptr_glob = (rec_pointer) malloc(sizeof(rec_type)); ptr_glob->ptr_comp = next_ptr_glob; ptr_glob->discr = ident_1; ptr_glob->variant.var_1.enum_comp = ident_3; ptr_glob->variant.var_1.int_comp = 40; strcpy(ptr_glob->variant.var_1.str_comp, "dhrystone program, some string"); strcpy(str_1_loc, "dhrystone program, 1'st string"); arr_2_glob[8][7] = 10; /* was missing in published program. without this statement, */ /* arr_2_glob [8][7] would have an undefined value. */ /* warning: with 16-bit processors and number_of_runs > 32000, */ /* overflow may occur for this array element. */ printf("\ndhrystone benchmark, version 2.1 (language: c)\n"); if (reg) { printf("program compiled with 'register' attribute\n"); } else { printf("program compiled without 'register' attribute\n"); }
16 number_of_runs = 6000; //set to your liking printf("\nexecution starts, %u runs through dhrystone\n", number_of_runs); /***************/ /* start timer */ /***************/ #ifdef am186em //test loop time amd = 0; timer_start(); // could also use a pragma here to disable optimization for (run_index = 1; run_index <= number_of_runs; ++run_index){ amd = amd +1; // keeps optimizer at bay } timer_stop(); amd = amd+2; // keeps optimizer at bay read_time(&ttest_time); //get counters into loop_time struct loop_time = elapsed_time(40.0e6,&ttest_time); timer_init(); //reset the timer //now start the program cr2_on(); // signal start of program timer_start(); //start time chain #endif /***************/ for (run_index = 1; run_index <= number_of_runs; ++run_index) { proc_5(); proc_4(); /* ch_1_glob == 'a', ch_2_glob == 'b', bool_glob == true */ int_1_loc = 2; int_2_loc = 3; strcpy(str_2_loc, "dhrystone program, 2'nd string"); enum_loc = ident_2; bool_glob = !func_2(str_1_loc, str_2_loc); /* bool_glob == 1 */ while (int_1_loc < int_2_loc) { /* loop body executed once */ int_3_loc = 5 * int_1_loc - int_2_loc; /* int_3_loc == 7 */ proc_7(int_1_loc, int_2_loc, &int_3_loc); /* int_3_loc == 7 */ int_1_loc += 1; } /* while */ /* int_1_loc == 3, int_2_loc == 3, int_3_loc == 7 */ proc_8(arr_1_glob, arr_2_glob, int_1_loc, int_3_loc); /* int_glob == 5 */ proc_1(ptr_glob); for (ch_index = 'a'; ch_index <= ch_2_glob; ++ch_index) /* loop body executed twice */ {
17 if (enum_loc == func_1(ch_index, 'c')) /* then, not executed */ { proc_6(ident_1, &enum_loc); strcpy(str_2_loc, "dhrystone program, 3'rd string"); int_2_loc = run_index; int_glob = run_index; } } /* int_1_loc == 3, int_2_loc == 3, int_3_loc == 7 */ int_2_loc = int_2_loc * int_1_loc; int_1_loc = int_2_loc / int_3_loc; int_2_loc = 7 * (int_2_loc - int_3_loc) - int_1_loc; /* int_1_loc == 1, int_2_loc == 13, int_3_loc == 7 */ proc_2(&int_1_loc); /* int_1_loc == 5 */ } /* loop "for run_index" */ /**************/ /* stop timer */ /**************/ #ifdef am186em timer_stop(); //stop timer cr3_on(); // signal program stop read_time(&ttest_time); total_time = elapsed_time(40.0e6,&ttest_time); #endif /**************/ /****************/ /* calculations */ /****************/ #ifdef am186em bench_time = total_time-loop_time; #endif /****************/ printf("execution ends\n\n"); #ifdef details // must be defined to print all this stuff printf("final values of the variables used in the benchmark:\n\n"); printf("int_glob: %d\n", int_glob); printf(" should be: %d\n", 5); printf("bool_glob: %d\n", bool_glob); printf(" should be: %d\n", 1); printf("ch_1_glob: %c\n", ch_1_glob); printf(" should be: %c\n", 'a'); printf("ch_2_glob: %c\n", ch_2_glob); printf(" should be: %c\n", 'b'); printf("arr_1_glob[8]: %d\n", arr_1_glob[8]); printf(" should be: %d\n", 7); printf("arr_2_glob[8][7]: %d\n", arr_2_glob[8][7]);
18 printf(" should be: number_of_runs + 10\n"); printf("ptr_glob->\n"); printf(" ptr_comp: %d\n", (int) ptr_glob->ptr_comp); printf(" should be: (implementation-dependent)\n"); printf(" discr: %d\n", ptr_glob->discr); printf(" should be: %d\n", 0); printf(" enum_comp: %d\n", ptr_glob->variant.var_1.enum_comp); printf(" should be: %d\n", 2); printf(" int_comp: %d\n", ptr_glob->variant.var_1.int_comp); printf(" should be: %d\n", 17); printf(" str_comp: %s\n", ptr_glob->variant.var_1.str_comp); printf(" should be: dhrystone program, some string\n"); printf("next_ptr_glob->\n"); printf(" ptr_comp: %d\n", (int) next_ptr_glob->ptr_comp); printf(" should be: (implementation-dependent), same as above\n"); printf(" discr: %d\n", next_ptr_glob->discr); printf(" should be: %d\n", 0); printf(" enum_comp: %d\n", next_ptr_glob->variant.var_1.enum_comp); printf(" should be: %d\n", 1); printf(" int_comp: %d\n", next_ptr_glob->variant.var_1.int_comp); printf(" should be: %d\n", 18); printf(" str_comp: %s\n", next_ptr_glob->variant.var_1.str_comp); printf(" should be: dhrystone program, some string\n"); printf("int_1_loc: %d\n", int_1_loc); printf(" should be: %d\n", 5); printf("int_2_loc: %d\n", int_2_loc); printf(" should be: %d\n", 13); printf("int_3_loc: %d\n", int_3_loc); printf(" should be: %d\n", 7); printf("enum_loc: %d\n", enum_loc); printf(" should be: %d\n", 1); printf("str_1_loc: %s\n", str_1_loc); printf(" should be: dhrystone program, 1'st string\n"); printf("str_2_loc: %s\n", str_2_loc); printf(" should be: dhrystone program, 2'nd string\n\n"); #endif printf("dhrystone time = %f\n",bench_time); printf("dhrystones / second = %8.0f\n",bench_time/(float)number_of_runs); printf("dhrystone mips = %4.1f\n",((float)number_of_runs/bench_time)/1757.0); exit(0); } void proc_1(ptr_val_par) /******************/ reg rec_pointer ptr_val_par; /* executed once */ { reg rec_pointer next_record = ptr_val_par->ptr_comp; /* == ptr_glob_next */
19 /* local variable, initialized with ptr_val_par->ptr_comp, */ /* corresponds to "rename" in ada, "with" in pascal */ structassign(*ptr_val_par->ptr_comp, *ptr_glob); ptr_val_par->variant.var_1.int_comp = 5; next_record->variant.var_1.int_comp = ptr_val_par->variant.var_1.int_comp; next_record->ptr_comp = ptr_val_par->ptr_comp; proc_3(&next_record->ptr_comp); /* * ptr_val_par->ptr_comp->ptr_comp == ptr_glob->ptr_comp */ if (next_record->discr == ident_1) /* then, executed */ { next_record->variant.var_1.int_comp = 6; proc_6(ptr_val_par->variant.var_1.enum_comp, &next_record->variant.var_1.enum_comp); next_record->ptr_comp = ptr_glob->ptr_comp; proc_7(next_record->variant.var_1.int_comp, 10, &next_record->variant.var_1.int_comp); } else /* not executed */ structassign(*ptr_val_par, *ptr_val_par->ptr_comp); } /* proc_1 */ void proc_2(int_par_ref) /******************/ /* executed once */ /* *int_par_ref == 1, becomes 4 */ one_fifty *int_par_ref; { one_fifty int_loc; enumeration enum_loc; int_loc = *int_par_ref + 10; do /* executed once */ if (ch_1_glob == 'a') /* then, executed */ { int_loc -= 1; *int_par_ref = int_loc - int_glob; enum_loc = ident_1; } /* if */ while (enum_loc != ident_1);/* true */ } /* proc_2 */ void proc_3(ptr_ref_par) /******************/ /* executed once */ /* ptr_ref_par becomes ptr_glob */ rec_pointer *ptr_ref_par;
20 { if (ptr_glob != null) /* then, executed */ *ptr_ref_par = ptr_glob->ptr_comp; proc_7(10, int_glob, &ptr_glob->variant.var_1.int_comp); } /* proc_3 */ void proc_4() { /* without parameters */ /*******/ /* executed once */ boolean bool_loc; bool_loc = ch_1_glob == 'a'; bool_glob = bool_loc | bool_glob; ch_2_glob = 'b'; } /* proc_4 */ void proc_5() { /* without parameters */ /*******/ /* executed once */ ch_1_glob = 'a'; bool_glob = false; } /* proc_5 */ /* procedure for the assignment of structures, */ /* if the c compiler doesn't support this feature */ #ifdef nostructassign memcpy(d, s, l) register char *d; register char *s; register int l; { while (l--) *d++ = *s++; } #endif /* **************************************************************************** * * "dhrystone" benchmark program * ----------------------------- * * version: c, version 2.1 * * file: dhry_2.c (part 3 of 3) * * date: may 25, 1988 * * author: reinhold p. weicker
21 * **************************************************************************** */ #ifndef reg #define reg /* reg becomes defined as empty */ /* i.e. no register variables */ #ifdef _am29k #undef reg #define reg register /* define reg; saves room on 127-char ms-dos cmd line */ #endif #endif #include "dhry.h" extern int int_glob; extern char ch_1_glob; void proc_6(enum_val_par, enum_ref_par) /*********************************/ /* executed once */ /* enum_val_par == ident_3, enum_ref_par becomes ident_2 */ enumeration enum_val_par; enumeration *enum_ref_par; { *enum_ref_par = enum_val_par; if (!func_3(enum_val_par)) /* then, not executed */ *enum_ref_par = ident_4; switch (enum_val_par) { case ident_1: *enum_ref_par = ident_1; break; case ident_2: if (int_glob > 100) /* then */ *enum_ref_par = ident_1; else *enum_ref_par = ident_4; break; case ident_3: /* executed */ *enum_ref_par = ident_2; break; case ident_4: break; case ident_5: *enum_ref_par = ident_3; break; } /* switch */ } /* proc_6 */ void proc_7(int_1_par_val, int_2_par_val, int_par_ref)
22 /**********************************************/ /* executed three times */ /* first call: int_1_par_val == 2, int_2_par_val == 3, */ /* int_par_ref becomes 7 */ /* second call: int_1_par_val == 10, int_2_par_val == 5, */ /* int_par_ref becomes 17 */ /* third call: int_1_par_val == 6, int_2_par_val == 10, */ /* int_par_ref becomes 18 */ one_fifty int_1_par_val; one_fifty int_2_par_val; one_fifty *int_par_ref; { one_fifty int_loc; int_loc = int_1_par_val + 2; *int_par_ref = int_2_par_val + int_loc; } /* proc_7 */ void proc_8(arr_1_par_ref, arr_2_par_ref, int_1_par_val, int_2_par_val) /*********************************************************************/ /* executed once */ /* int_par_val_1 == 3 */ /* int_par_val_2 == 7 */ arr_1_dim arr_1_par_ref; arr_2_dim arr_2_par_ref; int int_1_par_val; int int_2_par_val; { reg one_fifty int_index; reg one_fifty int_loc; int_loc = int_1_par_val + 5; arr_1_par_ref[int_loc] = int_2_par_val; arr_1_par_ref[int_loc + 1] = arr_1_par_ref[int_loc]; arr_1_par_ref[int_loc + 30] = int_loc; for (int_index = int_loc; int_index <= int_loc + 1; ++int_index) arr_2_par_ref[int_loc][int_index] = int_loc; arr_2_par_ref[int_loc][int_loc - 1] += 1; arr_2_par_ref[int_loc + 20][int_loc] = arr_1_par_ref[int_loc]; int_glob = 5; } /* proc_8 */ enumeration func_1(ch_1_par_val, ch_2_par_val) /*************************************************/ /* executed three times */ /* first call: ch_1_par_val == 'h', ch_2_par_val == 'r' */ /* second call: ch_1_par_val == 'a', ch_2_par_val == 'c' */ /* third call: ch_1_par_val == 'b', ch_2_par_val == 'c' */ capital_letter ch_1_par_val; capital_letter ch_2_par_val; { capital_letter ch_1_loc; capital_letter ch_2_loc;
23 ch_1_loc = ch_1_par_val; ch_2_loc = ch_1_loc; if (ch_2_loc != ch_2_par_val) /* then, executed */ return (ident_1); else { /* not executed */ ch_1_glob = ch_1_loc; return (ident_2); } } /* func_1 */ boolean func_2(str_1_par_ref, str_2_par_ref) /*************************************************/ /* executed once */ /* str_1_par_ref == "dhrystone program, 1'st string" */ /* str_2_par_ref == "dhrystone program, 2'nd string" */ str_30 str_1_par_ref; str_30 str_2_par_ref; { reg one_thirty int_loc; capital_letter ch_loc; int_loc = 2; while (int_loc <= 2) /* loop body executed once */ if (func_1(str_1_par_ref[int_loc], str_2_par_ref[int_loc + 1]) == ident_1) /* then, executed */ { ch_loc = 'a'; int_loc += 1; } /* if, while */ if (ch_loc >= 'w' && ch_loc < 'z') /* then, not executed */ int_loc = 7; if (ch_loc == 'r') /* then, not executed */ return (true); else { /* executed */ if (strcmp(str_1_par_ref, str_2_par_ref) > 0) /* then, not executed */ { int_loc += 7; int_glob = int_loc; return (true); } else /* executed */ return (false); } /* if ch_loc */ } /* func_2 */ boolean func_3(enum_par_val) /***************************/ /* executed once */
24 /* enum_par_val == ident_3 */ enumeration enum_par_val; { enumeration enum_loc; enum_loc = enum_par_val; if (enum_loc == ident_3) /* then, executed */ return (true); else /* not executed */ return (false); } /* func_3 */
25 profile of dhrystone the borland profiler tool tracks the percentage of time spent in each function of the program under test. using dhrystone as an example, the profilers output gives a good idea where time is spent in the benchmark. if you are looking for areas of improvement, the profiler tool can help you focus your efforts for maximum return. to use the profiler, you must compile with borland c++ under windows so that it can be run under windows. to do this select the option for easy win from the new project menu. this option allows you to run non-windows programs under the windows operating system.
26 below is the output of the borland c++ 4.5 profiler. the length of the bars (built from asterisks) determines the amount of cpu time consumed by the function. turbo profiler for windows version 4.5 tue mar 28 19:44:07 1995 program: c:\e86\prog\dhry21\dry21.exe execution profile total time: 6.9416 sec % of total: 98 % run: 1 of 1 filter: all show: time sort: frequency _main 3.1417 sec 45% |********************************************** _proc_1 1.4323 sec 20% |******************** _proc_6 0.5511 sec 8% |******** _func_2 0.4962 sec 7% |******* _proc_3 0.3859 sec 5% |***** _proc_7 0.2761 sec 4% |**** _func_1 0.2759 sec 4% |**** _proc_2 0.0553 sec <1% | _proc_8 0.0552 sec <1% | _func_3 0.0552 sec <1% | _proc_5 0.0552 sec <1% | _proc_4 0.0552 sec <1% |
27 installation of pdrem description pdrem as it is called is the remote rom based software which controls the operation of the sd186em development board. such actions as single step, break and dump registers are directly controlled by pdrem. this remote server communicates to the debugger running on the pc through the rs-232 port, com 1 or com 2. communication is carried out using binary packets which limits traffic on the serial cable making the debugger faster. as supplied, the sd186em board does not have pdrem installed. to use pdrem on the sd186em requires that the software be compiled, linked/located then programmed into amd flash memory. the information below details the changes needed to accomplish this task. be sure to look over all the readme files on the installation disk. the pdrem development process it is assumed that the paradigm software was installed for the am186em. 1) edit the configuration file as shown below 2) edit the source files as shown below 3) use make to re-compile, re-link and re-locate pdrem 4) bring up emon using the windows terminal package, or another modem communications package and download the hex file you just made. more instructions are included in the sd186em kit. 5) cycle the power on your sd186em board. after about a 15 second delay you should see the leds step one at a time toward the power connector. you are now ready to run paradigm debug pdrt186 configuration file this configuration file directs locate to place pdrem below emon and to intercept the cpu as it jumps out of emon. please see the sd186em kit documentation for details. please make sure your configuration matches the one provided, especially the bolded lines. this file is found in the pdrem directory built by the installation of the paradigm tools. file: pdrem.cfg // // paradigm locate configuration file for building the pdrem kernel. // copyright (c) 1994 paradigm systems. all rights reserved. // // this is an example configuration for the amd am186em. //
28 // please ensure that the example values shown are appropriate for // your target hardware. // // hexfile intel86 listfile segments publics by address // create a segment map [publics] cputype am186em // select the processor type initcode noreset // reset vector [reset] // chip select initializations emon initcode umcs = 0xe004 \ // 128k rom, 0 wait states lmcs = 0x9fbc \ // 128k ram, 0 wait states mpcs = 0x80bf \ mmcs = 0x11f8 \ pacs = 0x007f dup data romdata // make a copy of the initialized data class code = 0xf600 // modify to set the eprom address [fe00] class data = 0x0040 // and the ram address class ??locate = 0xf7ff // chip select initialization code [fff0] // want class ??boot = 0xf7ff to startup with emon boot jump vector (new) order data \ // fix the class ordering of dgroup const \ bss \ bssend \ stack order code \ // code, initialized data in eprom romdata \ endromdata output code \ // write to the output file ??locate \ romdata endromdata // sd186em below flash = 0xe0000 source file editing to prepare your pdrem port for use on the sd186em you must edit the following source files. these files are found in the pdrem directory built by the installation of the paradigm tools. file: dcomms.c pay close attention to bolded information, editing it into your file if not already there. you may set your baud rate as high as 115,200 baud. also selection of clock speed needs to be done (the sd186em runs at 40mhz). the am186em uart will be run in interrupt mode.
29 /* // paradigm pdremote/rom and tdrem remote target system interface // // amd am186em serial driver (polled/ interrupt mode ) // // copyright (c) 1994 paradigm systems. all rights reserved. // // this module provides support for the internal serial port on the // amd am186em microprocessor. it can be operated in either polled // or interrupt mode (by defining the symbol commint in target.h). // // p1 does not require null modem cable. // // select desired baud rate and crystal frequency below. // // // // functions possibly requiring customization are // comminit hardware initialization (sio/pic) // commsendchar write a character // commrcvchar read a character // commrcvint receive interrupt handler // commgotpacket tests if a complete packet has been received */ /* select xtal frequency and desired baud rate */ #define clk 40000000ul /* x1/x2 crystal input frequency */ #define baud 115200ul /* desired baud rate */ #define iob186em 0xff00 /* location of pcb */ #include /* enable/disable functions */ #include "typedefs.h" /* common definitions and prototypes */ #include "target.h" /* target system specific information */ #include "helpers.h" /* pdremote/rom helper functions */ #include "am186em.h" /* am186em definitions */ ... file: target.h edit bold changes into your file. /* // paradigm systems pdremote/rom interface // copyright (c) 1993, 1994 paradigm systems. all rights reserved. // // this file defines the target system dependent portions of the // paradigm paradigm debug remote interface. */ #if !defined(_target) #define _target #if 0 /* disabled: include if pdremote/rom ui is used */ #include "kernel.h" /* application interface extensions */
30 #endif #if 1 /*0 = no, 1=yes define if interrupt-driven serial i/o */ #define commint 0x14 /* change to desired interrupt vector */ #endif ... file: custom.mac bolded paths should point to your tools. # # macros customized for use with borland c++ 4.5 and tasm # these must be customized to match your compiler/assembler/linker # asm = tasm cc = bcc link = tlink libs = include = c:\bc45\include model = s # please do not change this!!! cflags = -c -m$(model) -i$(include) -z -od -f- -w -dbcpp40 -1- aflags = /mx lflags = lterm = ... file: pdrem.cfg edit this file to match bolded data. // // paradigm locate configuration file for building the pdremote/rom kernel. // copyright (c) 1993 paradigm systems. all rights reserved. // hexfile intel86 // intel hex output listfile segments // create a segment map map 0x00000 to 0x003ff as reserved // interrupt vector table map 0x00400 to 0x00fff as rdwr // pdremote/rom data area map 0x01000 to 0xdffff as reserved // reserved for applications map 0xe0000 to 0xfffff as rdonly // pdremote/rom kernel eprom cputype am186em // select the processor type initcode inbyte 0x1000 // inbyte is here to create ??locate class /* chip select inititalization not necessary (flash utility sets up // chip selects). however, there is a memory limitation on how many // initcode parameters can be used at address 0xf7ff0 (will run into // flash downloader). since the flash downloader will jump to address // 0xf7ff0, the initcode is used here to serve as a jump to pdremote's // code. the class, ??locate, will contain the jump automatically. // // umcs = 0xfe38 \ // lmcs = 0x0ff8 \ // mpcs = 0x80bf \ // pacs = 0x003f \ */ you must make this change to enable printing to your screen in the debugger! these are commented out, emon has already set these up before jumping to pdrem.
31 dup data romdata // make a copy of the initialized data class code = 0xe000 // modify to set the eprom address class data = 0x0040 // and the ram address class ??locate = 0xf7ff // initialization code order data \ // fix the class ordering of dgroup const \ bss bssend \ stack order code \ // code, initialized data in eprom romdata endromdata output code \ // write to the output file ??locate \ romdata endromdata downloading to emon the sd186em development board is supplied with documentation on emon. in this information you will find instructions on sending the pdrem.hex file to the target. it is recommended that you use the windows terminal package as the communications program. supplied on the disk or the bbs files is a file emon.trm. this file configures the windows terminal package to communicate with emon and download the pdrem hex file. click on the windows terminal icon and the terminal emulator should open on your screen. click on open and select emon.trm. the target board should be plugged into com2.
32 cycle the power on the board, the leds should bounce back and forth once, press cntl-break on your keyboard to signal emon to begin operation. once communication is established, the emon: prompt will be on your screen. if you type h for help the following screen should appear. you must set up the windows terminal package to transfer the hex file to the target board. click on settings then on text transfers to set up the handshaking protocol.
33 setting a delay allows the am186em to write into memory with a comfortable margin of time to be ready for the next character sent. set your text transfers box to look like this: once the terminal package is set up, type a p command at the emon prompt. this tells the monitor to receive a hex file and program that file into flash memory. once you hit the enter key your screen should look like this.
34 now that emon is ready, transmit the hex file to the target using the windows terminal emulator. to do this click on transfers and in the pull down menu select send text file as shown below. next you will be asked for a file name. to send the pdremote.hex file click in the directories box until your in the pdrem directory then select pdrem.hex from the file name list. the file is first sent to the target, then emon will print on your screen that the file was received and that programming had started. you will see the leds flashing randomly as the programming operation starts. when the led activity stops, programming should be complete. emon will indicate that the programming process is complete on your screen.
35 running a benchmark the benchmark development process follow these step to prepare your program for BENCHMARKING: 1) copy the amd supplied files to your working directory. - am186em.h - asmrules.h - efast.c - support.h -timer.c - timer.h - eleds.c - makefile - dhry21.cfg - dhry21.mkf - dhry21.rt - dhry.h - dhry_1.c - dhry_2.c - eleds.h - paradigm.mkf 2) copy these files from the "locate\bcpp45\helpers"(1), pdrem\demo\console(2) and "locate\bcpp45\examples\fpdemo"(3) directory to your working directory: - bcpp45.asm (1) - bcppsio.c (2) - bcpp45.inc (1) - console.c (2) - bcppdmm.c (1) - console.h (2) - bcppflt.asm (3) - dosemu.c (3) - bcppflt.inc (3) - dosemu.h (3) -bcpprtl.asm (3) - fardata.asm (3) - startup.inc (3) - fardata.cfg (3) - typedefs.h (1) - v8250.h (2) 3) instrument the your main program. 4) rename dhry21.rt to .rt and edit the file as shown. 5) edit the file named "makefile" as shown. 6) edit "paradigm.mkf" as shown. 7) to invoke the tools simply type "make" at the dos command line c:\make. 8) load and run the code using paradigm pdrt186.
36 as a final note, use of file i/o and the timers themselves by the benchmark is not allowed. the current development tools do not support disk file transactions such as fopen() or fgetc(). the timers are off-limits because they are being used to time the benchmark. instrumenting the code as can be seen in the dhry_1.c source code, all that is needed is to follow this simple template. the amd supplied files include a directory called bench. in this directory is a set of files which allow you to simply edit your benchmark source code into an existing file bench.c (edit out what is already there). after you copy the needed paradigm and borland files into this directory you should be able to type make at the dos prompt to have a completed benchmark named bench.axe. in your include section: #include timer.h // required for operation of the counter timers #include eleds.h // optional, allows you to see start and stop of your program in your main() section: t_time ttest_time; // structure to hold the contents of the counter timer registers float loop_time,total_time,bench_time; // times stored in seconds in your initialization code section (inside main()): ledinit(); // initialize the led outputs nowait(); // set the am186em to 0 wait states timer_init(); // initialize the counter timers and set all counts to 0 just before you start into the timed code: /***************/ /* start timer */ /***************/ #ifdef am186em // optional, conditional compilation, if your benchmark requires multiple passes, you must insert this code. the time consumed in the loop overhead must be subtracted out of the total running time (as done here). if you do not have an outer loop like this omit this section of the code.
37 //test loop time // could use a #pragma here to turn off the optimizer amd = 0; // any variable will do here (unused one please) timer_start(); // init the timers and set to 0 // now do as many loops as the timed program will run. if only 1 pass skip this // section of code. for (run_index = 1; run_index <= number_of_runs; ++run_index){ amd = amd +1; // keeps optimizer at bay, it will try to take out code } timer_stop(); // stop the timer amd = amd+2; // keeps optimizer at bay read_time(&ttest_time); //get counters into ttest_time struct loop_time = elapsed_time(40.0,&ttest_time); // produce a floating point // number of seconds of execution timer_init(); //reset the timer back to 0 //now start the program cr2_on(); // signal start of program turn on the led timer_start(); //start timer again #endif // optional, use if you used #ifdef /***************/ at the end of the program being timed: /**************/ /* stop timer */ /**************/ #ifdef am186em timer_stop(); //stop timer cr3_on(); // signal program stop read_time(&ttest_time); // read the contents of the timers into a structure total_time = elapsed_time(40.0,&ttest_time); // calculate number of secs. #endif /**************/ /****************/ /* calculations */ /****************/ #ifdef am186em ***************** your program *********************
38 // subtracts the loop overhead from the total time in the program. bench_time is // the correct benchmark time. bench_time = total_time-loop_time; #endif /****************/
39 editing configuration files file: dhry21.rt rename this file to .rt and edit as shown configures paradigm locator for programs which are debugged in paradigm debug. // // paradigm locate configuration file for debugging the borland c++ // application using paradigm debug/rt. // #include "fardata.cfg" // access the far data definitions absfile axe86 // output for paradigm debug/rt listfile segments // generate a segment map map 0x01000 to 0x0ffff as rdwr // system ram area (60kb ram) map 0x00000 to 0x00fff as reserved // pdremote and interrupt vector table map 0x10000 to 0x3ffff as reserved // no access allowed map 0x40000 to 0x7ffff as rdonly // simulated eprom area (256kb ram) map 0x80000 to 0xfffff as reserved // no access allowed cputype am186em dup data romdata // make a copy of initialized data class code = 0x0500 // loading at address class data = 0x0100 // data address order data \ // ram class organization bss bssend \ stack farheap \ _fardataclasses order code \ // eprom class organization initdata exitdata \ romdata endromdata \ _romfardataclasses output code \ // classes in the output file(s) initdata exitdata \ romdata endromdata \ _romfardataclasses as your code and data grow these may need to be changed to reflect the new size of both the code and data sections of your program. locate will tell you when they overlap.
40 compilation compilation requires command line operation in dos. (if you wish, the entire process can be done in the windows ide.) to invoke the process just type make on the dos command line. make will read a file in the current directory called makefile. this file is used to prepare the benchmark for execution. of note is the selection of optimization level (none). the optimizations called out on this line will have a great impact on performance and some times on code correctness. always consult the compiler manual to understand the effect of the specified optimization. you may wish to see if code performance improves with other selections (recommended as an exercise). file: makefile # # sample makefile for building the floating point arithmetic demonstration # using the borland c++ compiler. # # some of the macros used in this makefile that can be customized are # # model selected memory model # cpu cpu instruction selection # float floating point library selection # exceptions exception handling selection # fardata enables use of class far_data, with optional compression # debug enable/disable debug information # optimize enable/disable optimization # warnings enable/disable compiler warning messages # compdir = c:\bc45 # compiler home directory bccfg = turboc.cfg # bc++ compiler configuration file mkf = makefile # build everything if the makefile is changed model = s # s, m, c, l, h cpu = 1 # 0 - 8086, 1 - 80186, 2 - 80286, 3 -80386 float = 2 # 0 - none, 2 - emulator, 3 - coprocessor optimize = 0 # 0 - none, 1 - size, 2 - speed exceptions = 0 # 0 - disabled, 1 - enabled fardata = 1 # 0 - none, 1 - normal, 2 - compressed # above must be 1 if float > 0 debug = 2 # 0 - none, 1 - debug eprom, 2 - debug pdremote warnings = 1 # 0 - none, 1 - all codestring = 0 # put string literals in code segment dupstring = 1 # duplicate string merged checkstack = 0 # check for stack overflow stack = 8192 # application stack size (in bytes) !include paradigm.mkf # compiler/linker customization options # # these are the implicit rules for building c, c++, and assembly # language source modules. # .autodepend # autodepedency checking .suffixes: .cpp .c .asm # rules when target's dependent is ambiguous you must make sure that the paths point to your tool set directories! cpu must be 80186, to use the timer float must be set to 2, floating point requires fardata so set this to 1. select your own optimization levels (per text). select debug = 2 so you can use pdrt186 watch your stack size!!!
41 .cpp.obj: bcc {$*.cpp } .c.obj: bcc {$*.c } .asm.obj: tasm $(aflags) $*.asm # # here we have the list of object files that will form the # application. startup is special - it is used to make sure that # the startup code is linked first to set the order and alignment of # segments in the application. # startup = bcpp45.obj fardata.obj objs = dosemu.obj dhry_1.obj dhry_2.obj console.obj bcppsio.obj bcpprtl.obj bcppflt.obj bcppdmm.obj efast.obj eleds.obj timer.obj # # the remainder of the make file is the targets and dependencies # dhry21 .$(out): dhry21 .rom dhry21 .$(cfg) fardata.cfg locate -c dhry21 .$(cfg) $(pflags) $* dhry21 .rom: tlink.cfg $(startup) $(fardataobj) $(objs) tlink @&&! $(startup) $(fardataobj) $(objs) $*.rom $*.map $(libs) ! dhry_1.obj: dhry_1.c dhry.h $(mkf) dhry_2.obj: dhry_2.c dhry.h $(mkf) timer.obj: timer.c timer.h am186em.h $(mkf) efast.obj: efast.c am186em.h $(mkf) eleds.obj: eleds.c am186em.h $(mkf) dosemu.obj: dosemu.c typedefs.h dosemu.h $(bccfg) $(mkf) bcppflt.obj: bcppflt.asm bcppflt.inc bcpp45.inc startup.inc $(mkf) bcppsio.obj: bcppsio.c bcpp45.inc startup.inc $(mkf) bcpprtl.obj: bcpprtl.asm bcpp45.inc startup.inc $(mkf) fardata.obj: fardata.asm bcpp45.inc startup.inc $(mkf) bcpp45.obj: bcpp45.asm bcpp45.inc startup.inc $(mkf) console.obj: console.c $(mkf) # # update the compiler/linker configuration files. this ensures that a # default bc++ configuration file is never used to build an embedded # application. # turboc.cfg: $(mkf) copy &&| $(cflags) $(optflags) edit to match your object file names edit these also to match your file names. amd support stuff paradigm support stuff
42 | turboc.cfg tlink.cfg: $(mkf) copy &&| $(lflags) | tlink.cfg -------------------------------------------------------------------------------------- file: paradigm.mkf # # borland c++ 4.5 compiler/linker command line option macros # copyright (c) 1994 paradigm systems. all rights reserved. # mkf = $(mkf) paradigm.mkf # include in dependancies # # libs list of libraries to be linked with the application # cflags compiler command line options # aflags assembler command line options # lflags linker command line options # pflags paradigm locate command line options # optflags compiler optimization options and other defines .... # libs = rc$(model).lib cflags = -c -m$(model) -i. -i$(compdir)\include aflags = /mx /d__$(model)__ /dstksize=$(stack) lflags = /c /lc:\bc45\lib pflags = optflags = -dam186em -det # # process the debug options. this step will optionally add debug # information to the compiler, assembler, and linker, plus select the # paradigm locate configuration file to build the application. # out = hex cfg = rm !if $(debug) > 0 aflags = $(aflags) /zd # cflags = $(cflags) -v -r- cflags = $(cflags) -v lflags = $(lflags) /v !if $(debug) == 2 aflags = $(aflags) /dpdremote cflags = $(cflags) -dpdremote out = axe cfg = rt !elif $(debug) > 2 !error invalid debug option selected keep these to enable the correct options
43 !endif !endif # # process the selected cpu option # !if $(cpu) == 0 cflags = $(cflags) -1- !elif $(cpu) == 1 cflags = $(cflags) -1 !elif $(cpu) == 2 cflags = $(cflags) -2 !elif $(cpu) == 3 cflags = $(cflags) -3 !else !error invalid cpu option selected !endif # # process the selected exception handling option # !if $(exceptions) == 0 cflags = $(cflags) -x- libs = noeh$(model).lib $(libs) !elif $(exceptions) == 1 cflags = $(cflags) -x aflags = $(aflags) /dexceptions !else !error invalid exception option selected !endif # # process the selected floating point option # !if $(float) == 0 cflags = $(cflags) -f- !else libs = remu.lib rmath$(model).lib $(libs) !if $(float) == 2 cflags = $(cflags) -f aflags = $(aflags) /dfloat=$(float) /e !elif $(float) == 3 cflags = $(cflags) -f87 aflags = $(aflags) /dfloat=$(float) /r !else !error invalid floating point option selected !endif !endif # # process the fardata option. if enabled, we include the module # fardata.obj in the object file list to copy or decompress class # far_data at run-time.
44 # !if $(fardata) > 0 fardataobj = fardata.obj pflags = $(pflags) -dhasfardata !if $(fardata) == 2 aflags = $(aflags) /dcompressed cflags = $(cflags) -dcompressed pflags = $(pflags) -dcompressed !elif $(fardata) > 2 !error invalid fardata option selected !endif !endif # # process the advanced compiler options. # !if $(codestring) == 1 cflags = $(cflags) -dc !endif !if $(dupstring) == 1 cflags = $(cflags) -d !endif !if $(checkstack) == 1 cflags = $(cflags) -n !endif !if $(warnings) == 0 cflags = $(cflags) -w- !elif $(warnings) == 1 cflags = $(cflags) -w !else !error invalid warnings option selected !endif !if $(optimize) == 0 optflags = $(optflags) -od !elif $(optimize) == 1 optflags = $(optflags) -o1 !elif $(optimize) == 2 optflags = $(optflags) -o2 !else !error invalid optimize option selected !endif the following files are created by the make process, and are shown for completeness. you need not edit these files. ---------------------------------------------------------------------------------------------------- file: turboc.cfg -c -ms -i. -ic:\bc45\include -v -dpdremote -1 -x- -f -d -w carefully select these options or edit and change them only after you have read this document and your compiler manual carefully! (you could use -g for speed)
45 -dam186em -drev_e -det -od ------------------------------------------------------------------------------------------------------ file: tlink.cfg /c /lc:\bc45\lib /v -------------------------------------------------------------------------------------------------- to make your own dhrystone executable file just type the following at the dos command line: c:\>make you should see the following information on your screen as the make process proceeds. if the order of compilation or assembly is different this does not indicate an error. ------------------------------------------------------------------------------------------------------ make version 3.7 copyright (c) 1987, 1994 borland international tasm /mx /d__s__ /dstksize=8192 /zd /dpdremote /dfloat=2 /e bcpp45.asm turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: bcpp45.asm assembling for the small memory model paradigm locate borland c++ 4.5 startup support building paradigm pdremote-compatible startup code error messages: none warning messages: none passes: 1 remaining memory: 222k tasm /mx /d__s__ /dstksize=8192 /zd /dpdremote /dfloat=2 /e fardata.asm turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: fardata.asm assembling for the small memory model error messages: none warning messages: none passes: 1 remaining memory: 228k bcc dosemu.c dhry_1.c dhry_2.c console.c bcppsio.c borland c++ 4.5 copyright (c) 1987, 1994 borland international dosemu.c: dhry_1.c: warning dhry_1.c 81: call to function 'nowait' with no prototype in function main warning dhry_1.c 88: call to function 'malloc' with no prototype in function main warning dhry_1.c 89: call to function 'malloc' with no prototype in function main warning dhry_1.c 96: call to function 'strcpy' with no prototype in function main warning dhry_1.c 97: call to function 'strcpy' with no prototype in function main warning dhry_1.c 113: constant is long in function main warning dhry_1.c 144: call to function 'proc_5' with no prototype in function main warning dhry_1.c 145: call to function 'proc_4' with no prototype in function main warning dhry_1.c 149: call to function 'strcpy' with no prototype in function main the warning messages below are present because the dhrystone benchmark is not fully ansi compliant. dhrystone source does not include prototype for functions which results in these warnings.
46 warning dhry_1.c 151: call to function 'func_2' with no prototype in function main warning dhry_1.c 156: call to function 'proc_7' with no prototype in function main warning dhry_1.c 161: call to function 'proc_8' with no prototype in function main warning dhry_1.c 163: call to function 'proc_1' with no prototype in function main warning dhry_1.c 167: call to function 'func_1' with no prototype in function main warning dhry_1.c 170: call to function 'proc_6' with no prototype in function main warning dhry_1.c 171: call to function 'strcpy' with no prototype in function main warning dhry_1.c 172: conversion may lose significant digits in function main warning dhry_1.c 173: conversion may lose significant digits in function main warning dhry_1.c 181: call to function 'proc_2' with no prototype in function main warning dhry_1.c 266: call to function 'exit' with no prototype in function main warning dhry_1.c 286: call to function 'proc_3' with no prototype in function proc_1 warning dhry_1.c 295: call to function 'proc_6' with no prototype in function proc_1 warning dhry_1.c 298: call to function 'proc_7' with no prototype in function proc_1 warning dhry_1.c 338: call to function 'proc_7' with no prototype in function proc_3 dhry_2.c: warning dhry_2.c 44: call to function 'func_3' with no prototype in function proc_6 warning dhry_2.c 155: call to function 'func_1' with no prototype in function func_2 warning dhry_2.c 168: call to function 'strcmp' with no prototype in function func_2 console.c: bcppsio.c: turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: bcppsio.asm error messages: none warning messages: none passes: 1 remaining memory: 213k tasm /mx /d__s__ /dstksize=8192 /zd /dpdremote /dfloat=2 /e bcpprtl.asm turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: bcpprtl.asm assembling for the small memory model building paradigm pdremote-compatible library support error messages: none warning messages: none passes: 1 remaining memory: 226k tasm /mx /d__s__ /dstksize=8192 /zd /dpdremote /dfloat=2 /e bcppflt.asm turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: bcppflt.asm assembling for the small memory model building borland c++ emulator floating point support error messages: none warning messages: none passes: 1 remaining memory: 219k bcc bcppdmm.c efast.c eleds.c timer.c borland c++ 4.5 copyright (c) 1987, 1994 borland international
47 bcppdmm.c: turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: bcppdmm.asm error messages: none warning messages: none passes: 1 remaining memory: 222k efast.c: eleds.c: timer.c: turbo assembler version 4.0 copyright (c) 1988, 1993 borland international assembling file: timer.asm error messages: none warning messages: none passes: 1 remaining memory: 217k tlink @make0000.@@@ turbo link version 7.00 copyright (c) 1987, 1994 borland international locate -cdhry21.rt -dhasfardata dhry21 paradigm locate - version 5.00a copyright (c) 1987-1994 paradigm systems. all rights reserved. warning w1032: segment '_text/code' is output to a read/write region warning w1032: segment 'e87_prog/code' is output to a read/write region warning w1032: segment 'emu_prog/code' is output to a read/write region warning w1032: segment '_init_/initdata' is output to a read/write region warning w1032: segment '_initend_/initdata' is output to a read/write region warning w1032: segment '_exit_/exitdata' is output to a read/write region warning w1032: segment '_exitend_/exitdata' is output to a read/write region warning w1032: segment '_rd/romdata' is output to a read/write region warning w1032: segment '_data/romdata' is output to a read/write region warning w1032: segment '_cvtseg/romdata' is output to a read/write region warning w1032: segment '_scnseg/romdata' is output to a read/write region warning w1032: segment '_erd/endromdata' is output to a read/write region warning w1032: segment '_bfd/romfardata' is output to a read/write region warning w1032: segment '_brfd/romfardata' is output to a read/write region warning w1032: segment '_erfd/endromfardata' is output to a read/write region the warnings from locate indicate that you are using the sram area for loading your code. since it is intended that the program run in sram these messages can be ignored.
48 running paradigms locate operation of this tool is automatic. the supplied make files automatically invoke locate to change the .exe file into a located image loadable by the debugger. in the process, you may see that locate warns you that some or all of your sections are in read/write areas of your target memory. this is completely normal in that your program is being loaded into the sram on the sd186em board. if locate warns you that sections or classes overlap, take this seriously! this message is telling you that you havent allowed enough space between code and data. to correct this problem edit your .rt file. the entries for class code and class data need to be changed so that the overlap messages go away. watch out that you do not push the program to load beyond the srams address range. running paradigms debugger invoke the debugger from the dos command line. use the file|open command to request the loading of your program into the sd186em target. most programs use printf() and scanf(). to support user input and output you must enable the debuggers stdio window which allows the target to communicate with you on your monitor and you to send information the target with your pc keyboard. to operate pdrt186 follow these steps: at the dos (or windows dos box) command enter the command: c:\>pdrt186 you should see the following screen after clicking the file menu item.
49 after clicking the open menu item, you will be asked for the file name. in the provided input line, enter the name of the program you wish to run. to run the dhrystone, copy the above information (your current directory must be where the amd supplied files are found). then click ok . remember the supplied amd files include a pre-compiled dhrystone program to test. once the code is loaded into the sd186em target (dhrystone shown here) you should see a screen like this showing your source code.
50 to enable printing to your screen and keyboard communications, select the view menu and click on paradigm . for programs that need keyboard input do the following. enabling dynamic mode will slow down execution of the program under test. next, click on the debug controls menu selection.
51 in paragims version of turbo debugger, they have added a number of features. of interest is enabling the dynamic option in the menu shown below. as can be seen when powered up the debugger has dynamic mode disabled. click on dynamic to make the changes needed to enable communications. enabling, the dynamic mode will allow the sd186em to bi-directionally communicate with the debugger. make the resulting dialog box, shown below, match yours, then click on ok .
52 once you have entered all the information correctly you should see that dynamic mode is enabled. to close the debug control box click on the square in the upper left hand side of the window frame. you must next open a remote console window. this is the area in which the sd186em communicates to you and you can send keyboard information to the board. simply click on the remote console menu item.
53 your screen should now have the remote console window on it. click on the arrow pointing upward on the frame of the remote console window. this will make the window enlarge to cover the full screen. run the benchmark by pressing the f9 key on the keyboard. you should see the following results: after running dhrystone try editing paradigm.mkf changing the -o2 to -g. also add -dreg=register to the optflags line. re-compile the program and run it again. did it get faster?
54 optimization as you benchmark your programs keep in mind that optimization can greatly improve the performance of the program as well as inflate code size, cause erratic behavior and even cause the program to run slower! take careful note of the compiler selections you use (-o1, -g, -o2 etc.) as they determine the optimizations the compiler will perform on your code. below are some simple explanations of each major optimization. be sure you know the effect of each before you attempt to call any timing optimal. conventional wisdom says there are three components to generating good code on the 80x86 processors; register allocation, register allocation and register allocation. global register allocation because memory references are so expensive on these processors, it is extremely important to minimize those references through the intelligent use of registers. global register allocation both increases the speed and decreases the size of your application. you should always use global register allocation when compiling your application with optimizations on. dead-code elimination although you may never intentionally write code to do things which are unnecessary, the optimizer may reveal possibilities to eliminate stores into variables that are not needed. common subexpression elimination common subexpression elimination is the process of finding duplicate expressions within the target scope and storing the calculated value of those expressions once so as to avoid recalculating the expression. although in theory this optimization could reduce code-size, in practice, it is a speed optimization and will only rarely result in size reductions. you should also use global common subexpression analysis if you like to reuse expressions rather than create explicit stack locations for them. loop invariant code motion moving invariant code out of loops is a speed optimization. the optimizer uses the information about all the expressions in the function gathered during common subexpression elimination to find expressions whose values do not change inside a loop. to prevent the calculation from being done many times inside the loop, the optimizer moves the code outside the loop so that it is calculated only once. the optimizer then reuses the calculated value inside the loop. you should use loop invariant code motion whenever you are compiling for speed and you have used global common subexpressions, since moving code out of loops can result in enormous speed gains.
55 copy propagation propagating copies is primarily speed optimization, but since it never increases the size of your code, it is safe to use if you have enabled - og . like loop invariant code motion, copy propagation relies on the analysis performed during common subexpression elimination. copy propagation means that the optimizer remembers the values assigned to expressions and uses those values instead of loading the value of the assigned expressions. copies of constants, expressions, and variables may be propagated. pointer aliasing pointer aliasing is not an optimization in itself, but it does affect the way the optimizer performs common subexpression elimination and copy propagation. when pointer aliasing is turned on, it allows the optimizer to maintain copy propagation information across function calls and to maintain common subexpression information across some stores. otherwise, the optimizer must discard information about copies and subexpressions in these situations. induction-variable analysis and strength reduction creating induction variables and performing strength reduction are speed optimizations performed on loops. the optimizer uses a mathematical technique called induction to create new variables out of expressions used inside a loop. these variables are called induction variables. the optimizer assures that the operations performed on these new variables are computationally less expensive (reduced in strength) than those used by the original variables. opportunities for these optimizations are common if you use array indexing inside loops, since a multiplication operation is required to calculate the position in the array which is indicated by the index. loop compaction loop compaction takes advantage of the string move instructions on the 80x86 processors by replacing the code for a loop with such an instruction. depending on the complexity of the operands, the compacted loop code may also be smaller than the corresponding non-compacted loop. you may wish to experiment with this optimization if you are compiling for size and have loops of this nature. structure copy inlining the most visible optimization performed when compiling for speed as opposed to size is that of inlining structure copies. when you enable - ot , the compiler determines whether it can safely generate code to perform a rep movsw instruction instead of calling a helper function to do the copy. for structures and unions of over eight bytes in length, performing this optimization produces faster structure copies than the corresponding helper function call. code compaction the most visible optimization performed when compiling for size is code compaction. in code compaction, the optimizer scans the generated code for duplicate sequences. when such sequences warrant, the optimizer replaces one sequence of code with a jump to the other, thereby eliminating the first piece of code. switch statements contain the most opportunities for code compaction.
56 redundant load suppression load suppression is both a speed and size optimization. when -z is enabled, the optimizer keeps track of the values it loads into registers and suppresses loads of values which it already has in a register. intrinsic function inlining there are times when you would like to use one of the common string or memory functions, such as strcpy or memcmp , but you do not want to incur the overhead of a function call. by using -oi , the compiler will generate the code for these functions within your functions scope, eliminating the need for a function call. the resulting code will execute faster than a call to the same function, but it will also be larger. register parameter passing the borland c++ compiler has two calling conventions where parameters are passed in registers instead of on the stack: object data and fastcall.
57 available software file name listing if you have received the diskette containing the file amdfiles.zip. this disk is supplied by amd, you should see the following files after decompressing it: am186em.h dhry21.loc dhry_2.c paradigm.mkf asmrules.h dhry21.map efast.c support.h dhry.h eleds.c timer.c dhry21.axe dhry21.rom eleds.h timer.h dhry21.cfg dhry21.rt emon.trm dhry21.hex dhry_1.c makefile in directory bench: am186em.h bench.loc paradigm.mkf asmrules.h bench.map efast.c eleds.c timer.c support.h eleds.h timer.h bench.cfg bench.rt emon.trm bench.hex bench.axe makefile use and unzip utility to decompress the files. be sure to use the option to create and decompress sub-directories. bbs access if the disk was not included, or if you would like a newer revision of this material, contact the amd 29k bbs system by dialing: 1-800-2929-amd after the automated answering system connects then have your modem signal the number 1 to select the bbs system. other information data sheet to obtain a data sheet on the am186em call, 1-408-749-5703 and ask for pid number 19168. other application notes building embedded c-language applications for the am186em microcontroller by becky cavanaugh and melanie typaldos. to request this call: 1-800-2929-amd or your local sales office.
58 ordering board level products you may order an sd186em development board through your local amd office. if you need the local number call: 1-408-732-2400. books on the subject mastering turbo debugger by tom swan isbn # 0-672-48454-4 hayen books
59 appendix a understanding the dhrystone benchmark the included text was taken from the source code of the dhrystone benchmark. of interest is the information about the testing done on the benchmark. this data helps the programmer determine the amount of similarity between the benchmark and the actual product code. programs with a large degree of similar operations and memory utilization should see matching performance gains. /*********************************************************************** * * "dhrystone" benchmark program * ----------------------------- * * version: c * * author: reinhold p. weicker * siemens ag, e ste 35 * postfach 3240 * 8520 erlangen * germany (west) * phone: [xxx-49]-9131-7-20330 * (8-17 central european time) * usenet: ..!mcvax!unido!estevax!weicker * original version (in ada) published in "communications of the acm" vol. 27., no. 10 (oct. 1984), pp. 1013 - 1030, together with the statistics on which the distribution of statements etc. is based. in this c version, the following c library functions are used: - strcpy, strcmp (inside the measurement loop) - printf, scanf (outside the measurement loop) in addition, unix system calls "times ()" or "time ()" are used for execution time measurement. for measurements on other systems, these calls have to be changed. collection of results: reinhold weicker (address see above) and rick richardson pc research. inc. 94 apple orchard drive tinton falls, nj 07724 phone: (201) 834-1378 (9-17 est) usenet: ...!seismo!uunet!pcrat!rick please send results to reinhold weicker and/or rick richardson. - complete information should be given on hardware and software used. - hardware information includes: machine type, cpu, type and size of caches; for microprocessors: clock frequency, memory speed (number of wait states). - software information includes: compiler (and runtime library) manufacturer and version, compilation switches, os version. - the operating system version may give an indication about the compiler; dhrystone itself performs no os calls in the measurement loop. history: this version has been made for two reasons: 1) there is an obvious need for a common c version of dhrystone, since c is at present the most popular system programming language for the class of processors
60 (microcomputers, minicomputers) where dhrystone is used most. there should be, as far as possible, only one c version of dhrystone such that results can be compared without restrictions. in the past, the c versions distributed by rick richardson (version 1.1) and by reinhold weicker had small (though not significant) differences. 2) as far as it is possible without changes to the dhrystone statistics, optimizing compilers should be revented from removing significant statements. this c version has been developed in cooperation with rick richardson (tinton falls, nj), it incorporates many ideas from the "version 1.1" distributed previously by him over the unix network usenet. i also thank chaim benedelac (national semiconductor), david ditzel (sun), earl killian and john mashey (mips), alan smith and rafael saavedra-barrera (uc at berkeley) for their help with comments on earlier versions of the benchmark. changes: in the initialization part, this version follows mostly rick richardson's version distributed via usenet, not the version distributed earlier via floppy disk by reinhold weicker. as a concession to older compilers, names have been made unique within the first 8 characters. inside the measurement loop, this version follows the version previously distributed by reinhold weicker. at several places in the benchmark, code has been added, but within the measurement loop only in branches that are not executed. the intention is that optimizing compilers should be prevented from moving code out of the easurement loop, or from removing code altogether. since the statements that are executed within the measurement loop have not been changed, the numbers defining the "dhrystone distribution"(distribution of statements, operand types and locality) still hold. except for sophisticated optimizing compilers, execution times for this version should be the same as for previous versions. since it has proven difficult to subtract the time for the measurement loop overhead in a correct way, the loop check has been made a part of the benchmark. this does have an impact - though a very minor one - on the distribution statistics which have been updated for this version. all changes within the measurement loop are described and discussed in the companion paper "rationale for dhrystone version 2". because of the self-imposed limitation that the order and distribution of the executed statements should not be changed, there are still cases where optimizing compilers may not generate code for some statements. to a certain degree, this is unavoidable for small synthetic benchmarks. users of the benchmark are advised to check code listings whether code is generated for all statements of dhrystone.
61 ********************************************************************** defines: the following "defines" are possible: -dreg=register (default: not defined) as an approximation to what an average c programmer might do, the "register" storage class is applied (if enabled by -dreg=register) - for local variables, if they are used (dynamically) five or more times - for parameters if they are used (dynamically) six or more times note that an optimal "register" strategy is compiler- dependent, and that "register" declarations do not necessarily lead to faster execution -dnostructassign (default: not defined) define if the c compiler does not support assignment of structures. -dnoenums (default: not defined) define if the c compiler does not support enumeration types. -dtimes (default) -dtime the "times" function of unix (returning process times) or the "time" function (returning wallclock time) is used for measurement. for single user machines, "time ()" is adequate. for multi-user machines where you cannot get single-user access, use the "times ()" function. if you have neither, use a stopwatch in the dead of night. "printf"s are provided marking the points "start timer" and "stop timer". do not use the unix "time(1)" command, as this will measure the total time to run this program, which will (erroneously) include the time to allocate storage (malloc) and to perform the initialization. -dhz=nnn (default: 60) the function "times" returns process times in 1/hz seconds, with hz = 60 for most systems. check your system description before you just apply the default value. ********************************************************************** compilation model and measurement (important): this c version of dhrystone consists of three files: - dhry_global.h (this file, containing global definitions and comments) - dhry_pack_1.c - dhry_pack_2.c the following "ground rules" apply for measurements: - separate compilation - no procedure merging - otherwise,compiler optimizations are allowed but should be indicated - default results are those without register declarations see the companion paper "rationale for dhrystone version 2" for a more detailed discussion of these ground rules. for 16-bit processors (e.g. 80186, 80286), times for all compilation models ("small", "medium", "large" etc.) should be given if possible, together with a definition of these models for the compiler system used.
62 ********************************************************************** dhrystone (c version) statistics: the following program contains statements of a high level programming language (here: c) in a distribution considered representative: assignments 52 (51.0 %) control statements 33 (32.4 %) procedure, function calls 17 (16.7 %) 103 statements are dynamically executed. the program is balanced with respect to the three aspects: - statement type - operand type - operand locality operand global, local, parameter, or constant. the combination of these three aspects is balanced only approximately. 1. statement type: ----------------- number total v1 = v2 9 (incl. v1 = f(..) v = constant 12 assignment, 7 with array element assignment, 6 with record component -- 34 34 x = y +|-|"&&"|"|" z 5 x = y +|-|"==" constant 6 x = x +|- 1 3 x = y *|/ z 2 x = expression, 1 two operators x = expression, 1 three operators -- 18 18 if .... 14 with "else" 7 without "else" 7 executed 3 not executed 4 for ... 7 | counted every time while ... 4 | the loop condition do ... while 1 | is evaluated switch ... 1 break 1 declaration with 1 initialization -- 34 34 p (...) procedure call 11 user procedure 10 library procedure 1 x = f (...)
63 function call 6 user function 5 library function 1 -- 17 17 --- 103 the average number of parameters in procedure or function calls is 1.82 (not counting the function values as implicit parameters). 2. operators ------------ number approximate percentage arithmetic 32 50.8 + 21 33.3 - 7 11.1 * 3 4.8 / (int div) 1 1.6 comparison 27 42.8 == 9 14.3 /= 4 6.3 > 1 1.6 < 3 4.8 >= 1 1.6 <= 9 14.3 logic 4 6.3 && (and-then) 1 1.6 | (or) 1 1.6 ! (not) 2 3.2 -- ----- 63 100.1 3. operand type (counted once per operand reference): --------------- number approximate percentage integer 175 72.3 % character 45 18.6 % pointer 12 5.0 % string30 6 2.5 % array 2 0.8 % record 2 0.8 % --- ------- 242 100.0 % when there is an access path leading to the final operand (e.g. a record component), only the final data type on the access path is counted. 4. operand locality: ------------------- number approximate
64 percentage local variable 114 47.1 % global variable 22 9.1 % parameter 45 18.6 % value 23 9.5 % reference 22 9.1 % function result 6 2.5 % constant 55 22.7 % --- ------- 242 100.0 % the program does not compute anything meaningful, but it is syntacticallyand semantically correct. all variables have a value assigned to them before they are used as a source operand. there has been no explicit effort to account for the effects of a cache, or to balance the use of long or short displacements for code or data.
65 appendix b summary of processors below is a summary of computer systems and their processors running both dhrystone 1.1 and dhrystone 2.1. the header test defines the data and explains how an updated version of this report can be obtained. dhrystone 2.1 and 1.1 mips results (language: c). the dhrystone programs are by reinhold weicker. his email address is: weicker.muc@sni.de (in general), or weicker.muc@sni-usa.com (from north america). dhrystone is a short synthetic benchmark program intended to be representative for system (integer) programming. based on published statistics on use of programming language features: see original publication in cacm 27,10 (oct 1984). orginally published in ada, now mostly used in c. version 2 (in c) published in sigplan notices 23,8 (aug 1988), together with measurement rules. version 1 is no longer recommended since state-of-the-art compilers can eliminate too much 'dead code' from the benchmark (however, quoted mips numbers are often based on version 1). problems: due to its small size (100 hll statements, 1-1.5 kb code), the memory system outside the cache is not tested; compilers can too easily optimize for dhrystone; string operations are somewhat over-represented. recommendation: use it for controlled experiments only; don't blindly trust single dhrystone mips numbers quoted somewhere (don't do this for any benchmark). the dhrystone c programs (dhry.shar), and latest table of results (dhry.tbl) are available via anonymous ftp from 'ftp.nosc.mil' in directory 'pub/aburto'. the ftp.nosc.mil ip address is: 128.49.192.51 . please send new results (systems, machines, compilers, compiler options) to 'aburto@marlin.nosc.mil'. i will keep results updated, post to 'comp.benchmarks' periodically, and send results to reinhold weicker. please read the 'dhry.doc' file first for information regarding running the programs and submitting results. if you need the individual files please send email to aburto@marlin.nosc.mil . these dhrystone source and results are also available via ftp from netlib@ornl.gov . the vax 11/780 mips reference is 1757 for both dhrystone v1.1 and v2.1. i used the same vax 11/780 mips reference (1757) for both v1.1 and v2.1 since the results for unoptimized code in general were very similar, if not identical, in this case. reinhold weicker recommended that both v1.1 and v2.1 mips results be collected in this table as this would give a reference on how much more v1.1 can be optimized relative to v2.1. when quoting dhrystone vax mips ratings it is preferrable to use the v2.1 numbers. reinhold weicker has asked me to add the following statement: "dhrystone author says: relying on mips v1.1 numbers can be hazardous to your professional health" ... some
66 compiler results which appear 'too high' compared to other compiler results on the same machine are indicated by an '*' --- just my way of keeping track of exceptional results from certain compilers. results as of 05 nov 1994: cpu mips mips system os/compiler cpu (mhz) v1.1 v2.1 ref ### ---------------------- ------------ ----------- ----- ------ ------ --- 001 dec 10000/610 axp openvms v1.0 dec 21064 200.0 194.9 214.8 6 002 dec 7000/600 axp osf/1 v1.3a dec 21064 200.0 195.5 203.3 33 003 dec 7000/610 axp openvms v1.0 dec 21064 182.0 177.3 195.6 6 004 dec 3000/800 axp osf/1 v1.3a dec 21064 200.0 192.9 189.7 33 005 dec 4000/710 axp osf/1 v1.3a dec 21064 190.0 188.7 189.7 33 006 dec 4000/610 axp openvms v1.0 dec 21064 160.0 159.0 173.0 6 007 dec 3000/600 axp osf/1 v1.3a dec 21064 175.0 168.5 167.4 33 008 dec 3000/500 axp openvms v1.0 dec 21064 150.0 146.7 160.1 6 009 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 ------ 146.5 52 010 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 ------ 143.0 46 011 dec 3000/400 axp openvms v1.0 dec 21064 133.0 129.9 142.1 6 012 dec 3000/500 axp osf/1 t1.3-3 dec 21064 150.0 211.0 137.1* 25 013 dec 3000/500 axp osf/1 t1.3-3 dec 21064 150.0 157.1 133.7 25 014 dec 2000/300 axp osf/1 v1.3a dec 21064 150.0 140.9 129.4 33 015 dec 4000/710 osf dec 21064 190.0 186. 126. 51 016 gateway p5-90 dos 6.2 pentium 90.0 172.5 124.4 53 017 ibm rs/6000 590 aix 3.2.5 power2 ----- 134.2 124.3 47 018 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 ------ 116.6 32 019 powermac 8100/80 system 7.1.2 powerpc 601 80.0 157.7 113.3 56 020 hp 9000/735 hp-ux 9.0 pa-risc7100 99.0 122.7 110.5 5 021 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 122.7 110.1 8 022 sgi indigo2 extreme irix 4.0.5h r4000 100.0 ------ 108.9 60
67 023 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 118.1 106.6 8 024 intel pentium/66 unix svr 4.2 pentium 66.0 111.6 101.2 30 025 hp 9000/755 hp-ux 9.01 pa-risc7100 99.0 112.0 99.2 55 026 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 106.2 97.1 8 027 sgi indigo2 extreme irix 4.0.5h r4000 100.0 ------ 95.3 60 028 hp 9000/755 hp-ux 9.01 pa-risc7100 99.0 108.4 95.2 55 029 hp 9000/735 hp-ux 9.0 pa-risc7100 99.0 104.2 95.0 8 030 hp 9000/750 hp-ux 9.05 pa7000 66.0 110.7 94.2 61 031 powermac 7100/66 system 7.1.2 powerpc 601 66.0 129.6 93.1 56 032 alr pentium/60 ms dos 5.0 pentium 60.0 112.9 91.4 29 033 intel pentium/60 unix svr 4.2 pentium 60.0 100.0 91.4 30 034 hp 9000/712 hp-ux 9.05 pa7100lc 60.0 118.6 88.7 61 035 ibm rs/6000 250 aix 3.2.5 powerpc 601 66.0 96.1 83.6 54 036 ibm rs/6000 250 aix 3.2.5 powerpc 601 66.0 96.1 82.8 54 037 sun sparcserver 20/612 solaris 2.3 supersparc 60.0 93.6 82.2 58 038 ibm rs/6000 model 340 aix 3.2 power risc 33.0 ------ 76.3 60 039 gateway pentium p5-90 linux 1.1.35 pentium 90.0 80.6 75.9 59 040 datel pentium p5-90 ms dos 6.22 pentium 90.0 73.0 70.0 62 041 zeos pentium p5-90 ms dos 6.22 pentium 90.0 72.5 70.0 62 042 hp 9000/750 hp-ux 9.0 pa-risc 66.0 76.1 69.6 5 043 sgi indigo elan irix 4.0.5h r4000 100.0 78.8 69.2 43 044 sgi crimson elan irix 4.0.5 r4000 50.0 77.5 69.2 43 045 hp 9000/730 hp-ux 9.0 pa-risc 66.0 76.1 69.2 5 046 sparccenter 1000 solaris 2.3 supersparc 50.0 77.8 68.5 50 047 sgi indigo2 extreme irix 4.0.5h r4000 100.0 76.8 67.4 43 048 sparcstation 10/51 sunos 4.1.3 supersparc 50.0 74.9 67.0 49 049 sgi indy pc irix 5.1.1 r4000 100.0 65.1 67.0 43 050 hp 9000/730 hp-ux 8.07 pa-risc 66.0 ------ 65.2 3 051 ibm rs/6000 model 550 aix 3.2.2 power risc 42.7 80.8 62.9 2 052 sparcstation 10/41 sunos 4.1.3 supersparc 40.0 63.7 58.3 26
68 053 hp 9000/715 hp-ux 9.0 pa-risc7100 50.0 62.3 55.2 5 054 sparcstation voyager solaris 2.3 usparc ii 60.0 61.5 54.8 50 055 ibm rs/6000 model 550 aix 3.2.2 power risc 42.7 61.7 54.7 2 056 hp 9000/735 hp-ux 9.01 pa-risc7100 99.0 53.3 53.4 8 057 sparcstation 10/41 solaris 2.3 supersparc 40.0 59.2 52.7 49 058 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 58.0 52.7 26 059 hp 9000/720 hp-ux 9.0 pa-risc 50.0 57.4 52.7 5 060 sparcstation 10/51 sunos 4.1.3 supersparc 50.0 55.2 52.6 49 061 hp 9000/710 hp-ux 9.0 pa-risc 50.0 57.4 52.5 5 062 intel 486dx2/66 unix svr 4.2 80496dx2 66.0 54.0 51.6* 30 063 sparcstation 2 (80) sunos 4.1.2 weitek 80.0 50.1 51.3 41 064 sparcstation 10/41 sunos 4.1.3 supersparc 40.0 64.7 51.2 26 065 hp 9000/715t hp-ux 9.05 pa7100 33.0 60.3 50.5 61 066 hp 9000/705 hp-ux 9.05 pa7000 35.0 58.3 49.5 61 067 alr evolution v/1 os/2 2.1 pentium 60.0 52.3 49.0 38 068 hp 9000/715 hp-ux 9.0 pa-risc7100 50.0 ------ 47.9 7 069 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 58.7 46.5 26 070 sparcstation 10/20 sunos 4.1.2 supersparc 33. 37.5 43.5 13 071 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 56.2 42.8 17 072 ibm rs/6000 53h aix 3.2.5 risc 33. 44.9 42.7 54 073 sparcstation 10/20 sunos 4.1.2 supersparc 33. 43.2 42.5 13 074 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 54.6 40.9 17 075 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 43.4 37.9 17 076 sparcstation lx solaris 2.2 sparc 0.0 45.7 37.7 26 077 isa/vlb at clone ms dos 6.0 80486dx2 66.7 41.1 37.7 34 078 sparcstation 10/30 sunos 4.1.3 supersparc 36.0 39.3 37.4 21 079 hp 9000/705 hp-ux 9.0 pa-risc7100 35.0 40.0 36.7 5 080 hp 9000/715 hp-ux 9.0 pa-risc7100 33.0 41.4 36.3 5 081 486dx2/72 linux0.99.15 80486dx2 72.0 38.3 35.4 48 082 sparcstation lx solaris 2.2 sparc 0.0 44.5 34.2 26
69 083 intel 486dx/33 ------------ 80486dx 33.3 35.7 33.5* 29 084 ibm rs/6000 model 530 aix 3.1.5 power risc 25.0 37.3 32.3 10 085 ibm rs/6000 model 530 ------------ power risc 25.0 37.1 32.2 5 086 cornell eisa/isa/vlb linux 1.1.23 80486dx2 66.7 35.3 32.2 57 087 lijn 486dx2/66 linux0.99pl6 80486dx2 66.7 ------ 31.3 22 088 gateway 486dx2/66 linux 0.99 80486dx2 66.7 ------ 30.9 3 089 intel 486dx2/66 os/2 2.1 80486dx2 66.7 32.9 30.2 35 090 decstation 5000/133 ultrix 4.3 r3000 33. 31.6 29.0 12 091 ibm rs/6000 model 550 aix 3.2.2 power risc 42.7 28.5 28.8 2 092 sparcstation 2 (40) sunos 4.1.3 sparc 40.0 32.1 28.7 42 093 at clone ms dos 5.0 80486dx 33.3 33.6 28.3* 15 094 ---------------------- 4.3 bsd r3000 33. ------ 28.3 37 095 sparcstation 2 (50) sunos 4.1.3 sparc 50.0 38.5 27.5 42 096 sgi indigo xz irix 4.0.5h r3000 33. 31.0 27.3 43 097 sparcstation 2 (40) sunos 4.1.3 sparc 40.0 31.0 27.3 26 098 sparcstation 2 (40) sunos 4.1.3 sparc 40.0 30.1 27.2 26 099 decstation 5000/133 ultrix 4.3 r3000 33. 29.3 26.9 12 100 decstation 5000/133 ultrix 4.3 r3000 33. 30.0 26.6 12 101 decstation 5000/133 ultrix 4.3 r3000 33. 29.5 26.6 12 102 intel 486dx/33 unix svr 4.2 80486dx 33.3 27.6 26.3* 30 103 ibm rs/6000 model 320 ------------ power risc 20.0 29.5 25.8 5 104 sun sparcserver 690mp sunos 4.1.2 sparc 40.0 27.9 25.2 2 105 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 28.0 25.2 9 106 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 27.5 25.2 9 107 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 28.0 25.1 9 108 sparcstation 670 sunos 4.1.3 sparc 40.0 27.8 25.1 45 109 at clone ms dos 5.0 80486dx 33.3 28.3 25.0* 15 110 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 28.0 25.0 9 111 pt-sys5k sun4.1.3b-t1 lsi sparc 40.0 28.9 24.9 16 112 interpro 2430 sys v r3.1 ----------- 0.0 ------ 24.8 36
70 113 insight 486dx/50, isa ibm os/2 2.1 80486dx 50.0 25.7 24.0 28 114 decstation 5000/133 ultrix 4.3 r3000 33. 26.4 23.8 12 115 insight 486dx/50, isa ibm os/2 2.0 80486dx 50.0 25.3 23.7 18 116 sparcstation 2 (40) sunos 4.1.2 sparc 40.0 26.3 23.6 2 117 self-assembled pc 386bsd 0.1 80486dx 50.0 25.4 23.4 24 118 hitachi ex/60 osf/1 v1.0.0 hitachi 370 0.0 23.1 22.9 39 119 mips 3230 ------------ r3000 25.0 24.5 22.2 5 120 decstation 5000/200 ------------ r3000 25.0 24.2 22.1 5 121 amiga 2000/fusionforty amigados 2.1 68040 28.0 23.3 21.6 23 122 sun sparcserver 690mp sunos 4.1.2 sparc 40.0 23.7 21.3 2 123 sparcstation 2 (40) sunos 4.1.3 sparc 40.0 19.8 21.3 42 124 sun sparcserver 690mp sunos 4.1.2 sparc 40.0 21.1 20.9 2 125 sparcstation 2 (40) ------------ sparc 40.0 28.5 20.3 5 126 sparcstation 670 sunos 4.1.3 sparc 40.0 21.7 20.1 45 127 amiga 2000/fusionforty amigaos 2.1 68040 33.3 23.3 19.5 20 128 nextstation turbo nextstep 3.0 68040 33.3 20.0 19.5 11 129 80486dx/33 sco unix 3.2 80486dx 33.3 ------ 19.3 60 130 interpro 2430 sys v r3.1 ----------- 0.0 ------ 19.2 36 131 decstation 5000/133 ultrix 4.3 r3000 33. 19.5 19.0 12 132 nextstation/33 turbo nextstep 3.0 68040 33.3 20.0 18.9 11 133 amiga 4000/25 amigaos 68040 25.0 22.2 18.7 4 134 club falcon 433, isa ms dos 5.0 80486dx 33.3 20.4 18.6 9 135 club falcon 433, isa ms dos 5.0 80486dx 33.3 20.3 18.6 9 136 lan technologies esix 4.0.3a 80486dx 33.3 20.2 18.6 14 137 club falcon 433, isa ms dos 5.0 80486dx 33.3 20.2 18.6 9 138 lan technologies esix 4.0.3a 80486dx 33.3 20.4 18.1 14 139 lan technologies esix 4.0.3a 80486dx 33.3 20.1 18.1 14 140 vega 486dx/33 ms dos 5.0 80486dx 33.3 19.8 18.1 2 141 vega 486dx/33 ms dos 5.0 80486dx 33.3 19.7 18.1 2 142 lan technologies esix 4.0.3a 80486dx 33.3 20.1 18.0 14
71 143 vega 486dx/33 ms dos 5.0 80486dx 33.3 19.4 17.9 45 144 lan technologies esix 4.0.3a 80486dx 33.3 20.3 17.8 14 145 vega 486dx/33 ms dos 5.0 80486dx 33.3 19.6 17.6 45 146 opus 5120, sparc 1+ sunos 4.1.2 sparc 7 25.0 19.3 17.4 40 147 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 20.6 17.3 9 148 ami 486dx2/66, eisa ms dos 5.0 80486dx2 66.7 18.8 16.9 9 149 ami 486dx/33, isa linux0.99p17 80486dx 33.3 18.4 16.9 19 150 sun sparcstation 1+ sunos 4.1.3 sparc 25.0 18.6 16.8 31 151 vega 486dx/33 ms dos 5.0 80486dx 33.3 18.2 16.8 2 152 vega 486dx/33 ms dos 5.0 80486dx 33.3 18.4 16.7 2 153 vega 486dx/33 ms dos 5.0 80486dx 33.3 17.9 16.3 2 154 vega 486dx/33 ms dos 5.0 80486dx 33.3 17.9 16.3 2 155 vega 486dx/33 ms dos 5.0 80486dx 33.3 17.8 16.3 2 156 sgi personal iris 25g irix 4.0.5h r3000 20.0 17.4 15.6 43 157 opus 5120, sparc 1+ sunos 4.1.2 sparc 7 25.0 16.0 14.3 40 158 opus 5120, sparc 1+ sunos 4.1.2 sparc 7 25.0 15.9 14.2 40 159 opus 5120, sparc 1+ sunos 4.1.2 sparc 7 25.0 15.8 14.2 40 160 lan technologies esix 4.0.3a 80486dx 33.3 14.0 13.8 14 161 lan technologies esix 4.0.3a 80486dx 33.3 13.9 13.8 14 162 decstation 3100 ------------ r3000 16.7 14.9 13.4 5 163 sun sparcstation 1+ sunos 4.1.2 sparc 25.0 14.7 13.2 2 164 amdahl 580 uts 5.0 r1.2 ----------- ----- 13.1 ----- 0 165 sun ipc ------------ sparc 25.0 17.6 13.0 5 166 club falcon 433, isa ms dos 5.0 80486dx 33.3 13.5 12.2 9 167 sun sparcstation 1+ sunos 4.1.2 sparc 25.0 11.8 12.1 2 168 club falcon 433, isa ms dos 5.0 80486dx 33.3 12.2 11.9 9 169 vega 486dx/33 ms dos 5.0 80486dx 33.3 12.6 11.8 2 170 vega 486dx/33 ms dos 5.0 80486dx 33.3 12.1 11.7 2 171 80386 sco unix 3.2 80386 40.0 ------ 11.5 60 172 vega 486dx/33 ms dos 5.0 80486dx 33.3 12.3 11.3 2
72 173 vega 486dx/33 ms dos 5.0 80486dx 33.3 11.6 11.2 2 174 sun 4/280 sunos 4.1.2 mb86900 16.7 12.5 10.5 17 175 sun sparcserver 690mp sunos 4.1.2 sparc 40.0 13.1 10.4 2 176 sparcstation 2 (40) sunos 4.1.2 sparc 40.0 12.6 9.8 2 177 vax 8650 4.3 bsd ----------- 18.0 6.3 6.2 2 178 sun sparcstation 1+ sunos 4.1.2 sparc 25.0 7.0 5.9 2 179 vax 8650 4.3 bsd ----------- 18.0 5.4 5.4 2 180 vax 8650 4.3 bsd ----------- 18.0 5.0 5.0 2 181 sony nws-1750 ------------ 68030 25.0 4.8 4.6 27 182 amiga 2000/g-force netbsd 0.9a 68030 25.0 5.0 4.4 44 183 vax 8600 vms ----------- ----- 4.1 ----- 0 184 amiga 2000/csa-mmr amigaos 1.3 68030 33.3 3.4 3.6 2 185 amiga 2000/csa-mmr amigaos 1.3 68030 33.3 3.3 3.4 2 186 amiga 2000/csa-mmr amigaos 1.3 68030 33.3 3.0 3.1 2 187 masscomp 5600/5700 rtu v3.0 68020 16.7 2.7 ----- 0 188 gould pn9080 utx-32 1.1c ----------- ----- 2.6 ----- 0 189 atari mega st 4 mint/tos1.04 68030 25.0 2.2 2.6 40 190 atari mega st 4 mint/tos1.04 68030 25.0 2.2 2.6 40 191 sun 3/160c sun 3.0 68020 16.7 2.4 ----- 0 192 masscomp 5400 rtu v3.0 68020 16.7 2.3 ----- 0 193 sun 3/180 sun 4.2 68020 16.7 2.2 ----- 0 194 sun 3/75 sun 4.2 v3 68020 16.7 2.0 ----- 0 195 pyramid 90x osx 2.5 ----------- ----- 1.8 ----- 0 196 pyramid 90x osx 2.3 ----------- ----- 1.3 ----- 0 197 vax 11/785 vms ----------- ----- 1.2 ----- 0 198 vax 11/785 unix 5.2 ----------- ----- 1.2 ----- 0 199 gould pn6005 unix 4.2 bsd ----------- ----- 1.1 ----- 0 200 vax 11/785 unix 4.2 bsd ----------- ----- 1.03 ----- 1 201 hp 9000/500 hp-ux 4.02 b seriescpu ----- 0.98 ----- 0 202 att 3b20 unix 5.2 ----------- ----- 0.98 ----- 0
73 203 vax 11/780 unix 5.0.1 ----------- 5.0 0.93 ----- 1 204 microvax ii ------------ ----------- ----- 0.92 ----- 0 205 vax 11/780 unix 5.2 ----------- 5.0 0.89 ----- 0 206 cyber datamate uniplus 5.0 68010 12.5 0.89 ----- 0 207 att pc6300+ msdos 3.1 80286 6.0 0.81 ----- 0 208 att 3b2/400 unix 5.2 we32100 ----- 0.75 ----- 0 209 sun 2/120 standalone 68010 10.0 0.75 ----- 0 210 ibm pc/at venix/86 2.1 80286 7.5 0.75 ----- 0 211 ibm pc/at pcdos 3.1 80286 6.0 0.71 ----- 0 212 pdp 11/70 unix 5.2 ----------- ----- 0.71 ----- 0 213 cyber datamate uniplus 5.0 68010 12.5 0.71 ----- 0 214 masscomp 500 rtu v3.0 68010 10.0 0.70 ----- 0 215 ibm pc/at pcdos 3.0 80286 6.0 0.69 ----- 0 216 sun 2/120 sun 4.2 bsd 68010 10.0 0.69 ----- 0 217 att pc6300+ msdos 3.1 80286 6.0 0.69 ----- 0 218 att pc7300 unix 5.2 68010 10.0 0.63 ----- 0 219 vax 11/750 vms ----------- ----- 0.62 ----- 0 220 ibm pc/at pcdos 3.0 80286 6.0 0.62 ----- 0 221 ibm pc/at pcdos 3.0 80286 6.0 0.61 ----- 0 222 ibm pc/at venix/86 2.1 80286 6.0 0.57 ----- 0 223 iris-1400 unix sys v 68010 10.0 0.57 ----- 0 224 amiga 1000 amigados 68000 7.16 0.54 ----- 1 225 macintosh mac rom 68000 7.8 0.52 ----- 0 226 fast mac ------------ 68000 7.7 0.51 ----- 0 227 vax 11/750 unix 4.2 bsd ----------- ----- 0.50 ----- 0 228 ibm pc/at pcdos 3.0 80286 6.0 0.48 ----- 0 229 att 3b2/300 unix 5.0.2 we32000 ----- 0.46 ----- 0 230 att pc6300 msdos 2.11 8086 8.0 0.44 ----- 0 231 nec pc9801f pcdos 2.11 8086 8.0 0.44 ----- 0 232 macintosh ------------ 68000 7.7 0.40 ----- 0
74 233 macintosh mac rom 68000 7.8 0.40 ----- 0 234 ibm pc/at pcdos 3.0 80286 6.0 0.39 ----- 0 235 att pc6300 msdos 2.11 8086 8.0 0.39 ----- 0 236 onyx c8002 is/1 1.1(v7) z8000 4.0 0.29 ----- 0 237 pdp 11/34 unix v7m ----------- ----- 0.25 ----- 0 238 ibm pc/xt pcdos 2.1 8088 4.77 0.23 ----- 0 239 ibm pc msdos 2.0 8088 4.77 0.22 ----- 0 240 ibm pc msdos 2.0 8088 4.77 0.19 ----- 0 241 ibm pc/xt venix/86 2.0 8088 4.77 0.18 ----- 0 242 cosmos unisoft 68000 8.0 0.18 ----- 0 243 ibm pc/xt pc/ix 8088 4.77 0.16 ----- 0 244 cromemco z2 cromix 11.26 z80 4.0 0.052 ---- 1 245 apple iie dos 3.3 65c02 1.02 0.021 ---- 1 246 commodore 64 c64 rom 6510 1.0 0.0205 --- 1 compiler summary: --- ### 001 compiler/options unknown 002 4mb cache, dec osf/1 axp c x2.0, options unknown. 003 compiler/options unknown 004 2mb cache, dec osf/1 axp c x2.0, options unknown. 005 4mb cache, dec osf/1 axp c x2.0, options unknown. 006 compiler/options unknown 007 2mb cache, dec osf/1 axp c x2.0, options unknown. 008 compiler/options unknown 009 hp c 9.65, cc -dunix +o4 +oall 010 hp c 9.61, cc -dunix +o4 +oaggressive +opipeline +olibcalls -wl,-a,archive. new compiler improves performance considerably.
75 011 compiler/options unknown 012 gem c, gem c appears to break v1.1 ... 013 cc -dunix 014 512kb cache, dec osf/1 axp c x2.0, options unknown. 015 compiler/options unknown 016 watcom c32 v9.5 /oneatx /zp4 /5r 017 xlc 1.3.0.0, cc -dunix -o3 018 cc -dunix +p +o3 +om1 -j -wl,-a,archive 019 xlc v1.0.2, using gary karmarcik's aix emulator (paix), xlc -o3 -q 020 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 021 hp92453-01 a.09.19, cc -dunix -j +o3 -wl,-a,archive 022 cc 3.10.1, cc -dunix -o -mips2 023 hp92453-01 a.09.19, cc -dunix -j +o3 024 intel pentium processor performance brief, release 1.0, march 1993 the unix compiler was: intel reference c compiler (january 27th, 1993 beta release). no other information given. 025 gcc 2.5.8, gcc -dunix -o2 026 hp92453-01 a.09.19, cc -dunix +o3 027 cc 3.10.1, cc -dunix -o -mips1 028 cc -dunix +o3 029 cc -dunix -o 030 hp c 9.61, cc -dunix +o4 +oall 031 xlc v1.0.2, using gary karmarcik's aix emulator (paix), xlc -o3 -q 032 metaware high c with '-pentium' and '-o4' options, 8 mb 64-bit ram 033 intel pentium processor performance brief, release 1.0, march 1993 the unix compiler was: intel reference c compiler (january 27th, 1993 beta release). no other information given. 034 hp c 9.61, cc -dunix +o4 +oall 035 powerpc 601, cc -o3 -dunix -dropt 036 powerpc 601, cc -o3 -dunix
76 037 gcc 2.5.8, gcc -o2 -dunix_old -fomit-frame-pointer -finline- functions -fexpensive-optimizations -funroll-loops -fschedule-insns2 -fdelayed-branch 038 xl c 3.2, cc -dunix -o 039 gcc 2.6.0, gcc -o2 -fexpensive-optimizations 040 gcc 2.5.4, gcc -dunix -dropt -o2 -finline-functions -fomit-frame-pointer 041 gcc 2.5.4, gcc -dunix -dropt -o2 -finline-functions -fomit-frame-pointer 042 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 043 cc 3.10.1, cc -dunix -o2 -mips2 -sopt,-inline, 8 kbyte i/d caches, 1 mbyte external (secondary) cache. 044 cc 3.10.1, cc -dunix -o2 -mips2 -sopt,-inline 045 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 046 gcc 2.5.8, gcc -dunix_old -o2 -fomit-frame-pointer -funroll-loops -fexpensive-optimizations -fschedule-insns2 -fdelayed-branch 047 cc 3.10.1, cc -dunix -o2 -mips2 -sopt,-inline 048 gcc 2.5.8, gcc -dunix -o 049 cc 3.10.1, cc -dunix -o2 -mips2 -sopt,-inline, 8 kbyte i/d caches, no external cache. note: 81.9 mips using cc 3.17 compilers on v1.1. 050 cc +os +o3 051 cc -dunix -o -q 052 gcc 2.4.5, gcc -dunix -static -o2 -mv8 053 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 054 gcc 2.5.8, gcc -dunix_old -o2 -fomit-frame-pointer -finline- functions -fexpensive-optimizations -funroll-loops -fschedule-insns2 -fdelayed-branch 055 cc -dunix -o
77 056 hp92453-01 a.09.19, cc -dunix 057 gcc 2.5.8, gcc -dunix_old -o 058 gcc 2.4.5, gcc -dunix -static -o2 -mv8 059 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 060 cc -dunix -o 061 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 062 intel pentium processor performance brief, release 1.0, march 1993 metaware high c v2.4b, compiler options not provided. 063 sun c 2.0.1, acc -dunix -dropt -bstatic -fast -o4 the weitek cpu runs at 2 times the sparcstation 2 bus speed (40 mhz). 064 sun c 2.0.1, acc -dunix -bstatic -fast -o4 065 hp c 9.61, cc -dunix +o4 +oall 066 hp c 9.61, cc -dunix +o4 +oall 067 borland c++ v1.0, 256k cache, 16 mb 80 ns dram 068 cc -dunix -o 069 sc 2.0.1, acc -dunix -bstatic -fast -o4 070 cc -dunix -dropt -o3 071 sun c 1.0, using cg89, cc -dunix -dropt -fast -o4 -bstatic 072 cc -o3 -dunix 073 cc -dunix -dropt -bstatic -o4 074 sun c 1.0, using cg89, cc -dunix -fast -o4 075 sun c 1.0, using cg89, cc -dunix -o 076 gcc 2.4.5, gcc -dunix -static -o2 -mv8 -funroll-all-loops 077 256k write-back cache, 4 mbyte ram, umc chipset, gcc 2.2.2 (djgpp) gcc -dunix -o2 -m486 -fomit-frame-pointer -finline-functions 078 cc -dunix -o 079 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 080 dhrystone v1.1: cc -j +o3, dhrystone v2.0: cc -j +o3 -wl,-aarchive 081 gcc 2.5.7, gcc -dunix -o6 -m486 -funroll-loops -fomit-frame-pointer
78 -finline-functions 082 gcc 2.4.5, gcc -dunix -static -o2 -mv8 083 metaware high c with '-486' and '-o4' options, 8 mb 64-bit ram 084 cc -dunix -o -q 085 compiler/options unknown 086 gcc 2.5.8, gcc -dunix -m486 -o2 -fomit-frame-pointer -static -finline-functions -funroll-loops 087 gcc 2.3.3, gcc -dunix -m486 -o6 -fomit-frame-pointer -wall 088 gcc 2.3.3, gcc -dunix -m486 -o6 -finline-functions -funroll-all- loops 089 gcc 2.3.3, gcc -dunix -m486 -o2 -finline-functions 090 cc -dunix -o4 091 cc -dunix 092 gcc 2.5.6, gcc -dunix -static -o2 093 watcom c32 9.5la, options: -oeatx -zp4, msc timer, -oe == inline this compiler appears to do quite well in optimizing dhrystone. 094 cc -o2 095 sun c 2.0.1, cc -dunix -dropt -fast -o4, original 40 mhz motherboard replaced with 50 mhz motherboard. 096 cc 3.10.1, cc -dunix -o2 -sopt,-inline 097 gcc 2.4.5, gcc -dunix -static -o2 098 sun c 2.0.1, acc -dunix -bstatic -fast -o4 099 cc -dunix -o3 100 gcc 2.3.3, gcc -dunix -o1 101 gcc 2.3.3, gcc -dunix -o2 102 intel pentium processor performance brief, release 1.0, march 1993 metaware high c v2.4b. compiler options not provided. 103 compiler/options unknown 104 cc -dunix -o4 -bstatic 105 gcc 2.2.2, gcc -dunix -m486 -o -finline-functions -fomit-frame- pointer
79 106 gcc 2.2.2, gcc -dunix -dropt -m486 -o2 -finline-functions -fomit-frame-pointer 107 gcc 2.2.2, gcc -dunix -dropt -m486 -o -finline-functions -fomit-frame-pointer 108 athens.nosc.mil, /bin/cc, cc -dunix -o4 -bstatic 109 watcom c32 9.5la, options: -oatx -zp4, msc timer this compiler appears to do quite well in optimizing dhrystone. 110 gcc 2.2.2, gcc -dunix -m486 -o -finline-functions -fexpensive-optimizations 111 sun c 1.0, cc -dunix -bstatic -o4 112 acc -dctimer -o4 -atarg=c400 113 ibm c set/2 v1.0, csd level 50, icc /g4 /o+ /gs+ /dmsc /dropt, 256kb sram external cache, 16mb ram. 114 cc -dunix -o2 115 ibm c set/2 v1.0, csd level cs00042, /dmsc /dropt /g4 /o+ /c+, 256kb sram external cache, 16mb ram. 116 cc -dunix -o4 -bstatic 117 gcc 2.3.3, gcc -o9 118 cc v1.1, cc -o 119 compiler/options unknown 120 compiler/options unknown 121 sas c 6.2, global & peephole optimizers & inline enabled. 122 cc -dunix -o4 123 gcc 2.5.6, gcc -dunix -o2 124 cc -dunix -o 125 compiler/options unknown 126 athens.nosc.mil, /bin/cc, cc -dunix -o4 127 sas c 6.2, global & peephole optimizers enabled. 128 gcc 2.3.1, gcc -o 129 cc 3.2.4, cc -dunix_old -o
80 130 acc -dctimer -o4 -atarg=c100 131 cc -dunix -o1 132 gcc 1.93, gcc -o 133 sas/c 6.2, global and peephole optimizers enabled. 134 gcc 2.2.2, gcc -dunix -dropt -m486 -o -finline-functions -fomit-frame-pointer 135 gcc 2.2.2, gcc -dunix -m486 -o -finline-functions -fomit-frame- pointer 136 gcc 2.3.3, gcc -dunix -m486 -o2 -finline-functions -funroll-all- loops -fstrength-reduce 137 gcc 2.2.2, gcc -dunix -dropt -m486 -o2 -finline-functions -fomit-frame-pointer 138 gcc 2.3.3, gcc -dunix -o -static 139 gcc 2.3.3, gcc -dunix -o2 140 gcc 2.2.2, gcc -dunix -m486 -o2 -finline-functions -fomit-frame- pointer 141 gcc 2.2.2, gcc -dunix -m486 -o2 -finline-functions -funroll-loops -fomit-frame-pointer 142 gcc 2.3.3, gcc -dunix -m486 -o2 143 gcc 2.4.1, gcc -dunix -dropt -m486 -o -finline-functions -fomit-frame-pointer 144 gcc 2.3.3, gcc -dunix -o 145 gcc 2.4.1, gcc -dunix -m486 -o2 -finline-functions -fomit-frame- pointer 146 gcc 2.4.5, gcc -dunix -o2 -fomit-frame-pointer -static 147 gcc 2.2.2, gcc -dunix -dropt -m486 -finline-functions -fstrength- reduce -fomit-frame-pointer 148 gcc 2.2.2, gcc -dunix -m486 149 gcc 2.3.3, gcc -dunix -m486 -o -finline-functions -fstrength-reduce -fomit-frame-pointer
81 150 gcc 2.4.5, gcc -dunix -o6 -funroll-loops -static 151 gcc 2.2.2, gcc -dunix -m486 -finline-functions -o2 152 gcc 2.2.2, gcc -dunix -m486 -finline-functions -o 153 gcc 2.2.2, gcc -dunix -m486 -o 154 gcc 2.2.2, gcc -dunix -dropt -m486 -o 155 gcc 2.2.2, gcc -dunix -dropt -m486 -o2 156 cc 3.10.1, cc -dunix -o2 -sopt,-inline 157 /bin/cc, cc -o4 -dalign -bstatic -dropt 158 /bin/cc, cc -o4 -dalign -bstatic 159 /bin/cc, cc -o4 -qoption iropt -l9 -bstatic 160 gcc 2.3.3, gcc -dunix -bstatic -o 161 gcc 2.3.3, gcc -dunix -o 162 compiler/options unknown 163 cc -dunix -o4 -bstatic 164 cc 1.5, -dropt 165 compiler/options unknown 166 gcc 2.2.2, gcc -dunix -m486 -finline-functions -fstrength-reduce -fomit-frame-pointer 167 cc -dunix -o 168 gcc 2.2.2, gcc -dunix -m486 169 gcc 2.2.2, gcc -dunix -dropt -m486 170 gcc 2.2.2, gcc -dunix -m486 171 cc 3.2.4, cc -dunix_old -o 172 gcc 2.2.2, gcc -dunix -dropt 173 gcc 2.2.2, gcc -dunix 174 sun c 1.0, cc -dunix -fast -o4 -bstatic 175 cc -dunix 176 cc -dunix 177 gcc 1.37.1, gcc -dunix -o 178 cc -dunix
82 179 cc -dunix -o 180 cc -dunix 181 gcc 2.4.5, gcc -dunix -o2 -finline-functions -funroll-loops -lgcc -dropt -freg-struct-return 182 gcc 2.5.6, gcc -dunix -o2 -fomit-frame-pointer -finline-functions 183 vax-11 c 2.0, -dropt 184 aztec c 5.0a, cc -damiga -c2 -fm -sabfmnpr not a great optimizing compiler ('-s' options) since a 'hand' optimized version of the assembly code produced 7.7 v1.1 mips. 185 aztec c 5.0a, cc -damiga -dropt -c2 -fm -sabfmnpu 186 aztec c 5.0a, cc -damiga -c2 -fm -me 187 cc v4.0, -dropt 188 cc, -dropt 189 gcc 2.3.3, 4 mb ram (16-bit wide), gcc -o2 -fomit-frame-pointer -finline-functions -m68020 190 gcc 2.3.3, 4 mb ram (16-bit wide), gcc -o2 -fomit-frame-pointer -finline-functions -m68020 -dropt 191 --- 192 cc v4.0, -dropt 193 cc, -dropt 194 cc, -dropt 195 cc, -dropt 196 cc, -dropt 197 vax-11 c 2.0, -dropt 198 cc, -dropt 199 cc, -dropt 200 cc, -dropt 201 cc, -dropt 202 cc, -dropt 203 cc 4.1.1.31, -dropt
83 204 cc, -dropt 205 cc, -dropt 206 unisoft cc, -dropt, used 'short' instead of 'int'. 207 ci-c86 2.20m, -dropt 208 cc, -dropt 209 cc, -dropt 210 cc, -dropt 211 lattice c 2.15, -dropt 212 cc, -dropt 213 unisoft cc, -dropt, used 'int'. 214 cc v3.2, -dropt 215 ci-c86 2.20m, -dropt 216 cc, -dropt 217 b16cc 2.0, -dropt 218 cc, -dropt 219 vax-11 c 2.0, -dropt 220 ms 3.0, small memory model 221 b16cc 2.0, -dropt 222 cc, -dropt 223 cc, -dropt 224 manx c 2.30a, -dropt, short (16 bit) ints. 225 mac c, -dropt, used 'short' instead of 'int', short = 16 bits. 226 megamax c 2.0, -dropt 227 cc, -dropt 228 ms 3.0, large memory model, all other 80x8x results are small model under this reference. 229 cc, -dropt 230 ci-c86 2.20m, -dropt 231 lattice 2.15, -dropt 232 megamax c 2.0, -dropt
84 233 mac c, -dropt, 32 bit int 234 ci-c86 2.1, -dropt 235 b16cc 2.0, -dropt 236 cc, -dropt 237 cc, -dropt 238 lattice c 2.15 239 ci-c86 2.20m, -dropt 240 b16cc 2.0, -dropt 241 cc, -dropt 242 cc, -dropt 243 cc, -dropt 244 cc, -dropt 245 aztec cii v1.05i, -dropt 246 c power 2.8, -dropt details of who ran the benchmark: --- ### ref: 0 rick richardson, rer@vaximile.uucp, 23 oct 1985 these are all early v1.0 results collected by rick. 1 motorola 1986 mc68020 benchmark report, br353. includes table of results from rick richardson, pc reasearch, inc. 2 al aburto, aburto@marlin.nosc.mil, 03 jan 1993 3 bill broadley, broadley@neurocog.lrdc.pitt.edu, 07 jan 1993 4 roger uzun, uzun@crash.cts.com, 23 jan 1993 5 clark l. coleman, clc5q@virginia.edu, 19 feb 1993 hp apollo 9000 series 700 perfromance brief, november 1992 6 bob supnik, supnik@human.enet.dec.com, 20 feb 1993 alpha axp server family, performance brief - openvms, 20 nov 1992,
85 second edition, eb-n0104-51 7 huub van baal, huubb@neth.hp.com, 21 feb 1993 8 huub van baal, huubb@neth.hp.com, 23 feb 1993 9 al aburto, aburto@marlin.nosc.mil, 01 mar 1993 10 marc gingold, marc@tobit.saclay.cea.fr, 05 mar 1993 11 eric s. boltz, eboltz@nist.gov, 05 mar 1993 12 simon gornall, sjg@phlim.ph.kcl.ac.uk, 09 mar 1993 13 andrew mutz, mutz@kodak.com, 15 mar 1993 14 paul sutcliffe jr., paul@devon.lns.pa.us, 15 mar 1993 15 p. jeffrey ungar, ungar@embezzle.stanford.edu, 16 mar 1993 16 fred liu, mliu@pt.com, 20 mar 1993 pt-sys5k is a sparc system designed for real time application. 17 al aburto, a@cs.umr.edu, 22 mar 1993 19 mario a. guerra, mguerra@inforisc.cr, 01 apr 1993 20 benno senoner, senoner@ghost.sm.dsi.unimi.it, 02 apr 1993 21 peter valkenburg, valke@cca.vu.nl, 15 apr 1993 22 maurice janssen, mhmj@chem.vu.nl, 16 apr 1993 23 benno senoner, senoner@ghost.sm.dsi.unimi.it, 28 apr 1993 24 piero serini, piero@strider.st.dsi.unimi.it, 1virginia.edu, 23 jul 1993 28 john e. stone, johns@cs.umr.edu, 27 jul 1993 29 roger uzun, uzun@crash.cts.com, 19 aug 1993 30 al aburto, aburto@marlin.nosc.mil, 06 sep 1993 31 michael chapman, mike@hopper.acs.virginia.edu, 15 oct 1993ar, bhandarkar@msbcs.enet.dec.com, 15 oct 1993 posted to comp.benchmarks. 34 henrik harmsen, d9hh@dtek.chalmers.se, 16 oct 1993 35 skip sauls, skip@cy.cs.olemiss.edu, 18 oct 1993 36 18 oct 1993 37 lionel lacassagne, lacass_l@boson.epita.fr, 23 oct 1993
86 38 roger uzun, uzun@crash.cts.com, 31 oct 1993 39 ronald schalk, r.schalk@uci.kun.nl, 04 nov 1993 40 michael ritzert, mjr@dyn.geo.uni-bonn.de, 13 nov 1993 41 al aburto, aburto@ariel.nosc.mil, 28 nov 1993 42 al aburto, aburto@octopus.nosc.mil, 12 dec 1993 43 kristian wedberg, wedberg@mednet.gu.se, 16 dec 1993 44 petri nordlund, petrin@mits.mdata.fi, 18 dec 1993 45 al aburto, aburto@marlin.nosc.mil, 08 jan 1994 46 bo thide', bt@irfu.se, 17 jan 1994 47 pekka j{rvel{inen, jarvelai@csc.fi, 15 feb 1994 48 nsa@link.hacktic.nl, 23 feb 1994 49 wolfram wagner, ww@mpi-sb.mpg.de, 10 mar 1994 50 wolfram wagner, ww@mpi-sb.mpg.de, 28 apr 1994 51 dennis j robinson, cs370_4@eecs.uic.edu, 06 may 1994 52 bill broadley, broadley@neurocog.lrdc.pitt.edu, 08 may 1994 53 harlan w stockman, hwstock@saix531.energylan.sandia.gov, 18 may 1994 54 georg wambach, gw@informatik.uni-koeln.de, 25 may 1994 55 bor-wen jeng, bwjeng@beta.wsl.sinica.edu.tw, 16 jun 1994 56 evan torrie, torrie@cs.stanford.edu, 21 jun 1994 57 pawel potocki, ppotocki@panix.com, 29 jun 1994 58 wolfram wagner, ww@mpi-sb.mpg.de, 19 jul 1994 59 mario guerra, mguerra@cariari.ucr.ac.cr, 10 aug 1994 60 dani, dani@morgana.uab.es, 23 aug 1994 61 zygmunt krawczyk, zkr@ita.pwr.wroc.pl, 09 sep 1994 62 al aburto, aburto@marlin.nosc.mil, 05 nov 1994 al aburto aburto@marlin.nosc.mil -------
87 ________________________________________________________________ index ________________________________________________________________
88 revision history: rev 1.0, 4/1/95, dhg, initial writing rev 1.1, 5/1/95, dhg, found correct formula = template for user benchmarks rev 1.2, 5/5/95, dhg, error correction rev 1.3, 5/8/95, dhg, added more locate info rev 1.4, 5/9/95, dhg, format changes, made internet stuff and appendix rev 1.5, 5/10/95, dhg, added screens for pdrt186, clean up rev 1.6, 5/12/95, dhg, added screens for make/compile/assemble/locate rev 1.7, 5/17/95, dhg, added quick start rev 1.8, 5/18/95, dhg, amd formating applied rev 1.9, 5/19/95, dhg, clean-up, error correction rev 2.0, 5/20/95, dhg, addition of problem reports from alpha sites, index release to epd for final editing.


▲Up To Search▲   

 
Price & Availability of BENCHMARKING

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X