To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.roboticsOpen lugnet.robotics in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Robotics / 17676
17675  |  17677
Subject: 
Re: RCX Firmware speed versus legOS
Newsgroups: 
lugnet.robotics
Date: 
Thu, 11 Apr 2002 23:47:19 GMT
Reply-To: 
Dick Swan <DICKSWA@SBCGLOBAL.NETstopspam>
Viewed: 
537 times
  
"Michael Obenland" <obenland@t-online.de> wrote in message
news:GuEtHr.M39@lugnet.com...
<<snip>>
----------------------------------------------------------

It took 0.211 sec. running empty, i.e. without reading the sensor • but only
doing calculations, and 1.113 sec. with reading the sensor. So I can • assume
that reading 10000 light values takes about 1 second.


I was surprized at the wide disparity between 'empty loop' and loop
with 'reading the sensor'. I suspect there is significant compile time
code optimization at play here. I use Hitachi's H8 tool chain and not
GCC. Hitachi's compiler did lots of compile time optimization on the
empty loop in recognizing that many of the variables have constant
value.

I compiled three different versions of the program to see how many
instructions were included in the main loop. Results as follows:
[1]  9   7  'empty loop'
[2] 13  11  add sensor read to loop.
[3] 22  20  eliminated compile time code optimization.
Case [1] had nine instructions in the loop. 7 of these instructions
are executed; the other two are part of 'if..then..' code that is
never executed. In case [3] need to add the time taken in the call to
'divide' subroutine. Case [3] generated inline code for the multiply
by 3 code.

Note that a smarter compiler could generate even more efficient code
and get the loop code down to two instructions by eliminating the code
between labels L541-l544 in the snippet below.



The remainder of this note contains the actual code generated by
compiler for any 'hard core' readers.

The code generated by Hitachi's H8 compiler for [1] is next. I
manually inserted '//' comments.

max = 0;
      SUB.W       R4,R4        // compiler used R4 a register variable
min = 5000;
      MOV.W       #5000:16,R3  // compiler used R3a register variable
mean = 0;                      // compiler generated no code.
value = 0;                     // compiler generated no code.
for (index2 = 1; index2 <= kNumberOfLoops; ++index2)
      MOV.W       #1:16,R0    // compiler used R0a register variable
      BRA         L542
{
  //value = sensor2.sensorRaw();
  if( max < value ) max = value;
L541: MOV.W       R4,R4
      BGE         L543
      SUB.W       R4,R4
L543:
  if( min > value ) min = value;
      MOV.W       R3,R3
      BLE         L544
      SUB.W       R3,R3
L544: ADDS.W      #1,R0
L542: CMP.W       R1,R0
      BLE         L541
  mean = ( mean * 3 - mean + value ) / 3; // compiler generated no
code.
}

I recompiled with the sensor read instruction included, i.e. [2]
above. The same optimizations occurred as shown below.

max = 0;
      SUB.W       R4,R4
min = 5000;
      MOV.W       #5000:16,R0
      MOV.W       R0,@SP
mean = 0;
value = 0;
for (index2 = 1; index2 <= kNumberOfLoops; ++index2)
      MOV.W       #1:16,R5
      BRA         L547
{
  value = sensorRead(2);
L546: MOV.W       #2:16,R0
      BSR         _sensorRead
      MOV.W       R0,R6
  if( max < value ) max = value;
      CMP.W       R0,R4
      BGE         L548
      MOV.W       R0,R4
  if( min > value ) min = value;
L548: MOV.W       @SP,R0
      CMP.W       R6,R0
      BLE         L549
      MOV.W       R6,@SP
L549: ADDS.W      #1,R5
L547: CMP.W       R3,R5
      BLE         L546
  mean = ( mean * 3 - mean + value ) / 3;
}

Finally, I forced all code to be generating by adding, outside the
loop, the statement 'extVariable = mean + value', i.e. case [3]. Lots
more code generated.

max = 0;
      SUB.W       R4,R4
min = 5000;
      MOV.W       #5000:16,R0
      MOV.W       R0,@(2:16,SP)
mean = 0;
value = 0;
      SUB.W       R6,R6
      MOV.W       R6,R5
for (index2 = 1; index2 <= kNumberOfLoops; ++index2)
      MOV.W       #1:16,R3
      BRA         L548
{
value = sensorRead(2);
L547: MOV.W       #2:16,R0
      BSR         _sensorRead
      MOV.W       R0,R6
if( max < value ) max = value;
      CMP.W       R0,R4
      BGE         L549
      MOV.W       R0,R4
if( min > value ) min = value;
L549: MOV.W       @(2:16,SP),R0
      CMP.W       R6,R0
      BLE         L550
      MOV.W       R6,@(2:16,SP)
L550:
mean = ( mean * 3 - mean + value ) / 3;
      MOV.W       R5,R0
      MOV.W       R0,R1
      ADD.W       R0,R0
      ADD.W       R1,R0    // inline code for '*3'
      SUB.W       R5,R0
      ADD.W       R6,R0
      MOV.W       #3:16,R1
      JSR         @$DIVI$3:16
      MOV.W       R0,R5
      ADDS.W      #1,R3
L548:MOV.W       @SP,R0
      CMP.W       R0,R3
      BLE         L547
}
extVariable = mean + value;  // to 'disable' compile time code
optimization
      ADD.W       R6,R5
      MOV.W       R5,@_extVariable:16



Message has 1 Reply:
  Re: RCX Firmware speed versus legOS
 
(...) Of course you are right. <shame on me> (...) Hm? You don't use the gcc toolchain for the H8 but another compiler? Can you tell a bit more about your compiler? I revised my testings without optimizing out the calculation and the empty loop took (...) (22 years ago, 12-Apr-02, to lugnet.robotics)

Message is in Reply To:
  RCX Firmware speed versus legOS
 
The discussion about the Firmware speed is intresting indeed. So I did a fast, by no means representative, reserch. The following program is a legOS program reading the light sensor, calculating some values and printing the elapsed time: ---...--- (...) (22 years ago, 11-Apr-02, to lugnet.robotics)

3 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR