To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.roboticsOpen lugnet.robotics in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Robotics / 17669
17668  |  17670
Subject: 
Re: RCX Firmware Speed
Newsgroups: 
lugnet.robotics
Date: 
Wed, 10 Apr 2002 13:06:07 GMT
Viewed: 
559 times
  
Dick,

  Nice post, and an iteresting topic.

  My timings differ a little for v2.0 beta, but they are basically the same.
My timings use an interal timer as do yours.

  My multi-task measurement shows that the 3mS framework does not apply to
every instruction executed for v2.0.  This is consistant with your findings.
My measurement loop contains a single bytecode in a for loop executed 2000
times.  The cost of executing an empty for loop 2000 times is then
subtracted from the time of the measurement loop.

  I modified my timing test to measure the timing of repeated CHKL against
from 1 to 6 other tasks performing a while(1) loop.  I got these results.
Times are milliseconds per bytecode executed in the measurement loop.

  #tasks time   diff
      1     3.00
      2     3.26  0.26
      3     3.58  0.32
      4     3.80  0.22
      5     4.12  0.32
      6     4.44  0.32
      7     4.75  0.31

These timings confirm that adding new tasks gives an incremental increase in
execution time.

Making each new task perform the same code as the original measurement task
would give measurements of the actual "few assembly instructions" needed to
perform the CHKL execution.  As it stands, the differents reflects the time
it takes to perform the while(1) loop.

Kevin

  I did a slight modification to my measurement code, and ran
"Dick Swan" <dickswan@sbcglobal.net> wrote in message
news:GuC9L5.BMn@lugnet.com...
This is a follow up to many of the recent posts on the speed of Lego's
RCX byte code interpreter. It's a long post but the salient items are
in the first few paragraphs.

The interpreter is very slow because of all the overhead required to
execute a single interpreter instruction. The scheduler overhead takes
up about 90% of the time and the actual interpreter instruction is the
remainder.

The fastest opcode is an 'alive' [i.e. opcode 0x10] which is
essentially a NoOp at 2.73 msec [milliseconds]. 'Add' opcode is 2.84
msec and 'multiply' or 'divide' [i.e. likely the worst case] at 2.93
msec.

It is just a coincidence that opcode execution is about 3 msec and
sensors are scanned at 3 msec. The Opcode interpreter runs
asynchronously with the sensor interrupt. Opcodes are executed
whenever there is no system task -- sensor handler, display handler,
battery monitoring handler, etc -- ready to run.

The current 2.0 firmware is about 50% slower than the original 1.0
[version 0309] firmware.

Program execution speed can be improved if you can use events rather
than a code loop to check for a condition. The code for checking for
events is included in system overhead.

Simultaneous execution of multiple tasks running will slow down
individual tasks but it shouldn't be very noticeable with just two or
three tasks.

More details on each of the above follow.

2.0 Firmware Byte Code Interpreter Speed:
========================================
Several people have reported current 2.0 firmware as taking about 3
msecs per opcode. My measurements -- using the internal RCX timers and
not a stopwatch for measurement -- are similar at:
   2.836 msec   Add opcode
   2.930 msec   Multiply opcode [0xFFFF * 0xFF]
   2.927 msec   Divide opcode [0xFFFF / 0xFF]
   2.633 msec   Alive [basically a NoOp}

There really is very little difference between basic and complicated
opcodes. This is because the "system overhead' dominates the time per
opcode. For an understanding of this, see the scheduler pseudo code at
the end of this post.

Systems Counters:
================
2.0 firmware introduced three system counters which are the sames as
global variables 0..2. Writing to these variables takes a little
longer becasue the firmware does some event checking code on these
special variables. This adds another 0.16 msec to the opcode time.

2.0 vs 1.0 Firmware Byte Code Interpreter Speed:
===============================================
The current 2.0 firmware is about 50% slower than the original 1.0
firmware. I posted time measurements on 9 feb 99 on the 1.0 firmware.
A single opcode took about 2 msecs to execute. About 1.75 msec was
overhead to get to the 'opcode execution' code and the remainder was
time to actually execute. An extract from that post gave 'opcode
execution' times -- without overhead -- for several opcodes.
  0.17 msec   CPU time used to execute a assignment ("j = 0")
  0.35 msec   CPU time to execute a motor off command.
  0.26 msec   CPU time to execute a cleartimer() command.

This post also included a simple program to measure the execution time
using the RCX timer. It subtracted the time to execute an "empty" loop
versus the time with a loop containing 20 identical instructions. Each
loop was run 500 times.

The difference between the 1.0 and 2.0 firmware is likely due to
feature enhancements:
  - The addition of support for events.
  - opcodes can now take any source type rather than in 1.0 where
    sources were often restricted to variables or constants. Inline
    'if..then..else' code was replaced by a call to a more
    flexible common subroutine call.

Impact of Multiple Tasks:
========================
I did measurements a long time ago on 1.0 firmware comparing execution
time of a single task vs five identical copies running at once. The
time to execute one opcode in all running tasks was 2.0 msec for one
task and 3.0 msec for five tasks. This is consistent with the
algorithm given above.
   1.75 msec -- time to cycle through all ten possible tasks
                checking to see if they need to execute a single
                opcode
   0.25 msec -- avg time to execute a single opcode
So for one task time is 2.0 msec [1.75 + 0.25] and for five tasks it
is 3.0 msec [1.75 + 5 x 0.25].

I haven't measured 2.0 firmware, but results should be similar after
you add the 50% extra time for 2.0 over 1.0

For details on the 1.75 msec, see the pseudo code below for the system
scheduling algorithm.

System Scheduler Pseudo Code:
============================
The RCX RAM firmware scheduler algorithm is roughly the following:

do forever [loop in main scheduler]
{
   select the highest priority task waiting to run.
   These tasks are:
     sensors - scheduled every 3 msec in 1 msec interrupt handler
     LCD display - scheduled every 120 msec
     button keypress manager - scheduled every 120 msec
     motor - scheduled after stop/start/brake/change direction
             opcode
     battery voltage - scheduled every 120 msec
     opcode handler - scheduled lowest priority task, always ready
                      to run

   execute the task just selected
}

opcodeHandler()
{
  if 10 msec tick has occurred since last called
    update each event tick counter

  if 100 msec tick has just occured
    do a bunch of work to check if any events have happened.

  if a message is waiting
    executeSingleOpcode() corresponding to the message
  else
  {
    move 'current task' to next task. i.e. cycle through 0..9

    if 'current task' is 0
    {
      check all tasks to see if it was waiting for an event that
      has just happened

      adjust the resources controlled by tasks if pre-empted by
      higher priority task .
    }

    if 'current task' is waiting and task timer has expired
      set task state to running

    if 'current task' state is running
      executeSingleOpcode() [finally!]

    if opcode from a message
      generate a reply message

    Note: task states are 'undefined', 'waiting for timer',
          'waiting for event', 'running', 'stopped'
  }
  return;
}

executeSingleOpcode()
{
  switch statement on opcode
  {
    one 'case' statement for each opcode
  }
}

If you've waded through the above pseudo-code, then you can see
there's an awful lot of 'overhead' to get to the half-dozen assembly
instructions actually required to interpret a single byte code
instruction.









Message has 1 Reply:
  New pbForth 2.1.0 (and firmware speed)
 
For everyone that has been interested in firmware speed over the last few days, I am ready to announce the latest version of pbForth. This version supports the USB tower under Windows!!!! (URL) An empty loop of 20,000 interations takes less than 1 (...) (22 years ago, 10-Apr-02, to lugnet.robotics.rcx.pbforth, lugnet.robotics.rcx)

Message is in Reply To:
  RCX Firmware Speed
 
This is a follow up to many of the recent posts on the speed of Lego's RCX byte code interpreter. It's a long post but the salient items are in the first few paragraphs. The interpreter is very slow because of all the overhead required to execute a (...) (22 years ago, 10-Apr-02, to lugnet.robotics) ! 

9 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR