Subject:
|
RCX Firmware Speed
|
Newsgroups:
|
lugnet.robotics
|
Date:
|
Wed, 10 Apr 2002 06:20:55 GMT
|
Reply-To:
|
Dick Swan <dickswa@sbcglobal=nomorespam=.net>
|
Highlighted:
|
!
(details)
|
Viewed:
|
791 times
|
| |
| |
This is a follow up to many of the recent posts on the speed of Lego's
RCX byte code interpreter. It's a long post but the salient items are
in the first few paragraphs.
The interpreter is very slow because of all the overhead required to
execute a single interpreter instruction. The scheduler overhead takes
up about 90% of the time and the actual interpreter instruction is the
remainder.
The fastest opcode is an 'alive' [i.e. opcode 0x10] which is
essentially a NoOp at 2.73 msec [milliseconds]. 'Add' opcode is 2.84
msec and 'multiply' or 'divide' [i.e. likely the worst case] at 2.93
msec.
It is just a coincidence that opcode execution is about 3 msec and
sensors are scanned at 3 msec. The Opcode interpreter runs
asynchronously with the sensor interrupt. Opcodes are executed
whenever there is no system task -- sensor handler, display handler,
battery monitoring handler, etc -- ready to run.
The current 2.0 firmware is about 50% slower than the original 1.0
[version 0309] firmware.
Program execution speed can be improved if you can use events rather
than a code loop to check for a condition. The code for checking for
events is included in system overhead.
Simultaneous execution of multiple tasks running will slow down
individual tasks but it shouldn't be very noticeable with just two or
three tasks.
More details on each of the above follow.
2.0 Firmware Byte Code Interpreter Speed:
========================================
Several people have reported current 2.0 firmware as taking about 3
msecs per opcode. My measurements -- using the internal RCX timers and
not a stopwatch for measurement -- are similar at:
2.836 msec Add opcode
2.930 msec Multiply opcode [0xFFFF * 0xFF]
2.927 msec Divide opcode [0xFFFF / 0xFF]
2.633 msec Alive [basically a NoOp}
There really is very little difference between basic and complicated
opcodes. This is because the "system overhead' dominates the time per
opcode. For an understanding of this, see the scheduler pseudo code at
the end of this post.
Systems Counters:
================
2.0 firmware introduced three system counters which are the sames as
global variables 0..2. Writing to these variables takes a little
longer becasue the firmware does some event checking code on these
special variables. This adds another 0.16 msec to the opcode time.
2.0 vs 1.0 Firmware Byte Code Interpreter Speed:
===============================================
The current 2.0 firmware is about 50% slower than the original 1.0
firmware. I posted time measurements on 9 feb 99 on the 1.0 firmware.
A single opcode took about 2 msecs to execute. About 1.75 msec was
overhead to get to the 'opcode execution' code and the remainder was
time to actually execute. An extract from that post gave 'opcode
execution' times -- without overhead -- for several opcodes.
0.17 msec CPU time used to execute a assignment ("j = 0")
0.35 msec CPU time to execute a motor off command.
0.26 msec CPU time to execute a cleartimer() command.
This post also included a simple program to measure the execution time
using the RCX timer. It subtracted the time to execute an "empty" loop
versus the time with a loop containing 20 identical instructions. Each
loop was run 500 times.
The difference between the 1.0 and 2.0 firmware is likely due to
feature enhancements:
- The addition of support for events.
- opcodes can now take any source type rather than in 1.0 where
sources were often restricted to variables or constants. Inline
'if..then..else' code was replaced by a call to a more
flexible common subroutine call.
Impact of Multiple Tasks:
========================
I did measurements a long time ago on 1.0 firmware comparing execution
time of a single task vs five identical copies running at once. The
time to execute one opcode in all running tasks was 2.0 msec for one
task and 3.0 msec for five tasks. This is consistent with the
algorithm given above.
1.75 msec -- time to cycle through all ten possible tasks
checking to see if they need to execute a single
opcode
0.25 msec -- avg time to execute a single opcode
So for one task time is 2.0 msec [1.75 + 0.25] and for five tasks it
is 3.0 msec [1.75 + 5 x 0.25].
I haven't measured 2.0 firmware, but results should be similar after
you add the 50% extra time for 2.0 over 1.0
For details on the 1.75 msec, see the pseudo code below for the system
scheduling algorithm.
System Scheduler Pseudo Code:
============================
The RCX RAM firmware scheduler algorithm is roughly the following:
do forever [loop in main scheduler]
{
select the highest priority task waiting to run.
These tasks are:
sensors - scheduled every 3 msec in 1 msec interrupt handler
LCD display - scheduled every 120 msec
button keypress manager - scheduled every 120 msec
motor - scheduled after stop/start/brake/change direction
opcode
battery voltage - scheduled every 120 msec
opcode handler - scheduled lowest priority task, always ready
to run
execute the task just selected
}
opcodeHandler()
{
if 10 msec tick has occurred since last called
update each event tick counter
if 100 msec tick has just occured
do a bunch of work to check if any events have happened.
if a message is waiting
executeSingleOpcode() corresponding to the message
else
{
move 'current task' to next task. i.e. cycle through 0..9
if 'current task' is 0
{
check all tasks to see if it was waiting for an event that
has just happened
adjust the resources controlled by tasks if pre-empted by
higher priority task .
}
if 'current task' is waiting and task timer has expired
set task state to running
if 'current task' state is running
executeSingleOpcode() [finally!]
if opcode from a message
generate a reply message
Note: task states are 'undefined', 'waiting for timer',
'waiting for event', 'running', 'stopped'
}
return;
}
executeSingleOpcode()
{
switch statement on opcode
{
one 'case' statement for each opcode
}
}
If you've waded through the above pseudo-code, then you can see
there's an awful lot of 'overhead' to get to the half-dozen assembly
instructions actually required to interpret a single byte code
instruction.
|
|
Message has 1 Reply: | | Re: RCX Firmware Speed
|
| Dick, Nice post, and an iteresting topic. My timings differ a little for v2.0 beta, but they are basically the same. My timings use an interal timer as do yours. My multi-task measurement shows that the 3mS framework does not apply to every (...) (23 years ago, 10-Apr-02, to lugnet.robotics)
|
9 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|