Subject:
|
Re: RCX Firmware Speed
|
Newsgroups:
|
lugnet.robotics
|
Date:
|
Wed, 10 Apr 2002 13:06:07 GMT
|
Viewed:
|
759 times
|
| |
| |
Dick,
Nice post, and an iteresting topic.
My timings differ a little for v2.0 beta, but they are basically the same.
My timings use an interal timer as do yours.
My multi-task measurement shows that the 3mS framework does not apply to
every instruction executed for v2.0. This is consistant with your findings.
My measurement loop contains a single bytecode in a for loop executed 2000
times. The cost of executing an empty for loop 2000 times is then
subtracted from the time of the measurement loop.
I modified my timing test to measure the timing of repeated CHKL against
from 1 to 6 other tasks performing a while(1) loop. I got these results.
Times are milliseconds per bytecode executed in the measurement loop.
#tasks time diff
1 3.00
2 3.26 0.26
3 3.58 0.32
4 3.80 0.22
5 4.12 0.32
6 4.44 0.32
7 4.75 0.31
These timings confirm that adding new tasks gives an incremental increase in
execution time.
Making each new task perform the same code as the original measurement task
would give measurements of the actual "few assembly instructions" needed to
perform the CHKL execution. As it stands, the differents reflects the time
it takes to perform the while(1) loop.
Kevin
I did a slight modification to my measurement code, and ran
"Dick Swan" <dickswan@sbcglobal.net> wrote in message
news:GuC9L5.BMn@lugnet.com...
> This is a follow up to many of the recent posts on the speed of Lego's
> RCX byte code interpreter. It's a long post but the salient items are
> in the first few paragraphs.
>
> The interpreter is very slow because of all the overhead required to
> execute a single interpreter instruction. The scheduler overhead takes
> up about 90% of the time and the actual interpreter instruction is the
> remainder.
>
> The fastest opcode is an 'alive' [i.e. opcode 0x10] which is
> essentially a NoOp at 2.73 msec [milliseconds]. 'Add' opcode is 2.84
> msec and 'multiply' or 'divide' [i.e. likely the worst case] at 2.93
> msec.
>
> It is just a coincidence that opcode execution is about 3 msec and
> sensors are scanned at 3 msec. The Opcode interpreter runs
> asynchronously with the sensor interrupt. Opcodes are executed
> whenever there is no system task -- sensor handler, display handler,
> battery monitoring handler, etc -- ready to run.
>
> The current 2.0 firmware is about 50% slower than the original 1.0
> [version 0309] firmware.
>
> Program execution speed can be improved if you can use events rather
> than a code loop to check for a condition. The code for checking for
> events is included in system overhead.
>
> Simultaneous execution of multiple tasks running will slow down
> individual tasks but it shouldn't be very noticeable with just two or
> three tasks.
>
> More details on each of the above follow.
>
> 2.0 Firmware Byte Code Interpreter Speed:
> ========================================
> Several people have reported current 2.0 firmware as taking about 3
> msecs per opcode. My measurements -- using the internal RCX timers and
> not a stopwatch for measurement -- are similar at:
> 2.836 msec Add opcode
> 2.930 msec Multiply opcode [0xFFFF * 0xFF]
> 2.927 msec Divide opcode [0xFFFF / 0xFF]
> 2.633 msec Alive [basically a NoOp}
>
> There really is very little difference between basic and complicated
> opcodes. This is because the "system overhead' dominates the time per
> opcode. For an understanding of this, see the scheduler pseudo code at
> the end of this post.
>
> Systems Counters:
> ================
> 2.0 firmware introduced three system counters which are the sames as
> global variables 0..2. Writing to these variables takes a little
> longer becasue the firmware does some event checking code on these
> special variables. This adds another 0.16 msec to the opcode time.
>
> 2.0 vs 1.0 Firmware Byte Code Interpreter Speed:
> ===============================================
> The current 2.0 firmware is about 50% slower than the original 1.0
> firmware. I posted time measurements on 9 feb 99 on the 1.0 firmware.
> A single opcode took about 2 msecs to execute. About 1.75 msec was
> overhead to get to the 'opcode execution' code and the remainder was
> time to actually execute. An extract from that post gave 'opcode
> execution' times -- without overhead -- for several opcodes.
> 0.17 msec CPU time used to execute a assignment ("j = 0")
> 0.35 msec CPU time to execute a motor off command.
> 0.26 msec CPU time to execute a cleartimer() command.
>
> This post also included a simple program to measure the execution time
> using the RCX timer. It subtracted the time to execute an "empty" loop
> versus the time with a loop containing 20 identical instructions. Each
> loop was run 500 times.
>
> The difference between the 1.0 and 2.0 firmware is likely due to
> feature enhancements:
> - The addition of support for events.
> - opcodes can now take any source type rather than in 1.0 where
> sources were often restricted to variables or constants. Inline
> 'if..then..else' code was replaced by a call to a more
> flexible common subroutine call.
>
> Impact of Multiple Tasks:
> ========================
> I did measurements a long time ago on 1.0 firmware comparing execution
> time of a single task vs five identical copies running at once. The
> time to execute one opcode in all running tasks was 2.0 msec for one
> task and 3.0 msec for five tasks. This is consistent with the
> algorithm given above.
> 1.75 msec -- time to cycle through all ten possible tasks
> checking to see if they need to execute a single
> opcode
> 0.25 msec -- avg time to execute a single opcode
> So for one task time is 2.0 msec [1.75 + 0.25] and for five tasks it
> is 3.0 msec [1.75 + 5 x 0.25].
>
> I haven't measured 2.0 firmware, but results should be similar after
> you add the 50% extra time for 2.0 over 1.0
>
> For details on the 1.75 msec, see the pseudo code below for the system
> scheduling algorithm.
>
> System Scheduler Pseudo Code:
> ============================
> The RCX RAM firmware scheduler algorithm is roughly the following:
>
> do forever [loop in main scheduler]
> {
> select the highest priority task waiting to run.
> These tasks are:
> sensors - scheduled every 3 msec in 1 msec interrupt handler
> LCD display - scheduled every 120 msec
> button keypress manager - scheduled every 120 msec
> motor - scheduled after stop/start/brake/change direction
> opcode
> battery voltage - scheduled every 120 msec
> opcode handler - scheduled lowest priority task, always ready
> to run
>
> execute the task just selected
> }
>
> opcodeHandler()
> {
> if 10 msec tick has occurred since last called
> update each event tick counter
>
> if 100 msec tick has just occured
> do a bunch of work to check if any events have happened.
>
> if a message is waiting
> executeSingleOpcode() corresponding to the message
> else
> {
> move 'current task' to next task. i.e. cycle through 0..9
>
> if 'current task' is 0
> {
> check all tasks to see if it was waiting for an event that
> has just happened
>
> adjust the resources controlled by tasks if pre-empted by
> higher priority task .
> }
>
> if 'current task' is waiting and task timer has expired
> set task state to running
>
> if 'current task' state is running
> executeSingleOpcode() [finally!]
>
> if opcode from a message
> generate a reply message
>
> Note: task states are 'undefined', 'waiting for timer',
> 'waiting for event', 'running', 'stopped'
> }
> return;
> }
>
> executeSingleOpcode()
> {
> switch statement on opcode
> {
> one 'case' statement for each opcode
> }
> }
>
> If you've waded through the above pseudo-code, then you can see
> there's an awful lot of 'overhead' to get to the half-dozen assembly
> instructions actually required to interpret a single byte code
> instruction.
>
>
>
>
>
>
|
|
Message has 1 Reply:
Message is in Reply To:
| | RCX Firmware Speed
|
| This is a follow up to many of the recent posts on the speed of Lego's RCX byte code interpreter. It's a long post but the salient items are in the first few paragraphs. The interpreter is very slow because of all the overhead required to execute a (...) (23 years ago, 10-Apr-02, to lugnet.robotics) !
|
9 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
This Message and its Replies on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
Active threads in Robotics
|
|
|
|