To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.robotics.rcx.legosOpen lugnet.robotics.rcx.legos in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Robotics / RCX / legOS / 3032
3031  |  3033
Subject: 
Re: sys_time (long post)
Newsgroups: 
lugnet.robotics.rcx.legos
Date: 
Thu, 12 Dec 2002 23:23:06 GMT
Viewed: 
2879 times
  
Mark,

Excellent Post !

Ok, I will admit it; I was the one that put the sys_time handler on the
NMI.  8-)

I guess I will want to think about this a bit; but I see that you have
done a good bit of that yourself.

...

Could the 'Get' function do the following? (I wonder)

Read sys_time into A;
Read sys_time into B;
return the larger of the two;

Again, it is pretty much a hack, but it would allow us to leave existing
code pretty much the same (except to call the function instead of
reading it directly) and also keep the sys_time handler the same.

Another similar approach would be...

disable irqs;
Clear sys_time_flag;
Read sys_time into A;
check sys_time_flag, if non-zero, reread sys_time into A;
enable irqs;
return A;

The sys_time handler would update sys_time and set the sys_time flag to
non-zero.

I doubt that the second read would take so long that the sys_time
handler could be invoked again.

...

I guess it comes down to:

sys_time is as accurate as possible, but it takes a bit of work to read it.
-- or --
sys_time is easy to read, but can be kept from updating by masking
interrupts off.

I am leaning toward the first situation myself.

Again, Good one Mark!
// Joe


Mark Riley wrote:
As I was poking around in the BrickOS kernel, it
occurred to me that using the sys_time variable isn't
entirely safe.  Since sys_time is 32-bits and the
processor can only read 16-bits at a time into a
register, there is a period of time between the
instructions that read the upper and lower halves
of sys_time where the interrupt that increments
sys_time could fire and cause an inconsistent reading
of sys_time to occur.

The opportunity for the error to occur only happens
every 65 seconds or so when the lower 16-bits of
sys_time overflow (i.e. every 65536 milliseconds).
And, even then it'll only manifest itself if the interrupt
fires between the two instructions that read the upper
and lower 16-bits of the sys_time variable.  So, the
chances of this occurring are exceedingly remote, but
nonetheless possible.

The bug is subtle, but if you were able to watch sys_time
closely in my example program (below) you would see it
increment as follows:

0x0014FFFE
0x0014FFFF
0x00140000
0x00150000
0x00150001

Normally, sys_time would be safe to use by temporarily
disabling interrupts like so:

  disable_irqs();
  end_time = sys_time + TIMEOUT;
  enable_irqs();

And, using sys_time in an ISR would not be a problem
since interrupts are already disabled at that time.

However, the kicker is that in the current version of
BrickOS, the sys_time variable is updated by the
watchdog timer on an NMI so the above solution won't
work.  It makes using sys_time in a normal ISR unsafe,
as well.

Possible solutions include:

1) Move the sys_time increment back to a normal
IRQ and use the above code snippet when IRQs
happen to be enabled.

2) Use a function that is immune to the problem to
read sys_time (even when using sys_time in an ISR).
Here is an example (though it's kind of a hack):

extern time_t get_sys_time();
__asm__("
.text
.align 1
.global _get_sys_time
_get_sys_time:
  mov.w @_sys_time+2,r1
  mov.w @_sys_time,r0
  mov.w @_sys_time+2,r2
  cmp r2,r1
  bne _get_sys_time
  rts
");

BTW, I did test to see if this problem can actually
happen.  The following is a test program that
will demonstrate the problem.  You may need to run
it 10-20 minutes (or more) for it to fail.

#include <conio.h>
#include <unistd.h>
#include <time.h>

int main(int argc, char **argv)
  {
  time_t a, b;

  cputs("test");

  do
    {
    a = sys_time;
    b = sys_time;
    }
  while (a <= b);

  cputs("fail");

  // so we don't power down (use AC power!)
  while (1);

  return 0;
  }

The assembly code for the above is as follows.
You can see the places where it would be bad
for an NMI to occur.

.file "sys_time_fail.c"
.section .rodata
.LC0:
.ascii "test\0"
.LC1:
.ascii "fail\0"
.section .text
.align 1
.global _main
_main:
push r4
mov.w #.LC0,r0
jsr @_cputs
.L23:
mov.w @_sys_time,r0
; please no NMI here!
mov.w @_sys_time+2,r1
mov.w @_sys_time,r2
; please no NMI here!
mov.w @_sys_time+2,r3
mov.w r2,r4
cmp.w r4,r0
bhi .L21
bne .L23
mov.w r3,r2
cmp.w r2,r1
bls .L23
.L21:
mov.w #.LC1,r0
jsr @_cputs
.L27:
bra .L27
.end

Thanks for reading if you made it this far!

Cheers,

Mark






Message has 1 Reply:
  Re: sys_time (long post)
 
(...) It turns out that the nature of the glitch depends on the order you read the upper and lower 16-bits of sys_time. If the compiler generates code that reads the high word first (which it seems to do), then your example will work. However, if (...) (22 years ago, 13-Dec-02, to lugnet.robotics.rcx.legos)

Message is in Reply To:
  sys_time (long post)
 
As I was poking around in the BrickOS kernel, it occurred to me that using the sys_time variable isn't entirely safe. Since sys_time is 32-bits and the processor can only read 16-bits at a time into a register, there is a period of time between the (...) (22 years ago, 12-Dec-02, to lugnet.robotics.rcx.legos)

6 Messages in This Thread:


Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR