|
As I was poking around in the BrickOS kernel, it
occurred to me that using the sys_time variable isn't
entirely safe. Since sys_time is 32-bits and the
processor can only read 16-bits at a time into a
register, there is a period of time between the
instructions that read the upper and lower halves
of sys_time where the interrupt that increments
sys_time could fire and cause an inconsistent reading
of sys_time to occur.
The opportunity for the error to occur only happens
every 65 seconds or so when the lower 16-bits of
sys_time overflow (i.e. every 65536 milliseconds).
And, even then it'll only manifest itself if the interrupt
fires between the two instructions that read the upper
and lower 16-bits of the sys_time variable. So, the
chances of this occurring are exceedingly remote, but
nonetheless possible.
The bug is subtle, but if you were able to watch sys_time
closely in my example program (below) you would see it
increment as follows:
0x0014FFFE
0x0014FFFF
0x00140000
0x00150000
0x00150001
Normally, sys_time would be safe to use by temporarily
disabling interrupts like so:
disable_irqs();
end_time = sys_time + TIMEOUT;
enable_irqs();
And, using sys_time in an ISR would not be a problem
since interrupts are already disabled at that time.
However, the kicker is that in the current version of
BrickOS, the sys_time variable is updated by the
watchdog timer on an NMI so the above solution won't
work. It makes using sys_time in a normal ISR unsafe,
as well.
Possible solutions include:
1) Move the sys_time increment back to a normal
IRQ and use the above code snippet when IRQs
happen to be enabled.
2) Use a function that is immune to the problem to
read sys_time (even when using sys_time in an ISR).
Here is an example (though it's kind of a hack):
extern time_t get_sys_time();
__asm__("
.text
.align 1
.global _get_sys_time
_get_sys_time:
mov.w @_sys_time+2,r1
mov.w @_sys_time,r0
mov.w @_sys_time+2,r2
cmp r2,r1
bne _get_sys_time
rts
");
BTW, I did test to see if this problem can actually
happen. The following is a test program that
will demonstrate the problem. You may need to run
it 10-20 minutes (or more) for it to fail.
#include <conio.h>
#include <unistd.h>
#include <time.h>
int main(int argc, char **argv)
{
time_t a, b;
cputs("test");
do
{
a = sys_time;
b = sys_time;
}
while (a <= b);
cputs("fail");
// so we don't power down (use AC power!)
while (1);
return 0;
}
The assembly code for the above is as follows.
You can see the places where it would be bad
for an NMI to occur.
.file "sys_time_fail.c"
.section .rodata
.LC0:
.ascii "test\0"
.LC1:
.ascii "fail\0"
.section .text
.align 1
.global _main
_main:
push r4
mov.w #.LC0,r0
jsr @_cputs
.L23:
mov.w @_sys_time,r0
; please no NMI here!
mov.w @_sys_time+2,r1
mov.w @_sys_time,r2
; please no NMI here!
mov.w @_sys_time+2,r3
mov.w r2,r4
cmp.w r4,r0
bhi .L21
bne .L23
mov.w r3,r2
cmp.w r2,r1
bls .L23
.L21:
mov.w #.LC1,r0
jsr @_cputs
.L27:
bra .L27
.end
Thanks for reading if you made it this far!
Cheers,
Mark
|
|
Message has 1 Reply: | | Re: sys_time (long post)
|
| Mark, Excellent Post ! Ok, I will admit it; I was the one that put the sys_time handler on the NMI. 8-) I guess I will want to think about this a bit; but I see that you have done a good bit of that yourself. ... Could the 'Get' function do the (...) (22 years ago, 12-Dec-02, to lugnet.robotics.rcx.legos)
|
6 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|