Geeking : 2641


Off-Topic / Geek / 2641	2640 \| 2642

Subject:	Re: 8-bit floating-point number representations?
Newsgroups:	lugnet.off-topic.geek
Date:	Thu, 4 Jan 2001 04:52:03 GMT
Viewed:	101 times

Frank Filz wrote: > > Todd Lehman wrote: > > > > Anyone know of a good C library that efficiently implements conversion from > > a processor-native 'double' floating-point precision number to some standard > > form of an 8-bit floating-point number and back? And a Perl5 library to go > > with it? > > Hmm, 8 bit floating point numbers can't implement a very large range of > numbers. How many bits of exponent were you planning on? > > I think conversion would probably actually be pretty easy since you will > just take e bits of the exponent from the double and 8-e bits of the > rest (mantissa? gosh I haven't done math in ages). You of course should > check that the exponent is in range. I guess you might also want to > handle the case where the exponent is out of range, but the number can > still be expressed in your 8 bit float as a denormalized number, but > this only works for small numbers and you lose precision. After thinking about this, I realized most compact binary floating point forms don't allow denormalized numbers. The reason is that if one assures that all numbers are normalized, the bit to the left of the decimal point is always a 1, and therefore need not be stored, thus increasing the precision by 1 bit (which will be extremely significant for an 8 bit float). I'm not sure if the C double matches the Intel double (I think it does - I think both use IEEE standards). If so, the format of the double is: bit 63: sign bit bits 62-52: 11 bit exponent biased by +1023 (i.e. an exponent of +1 is expressed by the bit value 100 0000 0000) bits 51-0: 52 bits of precision On an Intel machine, you could use something like the following to produce an 8 bit float: /* define MIN and MAX exponent for 4 bits of exponent, we will have 4 bits of precision and 1 sign bit */ /* due to the small range of exponents, we will not provide for all the special values, a byte of all 0s represents the value 0 */ /* an exponent of 0 (-8) is not allowed but could be used with other non-zero bits to specify other special values */ #define MINEXP -7 #define MAXEXP 7 #define BIAS 8 /* constant to multiply to shift exponent into correct position */ #define SHIFT 8 /* mask for how many bits we want of significand */ #define SIGMASK 0xE000 /* constant to divide by to shift significand into correct bits */ #define SIGDIV 0x2000 int Make8bit(double *src, char *res) { int16 exp; char result; uint16 significant; exp = *((int16 *)src) result = exp & 0x8000; /* copy sign bit into result */ exp = (exp & 0x7FF0) / 16 - 1023; /* get exponent */ if ((exp < MINEXP) or (exp > MAXEXP)) return -1; /* error - can't express number */ result = result + exp * 8 + BIAS; significand = *(uint16 *)(1+(char *)src); /* extract some high order bits of significand */ significand = significand * 16; /* shift it into the left most bits */ /* add code here to do rounding if desired */ significand = significand / SIGDIV; /* shift it into the correct bits */ result = result + significand; *res = result; return 0; } Now that code can be really cleaned up and optimized - I'm trying to be a little bit general to show all the steps. Also, if the choice is actually going to be 4 bits exponent and 4 bits precision, significand actually only needs to be a char, and can actually be manipulated at the same time the exponent is. If you only need positive exponents, change the constants to: #define MINEXP 0 #define MAXEXP 14 #define BIAS 1 You still need a bias so a byte with all bits 0 can still represent 0 (with a BIAS of 0, a byte of all 0s would represent the integer 1). The code may actually be cleaner to write in assembler. I've probably made some misteaks but the above gives an idea of what needs to be done at least. Frank

Message has 1 Reply:

		Re: 8-bit floating-point number representations?
(...) Wow, Frank, I didn't know you geeked like that! All right! (...) It past my bedtime, so I'll have to think more about it another time, but you've definitely given me hope that it could be done pretty efficiently. As a bonus in this case, the (...) (23 years ago, 4-Jan-01, to lugnet.off-topic.geek)

Message is in Reply To:

		Re: 8-bit floating-point number representations?
(...) Hmm, 8 bit floating point numbers can't implement a very large range of numbers. How many bits of exponent were you planning on? I think conversion would probably actually be pretty easy since you will just take e bits of the exponent from the (...) (23 years ago, 3-Jan-01, to lugnet.off-topic.geek)

8 Messages in This Thread:

Entire Thread on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact
This Message and its Replies on One Page:: Nested: All | Brief | Compact | Dots
Linear: All | Brief | Compact

Custom Search