To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.off-topic.geekOpen lugnet.off-topic.geek in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Off-Topic / Geek / 2641
2640  |  2642
Subject: 
Re: 8-bit floating-point number representations?
Newsgroups: 
lugnet.off-topic.geek
Date: 
Thu, 4 Jan 2001 04:52:03 GMT
Viewed: 
128 times
  
Frank Filz wrote:

Todd Lehman wrote:

Anyone know of a good C library that efficiently implements conversion from
a processor-native 'double' floating-point precision number to some standard
form of an 8-bit floating-point number and back?  And a Perl5 library to go
with it?

Hmm, 8 bit floating point numbers can't implement a very large range of
numbers. How many bits of exponent were you planning on?

I think conversion would probably actually be pretty easy since you will
just take e bits of the exponent from the double and 8-e bits of the
rest (mantissa? gosh I haven't done math in ages). You of course should
check that the exponent is in range. I guess you might also want to
handle the case where the exponent is out of range, but the number can
still be expressed in your 8 bit float as a denormalized number, but
this only works for small numbers and you lose precision.

After thinking about this, I realized most compact binary floating point
forms don't allow denormalized numbers. The reason is that if one
assures that all numbers are normalized, the bit to the left of the
decimal point is always a 1, and therefore need not be stored, thus
increasing the precision by 1 bit (which will be extremely significant
for an 8 bit float).

I'm not sure if the C double matches the Intel double (I think it does -
I think both use IEEE standards). If so, the format of the double is:

bit 63:     sign bit
bits 62-52: 11 bit exponent biased by +1023 (i.e. an exponent of +1 is
expressed by the bit value 100 0000 0000)
bits 51-0:  52 bits of precision

On an Intel machine, you could use something like the following to
produce an 8 bit float:

/* define MIN and MAX exponent for 4 bits of exponent, we will have 4
bits of precision and 1 sign bit */
/* due to the small range of exponents, we will not provide for all the
special values, a byte of all 0s represents the value 0 */
/* an exponent of 0 (-8) is not allowed but could be used with other
non-zero bits to specify other special values */
#define MINEXP -7
#define MAXEXP 7
#define BIAS   8
/* constant to multiply to shift exponent into correct position */
#define SHIFT  8
/* mask for how many bits we want of significand */
#define SIGMASK 0xE000
/* constant to divide by to shift significand into correct bits */
#define SIGDIV  0x2000

int Make8bit(double *src, char *res) {

   int16  exp;
   char   result;
   uint16 significant;

   exp = *((int16 *)src)

   result = exp & 0x8000; /* copy sign bit into result */

   exp = (exp & 0x7FF0) / 16 - 1023; /* get exponent */

   if ((exp < MINEXP) or (exp > MAXEXP)) return -1; /* error - can't
express number */

   result = result + exp * 8 + BIAS;

   significand = *(uint16 *)(1+(char *)src); /* extract some high order
bits of significand */
   significand = significand * 16; /* shift it into the left most bits
*/
   /* add code here to do rounding if desired */
   significand = significand / SIGDIV; /* shift it into the correct bits
*/
   result = result + significand;
   *res = result;
   return 0;
}

Now that code can be really cleaned up and optimized - I'm trying to be
a little bit general to show all the steps. Also, if the choice is
actually going to be 4 bits exponent and 4 bits precision, significand
actually only needs to be a char, and can actually be manipulated at the
same time the exponent is. If you only need positive exponents, change
the constants to:

#define MINEXP 0
#define MAXEXP 14
#define BIAS   1

You still need a bias so a byte with all bits 0 can still represent 0
(with a BIAS of 0, a byte of all 0s would represent the integer 1).

The code may actually be cleaner to write in assembler.

I've probably made some misteaks but the above gives an idea of what
needs to be done at least.

Frank



Message has 1 Reply:
  Re: 8-bit floating-point number representations?
 
(...) Wow, Frank, I didn't know you geeked like that! All right! (...) It past my bedtime, so I'll have to think more about it another time, but you've definitely given me hope that it could be done pretty efficiently. As a bonus in this case, the (...) (24 years ago, 4-Jan-01, to lugnet.off-topic.geek)

Message is in Reply To:
  Re: 8-bit floating-point number representations?
 
(...) Hmm, 8 bit floating point numbers can't implement a very large range of numbers. How many bits of exponent were you planning on? I think conversion would probably actually be pretty easy since you will just take e bits of the exponent from the (...) (24 years ago, 3-Jan-01, to lugnet.off-topic.geek)

8 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR