To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.off-topic.geekOpen lugnet.off-topic.geek in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Off-Topic / Geek / 2647
2646  |  2648
Subject: 
Re: 8-bit floating-point number representations?
Newsgroups: 
lugnet.off-topic.geek
Date: 
Sun, 7 Jan 2001 19:05:23 GMT
Viewed: 
354 times
  
In lugnet.off-topic.geek, Frank Filz writes:
Frank Filz wrote:
What are the range and characteristics of the numbers you are dealing
with?

Well, basically, whatever is fast and flexible -- whatever that turns out
to be.  It might even turn out that a fixed-point representation is flexible
enough and maybe even faster since it would involve only a single floating-
point multiply by an arbitrary value and then a divide by a power-of-two
constant, plus the overhead of conversion from an integer to a floating-point
value (I imagine that's negligible on today's superscalar CPUs, but I haven't
checked).


I was thinking about this some more, and am really wondering what you
will be doing with the numbers.

For use in the news search engine query processor, I want to multiply one of
these 8-bit floating-point numbers, looked up in a memory-mapped file, by an
80-bit floating-point number, at the rate of about 5-10 million per second
on a 400 MHz. Pentium II, in C code compiled with gcc -O3.  The engine
currently analyzies about 1 million articles per CPU second; slowing it down
by 10%-20% would be acceptable.

The idea is that, after a query-dependent score has been computed for each
article, that score gets multiplied by a final weighting factor defined by
some external table prior to ranking the results that get presented to the
user.  This allows for some articles to be raised or lowered artificially
in their ranking relative to other articles.

External tables could be any of:

a) All 1's -- i.e., no effect (in implementation, this would probably simply
   bypass the actual multiply) -- table lookup disabled.
b) Current article scores based on member feedback.
c) Predicted article scores based on collaborative filtering.
d) How close an article is to the top of its thread (might be 1.0's and 0.5's
   or any values 0.0 to 1.0).
e) Some user-defined, user-specific function based on who posted an article,
   potentially useful for killfile-style filtering.
etc.

The expressible domain of values isn't actually as important as the overall
throughput speed of conversion from 8-bit format (whether that's fixed-point
or floating-point) to the native 80-bit (or 64-bit) internal floating-point
representation just prior to multiplication.  The size of the mantissa is
also more important than the size of the exponent, since normalization isn't
really needed.

Given that some bit-twiddling and shifting would be needed to convert an
8-bit floating-point number into a 64-bit floating-point number, I'm
beginning to wonder whether simply a fixed-point representation would be
the best after all.  It could multiply by the unsigned 8-bit number that it
looks up, then divide by 255.0.  If that's too slow, it might be faster to
divide by 128.0 and restrict the range of 8-bit unsigned values to [0,128]
instead of [0,255].

I guess doing some speed tests on actual data will be the only real way to
know.  I was hoping that there was some magic way to load a low-precision
floating-point number faster than loading and converting an integer and
dividing by a constant.

Hey, what if an unsigned 8-bit integer in the range [0,255] were loaded and
used as an offset into a 1024-byte table of 256 32-bit floating-point numbers
or into a 2048-byte table of 256 64-bit floating-point numbers?  That would
work at the same speed for 8-bit fixed-point numbers or 8-bit floating-point
numbers, and probably be very, very fast if the table is small enough.

Heck, if the integer range were restricted to [0,15], it could even be a
4-bit fixed-point number that gets looked up in an itty-bitty 64-byte table
of 16 32-bit floating-point numbers or a 128-byte table of 16 64-bit
floating-point numbers.  As a bonus, that would also chop the size of the
large memory-mapped file in half without sacrificing too much (I would hope)
precision.

My gut instincts tell me that a 16-level significance factor would be plenty
for these purposes inside of a search.

--Todd



Message has 1 Reply:
  Re: 8-bit floating-point number representations?
 
(...) <snip> (...) Aah, with your external tables, 0 to 15 would work well. You don't even need to scale it really since the output of your algorithm is just the relative ranking of articles. It would be easy to generate a 0-15 rating for the (...) (23 years ago, 7-Jan-01, to lugnet.off-topic.geek)

Message is in Reply To:
  Re: 8-bit floating-point number representations?
 
(...) I was thinking about this some more, and am really wondering what you will be doing with the numbers. If the only computations you need to do with them are comparisons, then once converted they will be extremely cheap to work with (the reason (...) (23 years ago, 5-Jan-01, to lugnet.off-topic.geek)

8 Messages in This Thread:

Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact

This Message and its Replies on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR