Subject:
|
Re: Bitstring vector to list conversion in Perl5
|
Newsgroups:
|
lugnet.off-topic.geek
|
Date:
|
Thu, 4 Jan 2001 06:45:35 GMT
|
Viewed:
|
108 times
|
| |
| |
In lugnet.off-topic.geek, Dan Boger writes:
> $veclength = length unpack("b*",$v); # only because I don't know how many
> # bits are in it - not efficient
> @n = ();
> for ($index=0;$index<$veclength,$index++) {
> push @n, $index if vec($v,$index,1);
> }
>
> -=-=-=-=-=-=-
>
> $veclength is terribly inefficient, but I could think of no way of finding
> out the length of the vector - I'm sure you have that little bit of
> information stored someplace...
$veclength = length($v) * 8;
It isn't precise, but it's fast -- and a few bits of slop would be perfectly
OK here since the slop would be 0-bits. :-)
> this isn't the fastest way... you might be able to do something with
> map, but you need to supply the broken down list to it as an argument,
> so you'll still end up with the non-sparse array loaded in memory.
Well, plus, map{} is also almost always slower than looping (especially with
use integer, which would be important here too).
I was imagining there must be a way to do it that's O(n) in low-level CPU
performance but O(1) in high-level Perl performance -- something that takes
advantage of CPU string-searching hardware or at least Perl's fast regex
engine.
What about something sneaky using a regex and pos?
This seems to run at a pretty decent clip...about ten million bits per second
on a string that has goodies at the beginning and end and is empty everywhere
else...
-=-=-=-=-=-=-
sub furl (@) {
use integer;
my $bits = "";
foreach (@_) { vec($bits, $_, 1) = 1; }
$bits;
}
sub unfurl ($) {
use integer;
my $bits = shift;
my @nums;
my $ibyte;
while ($bits =~ m/[^\0]/g) {
my $byte = unpack("b*", substr($bits, ($ibyte=pos($bits)-1), 1));
push(@nums, $ibyte*8+pos($byte)-1) while $byte =~ m/1/g;
}
@nums;
}
-=-=-=-=-=-=-
my $bits = furl(qw(
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101
103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197
199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311
313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431
433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557
563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661
673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937
941 947 953 967 971 977 983 991 997
));
$bits = $bits . "\0" x 1_000_000 . $bits;
my @nums = unfurl($bits);
print "@nums\n";
-=-=-=-=-=-=-
I wonder what instructions /[^\0]/ compiles down to. Is it smart enough to
know to scan full-steam-ahead for the first byte not equal to zero using
hardware instructions or would it construct a 256-byte lookup table and
do it the hard way? Oh well, it's probably not the bottleneck either way.
--Todd
|
|
Message is in Reply To:
3 Messages in This Thread:
- Entire Thread on One Page:
- Nested:
All | Brief | Compact | Dots
Linear:
All | Brief | Compact
|
|
|
|