Set bitcounts in 32bit/64bit value

All,

Does WDK compiler provides any set bitcount intrinsic builtin like GCC does
(__builtin_popcount)?
Or Should I implement one?

Regards
Deepak

__popcnt16, __popcnt, __popcnt64

Support depends on the platform. See the VS docs.

Good luck,

mm

I forgot: I don’t know if the current (or any other) WDK compiler supports these or not - are far as the docs are concerned, this is a VS thing. Easy enough to figure out, though.

Good luck,

mm

This requires hardware (SSE4) support. Otherwise RtlNumberOfSetBits in a
RTL_BITMAP will do.

//Daniel

wrote in message news:xxxxx@ntdev…
> popcnt16, popcnt, __popcnt64
>
> Support depends on the platform. See the VS docs.
>
>
> Good luck,
>
> mm
>

Indeed it does - I mean to say ‘architecture.’

mm

Thanks MM for the information.

Regards
Deepak

On Fri, May 28, 2010 at 10:26 PM, wrote:

> Indeed it does - I mean to say ‘architecture.’
>
> mm
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Seeing as how you can do this with a 4-instructions-per-set-bit highly branch-predictable loop, it
hardly seems worth all this effort unless you’re doing it more than a billion times a minute.

LONGLONG test = 0x0102030405060708;
int bits = 0;
for (; test; bits++) test &= test-1; // loops 13 times and leaves bits=13

You *did* profile your code to determine that this was a bottleneck before embarking on a quixotic
quest for an SSE4.2-only popcnt instruction, right? Premature optimization is the root of many evils.

On 5/28/2010 9:36 AM, Deepak Gupta wrote:

All,

Does WDK compiler provides any set bitcount intrinsic builtin like GCC
does (__builtin_popcount)?
Or Should I implement one?

Regards
Deepak

Obviously I had the same method.
And yes I wanted to reduce worst case no. of loops.

On Thu, Jun 3, 2010 at 2:50 AM, Ray Trent wrote:

> Seeing as how you can do this with a 4-instructions-per-set-bit highly
> branch-predictable loop, it hardly seems worth all this effort unless you’re
> doing it more than a billion times a minute.
>
> LONGLONG test = 0x0102030405060708;
> int bits = 0;
> for (; test; bits++) test &= test-1; // loops 13 times and leaves bits=13
>
> You did profile your code to determine that this was a bottleneck before
> embarking on a quixotic quest for an SSE4.2-only popcnt instruction, right?
> Premature optimization is the root of many evils.
>
>
> On 5/28/2010 9:36 AM, Deepak Gupta wrote:
>
>> All,
>>
>> Does WDK compiler provides any set bitcount intrinsic builtin like GCC
>> does (__builtin_popcount)?
>> Or Should I implement one?
>>
>> Regards
>> Deepak
>>
>
>
>

Then try this .

Byte counts = {0, 1, 1, 2, 1, 2, 2, 3, 1};

LONGLONG test = 0x0102030405060708;

Byte *pTest=&test;

Int bits = count[pTest[7]] +

count[pTest[6] ]+

count[pTest[5] ]+

count[pTest[4] ]+

count[pTest[3] ]+

count[pTest[2] ]+

count[pTest[1]] +

count[pTest[0]] ;

To get 0 - 255 you would need a 256 byte array with each element of the
array giving the bit count for that index into the array where 1 is one bit,
5 is 2 bits, 7 is 3 bits and 255 is 8 bits.

Gary G. Little

H (952) 223-1349

C (952) 454-4629

xxxxx@comcast.net

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Deepak Gupta
Sent: Wednesday, June 02, 2010 9:39 PM
To: Windows System Software Devs Interest List
Cc: Windows System Software Devs Interest List
Subject: Re:[ntdev] Set bitcounts in 32bit/64bit value

Obviously I had the same method.
And yes I wanted to reduce worst case no. of loops.

On Thu, Jun 3, 2010 at 2:50 AM, Ray Trent
wrote:

Seeing as how you can do this with a 4-instructions-per-set-bit highly
branch-predictable loop, it hardly seems worth all this effort unless you’re
doing it more than a billion times a minute.

LONGLONG test = 0x0102030405060708;
int bits = 0;
for (; test; bits++) test &= test-1; // loops 13 times and leaves bits=13

You did profile your code to determine that this was a bottleneck before
embarking on a quixotic quest for an SSE4.2-only popcnt instruction, right?
Premature optimization is the root of many evils.

On 5/28/2010 9:36 AM, Deepak Gupta wrote:

All,

Does WDK compiler provides any set bitcount intrinsic builtin like GCC
does (builtin_popcount)?
Or Should I implement one?

Regards
Deepak

— NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the
List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

________ Information from ESET Smart Security, version of virus signature
database 5169 (20100603) __________

The message was checked by ESET Smart Security.

http://www.eset.com

Or even faster and less space (from the Wikipedia Hamming Weight article):

//This uses fewer arithmetic operations than any other known
//implementation on machines with fast multiplication.
//It uses 12 arithmetic operations, one of which is a multiply.
const uint64 m1 = 0x5555555555555555; //binary: 0101…
const uint64 m2 = 0x3333333333333333; //binary: 00110011…
const uint64 m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones …
const uint64 h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3…
int popcount_3(uint64 x) {
x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits
return (x * h01)>>56; //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + …
}

On 6/3/2010 10:10 AM, Gary G. Little wrote:

Then try this …

Byte counts = {0, 1, 1, 2, 1, 2, 2, 3, 1};

LONGLONG test = 0x0102030405060708;

Byte *pTest=&test;

Int bits = count[pTest[7]] +

count[pTest[6] ]+

count[pTest[5] ]+

count[pTest[4] ]+

count[pTest[3] ]+

count[pTest[2] ]+

count[pTest[1]] +

count[pTest[0]] ;

To get 0 – 255 you would need a 256 byte array with each element of the
array giving the bit count for that index into the array where 1 is one
bit, 5 is 2 bits, 7 is 3 bits and 255 is 8 bits.

Gary G. Little

H (952) 223-1349

C (952) 454-4629

xxxxx@comcast.net

*From:* xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] *On Behalf Of *Deepak Gupta
*Sent:* Wednesday, June 02, 2010 9:39 PM
*To:* Windows System Software Devs Interest List
*Cc:* Windows System Software Devs Interest List
*Subject:* Re:[ntdev] Set bitcounts in 32bit/64bit value

Obviously I had the same method.
And yes I wanted to reduce worst case no. of loops.

On Thu, Jun 3, 2010 at 2:50 AM, Ray Trent > mailto:xxxxx> wrote:
>
> Seeing as how you can do this with a 4-instructions-per-set-bit highly
> branch-predictable loop, it hardly seems worth all this effort unless
> you’re doing it more than a billion times a minute.
>
> LONGLONG test = 0x0102030405060708;
> int bits = 0;
> for (; test; bits++) test &= test-1; // loops 13 times and leaves bits=13
>
> You did profile your code to determine that this was a bottleneck
> before embarking on a quixotic quest for an SSE4.2-only popcnt
> instruction, right? Premature optimization is the root of many evils.
>
>
>
> On 5/28/2010 9:36 AM, Deepak Gupta wrote:
>
> All,
>
> Does WDK compiler provides any set bitcount intrinsic builtin like GCC
> does (builtin_popcount)?
> Or Should I implement one?
>
> Regards
> Deepak
>
>
> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging
> and other seminars visit: http://www.osr.com/seminars To unsubscribe,
> visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
________ Information from ESET Smart Security, version of virus
> signature database 5169 (20100603)
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com
>
>
>
>
Information from ESET Smart Security, version of virus
> signature database 5170 (20100603) __________
>
> The message was checked by ESET Smart Security.
>
> http://www.eset.com</mailto:xxxxx>

Thanks people, already went for hamming weight method.

Regards
Deepak

On Fri, Jun 4, 2010 at 2:27 AM, Ray Trent wrote:

> Or even faster and less space (from the Wikipedia Hamming Weight article):
>
> //This uses fewer arithmetic operations than any other known
> //implementation on machines with fast multiplication.
> //It uses 12 arithmetic operations, one of which is a multiply.
> const uint64 m1 = 0x5555555555555555; //binary: 0101…
> const uint64 m2 = 0x3333333333333333; //binary: 00110011…
> const uint64 m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones …
> const uint64 h01 = 0x0101010101010101; //the sum of 256 to the power of
> 0,1,2,3…
> int popcount_3(uint64 x) {
> x -= (x >> 1) & m1; //put count of each 2 bits into those 2
> bits
> x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4
> bits
> x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8
> bits
> return (x * h01)>>56; //returns left 8 bits of x + (x<<8) + (x<<16) +
> (x<<24) + …
>
> }
>
>
> On 6/3/2010 10:10 AM, Gary G. Little wrote:
>
>> Then try this ?
>>
>> Byte counts = {0, 1, 1, 2, 1, 2, 2, 3, 1};
>>
>> LONGLONG test = 0x0102030405060708;
>>
>> Byte *pTest=&test;
>>
>> Int bits = count[pTest[7]] +
>>
>> count[pTest[6] ]+
>>
>> count[pTest[5] ]+
>>
>> count[pTest[4] ]+
>>
>> count[pTest[3] ]+
>>
>> count[pTest[2] ]+
>>
>> count[pTest[1]] +
>>
>> count[pTest[0]] ;
>>
>> To get 0 ? 255 you would need a 256 byte array with each element of the
>> array giving the bit count for that index into the array where 1 is one
>> bit, 5 is 2 bits, 7 is 3 bits and 255 is 8 bits.
>>
>> Gary G. Little
>>
>> H (952) 223-1349
>>
>> C (952) 454-4629
>>
>> xxxxx@comcast.net
>>
>> From: xxxxx@lists.osr.com
>> [mailto:xxxxx@lists.osr.com] *On Behalf Of *Deepak Gupta
>> Sent: Wednesday, June 02, 2010 9:39 PM
>> To: Windows System Software Devs Interest List
>> Cc: Windows System Software Devs Interest List
>> Subject: Re:[ntdev] Set bitcounts in 32bit/64bit value
>>
>> Obviously I had the same method.
>> And yes I wanted to reduce worst case no. of loops.
>>
>> On Thu, Jun 3, 2010 at 2:50 AM, Ray Trent >> mailto:xxxxx> wrote:
>>
>> Seeing as how you can do this with a 4-instructions-per-set-bit highly
>> branch-predictable loop, it hardly seems worth all this effort unless
>> you’re doing it more than a billion times a minute.
>>
>> LONGLONG test = 0x0102030405060708;
>> int bits = 0;
>> for (; test; bits++) test &= test-1; // loops 13 times and leaves bits=13
>>
>> You did profile your code to determine that this was a bottleneck
>> before embarking on a quixotic quest for an SSE4.2-only popcnt
>> instruction, right? Premature optimization is the root of many evils.
>>
>>
>>
>> On 5/28/2010 9:36 AM, Deepak Gupta wrote:
>>
>> All,
>>
>> Does WDK compiler provides any set bitcount intrinsic builtin like GCC
>> does (builtin_popcount)?
>> Or Should I implement one?
>>
>> Regards
>> Deepak
>>
>>
>> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging
>> and other seminars visit: http://www.osr.com/seminars To unsubscribe,
>> visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
________ Information from ESET Smart Security, version of virus
>> signature database 5169 (20100603)
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>>
>>
>>
Information from ESET Smart Security, version of virus
>> signature database 5170 (20100603) __________
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
></mailto:xxxxx>