Deyong Xu's blogs: 32-bit IEEE floating-point number

What is 32-bit IEEE floating-point number?

Refer to here to see the original post.

cray_32bit_ieee选项，这个资料不是很多。看看GrADS文档对它的描述：Indicates the data file contains 32-bit IEEE floats created on a cray. May be used with gridded or station data.

A 32-bit IEEE floating-point number has 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa.

读这种数据可以参考下面的描述：
We're looking at single precision floating point numbers here. Double precision uses the same scheme, just more bits. Here's what the output looks like :

0    00000000       0 00000000 00000000000000000000000
1    3F800000       0 01111111 00000000000000000000000
2    40000000       0 10000000 00000000000000000000000
4    40800000       0 10000001 00000000000000000000000
8    41000000       0 10000010 00000000000000000000000
16    41800000       0 10000011 00000000000000000000000
32    42000000       0 10000100 00000000000000000000000
64    42800000       0 10000101 00000000000000000000000
128    43000000       0 10000110 00000000000000000000000
256    43800000       0 10000111 00000000000000000000000
512    44000000       0 10001000 00000000000000000000000
1024 44800000       0 10001001 00000000000000000000000
2048 45000000       0 10001010 00000000000000000000000
4096 45800000       0 10001011 00000000000000000000000
8192 46000000       0 10001100 00000000000000000000000
5.75 40B80000       0 10000001 01110000000000000000000
-.1    BDCCCCCD    1 01111011 10011001100110011001101

The first column is what the stored format looks like in hex. After that come the actual bits; I've separated them in this odd way for a very good reason (which will become clear later). The value "5.75" is stored as "01000000101110000000000000000000" or "40B80000" (hex).

You might easily guess that the first bit is the sign bit. I think that's what I first grokked back in 1983 too. The next 8 bits are used for the exponent, and the last 23 are the value. As you will no doubt notice, the value bits from 0 to 8192 are all empty, so I must be crazy and there's no point in reading this trash any farther.

Well, actually there is. There's a hidden bit there that isn't stored but is always assumed. If you are really compulsive and counted the bits, you see that only 23 bits are there. The hidden bit makes it 24.bits (or 4 bytes) and is always 1. So, if we add the hidden bit, the bits would look like:

0       0 00000000 100000000000000000000000
1       0 01111111 100000000000000000000000
2       0 10000000 100000000000000000000000
4       0 10000001 100000000000000000000000
8       0 10000010 100000000000000000000000
16    0 10000011 100000000000000000000000
32    0 10000100 100000000000000000000000
64    0 10000101 100000000000000000000000
128    0 10000110 100000000000000000000000
256    0 10000111 100000000000000000000000
512    0 10001000 100000000000000000000000
1024    0 10001001 100000000000000000000000
2048    0 10001010 100000000000000000000000
4096    0 10001011 100000000000000000000000
8192    0 10001100 100000000000000000000000
5.75    0 10000001 101110000000000000000000
-.1    1 01111011 110011001100110011001101

But remember, it's what I showed above that is really there.

One more thing: there's an implied decimal point after that hidden number. To get the value of bits after the decimal point, start dividing by two: so the first bit after the (implied) decimal point is .5, the next is .25 and so on. We don't have to worry about any of that for the powers of two, because obviously those are whole numbers and the bits will be all 0. But down at the 5.75 we see that at work:

First, looking at the exponent for 5.75, we see that it is 129. Subtracting 127 gives us 2. So 1.0111 times 2^2 becomes 101.11 (simply shift 2 places to the right to multiply by 4). So now we have 101 binary, which is 5, plus .5 plus .25 (.11) or 5.75 in total. Too quick?

Taking it in detail:

Exponent: 10000001, which is 129 (use the Javascript Bit Twiddler if you like). Subtract 127 leaves us with 2.

Mantissa: 01110000000000000000000

Add in the implied bit and we have 101110000000000000000000, with implied decimal point that's 1.01110000000000000000000

Multiple that by 2^2 to get 101.110000000000000000000

That is 4 + 1 + .5 + .25 or 5.75

Look at 2048. The exponent is 128 + 8 + 2 or 138, subtract 127 we get 11. Use the Bit Twiddle if you don't see that. The mantissa is all 0's, which with the implied bit makes this all 1.00000000000000000000000 times 2^11. What's 2^11? It's 2048, of course.

Now the -.1. This actually can't store precisely, but the method is still the same. The exponent is 64 + 32 + 16 + 8 + 2 + 1 or 123. Subtract 127 and we get -4, which means the decimal point moves 4 places to the left, making our value .000110011001100110011001101. Now you understand why it's stored after adding 127 - it's so we can end up with negative exponents. If we calculate out the binary, that's .625 + .3125 + .0390625 and on to ever smaller numbers which get us very, very close to .1 (but off slightly). The sign bit was set, so it's a -.1

Deyong Xu's blogs

Pages

Friday, October 17, 2014

32-bit IEEE floating-point number

No comments:

Post a Comment