Java: memory representation of primitive data types

So I was working on a GIF decoder, and it required some very rapid manipulations with binary data. Converting byte to short, bit shifts, that sort of thing. Obviously, that’s easily done in C. You just cast a pointer to something else, cause GPF, and are done for the day. Java is very big brothery about this stuff. It really doesn’t want you to do any C tricks. So if you need to convert byte[] to int[] – would you please iterate it and do a[i] = b[i] for every one of the million elements. There are of course reasons for that, but it really limits options.

Anyway, one of the issues I ran into was a construction like this:

int[] data = new int[256];
byte index = readIndex();
int entry = data[index];

You’re probably going “ha-ha” now, but shut up, I didn’t know that. So, Java, being a great language, only has signed types. Incidentally, byte is also signed. Therefore, 0xFF as byte is not 255, it’s -1. And this code should have been:

int[] data = new int[256];
int index = ((int)readIndex()) & 0xFF;
int entry = data[index];

So why the & 0xFF? It’s an extra operation. Turns out, here’s how type casts really work.

int n
00000000000000000000000100000001 257
11111111111111111111111111111100 -4
11111111111111111111110000000000 -1024

byte b = (byte)n
                        00000001 1
                        11111100 -4
                        00000000 0
int nb = (int)b                
00000000000000000000000000000001 1
11111111111111111111111111111100 -4
00000000000000000000000000000000 0

char c = (char)n                        
int nc = (int)c
00000000000000000000000100000001 257
00000000000000001111111111111100 65532
00000000000000001111110000000000 64512

As you can see, when casting bigger type to smaller (int to byte), it just cuts everything that doesn’t fit, so most significant byte becomes the minus sign. When casting byte to int, it stretches the minus bit to all the added digits. So from this numeric perspective, it’s still the same number. But the entire problem is, in the example above, we want to speak binary, not numeric. And from binary perspective, it’s screwed up beyond recognition (binary being unaware of minus signs). So you have to re-cut the initial part (mask on 0xFF) to make it the same. It sort of works as expected with unsigned types, without this mental gymnastics.

Char is an interesting exception, it’s the one unsigned type. So it’s just always padded with zeroes.

This whole system is not without its benefits I guess, but why did they make the byte signed? I mean come on.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: