The P-Bug

From Andrey

(Difference between revisions)
Revision as of 05:22, 3 February 2006
Andrey (Talk | contribs)

← Previous diff
Revision as of 05:36, 3 February 2006
Andrey (Talk | contribs)

Next diff →
Line 1: Line 1:
 +=== Dear Colleagues, ===
------Original Message-----+Once in a while people put their names on things that they discover. We have Rorschach spots, Alzheimer's disease and Eiffel tower. Let me introduce Potekhin's Bug.
-From: Potekhin, Andrey +
-Sent: Tuesday, February 01, 2005 10:00 PM+
-To: Dev Hailey+
-Subject: The P-bug+
- +
-----+
- +
- +
-Dear colleagues,+
- +
-Once in a while people put their names on things that they discover. +
- +
-We have Rorschach spots, Alzheimer's disease and Eiffel tower. +
- +
-I would like to introduce +
- +
-== Potekhin's Bug ==+
Put this value into database: Put this value into database:
-130 + 130
Retrieve into char using our usual ADO call: Retrieve into char using our usual ADO call:
-char c = (char)(short)f->Item["MyField"]->Value; + char c = (char)(short)f->Item["MyField"]->Value;
-(We need to cast to short since ADO doesn't know how to cast to char directly. Then we need to cast to char to avoid compiler's warning of 'possible loss of data'. Supposed that we are storing values up to 256, we are sure that no data will be lost.) +In the above code, we need to cast to short since ADO doesn't know how to cast to char directly. Then, we need to cast to char to avoid compiler's warning of a 'possible loss of data'. Supposed that we are storing values up to 256, we are sure that no data is lost.
-Unfortunately, the retrieved value, as shown by debugger, is not 130, but -126. +Unfortunately, the retrieved value, as shown by debugger, is not '''130''', but '''-126'''.
-If you think that this is the bug I'm talking about, it is not. You are right, this is just our usual signed/unsigned issue. Since c is declared as a (signed) char, the moment the 1 hits its upper bit it becomes a negative value. This is not a bug - this is how it supposed to be. Bits are all the same. It is still 130 'under the hood'. But keep reading. +If you think that this is the bug I'm talking about, it is not. That's all right, this is just a usual signed/unsigned issue. Since c is declared as a (signed) <code>char</code>, the moment the 1 hits its upper bit it becomes a negative value. This is not a bug - this is how it supposed to be. Bits are all the same. It is still '''130''' 'under the hood'. Let's continue.
Retrieve same value as an int: Retrieve same value as an int:
-int i = (short)f->Item["MyField"]->Value; + int i = (short)f->Item["MyField"]->Value;
-You guessed correctly, this time the debugger shows 130. +As you probabbly guessed, this time the debugger shows '''130'''.
-== Now, what do you think of this code: ==+=== Now, what do you think of this code: ===
<pre> <pre>
if (c == i) if (c == i)
{ {
- AfxMessageBox("F-word");+ AfxMessageBox("Heaven");
} }
else else
{ {
- AfxMessageBox("Marijuana");+ AfxMessageBox("Hell");
} }
</pre> </pre>
-Which message will we get? On the one hand, this is -126 vs. 130. On the other: +Which message will we get? On the one hand, this is a case of '''-126''' vs. '''130'''. On the other:
-- Both values have same bits - 10000010, read from the same database field. +- '''Both values have same bits''' - 10000010, read from the same database field.
-- Both are signed values, so no signed/unsigned mismatch here. +- '''Both are signed values''', so there is no signed/unsigned mismatch here.
-- When evaluating the expression, both values are converted to same type, int. So when compared, they are of same size. +- When evaluating the expression, both values are converted to '''same type, int'''. So when compared, they are of '''same size'''.
-Which message will you see? I have to tell you. You're getting the else branch.+Which message will you see? I have to tell you. You'll be getting the else branch.
This is what we call Potekhin's bug. This is what we call Potekhin's bug.
Line 63: Line 47:
== Explanation of the trick == == Explanation of the trick ==
-You may say, of course, 130 is not -126, period, that's why they don't match. But it does not explain how we end up with such results. I only learned why after looking at bit values in debugger. Compare: +One may say, of course, '''130''' is not '''-126''', and that's why they don't match. However, this does not explain how did we end up with such results. Here's the explanation. Compare:
Before conversion to int: Before conversion to int:
-(int) 130 == 10000010+ (int) 130 == 10000010
-(char) -126 == 10000010 + (char) -126 == 10000010
After conversion to int: After conversion to int:
-(int) 130 == 10000010+ (int) 130 == 10000010
-(int) -126 == 11111111111111111111111110000010 + (int) -126 == 11111111111111111111111110000010
When a char gets converted, its negative bit gets propagated all the way to the left. When a char gets converted, its negative bit gets propagated all the way to the left.

Revision as of 05:36, 3 February 2006

Dear Colleagues,

Once in a while people put their names on things that they discover. We have Rorschach spots, Alzheimer's disease and Eiffel tower. Let me introduce Potekhin's Bug.

Put this value into database:

130 

Retrieve into char using our usual ADO call:

char c = (char)(short)f->Item["MyField"]->Value; 

In the above code, we need to cast to short since ADO doesn't know how to cast to char directly. Then, we need to cast to char to avoid compiler's warning of a 'possible loss of data'. Supposed that we are storing values up to 256, we are sure that no data is lost.

Unfortunately, the retrieved value, as shown by debugger, is not 130, but -126.

If you think that this is the bug I'm talking about, it is not. That's all right, this is just a usual signed/unsigned issue. Since c is declared as a (signed) char, the moment the 1 hits its upper bit it becomes a negative value. This is not a bug - this is how it supposed to be. Bits are all the same. It is still 130 'under the hood'. Let's continue.

Retrieve same value as an int:

int i = (short)f->Item["MyField"]->Value; 

As you probabbly guessed, this time the debugger shows 130.

Now, what do you think of this code:

 if (c == i)
 {
    AfxMessageBox("Heaven");
 }
 else
 {
     AfxMessageBox("Hell");
 } 

Which message will we get? On the one hand, this is a case of -126 vs. 130. On the other:

- Both values have same bits - 10000010, read from the same database field.

- Both are signed values, so there is no signed/unsigned mismatch here.

- When evaluating the expression, both values are converted to same type, int. So when compared, they are of same size.

Which message will you see? I have to tell you. You'll be getting the else branch.

This is what we call Potekhin's bug.


Explanation of the trick

One may say, of course, 130 is not -126, and that's why they don't match. However, this does not explain how did we end up with such results. Here's the explanation. Compare:

Before conversion to int:

(int) 130 == 10000010
(char) -126 == 10000010 

After conversion to int:

(int) 130 == 10000010
(int) -126 == 11111111111111111111111110000010 

When a char gets converted, its negative bit gets propagated all the way to the left.

Well, I knew it. Kind of. I knew that it gets propagated. The problem is, I didn't realize that it could lead to scenarios like the described above.

Conclusion? Same old rule. Never use a signed char to store anything above 128. Or if you do, don't compare it then to an int...