ROBLOX’s Encounter with a One-in-Four-Billion .NET Bug

ROBLOX Home Page

On July 28, 2012, user Yaman100 did something thousands of users do without a second thought every day: put a new t-shirt on his ROBLOX character. Only, this time it was special. It set off a bug, with approximately one-in-four-billion chances of occurring, that would end up breaking any ROBLOX page on which Yaman100′s character thumbnail appeared.

An innocent action, with such far-reaching effects.

The bug came to light via Spetsnaz, a group whose page on ROBLOX simply wouldn’t load. It was an isolated incident – there were no problems with other ROBLOX groups – so we dug further to find the bug was related to a particular group rank. A little more digging, and we found a single user, Yaman100, was unknowingly the cause of the bug; the group page wouldn’t load due to a post by Yaman100 on its wall.

What happened?

Understanding the bug requires a look at how ROBLOX stores and fetches user data.

Every ROBLOX user has a unique record. Each article of clothing he or she is wearing – in Yaman100’s case, a t-shirt – has a unique record. There’s also a unique record for a specific article of clothing on a specific user. We store all of these records as rows in a table in our database.

If this was all we did, fetching data would be painstakingly slow. To improve performance, we add recent requests to the database to our cache. We look up the records in the cache by assigning each one a unique string key, like “Wearing_67583028″, where the number indicates the unique record. This makes the process of fetching and displaying data on Roblox.com much faster.

Roblox.com runs on Microsoft’s ASP.NET framework. The ASP.NET Web Cache takes those string keys and uses a hash function to turn them into 32-bit numbers. Looking up the number is even faster than searching for the string.

32-bit numbers range from -2,147,483,648 to 2,147,483,647. Do the math and that’s a range of about 4.29 billion numbers.32-bit numbers range from -2,147,483,648 to 2,147,483,647. Do the math and that’s a range of about 4.29 billion numbers. As you can see, the smallest 32-bit number you can have is the opposite of the largest and one smaller.

When storing data in the cache, ASP.NET uses the absolute value of the hash. In the rare instance that a record’s string key is hashed down to -2,147,483,648, ASP.NET chokes because it pulls the absolute (positive) value, which is one over the largest 32-bit number — or 2,147,483,648 (231 + 1).

The string key for Yaman100’s character wearing a specific t-shirt hashed to the infamous, smallest-possible 32-bit number. Wherever his auto-generated character thumbnail appeared on Roblox.com, the pages threw an error.

How did we fix it?

The fix was actually less interesting than the circumstances of the bug. We left Yaman100’s account as it was for a short while so we had a test case. We implemented a quick check to determine, going forward, whether any hashed numbers are set to the smallest-possible 32-bit number. If so, we change them to something that’s… Well, not the smallest-possible 32-bit number.

Lightning BoltYaman100 is the only ROBLOX user who has caused – and ever will cause – this bug to rear its face on ROBLOX. He inadvertently helped us squash an ASP.NET bug with slim chances of occurring and gave us an interesting story. For that, he goes down as a small part of ROBLOX history.

Likelihood

To put this all in perspective, before our fix, you were more likely to get struck by lightning (one in a million in a given year) or experience a plane crash (one in 11 million) than cause this bug yourself. But ROBLOX deals with such a wealth of data that the bug, despite its rarity, was statistically certain to show up one day.