I was recently involved with a project which had 4 nodes running PHP code and a sharded MySQL database in the backend. The PHP nodes create data objects in a distributed manner and then persist them to the database shards.
Now, each object needs a unique identifier (key). MySQL auto-increment Ids could not be used here since two database shards could pick the same Ids resulting in a collision. We needed an efficient scheme for simultaneously generating unique identifiers on multiple hosts. A few additional constraints apply:
Low Collision Probability:
- The UUIDs should offer strong guarantees of uniqueness. Even if two nodes create UUIDs at the same instant of time, there would be a very, very low probability of a collision.
Scalable:
- There would be no central node / authority to generate them. A central node would be a single point of failure. It would also be a bottleneck as our cluster grows.
- The UUIDs should be generated in a distributed manner. Any web-layer node can generate the UUID locally and this should be efficient to generate.
Length of the UUID:
- We were not trying to achieve a theoretical uniqueness here. Yet we were looking at 20-50 million unique objects in our database.
- An 18-20 digit UUID seemed reasonable.
Not Cryptographically Secure:
- We were not particularly looking for something which is hard to predict or something cryptographically secure. So if the Ids were monotonically increasing or somewhat easier to predict, it was acceptable.
Non-strict Timestamp Synchronization:
- Nodes which generate UUIDs might not be strictly synchronized in time – however they would be within 5-10 seconds of each other.
The PHP Solution
PHP supports the uniqid() method (since PHP 4), which generates a unique id based on the current microsecond timestamp of the local system.
1
2
3
4
| for($i=0; $i<10; $i++){
echo uniqid();
echo "<br/>";
} |
It returns a 13 character hex string – monotonically increasing timestamp values:
4aa5c7f2c2d33
4aa5c7f2c2d4a
4aa5c7f2c2d4e
4aa5c7f2c2d51…
In a scenario where multiple nodes simultaneously generate the unique Id, there is a small probability of a collision. We add a 4 digit random prefix, to reduce the collision probability further.
1
2
3
4
5
6
7
| // Generate a random prefix.
$rand = md5(rand());
$randomPrefix = substr($rand, 0, 4);
// Generate a UniqId having the random prefix.
// (4 digit prefix + 13 digit Uniq) is good enough for us.
$uniq = uniqid($randomPrefix); |
Modulo and Shard Numbers
In our scheme, the shard number of an object was determined based on the modulus of the unique Id. Thus:
Shard Number = UniqId % TOTAL_SHARDS
For very large numbers the PHP mod (%) operator results in an overflow. So we consider only the 5 LSB digits for computing the modulo value.
1
2
3
4
5
6
7
8
9
10
11
12
| function getModulo($uniqId) {
// Take 5 LSB digits only.
$uuidLsb = substr($uniqId, -5);
// Convert Hex string to Int. Mod only works on Ints.
$intUuidLsb = (int) hexdec($uuidLsb);
// Compute Mod
$mod = $intUuidLsb % TOTAL_SHARDS;
return $mod;
} |
UUID Generation – Final Cut
We extend the UUID scheme a bit further in order to help us quickly determine the shard number from a given UUID string. We prefix the shard number (mod value) to every UUID. The final version of our scheme looks like this:
- Generate random number.
- Generate UniqId with the random number prefix.
- Generate UUID = mod(UniqId) + UniqId
Here is the final version of our code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| public function generateUUID() {
// Generate a random prefix.
$rand = md5(rand());
$randomPrefix = substr($rand, 0, 4);
// Generate a UniqId having the random prefix.
// (4 digit prefix + 13 digit Uniq) is good enough for us.
$uniq = uniqid($randomPrefix);
// Compute modulo.
$modVal = $this->getModulo($uniq);
// UUID = (1 digit mod val + 4 digit prefix + 13 digit Uniq).
$uuid = $modVal.$uniq;
return $uuid;
}
private function getModulo($uuid) {
// Take 5 LSB digits only for the modulo computation.
$uuidLsb = substr($uuid, -5);
// Convert Hex string to Int. Mod only works on Ints.
$intUuidLsb = (int) hexdec($uuidLsb);
// Compute Mod
$mod = $intUuidLsb % TOTAL_SHARDS;
return $mod;
} |
Please Note that this scheme is not compliant with the UUID RFC 4122, but should suffice for web applications in general.