<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Roshan Kulkarni &#187; webapps</title>
	<atom:link href="http://roshankulkarni.info/tag/webapps/feed/" rel="self" type="application/rss+xml" />
	<link>http://roshankulkarni.info</link>
	<description>On Technology, Web and Consulting</description>
	<lastBuildDate>Sun, 30 May 2010 18:28:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Generating Unique Ids in PHP &#8211; A scheme for UUIDs generation in a distributed application</title>
		<link>http://roshankulkarni.info/2009/09/generating-uuids-in-php-a-scalable-scheme-for-generating-unique-identifiers-in-a-distributed-php-application/</link>
		<comments>http://roshankulkarni.info/2009/09/generating-uuids-in-php-a-scalable-scheme-for-generating-unique-identifiers-in-a-distributed-php-application/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 02:42:37 +0000</pubDate>
		<dc:creator>roshan</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalable Applications]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[webapps]]></category>

		<guid isPermaLink="false">http://roshankulkarni.info/?p=3</guid>
		<description><![CDATA[I was recently involved with a project which had 4 nodes running PHP code and a sharded MySQL database in the backend. The PHP nodes create data objects in a distributed manner and then persist them to the database shards. Now, each object needs a unique identifier (key). MySQL auto-increment Ids could not be used [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently involved with a project which had 4 nodes running PHP code and a sharded MySQL database in the backend. The PHP nodes create data objects in a distributed manner and then persist them to the database shards.</p>
<p>Now, each object needs a unique identifier (key). MySQL auto-increment Ids could not be used here since two database shards could pick the same Ids resulting in a collision. We needed an efficient scheme for simultaneously generating unique identifiers on multiple hosts. A few additional constraints apply:</p>
<p><strong>Low Collision Probability:</strong></p>
<ul>
<li> The UUIDs should offer strong guarantees of uniqueness. Even if two nodes create UUIDs at the same instant of time, there would be a very, very low probability of a collision.</li>
</ul>
<p><strong>Scalable:</strong></p>
<ul>
<li> There would be no central node / authority to generate them.  A central node would be a single point of failure. It would also be a bottleneck as our cluster grows.</li>
</ul>
<ul>
<li> The UUIDs should be generated in a distributed manner. Any web-layer node can generate the UUID locally and this should be efficient to generate.</li>
</ul>
<p><strong>Length of the UUID:</strong></p>
<ul>
<li> We were not trying to achieve a theoretical uniqueness here. Yet we were looking at 20-50 million unique objects in our database.</li>
</ul>
<ul>
<li> An 18-20 digit UUID seemed reasonable.</li>
</ul>
<p><strong>Not Cryptographically Secure:</strong></p>
<ul>
<li> We were not particularly looking for something which is hard to predict or something cryptographically secure. So if the Ids were monotonically increasing or somewhat easier to predict, it was acceptable.</li>
</ul>
<p><strong>Non-strict Timestamp Synchronization:</strong></p>
<ul>
<li> Nodes which generate UUIDs might not be strictly synchronized in time &#8211; however they would be within 5-10 seconds of each other.</li>
</ul>
<p/>
<h2>The PHP Solution</h2>
<p>PHP supports the <a href="http://in.php.net/uniqid">uniqid()</a> method (since PHP 4), which generates a unique id based on the current microsecond timestamp of the local system.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">&lt;</span><span style="color: #cc66cc;">10</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">echo</span> <span style="color: #990000;">uniqid</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;br/&gt;&quot;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>It returns a 13 character hex string &#8211; monotonically increasing timestamp values:<br />
4aa5c7f2c2d33<br />
4aa5c7f2c2d4a<br />
4aa5c7f2c2d4e<br />
4aa5c7f2c2d51&#8230;</p>
<p>In a scenario where multiple nodes simultaneously generate the unique Id, there is a small probability of a collision. We add a 4 digit random prefix, to reduce the collision probability further.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// Generate a random prefix.</span>
<span style="color: #000088;">$rand</span> <span style="color: #339933;">=</span> <span style="color: #990000;">md5</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">rand</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$randomPrefix</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rand</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">4</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// Generate a UniqId having the random prefix.</span>
<span style="color: #666666; font-style: italic;">// (4 digit prefix + 13 digit Uniq) is good enough for us.</span>
<span style="color: #000088;">$uniq</span> <span style="color: #339933;">=</span> <span style="color: #990000;">uniqid</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$randomPrefix</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<h2>Modulo and Shard Numbers</h2>
<p>In our scheme, the shard number of an object was determined based on the modulus of the unique Id. Thus:</p>
<blockquote><p>Shard Number = UniqId % TOTAL_SHARDS</p></blockquote>
<p>For very large numbers the PHP mod (%) operator results in an overflow. So we consider only the 5 LSB digits for computing the modulo value.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">  <span style="color: #000000; font-weight: bold;">function</span> getModulo<span style="color: #009900;">&#40;</span><span style="color: #000088;">$uniqId</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Take 5 LSB digits only.</span>
    <span style="color: #000088;">$uuidLsb</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$uniqId</span><span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #cc66cc;">5</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Convert Hex string to Int. Mod only works on Ints.</span>
    <span style="color: #000088;">$intUuidLsb</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span> <span style="color: #990000;">hexdec</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$uuidLsb</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Compute Mod</span>
    <span style="color: #000088;">$mod</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$intUuidLsb</span> <span style="color: #339933;">%</span> TOTAL_SHARDS<span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000088;">$mod</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<h2>UUID Generation &#8211; Final Cut</h2>
<p>We extend the UUID scheme a bit further in order to help us quickly determine the shard number from a given UUID string. We prefix the shard number (mod value) to every UUID. The final version of our scheme looks like this:</p>
<ul>
<li>Generate random number.</li>
<li>Generate UniqId with the random number prefix.</li>
<li>Generate UUID = mod(UniqId) + UniqId</li>
</ul>
<p>Here is the final version of our code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
</pre></td><td class="code"><pre class="php" style="font-family:monospace;">  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> generateUUID<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Generate a random prefix. </span>
    <span style="color: #000088;">$rand</span> <span style="color: #339933;">=</span> <span style="color: #990000;">md5</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">rand</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$randomPrefix</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rand</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">4</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Generate a UniqId having the random prefix. </span>
    <span style="color: #666666; font-style: italic;">// (4 digit prefix + 13 digit Uniq) is good enough for us.</span>
    <span style="color: #000088;">$uniq</span> <span style="color: #339933;">=</span> <span style="color: #990000;">uniqid</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$randomPrefix</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Compute modulo.</span>
    <span style="color: #000088;">$modVal</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">getModulo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$uniq</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// UUID = (1 digit mod val + 4 digit prefix + 13 digit Uniq).</span>
    <span style="color: #000088;">$uuid</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$modVal</span><span style="color: #339933;">.</span><span style="color: #000088;">$uniq</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000088;">$uuid</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">function</span> getModulo<span style="color: #009900;">&#40;</span><span style="color: #000088;">$uuid</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Take 5 LSB digits only for the modulo computation.</span>
    <span style="color: #000088;">$uuidLsb</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$uuid</span><span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #cc66cc;">5</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Convert Hex string to Int. Mod only works on Ints.</span>
    <span style="color: #000088;">$intUuidLsb</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span> <span style="color: #990000;">hexdec</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$uuidLsb</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">// Compute Mod</span>
    <span style="color: #000088;">$mod</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$intUuidLsb</span> <span style="color: #339933;">%</span> TOTAL_SHARDS<span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000088;">$mod</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Please Note that this scheme is not compliant with the UUID <a href="http://www.rfc-archive.org/getrfc.php?rfc=4122">RFC 4122</a>, but should suffice for web applications in general.</p>
]]></content:encoded>
			<wfw:commentRss>http://roshankulkarni.info/2009/09/generating-uuids-in-php-a-scalable-scheme-for-generating-unique-identifiers-in-a-distributed-php-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

