Tuesday, March 8, 2011

CUrioUS CaSe of dUpliCate variableS

source

Puzzle: An array of length n+1 is populated with integers in the range [1, n]. Find a duplicate integer (just one, not all) in linear time with O(1) space. The array is read-only and may not be modified. Variation: what if the array may be written into but must be left unmodified by the algorithm?

Source: Heard from a fellow Googler sometimes in 2007.

Solution: (by Ovidiu Gheorghioiu at Google) Treat the values of the array as pointers into the same array. Let us call a chain of pointers a “linked list”. The linked list starting with the last entry must have the shape of a “lollipop” with a stick and a cycle. Our goal is to identify that node in the linked list that joins the stick to the cycle. We do so in two phases:

Phase I: maintain two pointers, one ‘slow’ and one ‘fast’. In one round, the slow pointer advances one step and the fast pointer advances two steps along the linked list — both pointers advance simultaneously. Phase I comes to an end when the two pointers become equal at the end of a round.

Phase II: maintain two pointers, both ‘slow’. The first slow pointer starts where Phase I ended. The second slow pointer starts at the beginning of the linked list. Phase II ends when these two pointers become equal at the end of a round. This node, magically, is the duplicate (in other words, that node in the linked list where the stick joins the cycle)!

Proof: Let s and c denote the lengths of the stick and the cycle portions of the lollipop. If the slow pointer in Phase I ends in s + x rounds, then 2(s+x) = s+x + mc, where m is a positive integer. In other words, s+x = mc. So in Phase II, by the time the second slow pointer (which starts at the node where Phase I ended) traverses s nodes, the first slow pointer traverses mc – x nodes. So both slow pointers become equal at that node where the stick attaches to the cycle.

Alternative Solution: (by Sreeram Ramachandran at Google) Phase I is the same as above. In Phase II, we compute c, the length of the cycle, by advancing the slow pointer until it reaches the same node where Phase I ended. In Phase III, we have two slow pointers, both at the start of the list. We first advance the second slow pointer by c steps, then advance both pointers in tandem until they collide. The collision is guaranteed to be at the node where the stick portion of the lollipop is attached to the cycle.

Another Solution: If the array may be modified, we start reversing the linked list. We shall reach the head of the linked list in 2s + c steps. Reverse the list again so that the cycle is in the correct orientation. During the second reversal, when s + c/2 steps have been taken, start a second “read-only” ‘backward pointer’ that travels the cycle in the backwards direction, in tandem with the ‘forward pointer’ that is busy reversing the list. The forward and the backward pointer shall collide at the node where the stick attaches to the cycle.

My Analysis: I fail to understand the Proof. However the alternate solution is more intuitive for me to understand. The total length of the linked list is S + C. By end of phase 2 we have the value C. If we walk total S + C steps we will reach the intersection (duplicate element). Hence in phase 3 we use this property and take a head start of C steps for pointer 1. We know when the pointers meet it will point of intersection (duplicate element) since pointer 2 has already consumed C out of the S + C length and pointer 1 will take S steps to reach point of intersection.


No comments:

Post a Comment