tHe puZZlingG wOrlD: facebook

Showing posts with label facebook. Show all posts

Sunday, January 29, 2012

Regular Expression Matching

Implement regular expression matching with support for ‘.’ and ‘*’.

‘.’ Matches any single character.
‘*’ Matches zero or more of the preceding element. The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)
Some examples:
isMatch(“aa”,”a”) → false
isMatch(“aa”,”aa”) → true
isMatch(“aaa”,”aa”) → false
isMatch(“aa”, “a*”) → true
isMatch(“aa”, “.*”) → true
isMatch(“ab”, “.*”) → true
isMatch(“aab”, “c*a*b”) → true

Online Judge
This problem is available at Online Judge. Head over there and it will judge your solution. Currently only able to compile C++/Java code. If you are using other languages, you can still verify your solution by looking at the judge’s test cases and its expected output.
Background:
This interesting problem has been asked in Facebook and Google interviews. It might seem deceptively easy even you know the general idea, but programming it correctly with all the details require careful thought.
Edit:
It seems that some readers are confused about why the regex pattern “.*” matches the string “ab”. “.*” means repeat the preceding element 0 or more times. Here, the “preceding” element is the dot character in the pattern, which can match any characters. Therefore, the regex pattern “.*” allows the dot to be repeated any number of times, which matches any string (even an empty string).
Hints:
Think carefully how you would do matching of ‘*’. Please note that ‘*’ in regular expression is different from wildcard matching, as we match the previous character 0 or more times. But, how many times? If you are stuck, recursion is your friend.
This problem is a tricky one. Due to the huge number of edge cases, many people would write lengthy code and have numerous bugs on their first try. Try your best getting your code correct first, then refactor mercilessly to as clean and concise as possible!

A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX’s grep tool. Please note that this post does not attempt to descibe a solution using DFA.

Solution:
This looks just like a straight forward string matching, isn’t it? Couldn’t we just match the pattern and the input string character by character? The question is, how to match a ‘*’?
A natural way is to use a greedy approach; that is, we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.
s = “abbbc”, p = “ab*c”
Assume we have matched the first ‘a’ on both s and p. When we see “b*” in p, we skip all b’s in s. Since the last ‘c’ matches on both side, they both match.
s = “ac”, p = “ab*c”
After the first ‘a’, we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.
It seems that being greedy is good. But how about this case?
s = “abbc”, p = “ab*bbc”
When we see “b*” in p, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.
One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in s? If it is smaller or equal to the number of consecutive b’s after “b*” in p, we conclude they both match and continue from there. For the opposite, we conclude there is not a match.
This seem to solve the above problem, but how about this case:
s = “abcbcd”, p = “a.*c.*d”
Here, “.*” in p means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in p matches the first or second ‘c’ in s? Unfortunately, there is no way to tell without using some kind of exhaustive search.
We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.
The recursion mainly breaks down elegantly to the following two cases:

If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.
If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.

You would need to consider the base case carefully too. That would be left as an exercise to the reader.

Below is the extremely concise code (Excluding comments and asserts, it’s about 10 lines of code).

bool isMatch(const char *s, const char *p) {
  assert(s && p);
  if (*p == '\0') return *s == '\0';
 
  // next char is not '*': must match current character
  if (*(p+1) != '*') {
    assert(*p != '*');
    return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);
  }
  // next char is '*'
  while ((*p == *s) || (*p == '.' && *s != '\0')) {
    if (isMatch(s, p+2)) return true;
    s++;
  }
  return isMatch(s, p+2);
}

Further Thoughts:
Some extra exercises to this problem:

If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
Try to implement partial matching instead of full matching. In addition, add ‘^’ and ‘$’ to the rule. ‘^’ matches the starting position within the string, while ‘$’ matches the ending position of the string.
Try to implement wildcard matching where ‘*’ means any sequence of zero or more characters.

For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read this article.

Tuesday, January 3, 2012

Studious Student Problem Analysis

Source

You’ve been given a list of words to study and memorize. Being a diligent student of language and the arts, you’ve decided to not study them at all and instead make up pointless games based on them. One game you’ve come up with is to see how you can concatenate the words to generate the lexicographically lowest possible string.
Input
As input for playing this game you will receive a text file containing an integer N, the number of word sets you need to play your game against. This will be followed by N word sets, each starting with an integer M, the number of words in the set, followed by M words. All tokens in the input will be separated by some whitespace and, aside from N and M, will consist entirely of lowercase letters.

Output
Your submission should contain the lexicographically shortest strings for each corresponding word set, one per line and in order.

Constraints
1 <= N <= 100
1 <= M <= 9
1 <= all word lengths <= 10

Here is my problem analysis for Facebook Hacker Cup Qualification Round: Studious Student.
Studious Student Problem Analysis:
As I mentioned, this problem is not as straight forward as you think it might be. The first most natural way to approach this problem is sorting. Most people will reason that you can sort and concatenate each individual word together to form the lexicographically smallest string. This is incorrect, as illustrated in one of the sample inputs:

jibw ji jp bw jibw

By sorting and concatenate, the answer is:

bwjijibwjibwjp,

while the correct answer should be:

bwjibwjibwjijp.

Lexicographical order is also known as dictionary order, since it is how the words are ordered in the dictionary.

Notice in the above words, “ji” is the prefix of “jibw“. The “sort and concatenate” method definitely does not work when there is a case where a word is a prefix of one or more other words.
Well, one might try a naive way of doing a brute force. Although it is highly inefficient, it works for this problem. This is because each input would be at most 9 words, and it turns out there are only a total of 9! = 362880 possible permutations of words being concatenated together. Therefore, one can generate all possible permutations and find the answer.
We make an easy observation that if all words in the list are of equal length, then sort + concatenate must yield the smallest dictionary order. In fact, a better argument would be:

If no word appears to be a prefix of any other words, then the simple sort + concatenate must yield the smallest dictionary order string.

To solve this problem correctly, we must also handle the special case where a word appears as a prefix of other words. One efficient and easy (non-trivial to prove but easy to reason) solution for this problem is to re-define the order relation of two words, s1 and s2, as:

s1 is less than s2 iff (if and only if)

s1 + s2 < s2 + s1.

Then, by sorting and concatenating the words using the above ordering, it must yield the lexicographically smallest string. Why?
Here is a concrete example why this works. We use an example where the list of words are:

ji jibw jijibw

By the definition of s1 is less than s2 iff s1+s2 < s2+s1, we found that the lowest ordered word in the list is “jibw“. This is because “jibwji” < “jijibw” and “jibwjijibw” < “jijibwjibw“.
Now, the key to understand why the order relation s1+s2 < s2+s1 yields the smallest dictionary order is:

We have found the smallest-ordered word such that s1+x < x+s1. Therefore, it is impossible to swap the words to yield a smaller dictionary order.
For a case with more words, then this order relation holds: s1+x < x+s1, and x+y < y+x. As swapping at any point could not possibly yield a smaller dictionary order, therefore s1+x+y must yield the smallest dictionary order.
This result can be generalized to all M words by induction, due to the transitive property mentioned above.

I do not claim that this is a rigid or even a correctly constructed proof. However, it does help me in convincing myself this is a valid solution.

bool compareSort(const string &s1, const string &s2) {
  return s1 + s2 < s2 + s1;
}
 
int main() {
  string words[10];
  int N, M;
  cin >> N;
 
  for (int i = 0; i < N; i++) {
    cin >> M;
 
    for (int j = 0; j < M; j++)
      cin >> words[j];
 
    sort(words, words+M, compareSort);
 
    for (int j = 0; j < M; j++)
      cout << words[j];
    cout << endl;
  }
}
 

Friday, December 23, 2011

Finding all unique triplets that sums to zero

Source

Given a set S of n integers, are there elements a, b, c in S such that a + b + c = 0? Find all unique triplets in the set which gives the sum of zero.
For example, given set S = {-1 0 1 2 -1 -4},
One possible solution set is:
(-1, 0, 1)
(-1, 2, -1)
Note that (0, 1, -1) is not part of the solution above, because (0, 1, -1) is the duplicate of (-1, 0, 1). Same with (-1, -1, 2).
For a set S, there is probably no “the” solution, another solution could be:
(0, 1, -1)
(2, -1, -1)

This is also known as the 3sum problem. The 3sum problem is the extension of the problem below:

Given a set S of n integers, find all pairs of integers of a and b in S such that a + b = k?

The above problem can be solved in O(n) time, assuming that the set S is already sorted. By using two index first and last, each pointing to the first and last element, we look at the element pointed by first, which we call A. We know that we need to find B = k – A, the complement of A. If the element pointed by last is less than B, we know that the choice is to increment pointer first by one step. Similarly, if the element pointed by last is greater than B, we decrement pointer last by one step. We are progressively refining the sum step by step. Since each step we move a pointer one step, there are at most n steps, which gives the complexity of O(n).
By incorporating the solution above, we can solve the 3sum problem in O(n^2) time, which is a straight forward extension.

set<vector<int> > find_triplets(vector<int> arr) {
  sort(arr.begin(), arr.end());
  set<vector<int> > triplets;
  vector<int> triplet(3);
  int n = arr.size();
  for (int i = 0;i < n; i++) {
    int j = i + 1;
    int k = n - 1;
    while (j < k) {
      int sum_two = arr[i] + arr[j];
      if (sum_two + arr[k] < 0) {
        j++;
      } else if (sum_two + arr[k] > 0) {
        k--;
      } else {
        triplet[0] = arr[i];
        triplet[1] = arr[j];
        triplet[2] = arr[k];
        triplets.insert(triplet);
        j++;
        k--;
      }
    }
  }
  return triplets;
}

Note that a set is chosen to store the triplets, because we are only interested in unique triplets. Since the set S is already sorted, and we don’t look back as it progresses forward, we can guarantee there will be no duplicate triplets (Even though the set might have duplicate elements.)

Thursday, April 7, 2011

Hacking a Google interview (From MIT)

Source

» Hacking a Google Interview Course Website

Directly from MIT’s course website,

Learn the tricks. Beat the system.
Ever wanted to work at a company like Google, Apple, or Facebook? There’s just one thing standing in your way: the interview. But there’s no need to fear. We’ve mastered the interview questions and topics, and we want to show you how you can nail every programming question. Whether you’re a beginning programmer or a seasoned expert, this class is for you.

There are a total 5 handouts available for download, with the first few handouts discussing basic data structures and common interview questions with complete solutions. 5 stars and highly recommended!

Monday, March 14, 2011

Print All Combinations of a Number as a Sum of Candidate Numbers

Source

Problem

Given a target number, and a series of candidate numbers, print out all combinations, so that the sum of candidate numbers equals to the target.

Here order is not important, so don’t print the duplicated combination.

e.g. target is 7, candidate is 2,3,6,7
output should be 7 and 3+2+2 (but not print 2+3+2, 2+2+3

My Attempt

void Sum(int arr[], int reminder, int index, int visited[]) {

if (index LessThen 0) { return }

if (difference == 0) { printSum(arr, visited) , return; }

if (reminder - arr[index] LessThen 0) { return }

visited[index]++;

Sum(arr, reminder - arr[index] , index, visited) // duplicate number

Sum(arr, reminder - arr[index], index - 1, visited) // next number

visted[index]--;

Sum(arr, reminder, index - 1, visited) // skip

}

Find all possible combination of numbers using a pre-defined candidate set.
Each number from the candidate set may be used only once in the combination.

For example,

Candidate Set = {10, 1, 2, 7, 6, 1, 5}
Target Number = 8

One possible output could be:
1+1+6
1+2+5
1+7
2+6

e.g. target is 7, candidate is 2,3,6,7
output should be 7 and 3+2+2 (but not print 2+3+2, 2+2+3e.g. target is 7, candidate is 2,3,6,7

output should be 7 and 3+2+2 (but not print 2+3+2, 2+2+3e.g. target is 7, candidate is 2,3,6,7

output should be 7 and 3+2+2 (but not print 2+3+2, 2+2+3

// repeat the current index for possible duplicates

}

Saturday, March 12, 2011

Sliding Window Maximum

Source

Problem

A long array A[] is given to you. There is a sliding window of size w which is moving from the very left of the array to the very right. You can only see the w numbers in the window. Each time the sliding window moves rightwards by one position. Following is an example:
The array is [1 3 -1 -3 5 3 6 7], and w is 3.

Window position                Max ---------------               ----- [1  3  -1] -3  5  3  6  7       3  1 [3  -1  -3] 5  3  6  7       3  1  3 [-1  -3  5] 3  6  7       5  1  3  -1 [-3  5  3] 6  7       5  1  3  -1  -3 [5  3  6] 7       6  1  3  -1  -3  5 [3  6  7]      7

Input: A long array A[], and a window width w
Output: An array B[], B[i] is the maximum value of from A[i] to A[i+w-1]
Requirement: Find a good optimal way to get B[i]