Tuesday, January 3, 2012

Studious Student Problem Analysis

Source
You’ve been given a list of words to study and memorize. Being a diligent student of language and the arts, you’ve decided to not study them at all and instead make up pointless games based on them. One game you’ve come up with is to see how you can concatenate the words to generate the lexicographically lowest possible string.
Input
As input for playing this game you will receive a text file containing an integer N, the number of word sets you need to play your game against. This will be followed by N word sets, each starting with an integer M, the number of words in the set, followed by M words. All tokens in the input will be separated by some whitespace and, aside from N and M, will consist entirely of lowercase letters.

Output

Your submission should contain the lexicographically shortest strings for each corresponding word set, one per line and in order.

Constraints

1 <= N <= 100
1 <= M <= 9
1 <= all word lengths <= 10
Here is my problem analysis for Facebook Hacker Cup Qualification Round: Studious Student.
Studious Student Problem Analysis:
As I mentioned, this problem is not as straight forward as you think it might be. The first most natural way to approach this problem is sorting. Most people will reason that you can sort and concatenate each individual word together to form the lexicographically smallest string. This is incorrect, as illustrated in one of the sample inputs:
jibw ji jp bw jibw
By sorting and concatenate, the answer is:
bwjijibwjibwjp,
while the correct answer should be:
bwjibwjibwjijp.

Lexicographical order is also known as dictionary order, since it is how the words are ordered in the dictionary.
Notice in the above words, “ji” is the prefix of “jibw“. The “sort and concatenate” method definitely does not work when there is a case where a word is a prefix of one or more other words.
Well, one might try a naive way of doing a brute force. Although it is highly inefficient, it works for this problem. This is because each input would be at most 9 words, and it turns out there are only a total of 9! = 362880 possible permutations of words being concatenated together. Therefore, one can generate all possible permutations and find the answer.
We make an easy observation that if all words in the list are of equal length, then sort + concatenate must yield the smallest dictionary order. In fact, a better argument would be:
If no word appears to be a prefix of any other words, then the simple sort + concatenate must yield the smallest dictionary order string.
To solve this problem correctly, we must also handle the special case where a word appears as a prefix of other words. One efficient and easy (non-trivial to prove but easy to reason) solution for this problem is to re-define the order relation of two words, s1 and s2, as:
    s1 is less than s2   iff (if and only if)

   
s1 + s2 < s2 + s1.
Then, by sorting and concatenating the words using the above ordering, it must yield the lexicographically smallest string. Why?
Here is a concrete example why this works. We use an example where the list of words are:
ji jibw jijibw
By the definition of s1 is less than s2 iff s1+s2 < s2+s1, we found that the lowest ordered word in the list is “jibw“. This is because “jibwji” < “jijibw” and “jibwjijibw” < “jijibwjibw“.
Now, the key to understand why the order relation s1+s2 < s2+s1 yields the smallest dictionary order is:
  • We have found the smallest-ordered word such that s1+x < x+s1. Therefore, it is impossible to swap the words to yield a smaller dictionary order. 
  • For a case with more words, then this order relation holds: s1+x < x+s1, and x+y < y+x. As swapping at any point could not possibly yield a smaller dictionary order, therefore s1+x+y must yield the smallest dictionary order. 
  • This result can be generalized to all M words by induction, due to the transitive property mentioned above.
I do not claim that this is a rigid or even a correctly constructed proof. However, it does help me in convincing myself this is a valid solution.
bool compareSort(const string &s1, const string &s2) {
  return s1 + s2 < s2 + s1;
}
 
int main() {
  string words[10];
  int N, M;
  cin >> N;
 
  for (int i = 0; i < N; i++) {
    cin >> M;
 
    for (int j = 0; j < M; j++)
      cin >> words[j];
 
    sort(words, words+M, compareSort);
 
    for (int j = 0; j < M; j++)
      cout << words[j];
    cout << endl;
  }
}
 

No comments:

Post a Comment