This week’s algorithm problem
Each week we are finding a new algorithm problem to present as a challenge to the class (and in a blog). Since my choice last week was way too big and had to be boiled down a lot to create the actual challenge, I backed off a little and went with a more interview-style algorithm problem this week.
So you have a bag, filled with all the numbers between 1 and 100. You take one at random, and throw it away without looking at it. How do you know which one is missing?
The “sorted” variation
There are many variations on the problem, and many ways to solve it. Let’s look first at the case where we assume there is exactly one number missing and these numbers are already in a sorted array.
Since the array is sorted, the most direct approach would be to go through the array starting at the beginning, check each number, and compare it to the previous number. If they differ by anything other than 1, you can assume that the number between them is the missing number. This seems like a naive, brute-force solution, but actually turns out to not be too bad. It has a worst-case of O(n), and a potentially much better best-case because you can usually break before traversing the entire loop.
The “unsorted” variation
The next version is a little harder — there is still exactly one number missing, but the numbers are now in random order.
Now, in order to use our iterative solution, we’d have to first sort the array, which automatically increases the time efficiency to O(n log n) since even the best sorting algorithms are really pretty slow.
It turns out there’s a better way using math. There is something called an arithmetic series that looks a whole lot like what we are working with here:
1, 2, 3, 4,…,100
There’s a formula to find the sum of an arithmetic series. The numbers don’t even have to count by one. As long as they count by a predictable amount, you can find the sum of the series. Here it is:
The sum of an arithmetic series (from Wikipedia)
Armed with this formula, we can find the sum of the complete series easily, and then subtract the actual sum of our incomplete series and we have our missing number!
missing = ( 100 * ( 1 + 100 ) / 2 ) - ( sum_of_incomplete_series )
This solution is nice and elegant. But there’s a little problem — the time complexity is always going to be O(n) due to the fact that in order to find the sum of the incomplete series we still need to iterate through the entire array. In fact, any solution to this problem will have a worst-case of O(n) — you must traverse the array at least once in order to find which one is missing.
One interesting thing I found while working on this problem is that for a sorted array, the naive solution will almost always be faster. This is because as long as the missing number is not the last number, you can break your iteration before you iterate through the entire loop. I confirmed this with benchmarks — sometimes it is much, much, faster.
For an unsorted array, the math way will always be better because then you don’t have to sort the array first! Using benchmarks, I found a small but consistent and significant improvement over sort-and-iterate.
More than one missing
What if there is more than one missing? If they are sorted, this really only requires a little adjustment for the iterative approach to work. You simply keep going until you have the number of missing numbers you need. If you don’t know how many are missing, you can get the array length and subtract that from the length of a “full” array. If the array is unsorted, we of course have to sort it first just like before.
What about doing this mathematically? Well, there’s a way! It’s not an easy way, but it’s a way. I’m not going to go into it here, but check out this stack overflow question if you are interested. As noted in this question, most non-math-majors would have some difficulty coming up with this on their own off the top of their heads. But it’s pretty cool anyway.
So that’s the “missing number” problem. I hope you found it enlightening.