"CPU Heat Sink" by Michael Dziedzic is licenced under Unsplash License
This kata focuses on practicing TDD with algorithms.
TDD does not 'magically' generate non-trivial algorithms.
TDD can help in a different way though. You can use it to validate that your algorithm works, and then use the tests to refactor your algorithm.
Here is the process (roughly):
- Think about the algorithm that will solve your problem.
- Think of an order of tests that will allow you to build this algorithm incrementally.
- Use TDD to write a crude algorithm that solves your problem, and that can later be re-factored into our target algorithm. This usually involves laying down the main blocks we will need, while using very low performance algorithms as much as possible.
- With the tests in place, refactor the algorithm to what we want.
We have a cluster of machines that each holds a very large list of integers. We want to get the median of all these numbers, but we cannot bring all the numbers on a single machine.
The central question is: how to compute the median value of a list of lists without concatenating all the lists? (A Median value does not need to be an element of the list, and can be a float value even if the lists only contain integers).
The definition of a median is that there are as many values that are greater as values that are smaller.
A simple way to find the median, is to search for a value that fits the definition by asking for the count of greater and smaller values to all lists, and then summing.
Here is a starting skeleton of the algorithm, that leaves place for incremental optimizations:
List<List<int>> values = ...
isMedian = function(n) {
totalCountSmaller = sum(countSmaller(n, values[0]), countSmaller(n, values[1]), countSmaller(n, values[2]), ...)
totalCountGreater = sum(countGreater(n, values[0]), countGreater(n, values[1]), countGreater(n, values[2]), ...)
return totalCountSmaller == totalCountGreater
}
maxValue = max(max(values[0]), max(values[1]), max(values[2]), ...)
minValue = min(min(values[0]), min(values[1]), min(values[2]), ...)
return findByPredicateBetween(minValue, maxValue, isMedian)
At first, we can implement all the bricks (min, max, findByPredicateBetween, countSmaller, countGreater) in a very basic brute force way.
The current solution is working, but it is slow! Your task now is to reach a faster solution! Therefore, do not attempt this step before completing part 1 (with a greedy solution).
For this, we will be using two techniques:
- Big(O) analysis of each function
- Code profiling: For this technique, you rely on tools to help you analyze the space and time complexity of your program. Refer to the below pages for more details:
Here are the steps we suggest you follow:
- Introduce a benchmark test
- Do a big(O) analysis. For each function, compute the Big(O) (comment the result at the top of the function)
- Run the profiler and compare the result to the results you got from the previous step.
- Based on the analysis you've done in previous steps, think of optimizations you can do
- Apply the optimizations
- Re-run the profiler and analyze the output. Was there any improvement in time?
- Plan further possible optimization (if any)
- Repeat steps 4-7
- Use a dichotomous search
- Sort the lists to speed up max and min
- Use a binary search for countSmaller and countGreater
- Compute countSmaller and countGreater in a single call (remember the lists are supposed to be on a different machine)
- Parallelize the computations happening on the different lists (BONUS: how to abstract this to keep the unit tests deterministic and single threaded?)
You can fill it from here
- TCR (Test && Commit || Revert) wrapper utility
- Collaborative timer for pairing or mobbing: mobti.me or agility timer
- 2-hour Randori in Pairs
- TDD on algorithms
- Acceptance tests
- Greedy or not?
- ⚠ Premature optimization
Kata-MedianListOfLists
and the accompanying materials are made available
under the terms of the MIT License which accompanies this
distribution, and is available at the Open Source site
See ACKNOWLEDGEMENTS.md for more information.
Damien Menanteau |
Ahmad Atwi |
Philippe Bourgau |
AntoineMx |