- Many companies quiz you on these
- On the job, knowledge of algorithms beyond the basics is unnecessary
- It does tend to be a common staple of eng interviews
- Phone screen with recruiter
- Maybe two technical phone screens
- Often a coding test
- In office interviews
- Often there's an obvious 'naive' solution
- I start by describing that, in case I crash and burn trying to think of something better
- I then start thinking about how the constraints of this particular problem might let me do better
- Talk through the problem so the interviewer sees what you're thinking
- These problems are expected to be tough; the interviewer will talk to you as you discuss the problem, so listen for hints
- Find an item in a list
- See solution
- We'll be answering the question: How long does this solution take?
- Expresses solution time as a function of problem complexity
- Problem complexity usually means the number of items
- Big-Oh is not about absolute solution time (5sec, 10min, 1hr)
- Big-Oh is about scalability: how the solution time scales as the problem instance gets harder
- What is the problem complexity a function of?
- The number of items in the list, which we call n
- As a function of n, how long does this algorithm take?
- In the worst case, we must consider every item in the list
- There are n items
- We say the time complexity is O(n), or linear
- We usually mostly care about the worst case time complexity
- Time complexity is not measured in time, per se, but the number of
primitive operations
- Primitive operations are basic math (+, -, *, /), comparison, swapping
- We assume all primitive operations take constant time, which we call O(1)
- If our algorithm takes proportionally n constant time operations, we call it O(n)
- We don't care about the exact number of operations, but the
scalability of a solution
- Not about fast or slow, so much as will your algo explode when I throw big problems as it
- For instance, O(n) time complexity means that if you double your problem size, your solution time will approximately double
- Asymptotic behavior is about behavior in the limit
- For "large enough" problem instances, an asymptotically superior solution will eventually be faster
- We throw away all constants
- If algo A always takes twice as long as algo B, they have the same asymptotic time complexity
- We throw away low order terms
- O(n^2 + n) = O(n^2)
- We care about the dominant behavior; for big enough problems, the lower order terms are negligible relative to higher order terms
- Ex: when n=100, n^2=10,000
- What parameter expresses the problem size?
- What is the time complexity?
- Hint: time complexity is normally in loops, since loops will take longer as you loop through more stuff
- What parameter expresses the problem size?
- Still n; 2 is a fixed number
- How does this compare, by wall clock, to find_min?
- How does this compare, in Big-Oh, to find_min?
- Both loop over n items
- find_two_min does about twice as much work per item
- But more importantly, they both do a constant amount of work per item
- What parameters express the problem size?
- Both n, the number of elements, and k, the number of mins to find
- Outer loop: still n times
- How much work inside the loop?
- min_items.each_with_index.max hides a loop of size k
- Complexity: O(nk) time
- We would say this algorithm is linear in both n and k
- Doubly nested for loop, each considers all n elements
- Means O(n^2) time complexity
- As a practical matter, O(n^2) is not particularly scalable, quickly becomes intractable
- Algorithms that, worst-case, consider every pair, are all O(n^2)
- But there's a trick!
- Two unnested for loops
- How much work is done inside the for loops?
- Sneak peek: O(1) (amortized) work
- Total work: O(n)
- The most primitive operations are really basic
- Data structures are the bigger building blocks
- Ex: vector, list, hash and tree maps, heap
- Those are all the data structures you need to know for interviews
- On the job is even easier: heaps rarely come up
- Operations:
- index: find the _i_th item
- push_front: put an element on the front
- push_back: push an element to the back
- Vectors/lists have different time complexities for these operations
- You'll choose one over the other based on what you're going to do with it
- A list is a chain of nodes
- Each node holds an item, and possibly a reference to a next node
- You iterate through the list by starting from the head, or first node, and following the references
- index: takes time worst-case O(n) to visit the last element, because you must visit every node
- push_front: O(1), since you just create a new head node, and link it to the old head
- push_back: it already takes O(n) time to traverse to the end
- There's a list variant that holds a reference to both front and back of the list, in which case both push_front and push_back take O(1); index still is O(n) (why?)
- Question: how would you insert an item into the middle of a list?
- index was slow for lists because we had to follow every link
- A vector holds all its items in a single block of memory (store), one next to the other
- Each entry has a fixed width, or size, which is the number of bytes to store the object
- If the store starts at memory address p, then the _i_th element is at address p + (i * size(item))
- Any address can be loaded in O(1) time, so index is O(1)
- It's not vital to know the details of a vector implementation, but a little understanding will help you remember its performance
- Because of how computers work, it is difficult to resize the store
- We have to allocate an entirely new store, then copy everything over
- Resize has what time complexity?
- If the store is at full capacity, and we want to push_front or push_back, we have to do a resize
- To avoid resizes, it's typical to double the store size on every resize
- So if we have to copy n elements on this resize, we increase the store size to 2n, which buys us another n insertions before the next resize
- Which brings us to amortized time complexity: which is O(1)
- A set has three basic operations:
- Insertion
- Lookup
- Iteration
- Two major kinds: tree set and hash set
- A tree is like a list, but a parent can have multiple children
- Nodes with no children are called leaves; the parent of everything is the root
- The relationship between parent-child and siblings defines the tree properties
- In a tree set, each parent has at most two children; a left and right child
- The left child must be less than the parent, the right child is greater than the parent
- How would you search/insert into the tree?
- Operations are defined by the depth of the tree
- Trivial worst case: every node has one child, O(n) depth
- Best case: every node has two children; how deep?
- O(log(n))
- algorithms that consider half the items at each step are typically logarithmic
- Self-balancing trees do some magic to ensure the tree depth is O(log(n))
- you don't need to know how these work; just that they exist
- A hash function maps an item to an almost random number
- The hash id is not guaranteed unique, but hash collisions should be rare
- A hash set has a store, each element in the store is called a bucket
- Values are mapped to buckets by their hash id, modulo the size of the store
- Buckets may contain more than one item, if hashes collide
- As we add more items buckets fill up
- It is typical to keep the hash set at a fix load, or num_items / num_buckets
- Periodic resizes take O(n) time, but we repeat the doubling trick
- Hash calculation takes O(1), bucket lookup takes O(1)
- On average, bucket contains load items, which is average O(1)
- Both a tree set and hash set can be transformed into a map
- Maps keys to values
- Store the keys as usual, and keep an extra reference to the value
- O(1); "constant" time; ideal; ex: lookup in a hash map
- O(log(n)); "logarithmic" time; fast; ex: search in a sorted list
- consider half the items each time
- O(n); good; ex: search in an unsorted list, consider every item
- O(nlog(n)); slow; ex: sorting a list
- O(n**2); "quadratic" time; bad; ex: considering every pair
- O(2**n); "exponential" time; intractable; ex: consider every subset
- O(n!); "factorial" time; worst-case; ex: consider every permutation