Skip to content

Latest commit

 

History

History
234 lines (209 loc) · 8.95 KB

outline.md

File metadata and controls

234 lines (209 loc) · 8.95 KB

Why Algorithms?

  • Many companies quiz you on these
  • On the job, knowledge of algorithms beyond the basics is unnecessary
  • It does tend to be a common staple of eng interviews

Typical Interview process

  1. Phone screen with recruiter
  2. Maybe two technical phone screens
  3. Often a coding test
  4. In office interviews

Algorithms interview advice

  • Often there's an obvious 'naive' solution
  • I start by describing that, in case I crash and burn trying to think of something better
  • I then start thinking about how the constraints of this particular problem might let me do better
  • Talk through the problem so the interviewer sees what you're thinking
  • These problems are expected to be tough; the interviewer will talk to you as you discuss the problem, so listen for hints

find_item

  • Find an item in a list
  • See solution
  • We'll be answering the question: How long does this solution take?

Big-Oh (Asymptotic Complexity)

  • Expresses solution time as a function of problem complexity
  • Problem complexity usually means the number of items
  • Big-Oh is not about absolute solution time (5sec, 10min, 1hr)
  • Big-Oh is about scalability: how the solution time scales as the problem instance gets harder

find_item complexity

  • What is the problem complexity a function of?
  • The number of items in the list, which we call n
  • As a function of n, how long does this algorithm take?
  • In the worst case, we must consider every item in the list
  • There are n items
  • We say the time complexity is O(n), or linear

Big-Oh (cont.)

  • We usually mostly care about the worst case time complexity
  • Time complexity is not measured in time, per se, but the number of primitive operations
    • Primitive operations are basic math (+, -, *, /), comparison, swapping
    • We assume all primitive operations take constant time, which we call O(1)
  • If our algorithm takes proportionally n constant time operations, we call it O(n)

Asymptotic behavior

  • We don't care about the exact number of operations, but the scalability of a solution
    • Not about fast or slow, so much as will your algo explode when I throw big problems as it
  • For instance, O(n) time complexity means that if you double your problem size, your solution time will approximately double
  • Asymptotic behavior is about behavior in the limit
    • For "large enough" problem instances, an asymptotically superior solution will eventually be faster

Asymptotic behavior (cont.)

  • We throw away all constants
    • If algo A always takes twice as long as algo B, they have the same asymptotic time complexity
  • We throw away low order terms
    • O(n^2 + n) = O(n^2)
    • We care about the dominant behavior; for big enough problems, the lower order terms are negligible relative to higher order terms
    • Ex: when n=100, n^2=10,000

find_min

  • What parameter expresses the problem size?
  • What is the time complexity?
  • Hint: time complexity is normally in loops, since loops will take longer as you loop through more stuff

find_two_min

  • What parameter expresses the problem size?
  • Still n; 2 is a fixed number
  • How does this compare, by wall clock, to find_min?
  • How does this compare, in Big-Oh, to find_min?
  • Both loop over n items
  • find_two_min does about twice as much work per item
  • But more importantly, they both do a constant amount of work per item

find_k_min

  • What parameters express the problem size?
  • Both n, the number of elements, and k, the number of mins to find
  • Outer loop: still n times
  • How much work inside the loop?
  • min_items.each_with_index.max hides a loop of size k
  • Complexity: O(nk) time
  • We would say this algorithm is linear in both n and k

find_zero_sum_pair

  • Doubly nested for loop, each considers all n elements
  • Means O(n^2) time complexity
  • As a practical matter, O(n^2) is not particularly scalable, quickly becomes intractable
  • Algorithms that, worst-case, consider every pair, are all O(n^2)
  • But there's a trick!

find_zero_sum_pair2

  • Two unnested for loops
  • How much work is done inside the for loops?
  • Sneak peek: O(1) (amortized) work
  • Total work: O(n)

Data structures

  • The most primitive operations are really basic
  • Data structures are the bigger building blocks
  • Ex: vector, list, hash and tree maps, heap
  • Those are all the data structures you need to know for interviews
  • On the job is even easier: heaps rarely come up

Vectors and lists

  • Operations:
  • index: find the _i_th item
  • push_front: put an element on the front
  • push_back: push an element to the back
  • Vectors/lists have different time complexities for these operations
  • You'll choose one over the other based on what you're going to do with it

List

  • A list is a chain of nodes
  • Each node holds an item, and possibly a reference to a next node
  • You iterate through the list by starting from the head, or first node, and following the references
  • index: takes time worst-case O(n) to visit the last element, because you must visit every node
  • push_front: O(1), since you just create a new head node, and link it to the old head
  • push_back: it already takes O(n) time to traverse to the end
  • There's a list variant that holds a reference to both front and back of the list, in which case both push_front and push_back take O(1); index still is O(n) (why?)
  • Question: how would you insert an item into the middle of a list?

Vector

  • index was slow for lists because we had to follow every link
  • A vector holds all its items in a single block of memory (store), one next to the other
  • Each entry has a fixed width, or size, which is the number of bytes to store the object
  • If the store starts at memory address p, then the _i_th element is at address p + (i * size(item))
  • Any address can be loaded in O(1) time, so index is O(1)
  • It's not vital to know the details of a vector implementation, but a little understanding will help you remember its performance

Vector mutations

  • Because of how computers work, it is difficult to resize the store
  • We have to allocate an entirely new store, then copy everything over
  • Resize has what time complexity?
  • If the store is at full capacity, and we want to push_front or push_back, we have to do a resize
  • To avoid resizes, it's typical to double the store size on every resize
  • So if we have to copy n elements on this resize, we increase the store size to 2n, which buys us another n insertions before the next resize
  • Which brings us to amortized time complexity: which is O(1)

Sets

  • A set has three basic operations:
  • Insertion
  • Lookup
  • Iteration
  • Two major kinds: tree set and hash set

Tree set

  • A tree is like a list, but a parent can have multiple children
  • Nodes with no children are called leaves; the parent of everything is the root
  • The relationship between parent-child and siblings defines the tree properties
  • In a tree set, each parent has at most two children; a left and right child
  • The left child must be less than the parent, the right child is greater than the parent
  • How would you search/insert into the tree?

Tree set performance

  • Operations are defined by the depth of the tree
  • Trivial worst case: every node has one child, O(n) depth
  • Best case: every node has two children; how deep?
  • O(log(n))
  • algorithms that consider half the items at each step are typically logarithmic
  • Self-balancing trees do some magic to ensure the tree depth is O(log(n))
  • you don't need to know how these work; just that they exist

Hash set

  • A hash function maps an item to an almost random number
  • The hash id is not guaranteed unique, but hash collisions should be rare
  • A hash set has a store, each element in the store is called a bucket
  • Values are mapped to buckets by their hash id, modulo the size of the store
  • Buckets may contain more than one item, if hashes collide
  • As we add more items buckets fill up

Hash set performance

  • It is typical to keep the hash set at a fix load, or num_items / num_buckets
  • Periodic resizes take O(n) time, but we repeat the doubling trick
  • Hash calculation takes O(1), bucket lookup takes O(1)
  • On average, bucket contains load items, which is average O(1)

Sets vs maps

  • Both a tree set and hash set can be transformed into a map
  • Maps keys to values
  • Store the keys as usual, and keep an extra reference to the value

A quick overview to common time complexities

  • O(1); "constant" time; ideal; ex: lookup in a hash map
  • O(log(n)); "logarithmic" time; fast; ex: search in a sorted list
  • consider half the items each time
  • O(n); good; ex: search in an unsorted list, consider every item
  • O(nlog(n)); slow; ex: sorting a list
  • O(n**2); "quadratic" time; bad; ex: considering every pair
  • O(2**n); "exponential" time; intractable; ex: consider every subset
  • O(n!); "factorial" time; worst-case; ex: consider every permutation