Interview Questions from Data Science Prep
There is a fair coin (one side heads, one side tails) and an unfair coin (both sides tails). You pick one at random, flip it 5 times, and observe that it comes up as tails all five times. Whatis the chance that you are flipping the unfair coin?
Question 1 in the pdf
file.
Say we are given a list of several categories (for example, the strings: A, B, C, and D) and want to sample from a list of such categories according to a particular weighting scheme. Such an example would be: for 100 items total, we want to see A 20% of the time, B 15% of the time, C 35% of the time, and D 30% of the time. How do we simulate this? What if we care about an arbitrary number of categories and about memory usage?
Code is here.
This problem was asked by Lyft.
What is the expected number of coin flips needed to get two consecutive heads?
Analytical solution in the Question 2 section in the pdf
file. Empirical evaluation is here.
You are drawing from a normally distributed random variable X ~ N(0, 1) once a day. What is the approximate expected number of days until you get a value of more than 2?
Analytical solution in the Question 3 section of the pdf
file. Empirical evaluation is here.
Assume you have the below events table on app analytics. Write a query to get the click-through rate per app in 2019.
column_name type
app_id integer
event_id string ("impression", "click")
timestamp datetime
SQL query is here.
A coin was flipped 1000 times, and 550 times it showed up heads. Do you think the coin is biased? Why or why not?
Solution is in the Question 4 section of the pdf
file. The computation is here. I've also added an additional Bayesian Modeling approach to this problem using pymc3
here.
What is the expected number of rolls needed to see all 6 sides of a fair die?
Solution is in the Question 5 section of the pdf
file.
There are two games involving dice that you can play. In the first game, you roll two die at once and get the dollar amount equivalent to the product of the rolls. In the second game, you roll one die and get the dollar amount equivalent to the square of that value. Which has the higher expected value and why?
Solution is in the Question 6 section of the pdf
file.
Say you are given an unfair coin, with an unknown bias towards heads or tails. How can you generate fair odds using this coin?
Solution is in the Question 7 section of the pdf
file. Code is here.
Three ants are sitting at the corners of an equilateral triangle. Each ant randomly picks a direction and starts moving along the edge of the triangle. What is the probability that none of the ants collide? Now, what if it is k ants on all k corners of an equilateral polygon?
Solution is in the Question 8 section of the pdf
file.
Write a program to generate the partitions for a number n
. A partition for n
is a list of positive integers that sum up to n.
For example: if n = 4
, we want to return the following partitions: [1,1,1,1], [1,1,2], [2,2], [1,3]
, and [4]
. Note that a partition[1,3]
is the same as [3,1]
so only the former is included.
Code is here.
Say you need to produce a binary classifier for fraud detection. What metrics would you look at, how is each defined, and what is the interpretation of each one?
Question 9 of the pdf
file.
Say you are given a random Bernoulli trial generator. How would you generate values from a standard normal distribution?
Code is here.
This problem was asked by Robinhood.
Write a program to calculate correlation (without any libraries except for math) for two lists X and Y.
Code is here.
You and your friend are playing a game. The two of you will continue to toss a coin until the sequence HH or TH shows up. If HH shows up first, you win. If TH shows up first, your friend wins. What is the probability of you winning?
Question 10 of the pdf
file and a simulation is here
A and B are playing the following game: a number k from 1-6 is chosen, and A and B will toss a die until the first person sees the side k, and that person gets $100. How much is A willing to pay to play first in this game?
Question 11 of the pdf
file.
Given a list of positive integers, return the maximum increasing subsequence, that is, the largest increasing subsequence within the array that has the maximum sum. Examples: if the input is [5, 4, 3, 2, 1] then return 5 (since no subsequence is increasing), if the input is [3, 2, 5, 7, 6] return 15 = 3 + 5 + 7, etc.
Code is here.
A and B are playing a game where A has n+1 coins, B has n coins, and they each flip all of their coins. What is the probability that A will have more heads than B?
Question 12 of the pdf
file.
Facebook has a content team that labels pieces of content on the platform as spam or not spam. 90% of them are diligent raters and will label 20% of the content as spam and 80% as non-spam. The remaining 10% are non-diligent raters and will label 0% of the content as spam and 100% as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled 4 pieces of content as good, what is the probability that they are a diligent rater?
Question 13 of the pdf
file.
Say you have an unfair coin which will land on heads 60% of the time. How many coin flips are needed to detect that the coin is unfair?
Question 14 of the pdf
file.
You have the entire social graph of Facebook users, with nodes representing users and edges representing friendships between users. Given the edges of the graph and the number of nodes, write a function to return the smallest number of friendships in-between two users.
Code is here.
A fair die is rolled n
times. What is the probability that the largest number rolled is r
, for each r
in 1..6
?
Question 15 of the pdf
file.
Given a binary tree, write a function to determine whether the tree is a mirror image of itself. Two trees are a mirror image if their root values are the same and the left subtree is a mirror image of the right subtree.
Code is here.
Say you model the lifetime for a set of customers using an exponential distribution with parameter λ
, and you have the lifetime history (in months) of n
customers. What is the MLE for λ
?
Question 16 in the pdf
file
Dropbox has just started and there are two servers that service users: a faster server and a slower server. When a user is on the website, they are routed to either server randomly, and the wait time is exponentially distributed with two different parameters. What is the probability density of a random user's waiting time?
Question 17 in the pdf
file.
This problem was asked by Stripe.
Estimate π
using a Monte Carlo method. Hint: think about throwing darts on a square and seeing where they land within a circle.
Code is here.
This problem was asked by Lyft.
A fair coin is tossed n times. Given that there were k heads in the n tosses, what is the probability that the first toss was heads?
Question 18 in the pdf
file.
Say that there are n topics on Twitter and there is a notion of topics being related. Specifically, if topic A is related to topic B, and topic B is related to topic C, then topic A is indirectly related to topic C.
Define a topic group to be any group of topics that either directly or indirectly related. Given an n by n adjacency matrix N, where N[i][j] = 1
if topic i
and topic are j
related and 0 otherwise, write a function to determine how many topic groups are there.
Code is here.
A biased coin, with probability p
of landing on heads, is tossed n
times. Write a recurrence relation for the probability that the total number of heads after n
tosses is even.
Question 19 of the pdf
file has the solution and an implementation of the solution is here.
Given n
distinct integers, write a function to generate all permutations of those integers.
Code is here.
Say that you are pushing a new feature X
out. You have 1000
users and each user is either a fan or not a fan of X, at random. There are 50
users of 1000
that do not like X.
You will decide whether to ship the feature or not based on sampling 5 users independently and if they all like the feature, you will ship it. What is the probability that you will ship the feature?
Question 20 of the pdf
file.
Given an integer n and an integer k, output a list of all of the combinations of k numbers from 1 to n.
For example, if the n = 3
, and k = 2
then return: [1, 2], [1, 3], [2, 3]
.
Code is here.
You are given an m
by n
matrix with 0s
and 1s
, where a 1
represents an obstacle and a 0
represents no obstacle. Determine the number of ways to navigate from the top-left corner of the matrix to the bottom right corner given that at any point in time there is only a move down or to the right as long as there is not an obstacle in that spot.
For example, if the matrix is given by: [[0, 0, 0], [1, 1, 0], [0, 1, 0]]
then you should return 1
since there is exactly one path.
Code is here.
You are testing a new feature with various sample groups of three people. Assume that each person is equally likely to be a fan or not a fan of the feature. What is the probability that a randomly chosen group has exactly one fan, given that there is a fan among the three?
Question 21 of the pdf
file.
Before a show is released, it is shown to several in-house raters. You assume there are two types of shows: hits, which have an 80%
chance of being liked by any viewer, and misses, which have a 20%
chance of being liked by any viewer. There is currently a new show which you believe has a prior distribution of 60%
being a hit, and 40%
being a miss. Given that 8
raters rated the show and 6
of the 8
liked the show, what is the new posterior distribution of being a hit or miss?
Question 22 of the pdf
file.
Given a string, return the count of substrings within the string that are palindromes.
For example, if input is aba
: return 4
, since the palindromes are: a
, b
, a
, and aba
.
Code is here.
Given two arrays, write a function to get the intersection of the two.
For example, if A = [2, 4, 1, 5, 0]
, and B = [3, 4, 5]
then you should return [4, 5].
Code is here.
You are modeling the wait time a customer has for a support call as exponentially distributed with a mean of 10
minutes. Suppose a customer calls in and is told that all lines are currently busy, and the most recent last spot was occupied 5
minutes ago. What is the probability that the current customer will need to wait no more than another 5
minutes?
Question 23 of the pdf
file.
Alice and Bob are choosing their top 3
shows from a list of 50
shows. Assume that they choose independently of one another. Being relatively new to Hulu, assume also that they choose randomly within the 50
shows. What is the expected number of shows they have in common, and what is the probability that they do not have any shows in common?
Question 24 of the pdf
file and the simulation is here.
Given a string with lowercase characters and left and right parentheses, remove the minimum number of parentheses so that the string is valid.
For example, if the string is )a(b((cd)e(f)g)
then return ab((cd)e(f)g)
Code is here.
Consider a Bernoulli random variable with parameter p.
Say you observe the following samples: [1, 0, 1, 1, 1]
. What is the log likelihood function for p
and what is the MLE of p
?
Question 25 of the pdf
file.
Given a number x
, define a palindromic subset as any subsequence within x
that is a palindrome. Write a function that returns the number of digits of the longest palindromic subset.
For example, if x
is 93567619
then you should return 5
since the longest subset would be 96769
, which is a 5
digit number.
Code is here.