Some clarification for Chapter 8 Observer Bias model formulation #13

alexklibisz · 2018-01-08T00:36:48Z

Chapter 8 makes an interesting point about Observer Bias on the Red Line, but it took me a while to understand why the distribution over passengers' observed wait times is greater than the true wait times. After some thought it turns out I was assuming a more complicated model than the text. I don't think either model is unreasonable; my intuition just wasn't on the same page and I didn't find an explicit reason in the text to invalidate my model. The correct model might be obvious to most but perhaps the clarification below will help someone in the future:

The text reads:

The average time between trains, as seen by a ran- dom passenger, is substantially higher than the true average.
Why? Because a passenger is more like (sic) to arrive during a large interval than a small one. Consider a simple example: suppose that the time between trains is either 5 minutes or 10 minutes with equal probability. In that case the average time between trains is 7.5 minutes.
But a passenger is more likely to arrive during a 10 minute gap than a 5 minute gap; in fact, twice as likely. If we surveyed arriving passengers, we would find that 2/3 of them arrived during a 10 minute gap, and only 1/3 during a 5 minute gap. So the average time between trains, as seen by an arriving passenger, is 8.33 minutes.

For this to be true, I believe we have to assume a passenger arriving 0 minutes after the previous train has the same observed waiting time as a passenger arriving any arbitrary n > 0 minutes after the train. In other words, a passenger who just missed the previous train and waited the full gap is treated the same as a passenger who just barely made it the train.

My intuition was as follows: In reality, a passenger can arrive at the 9th minute of a 10 minute gap or the 4th minute of a 5 minute gap. Both passengers wait 1 minute. If you model it this way, the biased distribution actually shifts to the left. Why? Let's say there are two passengers arriving per minute (lam = 2). For a 2 minute gap, you might have the following wait times for 4 passengers: [0, 0, 1, 1]. For a 3 minute gap, you might have the following wait times for 6 passengers: [0, 0, 1, 1, 2, 2]. A passenger who waits 0 has arrived just before the train departs. For an n minute gap, wait time n-1 indicates the passenger arrived within the first minute after the previous train departed. From the 2-minute and 3-minute gaps above, you can deduce that across all trains P(wait n) < P(wait n-1). I.e., there is always be a chance for a passenger to wait 0 minutes. But for an e.g. 5 minute gap, it's impossible to wait 6 minutes.

Here is some code to simulate the process and the resulting histogram.

from math import floor
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)

n = 50000  # Number of trains.
l = 2     # Passengers arriving per minute.
T = np.random.normal(10, 2, n) # True time between trains.
W1 = []   # Passengers' observed waiting time (my initial formulation).
W2 = []   # Passengers' observed waiting time (Think Bayes Formulation).

for t in T:
    size = int(floor(t * l)) # This many passengers will end up on the next train.
    W1 += list(np.random.uniform(0, floor(t), size))
    W2 += list(np.ones(size) * t)

bins = int(T.max() - T.min())
plt.hist(T, color='red', bins=bins, alpha=0.3, normed=True, label='True wait $\mu=%.3lf$' % T.mean())
plt.hist(W1, color='blue', bins=bins, alpha=0.3, normed=True, label='Observed wait $\mu=%.3lf$' % np.mean(W1))
plt.hist(W2, color='green', bins=bins, alpha=0.3, normed=True, label='Observed wait simplified $\mu=%.3lf$' % np.mean(W2))
plt.legend(fontsize=8)
plt.show()

The text was updated successfully, but these errors were encountered:

AllenDowney · 2018-01-09T21:29:15Z

Alex, thanks for these comments. It will take me some time to process them, but I will get to it as soon as I can.

…

On Sun, Jan 7, 2018 at 7:36 PM, Alex Klibisz ***@***.***> wrote: Chapter 8 makes an interesting point about Observer Bias on the Red Line, but it took me a while to understand why the distribution over passengers' observed wait times is greater than the true wait times. After some thought it turns out I was assuming a more complicated model than the text. I don't think either model is unreasonable; my intuition just wasn't on the same page and I didn't find an explicit reason in the text to invalidate my model. The correct model might be obvious to most but perhaps the clarification below will help someone in the future: The text reads: The average time between trains, as seen by a ran- dom passenger, is substantially higher than the true average. Why? Because a passenger is more like (sic) to arrive during a large interval than a small one. Consider a simple example: suppose that the time between trains is either 5 minutes or 10 minutes with equal probability. In that case the average time between trains is 7.5 minutes. But a passenger is more likely to arrive during a 10 minute gap than a 5 minute gap; in fact, twice as likely. If we surveyed arriving passengers, we would find that 2/3 of them arrived during a 10 minute gap, and only 1/3 during a 5 minute gap. So the average time between trains, as seen by an arriving passenger, is 8.33 minutes. For this to be true, I believe we have to assume a passenger arriving 0 minutes after the previous train has the same observed waiting time as a passenger arriving any arbitrary n > 0 minutes after the train. In other words, a passenger who just missed the previous train and waited the full gap is treated the same as a passenger who just barely made it the train. My intuition was as follows: In reality, a passenger can arrive at the 9th minute of a 10 minute gap or the 4th minute of a 5 minute gap. Both passengers wait 1 minute. If you model it this way, the biased distribution actually shifts to the left. Why? Let's say there are two passengers arriving per minute (lam = 2). For a 2 minute gap, you might have the following wait times for 4 passengers: [0, 0, 1, 1]. For a 3 minute gap, you might have the following wait times for 6 passengers: [0, 0, 1, 1, 2, 2]. A passenger who waits 0 has arrived just before the train departs. For an n minute gap, wait time n-1 indicates the passenger arrived within the first minute after the previous train departed. From the 2-minute and 3-minute gaps above, you can deduce that across all trains P(wait n) < P(wait n-1). I.e., there is always be a chance for a passenger to wait 0 minutes. But for an e.g. 5 minute gap, it's impossible to wait 6 minutes. Here is some code to simulate the process and the resulting histogram. from math import floor import matplotlib.pyplot as plt import numpy as np np.random.seed(0) n = 50000 # Number of trains. l = 2 # Passengers arriving per minute. T = np.random.normal(10, 2, n) # True time between trains. W1 = [] # Passengers' observed waiting time (my initial formulation). W2 = [] # Passengers' observed waiting time (Think Bayes Formulation). for t in T: size = int(floor(t * l)) # This many passengers will end up on the next train. W1 += list(np.random.uniform(0, floor(t), size)) W2 += list(np.ones(size) * t) bins = int(T.max() - T.min()) plt.hist(T, color='red', bins=bins, alpha=0.3, normed=True, label='True wait $\mu=%.3lf$' % T.mean()) plt.hist(W1, color='blue', bins=bins, alpha=0.3, normed=True, label='Observed wait $\mu=%.3lf$' % np.mean(W1)) plt.hist(W2, color='green', bins=bins, alpha=0.3, normed=True, label='Observed wait simplified $\mu=%.3lf$' % np.mean(W2)) plt.legend(fontsize=8) plt.show() [image: figure_1] <https://user-images.githubusercontent.com/8015228/34655951-13ef059e-f3e0-11e7-8aa6-f7bbd2a9ee3c.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABy37ZyayRQa4vgwF_DJdseXlYa6TZbaks5tIWMhgaJpZM4RV2cz> .

alexklibisz · 2018-01-10T00:34:52Z

@AllenDowney It's nothing urgent and not explicitly a problem. Just figured I'd post it in case someone else overcomplicates the problem like I did and gets confused at chapter 8. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clarification for Chapter 8 Observer Bias model formulation #13

Some clarification for Chapter 8 Observer Bias model formulation #13

alexklibisz commented Jan 8, 2018

AllenDowney commented Jan 9, 2018 via email

alexklibisz commented Jan 10, 2018

Some clarification for Chapter 8 Observer Bias model formulation #13

Some clarification for Chapter 8 Observer Bias model formulation #13

Comments

alexklibisz commented Jan 8, 2018

AllenDowney commented Jan 9, 2018 via email

alexklibisz commented Jan 10, 2018