You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The solutions for problem no. 3, 5, 6 and 7 in Exercise-4 appear to be missing plotting data for the years with no roles for actresses(viz. 1900, 1905, 1907, 1909).
Can be verified by plotting a subset (using head() with plot()).
Can be fixed (See below) by using "fillna(0)" while 'unstacking' the series to df.
Surprisingly, the area plot (kind = 'area'), used in problem no.4, does not get affected by NaNs.
# Plot the number of actor roles each year
# and the number of actress roles each year
# over the history of film.
c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type') # Causing missing data in plot for NaNs
c = c.unstack('type').fillna(0) # No missing data
c.plot() # Verify by c.head(10).plot()
# Plot the difference between the number of actor roles each year
# and the number of actress roles each year over the history of film.
c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type') # Missing data
c = c.unstack('type').fillna(0) # No missing data
(c.actor - c.actress).plot()
# Plot the fraction of roles that have been 'actor' roles
# each year in the history of film.
c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type') # Missing data
c = c.unstack('type').fillna(0) # No missing data
c1 = c.head(100)
(c1.actor / (c1.actor + c1.actress)).plot(ylim=[0,1])
# Plot the fraction of supporting (n=2) roles
# that have been 'actor' roles
# each year in the history of film.
c = cast
c = c[c.n == 2]
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type') # Missing data
c = c.unstack('type').fillna(0) # No missing data
(c.actor / (c.actor + c.actress)).plot(ylim=[0,1])
The text was updated successfully, but these errors were encountered:
Thanks for pointing out this edge case in some of the oldest data! I'm not sure when I'll next go in to the tutorial to make revisions, but I'll keep this issue open so that when I do I can try making these adjustments.
The solutions for problem no. 3, 5, 6 and 7 in Exercise-4 appear to be missing plotting data for the years with no roles for actresses(viz. 1900, 1905, 1907, 1909).
Can be verified by plotting a subset (using head() with plot()).
Can be fixed (See below) by using "fillna(0)" while 'unstacking' the series to df.
Surprisingly, the area plot (kind = 'area'), used in problem no.4, does not get affected by NaNs.
The text was updated successfully, but these errors were encountered: