Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem no. 3, 5, 6 and 7 in Exercise-4 #37

Open
saurabhp75 opened this issue Nov 27, 2017 · 1 comment
Open

Problem no. 3, 5, 6 and 7 in Exercise-4 #37

saurabhp75 opened this issue Nov 27, 2017 · 1 comment
Assignees

Comments

@saurabhp75
Copy link

The solutions for problem no. 3, 5, 6 and 7 in Exercise-4 appear to be missing plotting data for the years with no roles for actresses(viz. 1900, 1905, 1907, 1909).
Can be verified by plotting a subset (using head() with plot()).
Can be fixed (See below) by using "fillna(0)" while 'unstacking' the series to df.
Surprisingly, the area plot (kind = 'area'), used in problem no.4, does not get affected by NaNs.

# Plot the number of actor roles each year
# and the number of actress roles each year
# over the history of film.

c = cast 
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')              # Causing missing data in plot for NaNs
c = c.unstack('type').fillna(0)     # No missing data 
c.plot()                            # Verify by c.head(10).plot()




# Plot the difference between the number of actor roles each year
# and the number of actress roles each year over the history of film.

c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0)   # No missing data  
(c.actor - c.actress).plot()



# Plot the fraction of roles that have been 'actor' roles
# each year in the history of film.
c = cast
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0)   # No missing data
c1 = c.head(100)
(c1.actor / (c1.actor + c1.actress)).plot(ylim=[0,1])



# Plot the fraction of supporting (n=2) roles
# that have been 'actor' roles
# each year in the history of film.

c = cast
c = c[c.n == 2]
c = c.groupby(['year', 'type']).size()
#c = c.unstack('type')            # Missing data 
c = c.unstack('type').fillna(0) # No missing data
(c.actor / (c.actor + c.actress)).plot(ylim=[0,1]) 
@brandon-rhodes brandon-rhodes self-assigned this Dec 5, 2017
@brandon-rhodes
Copy link
Owner

Thanks for pointing out this edge case in some of the oldest data! I'm not sure when I'll next go in to the tutorial to make revisions, but I'll keep this issue open so that when I do I can try making these adjustments.

@brandon-rhodes brandon-rhodes removed their assignment Jul 23, 2018
@brandon-rhodes brandon-rhodes self-assigned this Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants