Skip to content

Exploratory data analysis to identify key ride sharing metrics

Notifications You must be signed in to change notification settings

Wamuza1/PyBer_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyBer_Analysis

Using Matplotlib created line, bar, scatter, bubble, pie, and box-and-whisker plots .Also, determine mean, median, and mode using Pandas, NumPy, and SciPy.

Overview of the analysis:

Pyber is a python based ride-sharing app company. We performed Exploratory data analysis from the CSV files. We created several types of visualization story from data to present complex finding in a way that is informative and engaging to all stakeholders. We used Pandas, SciPy, Numpy to perform various calculations and statistics analysis that would help us to demonstrate the relevance of the data such as:

  1. Number of rides for each city
  2. Which city types are generating the most money
  3. which types of cities need more driver support.

Using our Python skills and knowledge of Pandas, we created a summary DataFrame of the ride-sharing data by city type. Then, using Pandas and Matplotlib, we created a multiple-line graph that shows the total weekly fares for each city type. Finally, we submited a written report that summarizes how the data differs by city type and how those differences can be used by decision-makers at PyBer.

Results:

We used Python skills, Pandas libraries, and mattplotlib in jupyter notebook and retrieved a variety of outputs such as:

• The total rides for each city type: Rural 125 Suburban 625 Urban 1625

• The total drivers for each city type: Rural 78 Suburban 490 Urban 2405 have more drivers.

• The total amount of fares for each city type: Rural 4327.93 Suburban 19356.33 Urban 39854.38 have more fares.

• The average fare per ride for each city type: Rural 34.623440 have high average compared to Suburban 30.970128 Urban 24.525772

• The average fare per driver for each city type. Rural 55.486282 have also high average, whereas Suburban 39.502714 Urban 16.571468

PyBer summary DataFrame is created.

Screen Shot 2022-05-15 at 10 44 06 PM

Further we removed the index name("type") and formated the Pyber summary DataFrame to look like this:

Screen Shot 2022-05-15 at 10 45 27 PM

• Using groupby() we create a new DataFrame showing the sum of the fares for each date where the indices are the city type and date

Screen Shot 2022-05-15 at 10 57 48 PM

• After reseting the index on the DataFrame:

Screen Shot 2022-05-15 at 10 57 48 PM

• Created a new DataFrame from the pivot table DataFrame using loc on the given dates, and using the "resample()" function by week 'W' and get the sum of the fares for each week.

Screen Shot 2022-05-15 at 11 04 10 PM

• A multiple-line chart is created based on total fares for each city.

Screen Shot 2022-05-15 at 11 11 04 PM

  1. End of the Febrauary is high point for each three city types.
  2. Rural and urban cities have an increased is april whereas suburban cities have increased in end of april.
  3. Urban city has more volume in rides, drivers and fares compared to rural and suburban cities.

Provide three business recommendations to the CEO

1.Based on the graph and analysis, we can predict that we can generate more money in rural and suburban cities by offering various promotions to enhance new and existing cutomers.

  1. Hiring more drivers in rural and sububan cities may will also helpful to increase the business in each city types.

  2. It will be helpful to survey other factors such as other transportation services provied in these city types. Increasing a slight rise in fare rates in rural and suburban cities will help to generate company revenue.

About

Exploratory data analysis to identify key ride sharing metrics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published