This section should also be completed on HackerRank. The information below will help give you a surface level introduction to Pandas. A widely used Data Science library in Python.
Pandas is a Python library for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Pandas provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
The name is derived from the term "Panel data", an econometric term for multidimensional structured data sets.
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
Review the Jupyter Notebook 1_pandas_jeopardy_example.ipynb
which uses the jeopardy.csv
data.
groupby
objects- applying functions
- indexing
- conditional selecting; filtering
- selecting rows and columns:
.loc
,.iloc
- working with missing data:
NaN
,None
- sorting
- merge, join