Skip to content

Commit cf098dc

Browse files
committed
Python for data science 1e
1 parent 343108c commit cf098dc

26 files changed

+206
-92
lines changed

Dockerfile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ RUN mamba env create -f environment.yml
3030
# Make RUN commands use the new environment:
3131
SHELL ["conda", "run", "-n", "python4DS", "/bin/bash", "-c"]
3232

33-
RUN pip install --pre -U seaborn
34-
3533
RUN mamba list
3634

3735
# Copy the current directory contents into the container at /app

boolean-data.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -715,7 +715,7 @@
715715
"name": "python",
716716
"nbconvert_exporter": "python",
717717
"pygments_lexer": "ipython3",
718-
"version": "3.9.12"
718+
"version": "3.10.12"
719719
},
720720
"toc-showtags": true
721721
},

command-line.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
extension: .md
66
format_name: myst
77
kernelspec:
8-
display_name: Python4DS
8+
display_name: py4ds2e
99
language: python
1010
name: python3
1111
---

data-import.ipynb

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,9 +94,7 @@
9494
"```python\n",
9595
"import os\n",
9696
"\n",
97-
"# get current working directory (cwd)\n",
98-
"os.getcwd()\n",
99-
"\n",
97+
"os.getcwd() # get current working directory (cwd)\n",
10098
"```\n",
10199
"\n",
102100
"Say this comes back with 'python4DS', then your downloaded data should be in 'python4DS/data/students.csv'."
@@ -441,7 +439,7 @@
441439
"name": "python",
442440
"nbconvert_exporter": "python",
443441
"pygments_lexer": "ipython3",
444-
"version": "3.9.12"
442+
"version": "3.10.12"
445443
},
446444
"toc-showtags": true
447445
},

data-tidy.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,7 @@
366366
"name": "python",
367367
"nbconvert_exporter": "python",
368368
"pygments_lexer": "ipython3",
369-
"version": "3.9.12"
369+
"version": "3.10.12"
370370
},
371371
"toc-showtags": true
372372
},

data-visualise.ipynb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,7 @@
118118
"\n",
119119
"In this context, a variable refers to an attribute of all the penguins, and an observation refers to all the attributes of a single penguin.\n",
120120
"\n",
121-
"Type the name of the data frame in the interactive window and Python will print a preview of its contents.\n",
122-
"Note that it says `shape` on top of this preview: that's the shape of your data (344 rows, 8 columns)."
121+
"Type the name of the data frame in the interactive window and Python will print a preview of its contents."
123122
]
124123
},
125124
{

dates-and-times.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1093,7 +1093,7 @@
10931093
"name": "python",
10941094
"nbconvert_exporter": "python",
10951095
"pygments_lexer": "ipython3",
1096-
"version": "3.9.12"
1096+
"version": "3.10.12"
10971097
},
10981098
"toc-showtags": true
10991099
},

functions.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -496,7 +496,7 @@
496496
"name": "python",
497497
"nbconvert_exporter": "python",
498498
"pygments_lexer": "ipython3",
499-
"version": "3.9.12"
499+
"version": "3.10.12"
500500
},
501501
"toc-showtags": true
502502
},

introduction.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,15 +120,15 @@
120120
"\n",
121121
"Another possibility is that your big data problem is actually a large number of small data problems in disguise. Each individual problem might fit in memory, but you have millions of them. For example, you might want to fit a model to each person in your dataset. This would be trivial if you had just 10 or 100 people, but instead you have a million. Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like [Hadoop](https://hadoop.apache.org/) or [Spark](https://spark.apache.org/)) that allows you to send different datasets to different computers for processing. Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **pyspark** to solve it for the full dataset.\n",
122122
"\n",
123-
"### R, Julia, and friends\n",
123+
"### Julia and R\n",
124124
"\n",
125-
"In this book, you won't learn anything about R, Julia, or any other programming language useful for data science. This isn't because we think these tools are bad. They're not! And in practice, most data science teams use a mix of languages. However, you may find it easier to learn one set of tools at a time. In this book you'll see what we think of as the three critical tools for data science:\n",
125+
"In this book, you won't learn anything about R or Julia, which are both sometimes used for data science. This isn't because we think these tools are bad. They're not! In this book you'll see what we think of as the three critical tools for data science:\n",
126126
"\n",
127127
"- Python\n",
128128
"- SQL\n",
129129
"- command line scripting\n",
130130
"\n",
131-
"This book predominantly uses Python, which is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n",
131+
"These are the three languages that will get you a job as a data scientist, and that's a very good reason to focus on them. We'll spend most of our time with Python, and for good reason. Python is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n",
132132
"\n",
133133
"We think Python is a great place to start your data science journey because it is the most popular tool for data science and programming more generally, with a large community behind it.\n",
134134
"\n",
@@ -191,7 +191,7 @@
191191
"name": "python",
192192
"nbconvert_exporter": "python",
193193
"pygments_lexer": "ipython3",
194-
"version": "3.9.12"
194+
"version": "3.10.12"
195195
},
196196
"toc-showtags": true
197197
},

iteration.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -688,7 +688,7 @@
688688
"name": "python",
689689
"nbconvert_exporter": "python",
690690
"pygments_lexer": "ipython3",
691-
"version": "3.9.12"
691+
"version": "3.10.12"
692692
},
693693
"toc-showtags": true
694694
},

0 commit comments

Comments
 (0)