|
120 | 120 | "\n", |
121 | 121 | "Another possibility is that your big data problem is actually a large number of small data problems in disguise. Each individual problem might fit in memory, but you have millions of them. For example, you might want to fit a model to each person in your dataset. This would be trivial if you had just 10 or 100 people, but instead you have a million. Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like [Hadoop](https://hadoop.apache.org/) or [Spark](https://spark.apache.org/)) that allows you to send different datasets to different computers for processing. Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **pyspark** to solve it for the full dataset.\n", |
122 | 122 | "\n", |
123 | | - "### R, Julia, and friends\n", |
| 123 | + "### Julia and R\n", |
124 | 124 | "\n", |
125 | | - "In this book, you won't learn anything about R, Julia, or any other programming language useful for data science. This isn't because we think these tools are bad. They're not! And in practice, most data science teams use a mix of languages. However, you may find it easier to learn one set of tools at a time. In this book you'll see what we think of as the three critical tools for data science:\n", |
| 125 | + "In this book, you won't learn anything about R or Julia, which are both sometimes used for data science. This isn't because we think these tools are bad. They're not! In this book you'll see what we think of as the three critical tools for data science:\n", |
126 | 126 | "\n", |
127 | 127 | "- Python\n", |
128 | 128 | "- SQL\n", |
129 | 129 | "- command line scripting\n", |
130 | 130 | "\n", |
131 | | - "This book predominantly uses Python, which is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n", |
| 131 | + "These are the three languages that will get you a job as a data scientist, and that's a very good reason to focus on them. We'll spend most of our time with Python, and for good reason. Python is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n", |
132 | 132 | "\n", |
133 | 133 | "We think Python is a great place to start your data science journey because it is the most popular tool for data science and programming more generally, with a large community behind it.\n", |
134 | 134 | "\n", |
|
191 | 191 | "name": "python", |
192 | 192 | "nbconvert_exporter": "python", |
193 | 193 | "pygments_lexer": "ipython3", |
194 | | - "version": "3.9.12" |
| 194 | + "version": "3.10.12" |
195 | 195 | }, |
196 | 196 | "toc-showtags": true |
197 | 197 | }, |
|
0 commit comments