Skip to content

Latest commit

 

History

History
73 lines (69 loc) · 6.3 KB

Software Engineering for Data Scientists in Python.md

File metadata and controls

73 lines (69 loc) · 6.3 KB

Software Engineering for Data Scientists in Python

Introduction

  • Three topics in particular that we'll cover are modularity, documentation, and testing.
  • Modularity implies dividing code into shorter functional units, which are more readable, maintainable and portable
    • We can write modular code in python by leveraging packages, classes, and methods
  • Documentation includes using comments, docstrings, and self-documenting code to document your Data Science python projects
  • Testing can both be manual and automated:
    • It's definitely worthwhile to perform manual tests
    • But leveraging tools like the pytest package can automatically run and re-run your tests to ensure your code is working as intended even after adding new functionality
  • 'Python Package Index' (PyPi) gives us an easy platform to leverage published packages
  • Thanks to packages being modular, we can easily install them from PyPi using a tool called pip
  • pip is a recursive acronym that stands for 'Pip Installs Packages', and it does just that
  • To read documentation of packages/dtypes, use help(object_name)

Conventions and PEP 8

  • PEP 8 is the defacto Style Guide for Python Code
  • It lets us know how to format our code to be as readable as possible, and to quote PEP 8, 'code is read much more often than it is written'
  • To ensure your code keeps up with PEP 8, you can use:
    • an IDE that flags violations of bad lines of code
    • pycodestyle package - checks code in multiple files at once and it outputs descriptions of the violations along with information to let you know exactly where you need to go to fix the issue
      • image
  • Using pycodestyle in editor:
    • image

Writing a Python Module

Writing Python Packages

  • A minimal python package consists of 2 elements: a directory and a python file
  • The name of the directory will be the name of the package, but how should you name it?
    • PEP 8 states that packages should have short, all-lowercase names
    • The use of underscores in a package name is discouraged, but you can and should use them if it improves readability
    • It's ideal to pick a name that conveys the functionality of the package
  • The file in our newly branded directory doesn't have any flexibility in naming
  • We must name it underscore underscore init underscore underscore dot py (__init__.py)
  • This file lets Python know that the directory we created is a package
  • With this structure we've created a package that we can import just like we would import numpy or any other package
  • To import a local package, we need to establish it's path:
    • image
  • To import (if package and your script are in same directory):
    • image
  • To add functionality to the package, we start by adding a .py file in package directory:
    • image
  • To add the functionality and access it:
    • image
  • Alternative to access the functionality:
    • image
  • To extend package structure:
    • image
  • You can also extend package structure by building packages inside your package (subpackages):
    • image

Making your package portable

  • Now that you have a functional package you might want to share it with your colleagues
  • The two main steps to sharing a python package are creating setup.py and requirements.txt
  • These two pieces provide information on how to install your package and recreate its required environment
  • These files list information about what dependencies you've used as well as allowing you to describe your package with additional metadata
    • image
  • The contents of requirements.txt:
    • image
  • This installs all the packages listed with respect the correct version
  • Note that we didn't actually install our package, we just recreated its environment
  • The contents of setup.py:
    • image
  • Some less obvious arguments in our example are install_requires and packages
  • packages in essence lists the location of all the init files in our package. Our package has a single init file and it's in the directory 'my_package'
  • More complex packages might include subpackages with their own init files, if this was the case we would also list their locations here
  • Until you start writing more complex packages, the contents of the packages list will likely be the same as the name argument
  • install_requires might look familiar, in the case of our package, the contents are the same as our requirements file
  • There are cases where install_requires may differ from requirements.txt:
    • image
  • Now that we've completed our setup.py, we can install our package using pip install . from inside the same directory as our package
  • This will install our package at an environment level so we can import it into any python script using the same environment.
    • image