The data accessible via this interface was kindly provided by the BC Safety Authority, under the conditions of the disclaimer below, in association with their lecture in the BC Data Colloquium.
The interface used to access this data is coded in Python using a Jupyter Notebook available through https://ubc.syzygy.ca/ and https://sfu.syzygy.ca. To learn more about Jupyter notebooks, check out the Jupyter documentation; for an introduction to Jupyter notebooks and Python for data science, see this description and the associated mini-tutorials:
While Python encodes the download script, it is not the only option for exploring the data — Jupyter notebooks allow the user to load different kernels. Once the data is downloaded to a user's Syzygy account, it is possible to load the data into a Jupyter notebook running Python, R or Julia. See the above tutorials for instructions on switching kernels.
There are many resources for learning Python available freely online. For a tutorial on using Python, see for example [1], [2] or [3]. To brush up on rusty coding habits, check out projects like Project Euclid or Advent of Code.
-
Log in to https://ubc.syzygy.ca or https://sfu.syzygy.ca (respectively) using your CWL.
-
Press the "Start My Server" button.
-
Open a new Terminal by clicking
New
>Terminal
. -
In the terminal, navigate to a directory of your choice, then input
git clone https://github.com/bcdataca/bcsa-bcdata.git cd bcsa-bcdata pip install --user --upgrade boto3 botocore
-
Close the terminal window and in the main Jupyter window, navigate to the directory above.
-
Once inside the
bcsa-bcdata
directory, click to openPull and Import BCSA Data.ipynb
. -
Click on the
Kernel
menu item and clickRun All
. -
Verify that the download was successful by checking:
-
there are no error messages in the notebook
-
the last cell contains the output
{'2015': (6305, 34), '2016': (8172, 34)} {'incident': (6390, 143)}
-
The files, by default, are stored in the directory bcsa-data/tmp/
. To change
this behaviour, you can change the value of the target
variable to a
different directory.
To get familiar with accessing data using boto3
, play around with the data and notebooks accessible via the above instructions. To access the data and reference material for the image classification project, please refer to the documents inside the workshop-info folder.
The following is a disclaimer on the use of the data.
This information was provided expressly to those in attendance at the BC Data colloquium. This information is confidential and may not be disclosed without the prior written consent of BC Safety Authority.