Skip to content
/ GRA4157 Public

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management

Notifications You must be signed in to change notification settings

BI-DS/GRA4157

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRA4157

Mid-term grading

The maximum score was 12 points. One point was given per subtask. A total score between 0 and 100% was calculated and grades set on based on the grading scale below:

  • A = 92 - 100
  • B = 77 - 91
  • C = 58 - 76
  • D = 46 - 57
  • E = 40 - 45
  • F = 0 - 39

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management.

Exams

04-10-2024 - Mid-term exam (40%) 09:00 - 11:00. Room D3-141. Technical knowledge, concepts from programming with data.

07-11-2024 - The final exam (60%) is a written repor based on two group presentations (1 - 3 per group) during the semester.

Lectures

Lectures will be held each Friday 12-13:45 between August 23th and November 8th. You may contact me at [email protected].

Syllabus

https://rl.talis.com/3/binorway/lists/4D39CD33-F47E-E95D-1F5B-0511BBC9B6BF.html

Topics

Part 1

  • Basic Python lists, dictionaries and operations.
  • Reading from and writing to files, flexible solutions.
  • Numerical python with numpy, arrays, array slicing for vectorized computations.
  • Code standards, version control and code-collaboration.

Part 2

  • Working with the pandas library
  • Reading data from websites
  • Data visualisation

Part 3

  • Cleaning data, combining data sets
  • Machine learning workflows with scikit learn
  • Assess machine learning models based on various assumptions on data (outliers etc)

Preliminary lecture plan

For a given lecture, the reading gives an approximate overview of what is expected to be known after the lecture. I expect you to solve the exercises after the lecture. Each week, we start the lecture with a student presentation of a exercise of choice. Send an email to [email protected] to volunteer for an exercise. For exercises regarding pandas we refer to the w3resource (W3) https://www.w3resource.com/python-exercises/pandas/index-dataframe.php

Date Topic Reading Exercises Student presentation
Aug. 23 Course Introduction. Python recap, lists and dictionaries. Testing. Decorators. Sundnes: Chap 1,2,3 (and 7) Sundnes: 2.7, 2.8, 2.9, 2.15, 2.18, 3.3, 3.6, 3.17
Aug. 30 Reading and writing to file. User input. Exceptions. More on command line arguments Sundnes: Chap 5 Sundnes: 4.4, 4.9, 4.10, 4.12, 4.13, 4.17, 4.23 Yulin
Vera: 2.15
Sep. 06 Numerical Python and plotting Sundnes: Chap 6 Sundnes: 5.1, 5.2, 5.3, 5.4, 5.10, 5.12, 5.14, 5.28, 5.46, 5.54 Shan Xu: 4.4
Bohdan: 4.23
Sep. 13 Pandas McKinney: Chap 5 W3: DataFrames: 2.-22., 73 Yurou 5.2
Nhung: 5.46
Sep. 20 Web scraping KcKinney: Chap 6 W3: Pandas Performance: 1.-20. (select 5-10 exercises) + GitHub Exercies Note: Some changes were made to the exercises on 24. sept
Sep. 27 Github, Pipelines, Github actions Selena: 1
Ái Linh, Eirik: 2
Ilia: 3
Narges: 4
Johannes: 4
Oct. 1 Q & A Mid-term 08:00 - 09:45 Previous lectures Room C2-055
Oct. 04 Mid-term 09:00 - 11:00 Room D3-141
Oct. 11 Machine learning part 1 Project 1
Oct. 18 Group presentations Project 2
Oct. 25 Machine learning part 2
Nov. 01 Group presentations
Nov. 08 Final lecture

About

Course material for GRA 4157 - (Big) Data Curation, Pipelines, and Management

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published