Merge pull request #1 from mlatcl/filter-target

Filter target
mlatcl · Nov 15, 2023 · 8494049 · 8494049
2 parents 7787a5f + bac3cac
commit 8494049
Show file tree

Hide file tree

Showing 8 changed files with 518 additions and 0 deletions.
diff --git a/_dsa/_config.yml b/_dsa/_config.yml
@@ -0,0 +1,27 @@
+author:
+- given: Neil D.
+  family: Lawrence
+  institution: University of Cambridge
+  gscholar: r3SJcvoAAAAJ
+  twitter: lawrennd
+  orcid: 0000-0001-9258-1030
+  url: http://inverseprobability.com
+layout: lecture
+venue: Virtual (Zoom)
+ipynb: True
+talkcss: https://inverseprobability.com/assets/css/talks.css
+postsdir: ../../../mlatcl/dsa/_lectures/
+slidesdir: ../../../mlatcl/dsa/slides/
+notesdir: ../../../mlatcl/dsa/_notes/
+notebooksdir: ../../../mlatcl/dsa/_notebooks/
+writediagramsdir: .
+diagramsdir: ./slides/diagrams/
+baseurl: "dsa/" # the subpath of your site, e.g. /blog/
+url: "https://mlatcl.github.io/" # the base hostname & protocol for your site
+transition: None
+ghub:
+- organization: lawrennd
+  repository: talks
+  branch: gh-pages
+  directory: _dsa
+
diff --git a/_dsa/bayesian-methods-abuja.md b/_dsa/bayesian-methods-abuja.md
@@ -0,0 +1,53 @@
+---
+session: 3
+title: "Bayesian Methods"
+subtitle: Probabilistic Machine Learning
+abstract: >
+  In this session we review the *probabilistic* approach to machine
+  learning. We start with a review of probability, and introduce the
+  concepts of probabilistic modelling. We then apply the approach in
+  practice to Naive Bayesian classification.
+
+  In this session we review the probabilistic formulation of a
+  classification model, reviewing initially maximum likelihood and
+  the naive Bayes model.
+author:
+- family: Lawrence
+  given: Neil D.
+  gscholar: r3SJcvoAAAAJ
+  institute: Amazon Cambridge and University of Sheffield
+  twitter: lawrennd
+  url: http://inverseprobability.com
+- family: Koyejo
+  given: Oluwasanmi
+  institute: Google and University of Illinois
+  url: https://sanmi.cs.illinois.edu/
+  gscholar: EaaOeJwAAAAJ
+date: 2018-11-14
+venue: DSA, Abuja
+transition: None
+---
+
+\include{talk-macros.tex}
+
+\include{_ml/includes/what-is-ml.md}
+\include{_ml/includes/nigeria-nmis-data.md}
+\include{_ml/includes/probability-intro.md}
+\include{_ml/includes/probabilistic-modelling.md}
+
+\include{_ml/includes/graphical-models.md}
+\include{_ml/includes/classification-intro.md}
+\include{_ml/includes/classification-examples.md}
+\include{_ml/includes/bayesian-reminder.md}
+\include{_ml/includes/bernoulli-distribution.md}
+\include{_ml/includes/bernoulli-maximum-likelihood.md}
+\include{_ml/includes/bayes-rule-reminder.md}
+\include{_ml/includes/naive-bayes.md}
+
+\subsection{Other Reading}
+
+* Chapter 5 of @Rogers:book11 up to pg 179 (Section 5.1, and 5.2 up to 5.2.2).
+
+\references
+
+\thanks
diff --git a/_dsa/gaussian-processes.md b/_dsa/gaussian-processes.md
@@ -0,0 +1,95 @@
+---
+session: 4
+title: Gaussian Processes
+abstract: >
+  Classical machine learning and statistical approaches to learning, such as neural networks and linear regression, assume a parametric form for functions. Gaussian process models are an alternative approach that assumes a probabilistic prior over functions. This brings benefits, in that uncertainty of function estimation is sustained throughout inference, and some challenges: algorithms for fitting Gaussian processes tend to be more complex than parametric models. 
+  
+  In this sessions I will introduce Gaussian processes and explain why sustaining uncertainty is important. 
+date: 2020-11-13
+venue: Virtual Data Science Nigeria
+time: "15:00 (West Africa Standard Time)"
+transition: None
+---
+
+\include{talk-macros.tex}
+\include{_mlai/includes/mlai-notebook-setup.md}
+
+\include{_gp/includes/gp-book.md}
+\include{_ml/includes/first-course-book.md}
+<!--include{_gp/includes/what-is-a-gp.md}-->
+
+\include{_health/includes/malaria-gp.md}
+\include{_ml/includes/what-is-ml.md}
+\include{_ml/includes/overdetermined-inaugural.md}
+\include{_ml/includes/univariate-gaussian-properties.md}
+
+
+\include{_ml/includes/multivariate-gaussian-properties.md}
+\notes{\include{_ml/includes/linear-regression-log-likelihood.md}
+\include{_ml/includes/olympic-marathon-linear-regression.md}
+\include{_ml/includes/linear-regression-multivariate-log-likelihood.md}
+\define{designVector}{\basisVector}
+\define{designVariable}{Phi}
+\define{designMatrix}{\basisMatrix}
+\include{_ml/includes/linear-regression-direct-solution.md}}
+\include{_ml/includes/linear-regression-objective-optimisation.md}
+\include{_ml/includes/movie-body-count-linear-regression.md}
+
+\include{_ml/includes/underdetermined-system.md}
+\include{_ml/includes/two-d-gaussian.md}
+
+\include{_ml/includes/basis-functions-nn.md}
+\include{_ml/includes/relu-basis.md}
+
+\subsection{Gaussian Processes}
+\slides{
+* Basis function models give non-linear predictions.
+* Need to choose number and location of basis functions. 
+* Gaussian processes is a general framework (basis functions special case)
+* Within the framework you can consider models with infinite basis functions.
+}
+\notes{Models where we model the entire joint distribution of our training data, $p(\dataVector, \inputMatrix)$ are sometimes described as *generative models*. Because we can use sampling to generate data sets that represent all our assumptions. However, as we discussed in the sessions on \refnotes{logistic regression}{logistic-regression} and \refnotes{naive Bayes}{naive-bayes}, this can be a bad idea, because if our assumptions are wrong then we can make poor predictions. We can try to make more complex assumptions about data to alleviate the problem, but then this typically leads to challenges for tractable application of the sum and rules of probability that are needed to compute the relevant marginal and conditional densities. If we know the form of the question we wish to answer then we typically try and represent that directly, through $p(\dataVector|\inputMatrix)$.  In practice, we also have been making assumptions of conditional independence given the model parameters,}
+$$
+p(\dataVector|\inputMatrix, \mappingVector) =
+\prod_{i=1}^{\numData} p(\dataScalar_i | \inputVector_i, \mappingVector)
+$$
+\notes{Gaussian processes are *not* normally considered to be *generative models*, but we will be much more interested in the principles of conditioning in Gaussian processes because we will use conditioning to make predictions between our test and training data. We will avoid the data conditional indpendence assumption in favour of a richer assumption about the data, in a Gaussian process we assume data is *jointly Gaussian* with a particular mean and covariance,}
+$$
+\dataVector|\inputMatrix \sim \gaussianSamp{\mathbf{m}(\inputMatrix)}{\kernelMatrix(\inputMatrix)},
+$$
+\notes{where the conditioning is on the inputs $\inputMatrix$ which are used for computing the mean and covariance. For this reason they are known as mean and covariance functions.}
+
+
+
+\include{_ml/includes/linear-model-overview.md}
+
+\include{_ml/includes/radial-basis.md}
+
+\include{_gp/includes/gp-from-basis-functions.md}
+
+\include{_gp/includes/non-degenerate-gps.md}
+\include{_gp/includes/gp-function-space.md}
+
+\include{_gp/includes/gptwopointpred.md}
+
+\include{_gp/includes/gp-covariance-function-importance.md}
+\include{_gp/includes/gp-numerics-and-optimization.md}
+
+\include{_gp/includes/gp-optimize.md}
+\include{_kern/includes/eq-covariance.md}
+\include{_gp/includes/gp-summer-school.md}
+\include{_gp/includes/gpy-software.md}
+\include{_gp/includes/gpy-tutorial.md}
+
+\subsection{Review}
+
+\include{_gp/includes/other-gp-software.md}
+
+\reading
+
+\thanks
+
+\references
+
+
+
diff --git a/_dsa/ml-systems-kimberley.md b/_dsa/ml-systems-kimberley.md
@@ -0,0 +1,43 @@
+---
+title: "Introduction to Machine Learning Systems"
+abstract: "This session introduces some of the challenges of building machine learning data systems. It will introduce you to concepts around joining of databases together. The storage and manipulation of data is at the core of machine learning systems and data science. The goal of this notebook is to introduce the reader to these concepts, not to authoritatively answer any questions about the state of Nigerian health facilities or Covid19, but it may give you ideas about how to try and do that in your own country."
+author:
+- given: Eric
+  family: Meissner
+  url: https://www.linkedin.com/in/meissnereric/
+  twitter: meissner_eric_7 
+- given: Andrei
+  family: Paleyes
+  url: https://www.linkedin.com/in/andreipaleyes/
+- given: Neil D.
+  family: Lawrence
+  twitter: lawrennd
+  url: http://inverseprobability.com
+date: 2021-10-06
+ipynb: true
+venue: Virtual DSA, Kimberley
+transition: None
+---
+
+
+\slides{\section{AI via ML Systems}
+
+\include{_ai/includes/supply-chain-system.md}
+\include{_ai/includes/aws-soa.md}
+\include{_ai/includes/dsa-systems.md}
+}
+
+\notes{
+\include{_systems/includes/nigeria-health-intro.md}
+\include{_systems/includes/nigeria-nmis-installs.md}
+\include{_systems/includes/databases-and-joins.md}
+\include{_systems/includes/nigeria-nmis-data-systems.md}
+\include{_systems/includes/nigeria-nmis-spatial-join.md}
+\define{databaseType}{sqlite}
+\include{_systems/includes/nigeria-nmis-sql.md}
+\include{_systems/includes/nigeria-nmis-covid-join.md}
+}
+
+\thanks
+
+\references
diff --git a/_dsa/ml-systems.md b/_dsa/ml-systems.md
@@ -0,0 +1,45 @@
+---
+session: 2
+title: "Introduction to Machine Learning Systems"
+abstract: "This notebook introduces some of the challenges of building machine learning data systems. It will introduce you to concepts around joining of databases together. The storage and manipulation of data is at the core of machine learning systems and data science. The goal of this notebook is to introduce the reader to these concepts, not to authoritatively answer any questions about the state of Nigerian health facilities or Covid19, but it may give you ideas about how to try and do that in your own country."
+author:
+- given: Eric
+  family: Meissner
+  url: https://www.linkedin.com/in/meissnereric/
+  twitter: meissner_eric_7 
+- given: Andrei
+  family: Paleyes
+  url: https://www.linkedin.com/in/andreipaleyes/
+- given: Neil D.
+  family: Lawrence
+  twitter: lawrennd
+  url: http://inverseprobability.com
+date: 2020-07-24
+ipynb: true
+venue: Virtual DSA
+transition: None
+---
+
+\include{talk-macros.tex}
+
+\slides{\section{AI via ML Systems}
+
+\include{_ai/includes/supply-chain-system.md}
+\include{_ai/includes/aws-soa.md}
+\include{_ai/includes/dsa-systems.md}
+}
+
+\notes{
+\include{_systems/includes/nigeria-health-intro.md}
+\include{_systems/includes/nigeria-nmis-installs.md}
+\include{_systems/includes/databases-and-joins.md}
+\include{_systems/includes/nigeria-nmis-data-systems.md}
+\include{_systems/includes/nigeria-nmis-spatial-join.md}
+\define{databaseType}{sqlite}
+\include{_systems/includes/nigeria-nmis-sql.md}
+\include{_systems/includes/nigeria-nmis-covid-join.md}
+}
+
+\thanks
+
+\references
diff --git a/_dsa/probabilistic-machine-learning.md b/_dsa/probabilistic-machine-learning.md
@@ -0,0 +1,52 @@
+---
+session: 3
+title: "Probabilistic Machine Learning"
+abstract: >
+  In this session we review the *probabilistic* approach to machine
+  learning. We start with a review of probability, and introduce the
+  concepts of probabilistic modelling. We then apply the approach in
+  practice to Naive Bayesian classification.
+
+  In this session we review the Bayesian formalism in the context of
+  linear models, reviewing initially maximum likelihood and
+  introducing basis functions as a way of driving non-linearity in the
+  model.
+ipynb: True
+reveal: True
+author:
+- family: Lawrence
+  given: Neil D.
+  gscholar: r3SJcvoAAAAJ
+  institute: Amazon Cambridge and University of Sheffield
+  twitter: lawrennd
+  url: http://inverseprobability.com
+date: 2018-11-16
+venue: DSA, Abuja
+transition: None
+---
+
+%%%%%%%%%%%% LOCAL DATA %%%%%%%%%%%%%%%%%%%%
+https://www.kaggle.com/alaowerre/nigeria-nmis-health-facility-data
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\include{talk-macros.tex}
+
+\include{_ml/includes/what-is-ml.md}
+\include{_ml/includes/probability-intro.md}
+\include{_ml/includes/probabilistic-modelling.md}
+
+\include{_ml/includes/graphical-models.md}
+\include{_ml/includes/classification-intro.md}
+\include{_ml/includes/classification-examples.md}
+\include{_ml/includes/bayesian-reminder.md}
+\include{_ml/includes/bernoulli-distribution.md}
+\include{_ml/includes/bernoulli-maximum-likelihood.md}
+\include{_ml/includes/bayes-rule-reminder.md}
+\include{_ml/includes/naive-bayes.md}
+
+### Other Reading
+
+* Chapter 5 of @Rogers:book11 up to pg 179 (Section 5.1, and 5.2 up to 5.2.2).
+
+### References