diff --git a/_dsa/_config.yml b/_dsa/_config.yml new file mode 100755 index 0000000..529d800 --- /dev/null +++ b/_dsa/_config.yml @@ -0,0 +1,27 @@ +author: +- given: Neil D. + family: Lawrence + institution: University of Cambridge + gscholar: r3SJcvoAAAAJ + twitter: lawrennd + orcid: 0000-0001-9258-1030 + url: http://inverseprobability.com +layout: lecture +venue: Virtual (Zoom) +ipynb: True +talkcss: https://inverseprobability.com/assets/css/talks.css +postsdir: ../../../mlatcl/dsa/_lectures/ +slidesdir: ../../../mlatcl/dsa/slides/ +notesdir: ../../../mlatcl/dsa/_notes/ +notebooksdir: ../../../mlatcl/dsa/_notebooks/ +writediagramsdir: . +diagramsdir: ./slides/diagrams/ +baseurl: "dsa/" # the subpath of your site, e.g. /blog/ +url: "https://mlatcl.github.io/" # the base hostname & protocol for your site +transition: None +ghub: +- organization: lawrennd + repository: talks + branch: gh-pages + directory: _dsa + diff --git a/_dsa/bayesian-methods-abuja.md b/_dsa/bayesian-methods-abuja.md new file mode 100755 index 0000000..1e4cbe3 --- /dev/null +++ b/_dsa/bayesian-methods-abuja.md @@ -0,0 +1,53 @@ +--- +session: 3 +title: "Bayesian Methods" +subtitle: Probabilistic Machine Learning +abstract: > + In this session we review the *probabilistic* approach to machine + learning. We start with a review of probability, and introduce the + concepts of probabilistic modelling. We then apply the approach in + practice to Naive Bayesian classification. + + In this session we review the probabilistic formulation of a + classification model, reviewing initially maximum likelihood and + the naive Bayes model. +author: +- family: Lawrence + given: Neil D. + gscholar: r3SJcvoAAAAJ + institute: Amazon Cambridge and University of Sheffield + twitter: lawrennd + url: http://inverseprobability.com +- family: Koyejo + given: Oluwasanmi + institute: Google and University of Illinois + url: https://sanmi.cs.illinois.edu/ + gscholar: EaaOeJwAAAAJ +date: 2018-11-14 +venue: DSA, Abuja +transition: None +--- + +\include{talk-macros.tex} + +\include{_ml/includes/what-is-ml.md} +\include{_ml/includes/nigeria-nmis-data.md} +\include{_ml/includes/probability-intro.md} +\include{_ml/includes/probabilistic-modelling.md} + +\include{_ml/includes/graphical-models.md} +\include{_ml/includes/classification-intro.md} +\include{_ml/includes/classification-examples.md} +\include{_ml/includes/bayesian-reminder.md} +\include{_ml/includes/bernoulli-distribution.md} +\include{_ml/includes/bernoulli-maximum-likelihood.md} +\include{_ml/includes/bayes-rule-reminder.md} +\include{_ml/includes/naive-bayes.md} + +\subsection{Other Reading} + +* Chapter 5 of @Rogers:book11 up to pg 179 (Section 5.1, and 5.2 up to 5.2.2). + +\references + +\thanks diff --git a/_dsa/gaussian-processes.md b/_dsa/gaussian-processes.md new file mode 100755 index 0000000..76c7338 --- /dev/null +++ b/_dsa/gaussian-processes.md @@ -0,0 +1,95 @@ +--- +session: 4 +title: Gaussian Processes +abstract: > + Classical machine learning and statistical approaches to learning, such as neural networks and linear regression, assume a parametric form for functions. Gaussian process models are an alternative approach that assumes a probabilistic prior over functions. This brings benefits, in that uncertainty of function estimation is sustained throughout inference, and some challenges: algorithms for fitting Gaussian processes tend to be more complex than parametric models. + + In this sessions I will introduce Gaussian processes and explain why sustaining uncertainty is important. +date: 2020-11-13 +venue: Virtual Data Science Nigeria +time: "15:00 (West Africa Standard Time)" +transition: None +--- + +\include{talk-macros.tex} +\include{_mlai/includes/mlai-notebook-setup.md} + +\include{_gp/includes/gp-book.md} +\include{_ml/includes/first-course-book.md} + + +\include{_health/includes/malaria-gp.md} +\include{_ml/includes/what-is-ml.md} +\include{_ml/includes/overdetermined-inaugural.md} +\include{_ml/includes/univariate-gaussian-properties.md} + + +\include{_ml/includes/multivariate-gaussian-properties.md} +\notes{\include{_ml/includes/linear-regression-log-likelihood.md} +\include{_ml/includes/olympic-marathon-linear-regression.md} +\include{_ml/includes/linear-regression-multivariate-log-likelihood.md} +\define{designVector}{\basisVector} +\define{designVariable}{Phi} +\define{designMatrix}{\basisMatrix} +\include{_ml/includes/linear-regression-direct-solution.md}} +\include{_ml/includes/linear-regression-objective-optimisation.md} +\include{_ml/includes/movie-body-count-linear-regression.md} + +\include{_ml/includes/underdetermined-system.md} +\include{_ml/includes/two-d-gaussian.md} + +\include{_ml/includes/basis-functions-nn.md} +\include{_ml/includes/relu-basis.md} + +\subsection{Gaussian Processes} +\slides{ +* Basis function models give non-linear predictions. +* Need to choose number and location of basis functions. +* Gaussian processes is a general framework (basis functions special case) +* Within the framework you can consider models with infinite basis functions. +} +\notes{Models where we model the entire joint distribution of our training data, $p(\dataVector, \inputMatrix)$ are sometimes described as *generative models*. Because we can use sampling to generate data sets that represent all our assumptions. However, as we discussed in the sessions on \refnotes{logistic regression}{logistic-regression} and \refnotes{naive Bayes}{naive-bayes}, this can be a bad idea, because if our assumptions are wrong then we can make poor predictions. We can try to make more complex assumptions about data to alleviate the problem, but then this typically leads to challenges for tractable application of the sum and rules of probability that are needed to compute the relevant marginal and conditional densities. If we know the form of the question we wish to answer then we typically try and represent that directly, through $p(\dataVector|\inputMatrix)$. In practice, we also have been making assumptions of conditional independence given the model parameters,} +$$ +p(\dataVector|\inputMatrix, \mappingVector) = +\prod_{i=1}^{\numData} p(\dataScalar_i | \inputVector_i, \mappingVector) +$$ +\notes{Gaussian processes are *not* normally considered to be *generative models*, but we will be much more interested in the principles of conditioning in Gaussian processes because we will use conditioning to make predictions between our test and training data. We will avoid the data conditional indpendence assumption in favour of a richer assumption about the data, in a Gaussian process we assume data is *jointly Gaussian* with a particular mean and covariance,} +$$ +\dataVector|\inputMatrix \sim \gaussianSamp{\mathbf{m}(\inputMatrix)}{\kernelMatrix(\inputMatrix)}, +$$ +\notes{where the conditioning is on the inputs $\inputMatrix$ which are used for computing the mean and covariance. For this reason they are known as mean and covariance functions.} + + + +\include{_ml/includes/linear-model-overview.md} + +\include{_ml/includes/radial-basis.md} + +\include{_gp/includes/gp-from-basis-functions.md} + +\include{_gp/includes/non-degenerate-gps.md} +\include{_gp/includes/gp-function-space.md} + +\include{_gp/includes/gptwopointpred.md} + +\include{_gp/includes/gp-covariance-function-importance.md} +\include{_gp/includes/gp-numerics-and-optimization.md} + +\include{_gp/includes/gp-optimize.md} +\include{_kern/includes/eq-covariance.md} +\include{_gp/includes/gp-summer-school.md} +\include{_gp/includes/gpy-software.md} +\include{_gp/includes/gpy-tutorial.md} + +\subsection{Review} + +\include{_gp/includes/other-gp-software.md} + +\reading + +\thanks + +\references + + + diff --git a/_dsa/ml-systems-kimberley.md b/_dsa/ml-systems-kimberley.md new file mode 100644 index 0000000..bb268e3 --- /dev/null +++ b/_dsa/ml-systems-kimberley.md @@ -0,0 +1,43 @@ +--- +title: "Introduction to Machine Learning Systems" +abstract: "This session introduces some of the challenges of building machine learning data systems. It will introduce you to concepts around joining of databases together. The storage and manipulation of data is at the core of machine learning systems and data science. The goal of this notebook is to introduce the reader to these concepts, not to authoritatively answer any questions about the state of Nigerian health facilities or Covid19, but it may give you ideas about how to try and do that in your own country." +author: +- given: Eric + family: Meissner + url: https://www.linkedin.com/in/meissnereric/ + twitter: meissner_eric_7 +- given: Andrei + family: Paleyes + url: https://www.linkedin.com/in/andreipaleyes/ +- given: Neil D. + family: Lawrence + twitter: lawrennd + url: http://inverseprobability.com +date: 2021-10-06 +ipynb: true +venue: Virtual DSA, Kimberley +transition: None +--- + + +\slides{\section{AI via ML Systems} + +\include{_ai/includes/supply-chain-system.md} +\include{_ai/includes/aws-soa.md} +\include{_ai/includes/dsa-systems.md} +} + +\notes{ +\include{_systems/includes/nigeria-health-intro.md} +\include{_systems/includes/nigeria-nmis-installs.md} +\include{_systems/includes/databases-and-joins.md} +\include{_systems/includes/nigeria-nmis-data-systems.md} +\include{_systems/includes/nigeria-nmis-spatial-join.md} +\define{databaseType}{sqlite} +\include{_systems/includes/nigeria-nmis-sql.md} +\include{_systems/includes/nigeria-nmis-covid-join.md} +} + +\thanks + +\references diff --git a/_dsa/ml-systems.md b/_dsa/ml-systems.md new file mode 100755 index 0000000..83d09cd --- /dev/null +++ b/_dsa/ml-systems.md @@ -0,0 +1,45 @@ +--- +session: 2 +title: "Introduction to Machine Learning Systems" +abstract: "This notebook introduces some of the challenges of building machine learning data systems. It will introduce you to concepts around joining of databases together. The storage and manipulation of data is at the core of machine learning systems and data science. The goal of this notebook is to introduce the reader to these concepts, not to authoritatively answer any questions about the state of Nigerian health facilities or Covid19, but it may give you ideas about how to try and do that in your own country." +author: +- given: Eric + family: Meissner + url: https://www.linkedin.com/in/meissnereric/ + twitter: meissner_eric_7 +- given: Andrei + family: Paleyes + url: https://www.linkedin.com/in/andreipaleyes/ +- given: Neil D. + family: Lawrence + twitter: lawrennd + url: http://inverseprobability.com +date: 2020-07-24 +ipynb: true +venue: Virtual DSA +transition: None +--- + +\include{talk-macros.tex} + +\slides{\section{AI via ML Systems} + +\include{_ai/includes/supply-chain-system.md} +\include{_ai/includes/aws-soa.md} +\include{_ai/includes/dsa-systems.md} +} + +\notes{ +\include{_systems/includes/nigeria-health-intro.md} +\include{_systems/includes/nigeria-nmis-installs.md} +\include{_systems/includes/databases-and-joins.md} +\include{_systems/includes/nigeria-nmis-data-systems.md} +\include{_systems/includes/nigeria-nmis-spatial-join.md} +\define{databaseType}{sqlite} +\include{_systems/includes/nigeria-nmis-sql.md} +\include{_systems/includes/nigeria-nmis-covid-join.md} +} + +\thanks + +\references diff --git a/_dsa/probabilistic-machine-learning.md b/_dsa/probabilistic-machine-learning.md new file mode 100755 index 0000000..8c80d10 --- /dev/null +++ b/_dsa/probabilistic-machine-learning.md @@ -0,0 +1,52 @@ +--- +session: 3 +title: "Probabilistic Machine Learning" +abstract: > + In this session we review the *probabilistic* approach to machine + learning. We start with a review of probability, and introduce the + concepts of probabilistic modelling. We then apply the approach in + practice to Naive Bayesian classification. + + In this session we review the Bayesian formalism in the context of + linear models, reviewing initially maximum likelihood and + introducing basis functions as a way of driving non-linearity in the + model. +ipynb: True +reveal: True +author: +- family: Lawrence + given: Neil D. + gscholar: r3SJcvoAAAAJ + institute: Amazon Cambridge and University of Sheffield + twitter: lawrennd + url: http://inverseprobability.com +date: 2018-11-16 +venue: DSA, Abuja +transition: None +--- + +%%%%%%%%%%%% LOCAL DATA %%%%%%%%%%%%%%%%%%%% +https://www.kaggle.com/alaowerre/nigeria-nmis-health-facility-data +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +\include{talk-macros.tex} + +\include{_ml/includes/what-is-ml.md} +\include{_ml/includes/probability-intro.md} +\include{_ml/includes/probabilistic-modelling.md} + +\include{_ml/includes/graphical-models.md} +\include{_ml/includes/classification-intro.md} +\include{_ml/includes/classification-examples.md} +\include{_ml/includes/bayesian-reminder.md} +\include{_ml/includes/bernoulli-distribution.md} +\include{_ml/includes/bernoulli-maximum-likelihood.md} +\include{_ml/includes/bayes-rule-reminder.md} +\include{_ml/includes/naive-bayes.md} + +### Other Reading + +* Chapter 5 of @Rogers:book11 up to pg 179 (Section 5.1, and 5.2 up to 5.2.2). + +### References diff --git a/_dsa/what-is-machine-learning-ashesi.md b/_dsa/what-is-machine-learning-ashesi.md new file mode 100755 index 0000000..be0f574 --- /dev/null +++ b/_dsa/what-is-machine-learning-ashesi.md @@ -0,0 +1,106 @@ +--- +layout: slides +title: What is Machine Learning? +venue: Data Science Africa Summer School, Ashesi, Ghana +author: +- given: Neil D. + family: Lawrence + url: http://inverseprobability.com + institute: University of Cambridge + twitter: lawrennd + gscholar: r3SJcvoAAAAJ + orchid: +abstract: > + In this talk we will introduce the fundamental ideas in machine learning. We'll develop our exposition around the ideas of prediction function and the objective function. We don't so much focus on the derivation of particular algorithms, but more the general principles involved to give an idea of the machine learning *landscape*. +date: 2019-10-21 +categories: +- notes +layout: talk +geometry: ["a4paper", "margin=2cm"] +papersize: a4paper +transition: None +--- + +\include{../talk-macros.tex} + +\section{Introduction} + +\include{_data-science/includes/data-science-africa.md} +\include{_health/includes/malaria-gp.md} + +\subsection{Machine Learning} +\notes{This talk is a general introduction to machine learning, we will highlight the technical challenges and the current solutions. We will give an overview of what is machine learning and why it is important.} + +\subsection{Rise of Machine Learning} +\slides{ +* Driven by data and computation +* Fundamentally dependent on models +}\notes{Machine learning is the combination of data and models, through computation, to make predictions.} +$$ +\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction} +$$ + +\subsection{Data Revolution} + +\notes{Machine learning has risen in prominence due to the rise in data availability, and its interconnection with computers. The high bandwidth connection between data and computer leads to a new interaction between us and data via the computer. It is that channel that is being mediated by machine learning techniques.} +\figure{\includediagram{\diagramsDir/data-science/new-flow-of-information}{60%}}{Large amounts of data and high interconnection bandwidth mean that we receive much of our information about the world around us through computers.}{data-science-information-flow} + +\include{_supply-chain/includes/supply-chain-africa.md} +\include{_ml/includes/process-emulation.md} +\include{_ml/includes/nigeria-nmis-data.md} +\include{_ml/includes/what-does-machine-learning-do.md} +\include{_ml/includes/what-is-ml-2.md} +\include{_ai/includes/ai-vs-data-science-2.md} +\include{_ml/includes/neural-networks.md} + +\subsection{Machine Learning} +\slides{ +1. observe a system in practice +2. emulate its behavior with mathematics. + +* Design challenge: where to put mathematical function. +* Where it's placed leads to different ML domains. +}\notes{The key idea in machine learning is to observe the system in practice, and then emulate its behavior with mathematics. That leads to a design challenge as to where to place the mathematical function. The placement of the mathematical function leads to the different domains of machine learning.} + +\newslide{Types of Machine Learning} + +1. Supervised learning +2. Unsupervised learning +3. Reinforcement learning + +\newslide{Types of Machine Learning} +\slides{ +1. Supervised learning +2. Unsupervised learning +3. Reinforcement learning +} + + +\include{_ml/includes/supervised-learning-intro.md} + +\include{_ml/includes/classification-intro.md} +\include{_ml/includes/classification-examples.md} +\include{_ml/includes/the-perceptron.md} +\notes{\include{_ml/includes/logistic-regression.md} +\include{_ml/includes/nigeria-nmis-data-logistic.md}} +\include{_ml/includes/regression-intro.md} +\include{_ml/includes/regression-examples.md} +\include{_ml/includes/olympic-marathon-polynomial.md} + +\include{_ml/includes/supervised-learning-challenges.md} + + +\notes{ +\include{_ml/includes/unsupervised-learning.md} +\include{_ml/includes/reinforcement-learning.md} + +\notes{We have introduced a range of machine learning approaches by focusing on their use of mathematical functions to replace manually coded systems of rules. The important characteristic of machine learning is that the form of these functions, as dictated by their parameters, is determined by acquiring data from the real world.} + + +\include{_ml/includes/deployment.md}} + +\reading + +\thanks + +\references diff --git a/_dsa/what-is-machine-learning.md b/_dsa/what-is-machine-learning.md new file mode 100755 index 0000000..9d5eb49 --- /dev/null +++ b/_dsa/what-is-machine-learning.md @@ -0,0 +1,97 @@ +--- +session: 1 +title: What is Machine Learning? +venue: Data Science Africa Summer School, Addis Ababa, Ethiopia +author: +- given: Neil D. + family: Lawrence + url: http://inverseprobability.com + institute: Amazon Cambridge and University of Sheffield + twitter: lawrennd + gscholar: r3SJcvoAAAAJ + orchid: +abstract: > + In this talk we will introduce the fundamental ideas in machine learning. We'll develop our exposition around the ideas of prediction function and the objective function. We don't so much focus on the derivation of particular algorithms, but more the general principles involved to give an idea of the machine learning *landscape*. +date: 2019-06-03 +categories: +- notes +geometry: ["a4paper", "margin=2cm"] +papersize: a4paper +transition: None +--- + +\include{../talk-macros.gpp} + +\section{Introduction} + +\include{_data-science/includes/data-science-africa.md} +\include{_health/includes/malaria-gp.md} + +\subsection{Machine Learning} +\notes{This talk is a general introduction to machine learning, we will highlight the technical challenges and the current solutions. We will give an overview of what is machine learning and why it is important.} + +\subsection{Rise of Machine Learning} +\slides{ +* Driven by data and computation +* Fundamentally dependent on models +}\notes{Machine learning is the combination of data and models, through computation, to make predictions.} +$$ +\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction} +$$ + +\subsection{Data Revolution} + +\notes{Machine learning has risen in prominence due to the rise in data availability, and its interconnection with computers. The high bandwidth connection between data and computer leads to a new interaction between us and data via the computer. It is that channel that is being mediated by machine learning techniques.} +\figure{\includediagram{\diagramsDir/data-science/new-flow-of-information}{60%}}{Large amounts of data and high interconnection bandwidth mean that we receive much of our information about the world around us through computers.}{data-science-information-flow} + +\include{_supply-chain/includes/supply-chain-africa.md} +\include{_ml/includes/process-emulation.md} + +\newslide{Kapchorwa District} + +\figure{\includediagramclass{\diagramsDir/health/Kapchorwa_District_in_Uganda}{50%}}{The Kapchorwa District, home district of Stephen Kiprotich.}{kapchorwa-district-in-uganda} + +\notes{Stephen Kiprotich, the 2012 gold medal winner from the London Olympics, comes from Kapchorwa district, in eastern Uganda, near the border with Kenya.} + +\include{_ml/includes/olympic-marathon-polynomial.md} +\include{../_ml/includes/what-does-machine-learning-do.md} + +\include{_ml/includes/what-is-ml-2.md} +\include{_ai/includes/ai-vs-data-science-2.md} +\include{_ml/includes/neural-networks.md} + +\subsection{Machine Learning} +\slides{ +1. observe a system in practice +2. emulate its behavior with mathematics. + +* Design challenge: where to put mathematical function. +* Where it's placed leads to different ML domains. +}\notes{The key idea in machine learning is to observe the system in practice, and then emulate its behavior with mathematics. That leads to a design challenge as to where to place the mathematical function. The placement of the mathematical function leads to the different domains of machine learning.} + +\newslide{Types of Machine Learning} + +1. Supervised learning +2. Unsupervised learning +3. Reinforcement learning + +\newslide{Types of Machine Learning} +\slides{ +1. Supervised learning +2. Unsupervised learning +3. Reinforcement learning +} +\include{_ml/includes/supervised-learning.md} + +\notes{ +\include{_ml/includes/unsupervised-learning.md} +\include{_ml/includes/reinforcement-learning.md} + +\notes{We have introduced a range of machine learning approaches by focusing on their use of mathematical functions to replace manually coded systems of rules. The important characteristic of machine learning is that the form of these functions, as dictated by their parameters, is determined by acquiring data from the real world.} + + +\include{_ml/includes/deployment.md}} + +\thanks + +\references