Update MLAPA application

phrb · Mar 16, 2021 · 280d05e · 280d05e
1 parent 0b687b3
commit 280d05e
Showing 1 changed file with 309 additions and 1 deletion.
diff --git a/journal.org b/journal.org
@@ -35125,10 +35125,12 @@ gemv_mem = mem_gemv(gemv_size)
 gemv_peak_perf = max(gemv_perf / peak_perf_1core,
                      gemv_mem / peak_mem)
 
+fastest_time = gemv_perf / 1.160198
+
 gemv_df = data.frame(id = "GEMV",
                      peak_theoretical = gemv_peak_perf,
                      arithmetic_intensity = gemv_perf / gemv_mem,
-                     peak_achieved = 1.160198)
+                     peak_achieved = fastest_time)
 
 ggplot() +
     geom_line(data = roofline_df,
@@ -35182,3 +35184,309 @@ ggplot() +
 
 #+RESULTS:
 [[file:img/theoretical_roofline_xeonE52630v3.pdf]]
+
+*** [2021-03-15 Mon]
+**** Files for the Microsoft Latin America PhD Award Application
+***** Application to the Microsoft Latin America PhD Award (didn't pan out)
+****** Search Heuristics and Statistical Learning methods for Autotuning HPC Programs
+:PROPERTIES:
+:EXPORT_DATE:
+:EXPORT_TITLE: @@latex: Search Heuristics and Statistical Learning \\ Methods for Program Autotuning@@
+:EXPORT_FILE_NAME: application.pdf
+:EXPORT_AUTHOR: Pedro Bruel
+:END:
+
+#+latex: \vspace{-4em}
+
+High Performance Computing  has been a cornerstone of  collective scientific and
+industrial progress  for at least  five decades.   Paying the cost  of increased
+complexity,  software and  hardware  engineering advances  continue to  overcome
+several challenges on the way of the sustained performance improvements observed
+during the last  fifty years.  This mounting complexity means  that reaching the
+advertised hardware  performance for  a given program  requires not  only expert
+knowledge  of a  given hardware  architecture, but  also mastery  of programming
+models  and  languages for  parallel  and  distributed  computing.
+
+If we state performance optimization problems as /search/ or /learning/ problems, by
+converting implementation  and configuration  choices to /parameters/  which might
+affect  performance,  we  can  draw   and  adapt  proven  methods  from  search,
+mathematical  optimization and  statistics. The  effectiveness of  these adapted
+methods  on autotuning  problems varies  greatly,  and hinges  on practical  and
+mathematical properties of the problem and the corresponding /search space/.
+
+When  adapting methods  for autotuning,  we must  face challenges  emerging from
+practical properties  such as restricted  time and cost budgets,  constraints on
+feasible  parameter values,  and the  need to  mix /categorical/,  /continuous/, and
+/discrete/ parameters. To achieve useful results, we must also choose methods that
+make hypotheses compatible with problem search  spaces, such as the existence of
+discoverable,  or at  least  exploitable, relationships  between parameters  and
+performance.   Choosing  an autotuning  method  requires  determining a  balance
+between the exploration of a problem, when we would seek to discover and explain
+relationships between  parameters and performance,  and the exploitation  of the
+best optimizations we can find, when we would seek only to minimize performance.
+
+The    effectiveness   of    search    heuristics   on    autotuning   can    be
+limited\nbsp{}\cite{seymour2008comparison,balaprakash2011can,balaprakash2012experimental},
+between other factors, by underlying hypotheses  about the search space, such as
+the  reachability of  the  global optimum  and the  smoothness  of search  space
+surfaces, which  are frequently not  respected. The derivation  of relationships
+between  parameters  and  performance  from search  heuristic  optimizations  is
+greatly hindered,  if not rendered impossible,  by the biased way  these methods
+explore  parameters.   Some  parametric  learning methods,  such  as  Design  of
+Experiments,  are  not widely  applied  to  autotuning.  These  methods  perform
+structured  parameter  exploration,  and  can  be used  to  build  and  validate
+performance     models,     generating    transparent     and     cost-effective
+optimizations\nbsp{}\cite{mametjanov2015autotuning,bruel2019autotuning}.   Other
+methods  from  the parametric  family  are  more  widely  used, such  as  Bandit
+Algorithms\nbsp{}\cite{xu2017parallel}.  Nonparametric learning methods, such as
+Decision   Trees\nbsp{}\cite{balaprakash2016automomml}   and  Gaussian   Process
+Regression\nbsp{}\cite{parsa2019pabo}, are able to reduce model bias greatly, at
+the  expense  of   increased  prediction  variance.  Figure\nbsp{}\ref{fig:tree}
+categorizes some autotuning methods according to  some of the key hypotheses and
+branching questions underlying each method.
+
+During this  thesis I have  adapted and  studied the effectiveness  of different
+search heuristics and statistical learning  methods on optimizing performance on
+several autotuning domains.  During the beginning of my PhD at the University of
+São Paulo (USP), I have published a paper on optimizing the configuration of the
+CUDA compiler\nbsp{}\cite{bruel2017autotuning},  where we have  reached up to  4 times
+performance improvement  in comparison with a  high-level compiler optimization.
+In collaboration with researchers from  Hewlett Packard Enterprise (HPE) in Palo
+Alto, I wrote a  paper on the autotuning of a  compiler for High-Level Synthesis
+for FPGAs\nbsp{}\cite{bruel2017autotuninghls}, where we  have reached, on average, 25%
+improvements on  performance, size, and  complexity of  designs.
+
+At the  end of 2017,  I joined  the /cotutelle/ PhD  program at the  University of
+Grenoble Alpes  (UGA) and  became a member  of the POLARIS  Inria team,  where I
+applied  Design  of   Experiments  to  the  autotuning   of  a  source-to-source
+transformation  compiler\nbsp{}\cite{bruel2019autotuning},  where  we  showed  we  can
+achieve significant speedup by exploiting  search space structure using a strict
+budget.   I also  have  collaborated with  HPE on  another  paper, providing  an
+analysis  of the  applicability  of autotuning  methods  to a  Hardware-Software
+Co-design  problem\nbsp{}\cite{bruel2017generalize}.   During  my  Teaching  Assistant
+internships,  I  have  published one  paper\nbsp{}\cite{bruel2017openmp}  on  parallel
+programming  teaching, and  collaborated on  another\nbsp{}\cite{goncalves2016openmp},
+where we showed that teaching lower level programming models, despite being more
+challenging at first, provides a stronger core understanding.
+
+I continue to collaborate with HPE  researchers on the application of autotuning
+methods to  optimize Neural Networks,  hardware accelerators for  Deep Learning,
+and  algorithms  for dealing  with  network  congestion.   With my  advisors,  I
+currently manage  1 undergraduate and 4  masters students, who are  applying the
+statistical  learning autotuning  methods  I studied  and  adapted to  different
+domains  in the  context of  a joint  USP/HPE research  project.  I  am strongly
+motivated to continue pursuing a career  on Computer Science research, aiming to
+produce  rigorous and  value-adding  contributions. I  hereby  submit my  thesis
+proposal and application to the Microsoft Latin America PhD Award.
+#+begin_export latex
+\begin{center}
+  \begin{figure}[t]
+    \resizebox{.9\textwidth}{!}{%
+      \begin{forest}
+        for tree={%
+          anchor = north,
+          align = center,
+          l sep+=1em
+        },
+        [{Minimize $f: \mathcal{X} \to \mathbb{R}$,\\$Y = f(X = (x_1,\dots,x_k) \in \mathcal{X}) + \varepsilon$},
+          draw,
+          [{Constructs surrogate estimate $\hat{f}(\cdot, \theta(X))$?},
+            draw,
+            color = NavyBlue
+            [{Search Heuristics},
+              draw,
+              color = BurntOrange,
+              edge label = {node[midway, fill=white, font = \scriptsize]{No}}
+              [{\textbf{Random} \textbf{Sampling}}, draw]
+              [{Reachable Optima},
+                draw,
+                color = BurntOrange
+                [, phantom]
+                [{Underlying Hypotheses \\ \textbf{Heuristics}}, draw]]]
+            [{Statistical Learning},
+              draw,
+              color = BurntOrange,
+              edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}
+              [{Parametric Learning},
+                draw,
+                color = BurntOrange
+                [{$\forall{}i: x_i \in X$ is discrete\\$\hat{f}(X) \approx f_1(x_1) + \dots + f_k(x_k)$},
+                  draw,
+                  color = BurntOrange
+                  [{\textbf{Independent Bandits}\\for each $x_i$:\textbf{UCB},\textbf{EXP3},$\dots$}, draw]
+                  [, phantom]]
+                [{Linear Model\\$\hat{f} = \mathcal{M}(X)\theta{}(X) + \varepsilon$},
+                  draw,
+                  color = BurntOrange
+                  [, phantom]
+                  [{Check for model adequacy?},
+                    draw,
+                    alias = adequacy,
+                    color = NavyBlue
+                    [{Consider interactions?\\{$\exists x_i \neq x_j:\; \theta(x_ix_j) \neq 0$}},
+                      draw,
+                      alias = interactions,
+                      color = NavyBlue,
+                      edge label = {node[midway, fill=white, font = \scriptsize]{No}}
+                      [{$\forall x_i \in X: x_i \in \{-1, 1\}$\\\textbf{Screening} \textbf{Designs}},
+                        edge label = {node[midway, fill=white, font = \scriptsize]{No}},
+                        draw
+                        [, phantom]
+                        [{Select $\hat{X}_{*}$, reduce dimension of $\mathcal{X}$},
+                          edge = {-stealth, ForestGreen, semithick},
+                          edge label = {node[midway, fill=white, font = \scriptsize]{Exploit}},
+                          draw,
+                          alias = estimate,
+                          color = ForestGreen]]
+                      [{\textbf{Optimal} \textbf{Design}},
+                        draw,
+                        alias = optimal,
+                        edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}]]
+                    [, phantom]
+                    [, phantom]
+                    [, phantom]
+                    [, phantom]
+                    [, phantom]
+                    [, phantom]
+                    [{\textbf{Space-filling} \textbf{Designs}},
+                      draw,
+                      edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}
+                      [, phantom]
+                      [{Model selection},
+                        edge = {-stealth, ForestGreen, semithick},
+                        edge label = {node[midway, fill=white, font = \scriptsize]{Explore}},
+                        draw,
+                        alias = selection,
+                        color = ForestGreen]]]]]
+              [{Nonparametric Learning},
+                draw,
+                color = BurntOrange
+                [{Splitting rules on X\\\textbf{Decision} \textbf{Trees}},
+                  draw
+                  [, phantom]
+                  [{Estimate $\hat{f}(\cdot)$ and $uncertainty(\hat{f}(\cdot))$},
+                    edge = {-stealth, ForestGreen, semithick},
+                    draw,
+                    alias = uncertainty,
+                    color = ForestGreen
+                    [{Minimize $uncertainty(\hat{f}(X))$},
+                      edge = {ForestGreen, semithick},
+                      edge label = {node[midway, fill=white, font = \scriptsize]{Explore}},
+                      draw,
+                      color = ForestGreen]
+                    [{Minimize $\hat{f}(X)$},
+                      edge = {ForestGreen, semithick},
+                      edge label = {node[midway, fill=white, font = \scriptsize]{Exploit}},
+                      draw,
+                      color = ForestGreen]
+                    [{Minimize $\hat{f}(X) - uncertainty(\hat{f}(X))$},
+                      edge = {ForestGreen, semithick},
+                      edge label = {node[midway, fill=white, font = \scriptsize]{Exploit$+$Explore}},
+                      draw,
+                      color = ForestGreen]]]
+                [{\textbf{Gaussian} \textbf{Process Regression}},
+                  alias = gaussian,
+                  draw]
+                [{\textbf{Neural} \textbf{Networks}}, draw]]]]]
+        \draw [-stealth, semithick, ForestGreen](selection) to [bend left=27] node[near start, fill=white, font = \scriptsize] {Exploit} (adequacy.south);
+        \draw [-stealth, semithick, ForestGreen](estimate.east) to [bend right=37] node[near start, fill=white, font = \scriptsize] {Explore} (adequacy.south) ;
+        \draw [-stealth, semithick, ForestGreen](gaussian) to (uncertainty);
+        \draw [-stealth, semithick, ForestGreen](optimal) to node[midway, fill=white, font = \scriptsize] {Exploit} (estimate) ;
+      \end{forest}
+    }
+    \caption{A high-level view of autotuning methods, where \textcolor{NavyBlue}{\textbf{blue}} boxes
+      denote branching questions, \textcolor{BurntOrange}{\textbf{orange}} boxes
+      denote key hypotheses, \textcolor{ForestGreen}{\textbf{green}} boxes
+      highlight exploration and exploitation choices, and \textbf{bold} boxes denote methods.}
+    \label{fig:tree}
+  \end{figure}
+\end{center}
+#+end_export
+
+#+latex: \newpage
+
+#+LATEX: \bibliographystyle{IEEEtran}
+#+LATEX: \bibliography{references}
+****** (Short) Search Heuristics and Statistical Learning methods for Autotuning HPC Programs
+:PROPERTIES:
+:EXPORT_DATE:
+:EXPORT_TITLE: @@latex: Search Heuristics and Statistical Learning \\ Methods for Program Autotuning@@
+:EXPORT_FILE_NAME: short-application.pdf
+:EXPORT_AUTHOR: Pedro Bruel
+:END:
+
+#+latex: \vspace{-3em}
+
+High Performance Computing  has been a cornerstone of  collective scientific and
+industrial progress  for at least  five decades.   Paying the cost  of increased
+complexity,  software and  hardware  engineering advances  continue to  overcome
+several challenges on the way of the sustained performance improvements observed
+during the last  fifty years.  This mounting complexity means  that reaching the
+advertised hardware  performance for  a given program  requires not  only expert
+knowledge  of a  given hardware  architecture, but  also mastery  of programming
+models  and  languages for  parallel  and  distributed  computing.
+
+If we state performance optimization problems as /search/ or /learning/ problems, by
+converting implementation  and configuration  choices to /parameters/  which might
+affect  performance,  we  can  draw   and  adapt  proven  methods  from  search,
+mathematical  optimization and  statistics. The  effectiveness of  these adapted
+methods  on autotuning  problems varies  greatly,  and hinges  on practical  and
+mathematical properties of the problem and the corresponding /search space/.
+
+When  adapting methods  for autotuning,  we must  face challenges  emerging from
+practical properties  such as restricted  time and cost budgets,  constraints on
+feasible  parameter values,  and the  need to  mix /categorical/,  /continuous/, and
+/discrete/ parameters. To achieve useful results, we must also choose methods that
+make hypotheses compatible with problem search  spaces, such as the existence of
+discoverable,  or at  least  exploitable, relationships  between parameters  and
+performance.   Choosing  an autotuning  method  requires  determining a  balance
+between the exploration of a problem, when we would seek to discover and explain
+relationships between  parameters and performance,  and the exploitation  of the
+best optimizations we can find, when we would seek only to minimize performance.
+
+During this  thesis I have  adapted and  studied the effectiveness  of different
+search heuristics and statistical learning  methods on optimizing performance on
+several autotuning domains.  During the beginning of my PhD at the University of
+São Paulo (USP), I have published a paper on optimizing the configuration of the
+CUDA compiler\nbsp{}\cite{bruel2017autotuning},  where we have  reached up to  4 times
+performance improvement  in comparison with a  high-level compiler optimization.
+In collaboration with researchers from  Hewlett Packard Enterprise (HPE) in Palo
+Alto, I wrote a  paper on the autotuning of a  compiler for High-Level Synthesis
+for FPGAs\nbsp{}\cite{bruel2017autotuninghls}, where we  have reached, on average, 25%
+improvements on  performance, size, and  complexity of  designs.
+
+At the  end of 2017,  I joined  the /cotutelle/ PhD  program at the  University of
+Grenoble Alpes  (UGA) and  became a member  of the POLARIS  Inria team,  where I
+applied  Design  of   Experiments  to  the  autotuning   of  a  source-to-source
+transformation  compiler\nbsp{}\cite{bruel2019autotuning},  where  we  showed  we  can
+achieve significant speedup by exploiting  search space structure using a strict
+budget.   I also  have  collaborated with  HPE on  another  paper, providing  an
+analysis  of the  applicability  of autotuning  methods  to a  Hardware-Software
+Co-design  problem\nbsp{}\cite{bruel2017generalize}.
+
+I continue to collaborate with HPE  researchers on the application of autotuning
+methods to  optimize Neural Networks,  hardware accelerators for  Deep Learning,
+and  algorithms  for dealing  with  network  congestion.   With my  advisors,  I
+currently manage  1 undergraduate and 4  masters students, who are  applying the
+statistical  learning autotuning  methods  I studied  and  adapted to  different
+domains  in the  context of  a joint  USP/HPE research  project.  I  am strongly
+motivated to continue pursuing a career  on Computer Science research, aiming to
+produce  rigorous and  value-adding  contributions. I  hereby  submit my  thesis
+proposal and application to the Microsoft Latin America PhD Award.
+
+#+LATEX: \bibliographystyle{IEEEtran}
+#+LATEX: \bibliography{references}
+**** Arnaud and Brice: March 15th Meeting
+***** Postdoc training sessions @ Inria
+- Registration OK
+- Project, lab, advisor?
+- Can it be Arnaud, or someone at LIG?
+***** Empirical Roofline
+- Almost 2x peak when using march=native
+- Despite that, no improvement in the best configuration for dgemv kernel
+  - Around 20% improvement for the -O3 kernel version
+- Good argument that the kernel is not capable of leveraging the parameters
+***** Laplacian
+- Looking at a specific semi-automatic example
+- Factors were not always eliminated
+- Later, experiments were automated