Skip to content

Commit

Permalink
Update MLAPA application
Browse files Browse the repository at this point in the history
  • Loading branch information
phrb committed Mar 16, 2021
1 parent 0b687b3 commit 280d05e
Showing 1 changed file with 309 additions and 1 deletion.
310 changes: 309 additions & 1 deletion journal.org
Original file line number Diff line number Diff line change
Expand Up @@ -35125,10 +35125,12 @@ gemv_mem = mem_gemv(gemv_size)
gemv_peak_perf = max(gemv_perf / peak_perf_1core,
gemv_mem / peak_mem)

fastest_time = gemv_perf / 1.160198

gemv_df = data.frame(id = "GEMV",
peak_theoretical = gemv_peak_perf,
arithmetic_intensity = gemv_perf / gemv_mem,
peak_achieved = 1.160198)
peak_achieved = fastest_time)

ggplot() +
geom_line(data = roofline_df,
Expand Down Expand Up @@ -35182,3 +35184,309 @@ ggplot() +

#+RESULTS:
[[file:img/theoretical_roofline_xeonE52630v3.pdf]]

*** [2021-03-15 Mon]
**** Files for the Microsoft Latin America PhD Award Application
***** Application to the Microsoft Latin America PhD Award (didn't pan out)
****** Search Heuristics and Statistical Learning methods for Autotuning HPC Programs
:PROPERTIES:
:EXPORT_DATE:
:EXPORT_TITLE: @@latex: Search Heuristics and Statistical Learning \\ Methods for Program Autotuning@@
:EXPORT_FILE_NAME: application.pdf
:EXPORT_AUTHOR: Pedro Bruel
:END:

#+latex: \vspace{-4em}

High Performance Computing has been a cornerstone of collective scientific and
industrial progress for at least five decades. Paying the cost of increased
complexity, software and hardware engineering advances continue to overcome
several challenges on the way of the sustained performance improvements observed
during the last fifty years. This mounting complexity means that reaching the
advertised hardware performance for a given program requires not only expert
knowledge of a given hardware architecture, but also mastery of programming
models and languages for parallel and distributed computing.

If we state performance optimization problems as /search/ or /learning/ problems, by
converting implementation and configuration choices to /parameters/ which might
affect performance, we can draw and adapt proven methods from search,
mathematical optimization and statistics. The effectiveness of these adapted
methods on autotuning problems varies greatly, and hinges on practical and
mathematical properties of the problem and the corresponding /search space/.

When adapting methods for autotuning, we must face challenges emerging from
practical properties such as restricted time and cost budgets, constraints on
feasible parameter values, and the need to mix /categorical/, /continuous/, and
/discrete/ parameters. To achieve useful results, we must also choose methods that
make hypotheses compatible with problem search spaces, such as the existence of
discoverable, or at least exploitable, relationships between parameters and
performance. Choosing an autotuning method requires determining a balance
between the exploration of a problem, when we would seek to discover and explain
relationships between parameters and performance, and the exploitation of the
best optimizations we can find, when we would seek only to minimize performance.

The effectiveness of search heuristics on autotuning can be
limited\nbsp{}\cite{seymour2008comparison,balaprakash2011can,balaprakash2012experimental},
between other factors, by underlying hypotheses about the search space, such as
the reachability of the global optimum and the smoothness of search space
surfaces, which are frequently not respected. The derivation of relationships
between parameters and performance from search heuristic optimizations is
greatly hindered, if not rendered impossible, by the biased way these methods
explore parameters. Some parametric learning methods, such as Design of
Experiments, are not widely applied to autotuning. These methods perform
structured parameter exploration, and can be used to build and validate
performance models, generating transparent and cost-effective
optimizations\nbsp{}\cite{mametjanov2015autotuning,bruel2019autotuning}. Other
methods from the parametric family are more widely used, such as Bandit
Algorithms\nbsp{}\cite{xu2017parallel}. Nonparametric learning methods, such as
Decision Trees\nbsp{}\cite{balaprakash2016automomml} and Gaussian Process
Regression\nbsp{}\cite{parsa2019pabo}, are able to reduce model bias greatly, at
the expense of increased prediction variance. Figure\nbsp{}\ref{fig:tree}
categorizes some autotuning methods according to some of the key hypotheses and
branching questions underlying each method.

During this thesis I have adapted and studied the effectiveness of different
search heuristics and statistical learning methods on optimizing performance on
several autotuning domains. During the beginning of my PhD at the University of
São Paulo (USP), I have published a paper on optimizing the configuration of the
CUDA compiler\nbsp{}\cite{bruel2017autotuning}, where we have reached up to 4 times
performance improvement in comparison with a high-level compiler optimization.
In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo
Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis
for FPGAs\nbsp{}\cite{bruel2017autotuninghls}, where we have reached, on average, 25%
improvements on performance, size, and complexity of designs.

At the end of 2017, I joined the /cotutelle/ PhD program at the University of
Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I
applied Design of Experiments to the autotuning of a source-to-source
transformation compiler\nbsp{}\cite{bruel2019autotuning}, where we showed we can
achieve significant speedup by exploiting search space structure using a strict
budget. I also have collaborated with HPE on another paper, providing an
analysis of the applicability of autotuning methods to a Hardware-Software
Co-design problem\nbsp{}\cite{bruel2017generalize}. During my Teaching Assistant
internships, I have published one paper\nbsp{}\cite{bruel2017openmp} on parallel
programming teaching, and collaborated on another\nbsp{}\cite{goncalves2016openmp},
where we showed that teaching lower level programming models, despite being more
challenging at first, provides a stronger core understanding.

I continue to collaborate with HPE researchers on the application of autotuning
methods to optimize Neural Networks, hardware accelerators for Deep Learning,
and algorithms for dealing with network congestion. With my advisors, I
currently manage 1 undergraduate and 4 masters students, who are applying the
statistical learning autotuning methods I studied and adapted to different
domains in the context of a joint USP/HPE research project. I am strongly
motivated to continue pursuing a career on Computer Science research, aiming to
produce rigorous and value-adding contributions. I hereby submit my thesis
proposal and application to the Microsoft Latin America PhD Award.
#+begin_export latex
\begin{center}
\begin{figure}[t]
\resizebox{.9\textwidth}{!}{%
\begin{forest}
for tree={%
anchor = north,
align = center,
l sep+=1em
},
[{Minimize $f: \mathcal{X} \to \mathbb{R}$,\\$Y = f(X = (x_1,\dots,x_k) \in \mathcal{X}) + \varepsilon$},
draw,
[{Constructs surrogate estimate $\hat{f}(\cdot, \theta(X))$?},
draw,
color = NavyBlue
[{Search Heuristics},
draw,
color = BurntOrange,
edge label = {node[midway, fill=white, font = \scriptsize]{No}}
[{\textbf{Random} \textbf{Sampling}}, draw]
[{Reachable Optima},
draw,
color = BurntOrange
[, phantom]
[{Underlying Hypotheses \\ \textbf{Heuristics}}, draw]]]
[{Statistical Learning},
draw,
color = BurntOrange,
edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}
[{Parametric Learning},
draw,
color = BurntOrange
[{$\forall{}i: x_i \in X$ is discrete\\$\hat{f}(X) \approx f_1(x_1) + \dots + f_k(x_k)$},
draw,
color = BurntOrange
[{\textbf{Independent Bandits}\\for each $x_i$:\textbf{UCB},\textbf{EXP3},$\dots$}, draw]
[, phantom]]
[{Linear Model\\$\hat{f} = \mathcal{M}(X)\theta{}(X) + \varepsilon$},
draw,
color = BurntOrange
[, phantom]
[{Check for model adequacy?},
draw,
alias = adequacy,
color = NavyBlue
[{Consider interactions?\\{$\exists x_i \neq x_j:\; \theta(x_ix_j) \neq 0$}},
draw,
alias = interactions,
color = NavyBlue,
edge label = {node[midway, fill=white, font = \scriptsize]{No}}
[{$\forall x_i \in X: x_i \in \{-1, 1\}$\\\textbf{Screening} \textbf{Designs}},
edge label = {node[midway, fill=white, font = \scriptsize]{No}},
draw
[, phantom]
[{Select $\hat{X}_{*}$, reduce dimension of $\mathcal{X}$},
edge = {-stealth, ForestGreen, semithick},
edge label = {node[midway, fill=white, font = \scriptsize]{Exploit}},
draw,
alias = estimate,
color = ForestGreen]]
[{\textbf{Optimal} \textbf{Design}},
draw,
alias = optimal,
edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}]]
[, phantom]
[, phantom]
[, phantom]
[, phantom]
[, phantom]
[, phantom]
[{\textbf{Space-filling} \textbf{Designs}},
draw,
edge label = {node[midway, fill=white, font = \scriptsize]{Yes}}
[, phantom]
[{Model selection},
edge = {-stealth, ForestGreen, semithick},
edge label = {node[midway, fill=white, font = \scriptsize]{Explore}},
draw,
alias = selection,
color = ForestGreen]]]]]
[{Nonparametric Learning},
draw,
color = BurntOrange
[{Splitting rules on X\\\textbf{Decision} \textbf{Trees}},
draw
[, phantom]
[{Estimate $\hat{f}(\cdot)$ and $uncertainty(\hat{f}(\cdot))$},
edge = {-stealth, ForestGreen, semithick},
draw,
alias = uncertainty,
color = ForestGreen
[{Minimize $uncertainty(\hat{f}(X))$},
edge = {ForestGreen, semithick},
edge label = {node[midway, fill=white, font = \scriptsize]{Explore}},
draw,
color = ForestGreen]
[{Minimize $\hat{f}(X)$},
edge = {ForestGreen, semithick},
edge label = {node[midway, fill=white, font = \scriptsize]{Exploit}},
draw,
color = ForestGreen]
[{Minimize $\hat{f}(X) - uncertainty(\hat{f}(X))$},
edge = {ForestGreen, semithick},
edge label = {node[midway, fill=white, font = \scriptsize]{Exploit$+$Explore}},
draw,
color = ForestGreen]]]
[{\textbf{Gaussian} \textbf{Process Regression}},
alias = gaussian,
draw]
[{\textbf{Neural} \textbf{Networks}}, draw]]]]]
\draw [-stealth, semithick, ForestGreen](selection) to [bend left=27] node[near start, fill=white, font = \scriptsize] {Exploit} (adequacy.south);
\draw [-stealth, semithick, ForestGreen](estimate.east) to [bend right=37] node[near start, fill=white, font = \scriptsize] {Explore} (adequacy.south) ;
\draw [-stealth, semithick, ForestGreen](gaussian) to (uncertainty);
\draw [-stealth, semithick, ForestGreen](optimal) to node[midway, fill=white, font = \scriptsize] {Exploit} (estimate) ;
\end{forest}
}
\caption{A high-level view of autotuning methods, where \textcolor{NavyBlue}{\textbf{blue}} boxes
denote branching questions, \textcolor{BurntOrange}{\textbf{orange}} boxes
denote key hypotheses, \textcolor{ForestGreen}{\textbf{green}} boxes
highlight exploration and exploitation choices, and \textbf{bold} boxes denote methods.}
\label{fig:tree}
\end{figure}
\end{center}
#+end_export

#+latex: \newpage

#+LATEX: \bibliographystyle{IEEEtran}
#+LATEX: \bibliography{references}
****** (Short) Search Heuristics and Statistical Learning methods for Autotuning HPC Programs
:PROPERTIES:
:EXPORT_DATE:
:EXPORT_TITLE: @@latex: Search Heuristics and Statistical Learning \\ Methods for Program Autotuning@@
:EXPORT_FILE_NAME: short-application.pdf
:EXPORT_AUTHOR: Pedro Bruel
:END:

#+latex: \vspace{-3em}

High Performance Computing has been a cornerstone of collective scientific and
industrial progress for at least five decades. Paying the cost of increased
complexity, software and hardware engineering advances continue to overcome
several challenges on the way of the sustained performance improvements observed
during the last fifty years. This mounting complexity means that reaching the
advertised hardware performance for a given program requires not only expert
knowledge of a given hardware architecture, but also mastery of programming
models and languages for parallel and distributed computing.

If we state performance optimization problems as /search/ or /learning/ problems, by
converting implementation and configuration choices to /parameters/ which might
affect performance, we can draw and adapt proven methods from search,
mathematical optimization and statistics. The effectiveness of these adapted
methods on autotuning problems varies greatly, and hinges on practical and
mathematical properties of the problem and the corresponding /search space/.

When adapting methods for autotuning, we must face challenges emerging from
practical properties such as restricted time and cost budgets, constraints on
feasible parameter values, and the need to mix /categorical/, /continuous/, and
/discrete/ parameters. To achieve useful results, we must also choose methods that
make hypotheses compatible with problem search spaces, such as the existence of
discoverable, or at least exploitable, relationships between parameters and
performance. Choosing an autotuning method requires determining a balance
between the exploration of a problem, when we would seek to discover and explain
relationships between parameters and performance, and the exploitation of the
best optimizations we can find, when we would seek only to minimize performance.

During this thesis I have adapted and studied the effectiveness of different
search heuristics and statistical learning methods on optimizing performance on
several autotuning domains. During the beginning of my PhD at the University of
São Paulo (USP), I have published a paper on optimizing the configuration of the
CUDA compiler\nbsp{}\cite{bruel2017autotuning}, where we have reached up to 4 times
performance improvement in comparison with a high-level compiler optimization.
In collaboration with researchers from Hewlett Packard Enterprise (HPE) in Palo
Alto, I wrote a paper on the autotuning of a compiler for High-Level Synthesis
for FPGAs\nbsp{}\cite{bruel2017autotuninghls}, where we have reached, on average, 25%
improvements on performance, size, and complexity of designs.

At the end of 2017, I joined the /cotutelle/ PhD program at the University of
Grenoble Alpes (UGA) and became a member of the POLARIS Inria team, where I
applied Design of Experiments to the autotuning of a source-to-source
transformation compiler\nbsp{}\cite{bruel2019autotuning}, where we showed we can
achieve significant speedup by exploiting search space structure using a strict
budget. I also have collaborated with HPE on another paper, providing an
analysis of the applicability of autotuning methods to a Hardware-Software
Co-design problem\nbsp{}\cite{bruel2017generalize}.

I continue to collaborate with HPE researchers on the application of autotuning
methods to optimize Neural Networks, hardware accelerators for Deep Learning,
and algorithms for dealing with network congestion. With my advisors, I
currently manage 1 undergraduate and 4 masters students, who are applying the
statistical learning autotuning methods I studied and adapted to different
domains in the context of a joint USP/HPE research project. I am strongly
motivated to continue pursuing a career on Computer Science research, aiming to
produce rigorous and value-adding contributions. I hereby submit my thesis
proposal and application to the Microsoft Latin America PhD Award.

#+LATEX: \bibliographystyle{IEEEtran}
#+LATEX: \bibliography{references}
**** Arnaud and Brice: March 15th Meeting
***** Postdoc training sessions @ Inria
- Registration OK
- Project, lab, advisor?
- Can it be Arnaud, or someone at LIG?
***** Empirical Roofline
- Almost 2x peak when using march=native
- Despite that, no improvement in the best configuration for dgemv kernel
- Around 20% improvement for the -O3 kernel version
- Good argument that the kernel is not capable of leveraging the parameters
***** Laplacian
- Looking at a specific semi-automatic example
- Factors were not always eliminated
- Later, experiments were automated
Expand Down

0 comments on commit 280d05e

Please sign in to comment.