You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Markov Decision Process is a $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ where
$\mathcal{S}$ consists of states,
$\mathcal{A}$ consists of actions,
more generally, $\mathcal{A}_s$ is a set of actions permitted from $s\in\mathcal{S}$
$\mathcal{P}_a(s, s')$ is the probability, when permitted, that visiting $(s, a)$ leads to $s'$
$R_a(s, s')$ is the (expected) immediate reward if visiting $(s, a)$ leads to $s'$
We don't quite need the generality of a MDP because our transition probabilities are determinstic.
I.e., $\mathcal{P}a(s, \cdot) = \delta{s'}$ for some $s'\in\mathcal{S}$.
instead of including a reward function, we are optimizing an objective over a DFA
a DFA has a specific initial state
we may consider a DFA for each MCTS
this formally does explicitly leave room for actions sets to be restricted in any way
but this can be patched by replacing a $(\mathcal{Q}, \Sigma, \delta: \mathcal{Q}\times\Sigma\to\mathcal{Q}^\ast)$
(here, $\mathcal{Q}^\ast$ is $\mathcal{Q}$ with dud state added)
with $(\mathcal{Q}^\ast, \Sigma, \delta^\ast: \mathcal{Q}^\ast\times\Sigma\to\mathcal{Q}^\ast)$ with the dud state serving as a final object in the corresponding monoidal category
Across each MCTS, the transition dynamics are shared by a common semiautomaton.
Markov Decision Process
A Markov Decision Process is a$(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ where
We don't quite need the generality of a MDP because our transition probabilities are determinstic.$s'\in\mathcal{S}$ .
I.e., $\mathcal{P}a(s, \cdot) = \delta{s'}$ for some
Deterministic Finite Automaton
A Deterministic Finite Automaton consists of
This is closer to our formulation. Note that:
Across each MCTS, the transition dynamics are shared by a common semiautomaton.
Semiautomaton
A Semiautomaton has
As before, we may accommodate partial transition functions$T:\mathcal{Q}\times \Sigma\to\mathcal{Q}$ with the canonical monadic extension.
The text was updated successfully, but these errors were encountered: