Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.
noe edited this page Feb 3, 2016 · 1 revision

HMM Estimation in PyEMMA

Initialization

Initialization options include:

  • MSM given: Use active set of the given MSM to initialize
  • msm_init=None (default) or 'lcs': Estimate MSM and use largest connected set to initialize
  • msm_init='all': Estimate MSM(s) on full state space (may be disconnected) to initialize

Hidden state space

  • restrict to subset by sub_hmm(states=None, obs=None).

Observation space

  • Generally all states are being observed. We can always provide the three quantities $$p_0 \in \mathbb{R}^{n} :::::: P \in \mathbb{R}^{n\times n} :::::: B \in \mathbb{R}^{n\times m}$$ for all symbols $m$, even if some symbols are never observed. Note, however, that the Bayesian inverse of $B$, $$M \in \mathbb{R}^{m\times n} $$ is not well defined for empty symbols. We can formally define $M$ to have zero rows in that case, but that breaks the stochasticity structure of $M$.
  • If we want to use HMM as a MSM coarse-graining approach, we may want to cut down the observed symbols to those matching the MSM active set. We can solve this by calling the restriction function sub_hmm(states=None, obs=None) in the MSM.coarse_grain() function.
  • TODO: remove observe_active option.
  • Optional: add observe option that leads to return of sub_hmm instead of full model

Sub-Models

Functionality to extract a sub-Model, either in terms of hidden states or observed states, or both.

  • Need to remember full state space. Set active_set and observable_set as index sets
  • Method design:
def sub_hmm(states=None, obs=None)
	""" Returns a HMM with restricted state space.
	
	Parameters
	----------
	states : None, str or int-array
		Hidden states to restrict the model to. In addition to specifying
		the subset, possible options are:
		* None : all states - don't restrict
		* 'populous-strong' : strongly connected subset with maximum counts
		* 'populous-weak' : weakly connected subset with maximum counts
		* 'largest-strong' : strongly connected subset with maximum size
		* 'largest-weak' : weakly connected subset with maximum size
	obs : None, str or int-array
		Observed states to restrict the model to. In addition to specifying
		the subset, possible options are:
		* None : all observed states - don't restrict
		* 'nonempty' : all states with at least one observation
	
	Returns
	-------
	hmm : HMM
		The restricted HMM.
		
	"""
	pass

def largest_connected_hmm(strong=True)
	""" Returns the largest connected sub-HMM (convenience function)
	"""
	if strong:
		return sub_hmm(states='largest-strong')
	else:
		return sub_hmm(states='largest-weak')

def populous_connected_hmm(strong=True)
	""" Returns the most populous connected sub-HMM (convenience function)
	"""
	if strong:
		return sub_hmm(states='populous-strong')
	else:
		return sub_hmm(states='populous-weak')
Clone this wiki locally