Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rejection sampling variational inference #819

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
4efb780
fix typos in docstring
Jan 3, 2018
7e43d1b
add multinomial-dirichlet test, empty `RejectionSamplingKLqp` class
Jan 7, 2018
d673763
Merge branch 'master' into rejection-sampling-variational-inference
Jan 12, 2018
7a5f90e
remove `sample_shape=1`
Jan 12, 2018
94a1bc3
add poisson-gamma test
Jan 14, 2018
a4c87cc
WIP: begin to implement RSVI logic
Jan 15, 2018
163414c
WIP: implement RSVI gradients
Jan 15, 2018
f162135
add scrap notebook with gradient update algo
Jan 19, 2018
2f96076
unit test gradient update algo in notebook
Jan 20, 2018
2c1162b
unit test gradient update algo to 3 iterations
Jan 20, 2018
ad25f6d
`test_kucukelbir_grad` passes
Jan 20, 2018
7e4a9ce
correction: `test_kucukelbir_grad` passes
Jan 20, 2018
8dc4f4f
cleanup (still skeptical this test works, as it seems almost stochastic
Jan 20, 2018
0aae8ed
move `test_kucukelbir_grad` to separate file
Jan 20, 2018
70172fb
add `KucukelbirOptimizer`
Jan 20, 2018
929e25c
pass `n`, `s_n` into `KucukelbirOptimizer` constructor
Jan 20, 2018
95d9774
looking forward to seeing if this passes CI. locally, i have no idea …
Jan 20, 2018
c212858
slightly more confidence
Jan 20, 2018
81637fb
set trainable=False
Jan 20, 2018
7aec66c
initialize `n` to 0
Jan 21, 2018
dda7f26
assert in loop
Jan 21, 2018
2a4ccc8
add dummy parameter `global_step` for temporary compatibility
Jan 21, 2018
8f69548
add `KucukelbirOptimizer`
Jan 21, 2018
26f8ed8
2-space indent
Jan 21, 2018
c7f3ea1
use `KucukelbirOptimizer`
Jan 21, 2018
435ec01
cleanup
Jan 21, 2018
45b17b8
test `qalpha`, `qbeta` values
Jan 21, 2018
ed6e266
delete blank line
Jan 21, 2018
80cee16
add `GammaRejectionSampler`
Jan 23, 2018
ef45bc3
add `log_prob_s` to `GammaRejectionSampler`
Jan 23, 2018
b94ef73
add citation to docstring
Jan 23, 2018
a136f9d
add guts of RSVI, integrating w.r.t. z
Jan 23, 2018
680894b
parametrize sampler with density
Jan 24, 2018
47ba81c
pass density to rejection sampler; return gradients
Jan 24, 2018
26f0c32
dict_swap[z] comes from rejection sampler, not `qz`
Jan 24, 2018
7b997e1
delete gamma_rejection_sampler_vars
Jan 24, 2018
6108125
delete TODO
Jan 24, 2018
77e9a6c
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
3846fa6
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
23c33af
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
4c481a0
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
00c9325
WIP: _test_build_rejection_sampling_loss_and_gradients
Jan 30, 2018
40d3808
pep8
Jan 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion edward/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
HMC, MetropolisHastings, SGLD, SGHMC, \
KLpq, KLqp, ReparameterizationKLqp, ReparameterizationKLKLqp, \
ReparameterizationEntropyKLqp, ScoreKLqp, ScoreKLKLqp, ScoreEntropyKLqp, \
ScoreRBKLqp, WakeSleep, GANInference, BiGANInference, WGANInference, \
ScoreRBKLqp, RejectionSamplingKLqp, WakeSleep, GANInference, BiGANInference, WGANInference, \
ImplicitKLqp, MAP, Laplace, complete_conditional, Gibbs
from edward.models import RandomVariable
from edward.util import check_data, check_latent_vars, copy, dot, \
Expand Down Expand Up @@ -52,6 +52,7 @@
'ScoreKLKLqp',
'ScoreEntropyKLqp',
'ScoreRBKLqp',
'RejectionSamplingKLqp',
'WakeSleep',
'GANInference',
'BiGANInference',
Expand Down
1 change: 1 addition & 0 deletions edward/inferences/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
'ScoreKLKLqp',
'ScoreEntropyKLqp',
'ScoreRBKLqp',
'RejectionSamplingKLqp',
'Laplace',
'MAP',
'MetropolisHastings',
Expand Down
2 changes: 1 addition & 1 deletion edward/inferences/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,6 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs):
Passed into `initialize`.
"""
self.initialize(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add back newline? unrelated to PR


if variables is None:
init = tf.global_variables_initializer()
else:
Expand All @@ -144,6 +143,7 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs):

for _ in range(self.n_iter):
info_dict = self.update()
print(info_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm?

self.print_progress(info_dict)

self.finalize()
Expand Down
2 changes: 1 addition & 1 deletion edward/inferences/klpq.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class KLpq(VariationalInference):

with respect to $\\theta$.

In conditional inference, we infer $z` in $p(z, \\beta
In conditional inference, we infer $z$ in $p(z, \\beta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR. Can you make a new PR to fix this?

\mid x)$ while fixing inference over $\\beta$ using another
distribution $q(\\beta)$. During gradient calculation, instead
of using the model's density
Expand Down
146 changes: 145 additions & 1 deletion edward/inferences/klqp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
import tensorflow as tf

from edward.inferences.variational_inference import VariationalInference
from edward.models import RandomVariable
from edward.models import RandomVariable, Gamma
from edward.samplers import GammaRejectionSampler
from edward.util import copy, get_descendants

try:
Expand Down Expand Up @@ -616,6 +617,62 @@ def build_loss_and_gradients(self, var_list):
return build_score_rb_loss_and_gradients(self, var_list)


class RejectionSamplingKLqp(VariationalInference):

"""
"""

def __init__(self, latent_vars=None, data=None, rejection_sampler_vars=None):
"""Create an inference algorithm.

# TODO: update me

Args:
latent_vars: list of RandomVariable or
dict of RandomVariable to RandomVariable.
Collection of random variables to perform inference on. If
list, each random variable will be implictly optimized using a
`Normal` random variable that is defined internally with a
free parameter per location and scale and is initialized using
standard normal draws. The random variables to approximate
must be continuous.
"""
if isinstance(latent_vars, list):
with tf.variable_scope(None, default_name="posterior"):
latent_vars_dict = {}
continuous = \
('01', 'nonnegative', 'simplex', 'real', 'multivariate_real')
for z in latent_vars:
if not hasattr(z, 'support') or z.support not in continuous:
raise AttributeError(
"Random variable {} is not continuous or a random "
"variable with supported continuous support.".format(z))
batch_event_shape = z.batch_shape.concatenate(z.event_shape)
loc = tf.Variable(tf.random_normal(batch_event_shape))
scale = tf.nn.softplus(
tf.Variable(tf.random_normal(batch_event_shape)))
latent_vars_dict[z] = Normal(loc=loc, scale=scale)
latent_vars = latent_vars_dict
del latent_vars_dict
super(RejectionSamplingKLqp, self).__init__(latent_vars, data)
self.rejection_sampler_vars = rejection_sampler_vars

def initialize(self, n_samples=1, *args, **kwargs):
"""Initialize inference algorithm. It initializes hyperparameters
and builds ops for the algorithm's computation graph.

Args:
n_samples: int, optional.
Number of samples from variational model for calculating
stochastic gradients.
"""
self.n_samples = n_samples
return super(RejectionSamplingKLqp, self).initialize(*args, **kwargs)

def build_loss_and_gradients(self, var_list):
return build_rejection_sampling_loss_and_gradients(self, var_list)


def build_reparam_loss_and_gradients(inference, var_list):
"""Build loss function. Its automatic differentiation
is a stochastic gradient of
Expand Down Expand Up @@ -1127,3 +1184,90 @@ def build_score_rb_loss_and_gradients(inference, var_list):
grads_vars.extend(model_vars)
grads_and_vars = list(zip(grads, grads_vars))
return loss, grads_and_vars


def build_rejection_sampling_loss_and_gradients(inference, var_list, epsilon=None):
"""
"""
rej_samplers = {
Gamma: GammaRejectionSampler
}

rep = [0.0] * inference.n_samples
cor = [0.0] * inference.n_samples
base_scope = tf.get_default_graph().unique_name("inference") + '/'
for s in range(inference.n_samples):
# Form dictionary in order to replace conditioning on prior or
# observed variable with conditioning on a specific value.
scope = base_scope + tf.get_default_graph().unique_name("sample")
dict_swap = {}
for x, qx in six.iteritems(inference.data):
if isinstance(x, RandomVariable):
if isinstance(qx, RandomVariable):
qx_copy = copy(qx, scope=scope)
dict_swap[x] = qx_copy.value()
else:
dict_swap[x] = qx

p_log_prob = 0.
q_log_prob = 0.
r_log_prob = 0.

for z, qz in six.iteritems(inference.latent_vars):
# Copy q(z) to obtain new set of posterior samples.
qz_copy = copy(qz, scope=scope)
sampler = rej_samplers[qz_copy.__class__](density=qz)

if epsilon is not None: # temporary
pass
else:
dict_swap[z] = qz_copy.value()
print('sample:', dict_swap[z])
epsilon = sampler.h_inverse(dict_swap[z])

dict_swap[z] = sampler.h(epsilon)
q_log_prob += tf.reduce_sum(
inference.scale.get(z, 1.0) * qz_copy.log_prob(dict_swap[z]))
r_log_prob += -tf.log(tf.gradients(dict_swap[z], epsilon))

for z in six.iterkeys(inference.latent_vars):
z_copy = copy(z, dict_swap, scope=scope)
p_log_prob += tf.reduce_sum(
inference.scale.get(z, 1.0) * z_copy.log_prob(dict_swap[z]))

for x in six.iterkeys(inference.data):
if isinstance(x, RandomVariable):
x_copy = copy(x, dict_swap, scope=scope)
p_log_prob += tf.reduce_sum(
inference.scale.get(x, 1.0) * x_copy.log_prob(dict_swap[x]))

rep[s] = p_log_prob
cor[s] = tf.stop_gradient(p_log_prob) * (q_log_prob - r_log_prob)

rep = tf.reduce_mean(rep)
cor = tf.reduce_mean(cor)
q_entropy = tf.reduce_sum([
tf.reduce_sum(qz.entropy())
for z, qz in six.iteritems(inference.latent_vars)])
reg_penalty = tf.reduce_sum(tf.losses.get_regularization_losses())

loss = -(rep + q_entropy - reg_penalty)

if inference.logging:
tf.summary.scalar("loss/reparam_objective", rep,
collections=[inference._summary_key])
tf.summary.scalar("loss/correction_term", cor,
collections=[inference._summary_key])
tf.summary.scalar("loss/q_entropy", q_entropy,
collections=[inference._summary_key])
tf.summary.scalar("loss/reg_penalty", reg_penalty,
collections=[inference._summary_key])

g_rep = tf.gradients(rep, var_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you need the multiple gradient calls and not just one? This seems inefficient.

g_cor = tf.gradients(cor, var_list)
g_entropy = tf.gradients(q_entropy, var_list)

grad_summands = zip(*[g_rep, g_cor, g_entropy])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try dropping g_cor from this summand and see if tests still pass?

Expected behavior: pass at a higher tolerance, but not blow up.

This is a possible culprit re: why gradients are exploding in running _test_poisson_gamma.

With a reasonably small step size, maybe 100 epochs.

Worth keeping an eye on g_entropy:

  • First, try g_rep and g_entropy
  • Next, try just g_rep

Print all the gradient terms from notebook as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With a reasonably small step size, maybe 100 epochs." --> i.e. it should pass "with a reasonably small step size, and run for maybe 100 epochs."

grads = [tf.reduce_sum(summand) for summand in grad_summands]
grads_and_vars = list(zip(grads, var_list))
return loss, grads_and_vars
8 changes: 6 additions & 2 deletions edward/inferences/variational_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ def initialize(self, optimizer=None, var_list=None, use_prettytensor=False,

self.loss, grads_and_vars = self.build_loss_and_gradients(var_list)

self.grads_and_vars = grads_and_vars

if self.logging:
tf.summary.scalar("loss", self.loss, collections=[self._summary_key])
for grad, var in grads_and_vars:
Expand Down Expand Up @@ -151,7 +153,9 @@ def update(self, feed_dict=None):
feed_dict[key] = value

sess = get_session()
_, t, loss = sess.run([self.train, self.increment_t, self.loss], feed_dict)
# _, t, loss = sess.run([self.train, self.increment_t, self.loss], feed_dict)
# TODO: delete me
_, t, loss, grads_and_vars_debug = sess.run([self.train, self.increment_t, self.loss, self.grads_and_vars], feed_dict)

if self.debug:
sess.run(self.op_check, feed_dict)
Expand All @@ -161,7 +165,7 @@ def update(self, feed_dict=None):
summary = sess.run(self.summarize, feed_dict)
self.train_writer.add_summary(summary, t)

return {'t': t, 'loss': loss}
return {'t': t, 'loss': loss, 'grads_and_vars_debug': grads_and_vars_debug}

def print_progress(self, info_dict):
"""Print progress to output.
Expand Down
15 changes: 15 additions & 0 deletions edward/samplers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from edward.samplers.rejection import *

from tensorflow.python.util.all_util import remove_undocumented

_allowed_symbols = [
'GammaRejectionSampler',
]

remove_undocumented(__name__, allowed_exception_list=_allowed_symbols)
34 changes: 34 additions & 0 deletions edward/samplers/rejection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import math

import tensorflow as tf


class GammaRejectionSampler:

# As implemented in https://github.com/blei-lab/ars-reparameterization/blob/master/gamma/demo.ipynb

def __init__(self, density):
self.alpha = density.parameters['concentration']
self.beta = density.parameters['rate']

def h(self, epsilon):
a = self.alpha - (1. / 3)
b = tf.sqrt(9 * self.alpha - 3)
c = 1 + (epsilon / b)
d = a * c**3
return d / self.beta

def h_inverse(self, z):
a = self.alpha - (1. / 3)
b = tf.sqrt(9 * self.alpha - 3)
c = self.beta * z / a
d = c**(1 / 3)
return b * (d - 1)

@staticmethod
def log_prob_s(epsilon):
return -0.5 * (tf.log(2 * math.pi) + epsilon**2)
Loading