Skip to content

Renyi divergence#769

Open
jbregli wants to merge 40 commits into
blei-lab:masterfrom
jbregli:renyi_divergence
Open

Renyi divergence#769
jbregli wants to merge 40 commits into
blei-lab:masterfrom
jbregli:renyi_divergence

Conversation

@jbregli

@jbregli jbregli commented Sep 27, 2017

Copy link
Copy Markdown

Here is an implementation of the Renyi divergence variational inference.
There's also an example on VAEs.

Here is a link to the edward forum with some more info:
https://discourse.edwardlib.org/t/renyi-divergence-variational-inference/366/3

ps: Sorry for the quite messy commit history.

@dustinvtran dustinvtran left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to go catch a flight but some preliminary comments:

Comment thread .gitignore Outdated
# IDE related
.idea/
.vscode/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove changes that aren't relevant for this PR? This includes changes to .gitignore here as well as deletion of CSVs.

Comment thread edward/inferences/renyi_divergence.py Outdated
from edward.util import copy

try:
from edward.models import Normal

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we use 2-space indent.

from __future__ import print_function

import six
import numpy as np

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we alphabetize the ordering of the import libraries.

Comment thread edward/inferences/renyi_divergence.py Outdated
"{0}. Your TensorFlow version is not supported.".format(e))


class Renyi_divergence(VariationalInference):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we use CamelCase for class names.

Comment thread edward/inferences/renyi_divergence.py Outdated
To perform the optimization, this class uses the techniques from
Renyi Divergence Variational Inference (Y. Li & al, 2016)

# Notes:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are parsed as Markdown and formatted in a somewhat specific way as they appear on the API docs. I recommend following the other classes, where you would denote a subsection as #### Notes and when writing bullet points, do, e.g.,

#### Notes

+ bullet 1
+ bullet 2
  + maybe bulleted list in a bullet

@dustinvtran dustinvtran left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some comments below. The code looks correct and only minor suggestions with respect to formatting are laid out.

Can you include a unit test? See, e.g., how KLpq is tested under the file tests/inferences/test_klpq.py.

Comment thread edward/inferences/renyi_divergence.py Outdated
$ \text{D}_{R}^{(\alpha)}(q(z)||p(z \mid x))
= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $

To perform the optimization, this class uses the techniques from

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Periods at end of sentences. (If you'd look at the generated API for the class, I recommend compiling the website following instructions from docs/.)

Comment thread edward/inferences/renyi_divergence.py Outdated
= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $

To perform the optimization, this class uses the techniques from
Renyi Divergence Variational Inference (Y. Li & al, 2016)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use bibtex for handling references in docstrings. This is handled by adding the appropriate bib entry to docs/tex/bib.bib; make sure it's also written in the right order: we sort bib entries by their year, then alphabetically according to their citekey within each year.

When using references, you can produce (Li et al., 2016) and Li et al. (2016) by writing [@li2016renyi] and @li2016renyirespectively, assuming thatli2016renyi` is the citekey.

Comment thread edward/inferences/renyi_divergence.py Outdated

# Notes:
- Renyi divergence does not have any analytic version.
- Renyi divergence does not have any version for non reparametrizable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does but the gradient estimator in @li2016variational doesn't. I recommend just stating that this inference algorithm is restricted to variational approximations whose random variables all satisfy rv.reparameterization_type == tf.contrib.distributions.FULLY_REPARAMETERIZED.

Also, instead of checking this during build_loss_and_gradients I recommend checking this during the __init__. This sort of check is done statically any graph construction similar to how we check for compatible shapes in all latent variables and data during __init__.

Comment thread edward/inferences/renyi_divergence.py Outdated

def initialize(self,
n_samples=32,
alpha=1.,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we append all numerics with 0, e.g., 1.0.

Comment thread edward/inferences/renyi_divergence.py Outdated
Number of samples from variational model for calculating
stochastic gradients.
alpha: float, optional.
Renyi divergence coefficient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to specify the domain of the coefficient. E.g., Must be greater than 0. or etc.

Comment thread edward/inferences/renyi_divergence.py Outdated
"Variational Renyi inference only works with reparameterizable"
" models")

#########

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used in one location and is a one-liner; could you write that line instead of defining a new function?

Comment thread examples/vae_renyi.py Outdated
scale=Dense(d, activation='softplus')(hidden))

# Bind p(x, z) and q(z | x) to the same TensorFlow placeholder for x.
inference = Renyi_divergence({z: qz}, data={x: x_ph})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks exactly the same as an older version of vae.py but only differs in this line. To keep the VAE versions better synced, could you add a comment suggesting that this is also an alternative in the existing vae.py?

Ideally, we'd like a specific application where ed.RenyiDivergence produces better results by some metric than alternatives. IIRC, the paper had some interesting results for a Bayesian neural net on some specific UCI data sets. That would be great to have and reproduce some of their results.

If you don't have time for this, we can leave it off for now and raise it as a Github issue post-merging this PR.

Comment thread edward/inferences/renyi_divergence.py Outdated
self.scale.get(x, 1.0)
* x_copy.log_prob(dict_swap[x]))

logF = [p - q for p, q in zip(p_log_prob, q_log_prob)]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of logF, what about something like log_ratios, which is more Pythonic in snake_case and also more semantically meaningful?

Replaced LogF by log_ratios
Fix convention errors
@jbregli

jbregli commented Sep 27, 2017

Copy link
Copy Markdown
Author

Thanks for the suggestion and the very informative feedback.

Can you include a unit test? See, e.g., how KLpq is tested under the file tests/inferences/test_klpq.py.

Will do later today.

@jbregli

jbregli commented Sep 28, 2017

Copy link
Copy Markdown
Author

I've added some testing in a similar way as KLqp. (both normal_normal and the bernouilli distribution.
For each cases, I've tested most of the possible cases of the Renyi VI:
KL, VR-max, VR-min, alpha<0, alpha>0.

import tensorflow as tf

from edward.models import Bernoulli, Normal
from edward.inferences.renyi_divergence import RenyiDivergence

@dustinvtran dustinvtran Sep 28, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the import works by instead using ed.RenyiDivergence in the test.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some issue with this but it should be working now.

Comment thread edward/inferences/renyi_divergence.py Outdated
[@li2016renyi].

#### Notes
+ The gradient estimator used here does not have any analytic version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Markdown formatting, you don't need the 4 spaces of indentation. E.g., you can just do

#### Notes

+ The gradient estimator ...
+ ...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread edward/inferences/renyi_divergence.py Outdated
= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz.$

The optimization is performed using the gradient estimator as defined in
[@li2016renyi].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The citekey is being used as a direct object so it should be [@li2016renyi] -> @li2016renyi.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread edward/inferences/renyi_divergence.py Outdated
+ See Renyi Divergence Variational Inference [@li2016renyi] for
more details.
"""
if self.is_reparameterizable:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_reparameterizable should be checked with the possible raising error during the __init__, and since it's checked there it doesn't need to be stored in the class. This also helps to remove one layer of indentation in this function.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread edward/inferences/renyi_divergence.py Outdated
if self.backward_pass == 'max':
log_ratios = tf.stack(log_ratios)
log_ratios = tf.reduce_max(log_ratios, 0)
loss = tf.reduce_mean(log_ratios)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the code correctly, log_ratios when first created is a list of n_samples elements, where each element is a log ratio calculation per sample from q. For the min / max modes, we take the min / max of these log ratios, which is a scalar.

Is tf.reduce_mean for the loss needed? You can also remove the tf.stack line in the min and max cases in the same way you didn't use it for the self.alpha \approx 1 case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for spotting this.

Comment thread examples/vae_renyi.py Outdated
[@li2016renyi]

#### Notes
This example is almost exactly similar to example/vae.py.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the miscommunication. What I meant was that you can edit vae.py, comment out the 1-2 lines of code to use ed.RenyiDivergence, and add these notes there. This helps to compress the content in the examples, c.f., https://github.com/blei-lab/edward/blob/master/examples/bayesian_logistic_regression.py#L51.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed vae_renyi.py and modifed vae.py instead.
The version of vae.py I had wasn't running though. So I've modified it quite a bit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the latest version of Edward? We updated a few details in vae.py so it actually runs better. For example, you should be using the observations library and a generator, which is far more transparent than the mnist_data class from that TensorFlow tutorial.

In addition, since vae.py is also our canonical VAE example, I prefer keeping it as ed.KLqp as the default, and with the renyi divergence option commented out; similarly, the top-level comments should be written in-line near the renyi divergence option instead.

If you have thoughts otherwise, happy to take alternative suggestions.


class test_renyi_divergence_class(tf.test.TestCase):

def _test_normal_normal(self, Inference, *args, **kwargs):

@dustinvtran dustinvtran Sep 28, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since RenyiDivergence is used across all tests, you don't need Inference as an arg to the test functions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used the same template as test_klpq where only KLpq is used during the tests and where Inference is stil an argument to the test functions.
But I think I have modified to be closer to what you had in mind

@jbregli

jbregli commented Sep 29, 2017

Copy link
Copy Markdown
Author

It keeps failing the travis-ci check for python 2.7 but before getting into the proper testing of my code (fail to install matplotlib and seaborn).
Anything I've done wrong on my side?

@dustinvtran

Copy link
Copy Markdown
Member

Looks like this is happening in Travis on any build. I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants