|
18 | 18 | # For instance, in autoregressive models, we cannot interpolate between two images because of the lack of a latent representation. |
19 | 19 | # We will explore and discuss these benefits and drawbacks alongside with our implementation. |
20 | 20 | # |
21 | | -# Our implementation will focus on the [PixelCNN](https://arxiv.org/pdf/1606.05328.pdf) [2] model which has been discussed in detail in the lecture. |
| 21 | +# Our implementation will focus on the [PixelCNN](https://arxiv.org/abs/1606.05328) [2] model which has been discussed in detail in the lecture. |
22 | 22 | # Most current SOTA models use PixelCNN as their fundamental architecture, |
23 | 23 | # and various additions have been proposed to improve the performance |
24 | | -# (e.g. [PixelCNN++](https://arxiv.org/pdf/1701.05517.pdf) and [PixelSNAIL](http://proceedings.mlr.press/v80/chen18h/chen18h.pdf)). |
| 24 | +# (e.g. [PixelCNN++](https://arxiv.org/abs/1701.05517) and [PixelSNAIL](http://proceedings.mlr.press/v80/chen18h/chen18h.pdf)). |
25 | 25 | # Hence, implementing PixelCNN is a good starting point for our short tutorial. |
26 | 26 | # |
27 | 27 | # First of all, we need to import our standard libraries. Similarly as in |
@@ -173,7 +173,7 @@ def show_imgs(imgs): |
173 | 173 | # If we now want to apply this to our convolutions, we need to ensure that the prediction of pixel 1 |
174 | 174 | # is not influenced by its own "true" input, and all pixels on its right and in any lower row. |
175 | 175 | # In convolutions, this means that we want to set those entries of the weight matrix to zero that take pixels on the right and below into account. |
176 | | -# As an example for a 5x5 kernel, see a mask below (figure credit - [Aaron van den Oord](https://arxiv.org/pdf/1606.05328.pdf)): |
| 176 | +# As an example for a 5x5 kernel, see a mask below (figure credit - [Aaron van den Oord](https://arxiv.org/abs/1606.05328)): |
177 | 177 | # |
178 | 178 | # <center width="100%" style="padding: 10px"><img src="masked_convolution.svg" width="150px"></center> |
179 | 179 | # |
@@ -216,10 +216,10 @@ def forward(self, x): |
216 | 216 | # |
217 | 217 | # To build our own autoregressive image model, we could simply stack a few masked convolutions on top of each other. |
218 | 218 | # This was actually the case for the original PixelCNN model, discussed in the paper |
219 | | -# [Pixel Recurrent Neural Networks](https://arxiv.org/pdf/1601.06759.pdf), but this leads to a considerable issue. |
| 219 | +# [Pixel Recurrent Neural Networks](https://arxiv.org/abs/1601.06759), but this leads to a considerable issue. |
220 | 220 | # When sequentially applying a couple of masked convolutions, the receptive field of a pixel |
221 | 221 | # show to have a "blind spot" on the right upper side, as shown in the figure below |
222 | | -# (figure credit - [Aaron van den Oord et al. ](https://arxiv.org/pdf/1606.05328.pdf)): |
| 222 | +# (figure credit - [Aaron van den Oord et al. ](https://arxiv.org/abs/1606.05328)): |
223 | 223 | # |
224 | 224 | # <center width="100%" style="padding: 10px"><img src="pixelcnn_blind_spot.svg" width="275px"></center> |
225 | 225 | # |
@@ -445,7 +445,7 @@ def show_center_recep_field(img, out): |
445 | 445 | # For visualizing the receptive field, we assumed a very simplified stack of vertical and horizontal convolutions. |
446 | 446 | # Obviously, there are more sophisticated ways of doing it, and PixelCNN uses gated convolutions for this. |
447 | 447 | # Specifically, the Gated Convolution block in PixelCNN looks as follows |
448 | | -# (figure credit - [Aaron van den Oord et al. ](https://arxiv.org/pdf/1606.05328.pdf)): |
| 448 | +# (figure credit - [Aaron van den Oord et al. ](https://arxiv.org/abs/1606.05328)): |
449 | 449 | # |
450 | 450 | # <center width="100%"><img src="PixelCNN_GatedConv.svg" width="700px" style="padding: 15px"/></center> |
451 | 451 | # |
@@ -506,7 +506,7 @@ def forward(self, v_stack, h_stack): |
506 | 506 | # The architecture consists of multiple stacked GatedMaskedConv blocks, where we add an additional dilation factor to a few convolutions. |
507 | 507 | # This is used to increase the receptive field of the model and allows to take a larger context into account during generation. |
508 | 508 | # As a reminder, dilation on a convolution works looks as follows |
509 | | -# (figure credit - [Vincent Dumoulin and Francesco Visin](https://arxiv.org/pdf/1603.07285.pdf)): |
| 509 | +# (figure credit - [Vincent Dumoulin and Francesco Visin](https://arxiv.org/abs/1603.07285)): |
510 | 510 | # |
511 | 511 | # <center width="100%"><img src="https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/dilation.gif" width="250px"></center> |
512 | 512 | # |
@@ -655,7 +655,7 @@ def test_step(self, batch, batch_idx): |
655 | 655 | # %% [markdown] |
656 | 656 | # The visualization shows that for predicting any pixel, we can take almost half of the image into account. |
657 | 657 | # However, keep in mind that this is the "theoretical" receptive field and not necessarily |
658 | | -# the [effective receptive field](https://arxiv.org/pdf/1701.04128.pdf), which is usually much smaller. |
| 658 | +# the [effective receptive field](https://arxiv.org/abs/1701.04128), which is usually much smaller. |
659 | 659 | # For a stronger model, we should therefore try to increase the receptive |
660 | 660 | # field even further. Especially, for the pixel on the bottom right, the |
661 | 661 | # very last pixel, we would be allowed to take into account the whole |
@@ -869,7 +869,7 @@ def autocomplete_image(img): |
869 | 869 | # Interestingly, the pixel values 64, 128 and 191 also stand out which is likely due to the quantization used during the creation of the dataset. |
870 | 870 | # For RGB images, we would also see two peaks around 0 and 255, |
871 | 871 | # but the values in between would be much more frequent than in MNIST |
872 | | -# (see Figure 1 in the [PixelCNN++](https://arxiv.org/pdf/1701.05517.pdf) for a visualization on CIFAR10). |
| 872 | +# (see Figure 1 in the [PixelCNN++](https://arxiv.org/abs/1701.05517) for a visualization on CIFAR10). |
873 | 873 | # |
874 | 874 | # Next, we can visualize the distribution our model predicts (in average): |
875 | 875 |
|
|
0 commit comments