Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
as6325400 committed Apr 24, 2024
1 parent 95634c8 commit 3f19d54
Show file tree
Hide file tree
Showing 12 changed files with 76 additions and 23 deletions.
6 changes: 3 additions & 3 deletions components/ImageZoom.vue
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ watch(() => props.options, (options) => {
border: initial;
}
img {
border-radius: 5px;
/* border-radius: 5px; */
width: 100%;
}
@media (prefers-color-scheme: light) {
img {
/* img {
border: solid 0.1rem #ddd;
}
} */
}
</style>

93 changes: 73 additions & 20 deletions components/Main.vue
Original file line number Diff line number Diff line change
Expand Up @@ -34,25 +34,25 @@ function shuffle(array) {

<template>
<div class="main">
<div class="video">
<VideoComparison url="/videos/teaser" />
<div class="caption">MeDM enables temporally consistent video rendering and translation using image Diffusion Models. Slide for interactive comparison. Inputs are on the left.</div>
<div class="image">
<ImageZoom src="images/teaser.png" :options="{ background: imageOverlayColor }" />
<div class="caption">Leveraging the preeminent capability of Latent Diffusion Model (LDM) and ControlNet as a prior knowledge of aesthetic QR code images, coupled with our proposed Scanning-Robust (Perceptual) Guidance, we can generate custom-styled QR codes conform to user prompts while assuring both scannability and aesthetics.</div>
</div>

<div class="section-title">Abstract</div>
<p>
QR codes, prevalent in daily applications, lack visual appeal due to their conventional black-and-white design. Integrating aesthetics while maintaining scannability poses a challenge. In this paper, we introduce a novel diffusion-model-based aesthetic QR code generation pipeline, utilizing pre-trained ControlNet and guided iterative refinement via a novel classifier guidance (SRG) based on the proposed Scanning-Robust Loss (SRL) tailored with QR code mechanisms, which ensures both aesthetics and scannability. To further improve the scannability while preserving aesthetics, we propose a two-stage pipeline with Scanning-Robust Perceptual Guidance (SRPG). Moreover, we can further enhance the scannability of the generated QR code by postprocessing it through the proposed Scanning-Robust Projected Gradient Descent (SRPGD) post-processing technique based on SRL with proven convergence. With extensive quantitative, qualitative, and subjective experiments, the results demonstrate that the proposed approach can generate diverse aesthetic QR codes with flexibility in detail. In addition, our pipelines outperforming existing models in terms of Scanning Success Rate (SSR) 86.67% (+40%) with comparable aesthetic scores. The pipeline combined with SRPGD further achieves 96.67% (+50%).
</p>
<div class="image">
<!-- <div class="image">
<ImageZoom src="/images/girl.jpg" :options="{ background: imageOverlayColor }" />
<div class="caption">We extract a 20-pixel-wide vertical segment of pixels from each generated frame and stack them horizontally. MeDM produces fluent videos which reconstruct stripe-free images.
<a @click="showGirlVideo = !showGirlVideo">
<template v-if="!showGirlVideo">Show</template>
<template v-else>Hide</template>
video.</a>
</div>
</div>
<VideoComparisonMultiple v-if="showGirlVideo"
</div> -->
<!-- <VideoComparisonMultiple v-if="showGirlVideo"
:urls="[
'/videos/girl/sde',
'/videos/girl/cn',
Expand All @@ -65,16 +65,38 @@ function shuffle(array) {
'SDEdit + MeDM',
'ControlNet + MeDM',
]"
/>
/> -->

<div class="section-title">Methodology</div>

<div class="section-title">Architecture</div>
<div class="image">
<ImageZoom src="/images/system-diagram.jpg" :options="{ background: imageOverlayColor }" />
<div class="caption">MeDM mediates independent image score estimations after every denoising step. Inspired by the fact that video pixels are essentially views to the underlying objects, we construct an explicit pixel repository <LatexR /> to represent the underlying world. For more details, please refer to our <a href="/medm.pdf" target="_blank">paper</a>.</div>
<ImageZoom src="images/loss.png" :options="{ background: imageOverlayColor }" />
<div class="caption"><span class="bold">Scanning-Robust Loss (SRL).</span> We emulate the scanning process using module pixel extraction and binarization to calculate the pixel-wise error matrix and module-wise optimization decision mask. Then we apply a Gaussian kernel to re-weight the error matrix. Finally, we mask the error matrix with the decision mask via Hadamard product, then take the average to form our SRL.</div>
</div>

<div class="section-title">Video Rendering</div>
<p>
<div class="image">
<ImageZoom src="images/one_stage_generation_pipeline.png" :options="{ background: imageOverlayColor }" />
<div class="caption"><span class="bold">Iterative refinement with Scanning Robustness Guidance (SGD).</span> First, we leverage pre-trained ControlNet to obtain the initial score prediction conditioned on the target QR code and user-specified prompt. During each denoising step, we approximate the original latent followed by DDIM formulation, then apply the VAE decoder to get the original image for SRL calculation. We utilize the gradient of SRL as a guidance term to update the predicted score. Repeat the above iterative refinement process until convergence.</div>
</div>

<div class="image">
<ImageZoom src="images/two_stage_generation_pipeline.png" :options="{ background: imageOverlayColor }" />
<div class="caption"><span class="bold">Two-stage generation pipeline with Scanning-Robust Perceptual Guidance (SRPG).</span> In Stage 1, we utilize the pre-trained plain ControlNet to generate an aesthetic yet unscannable sub-optimal QR code; In Stage 2, we first perform SDEdit to convert the sub-optimal QR code to latent space, then leverage Qart to merge with the target QR code, finally, we apply our proposed iterative refinement to produce aesthetic and scannable QR code.</div>
</div>

<div class="section-title">Comparison Results</div>

<div class="image">
<ImageZoom src="images/comparison.png" :options="{ background: imageOverlayColor }" />
<div class="caption">Comparisons with generative-based methods. The green box represents scannable images, while the red box indicates images that cannot be scanned.</div>
</div>

<div class="image">
<ImageZoom src="images/compare.png" :options="{ background: imageOverlayColor }" />
<div class="caption">Quantitative results of generative-based methods and our proposed pipeline. Improvements marked in green are compared with QR Code Monster.</div>
</div>

<!-- <p>
MeDM is capable of efficiently rendering high quality videos solely from 3D assets, including optical flows, occlusions and position information (depth, normal). We use the lineart derived from the normal maps as the input conditions to ControlNet. 3D assets from <a href="http://sintel.is.tue.mpg.de" target="_blank">MPI Sintel</a>.
</p>
<Gallery
Expand Down Expand Up @@ -102,13 +124,40 @@ function shuffle(array) {
'/videos/assistive-rendering/shaman_2',
'/videos/assistive-rendering/temple_3',
])"
/>
<div class="section-title">Text-Guided Video Edit</div>
/> -->
<div class="section-title">Analytics</div>
<p>
MeDM also performs well without high precision optical flows. We demonstrate this by applying text-guided video editing on real-world videos in <a href="https://davischallenge.org/davis2016/code.html" target="_blank">DAVIS 2016</a>.
In Fig. (a), we compare the error rates of a sample with different SRG weights during iterative refinement steps. We observed that the error plunges within the first 5 iterations with SRG, whereas without SRG. Furthermore, we analyze the change in score magnitude of different SRG weights. We found that the score magnitude decreased over the iterations, indicating the guidance effects diminished over time. This trend is depicted in Fig. (b).
</p>

<div style="display: flex;">
<div class="image">
<ImageZoom src="images/error.png" :options="{ background: imageOverlayColor }" />
<div class="caption">(a) QR code error rate.</div>
</div>
<div class="image">
<ImageZoom src="images/gradient_norm.png" :options="{ background: imageOverlayColor }" />
<div class="caption">(b) Score magnitude</div>
</div>
</div>

<p>
We visualize the images at different timestep and their corresponding mismatched modules. The mismatched modules are marked in red, indicating the inconsistencies between
scanner-decoded image and the target QR code, Initially, the image contains a plethora of mismatched modules, leading to the unscannable situation. However, the number of mismatched modules significantly decreases as the sampling process proceeds. Moreover, we can observe that the amount of mismatched modules plunges after certain sampling steps. This indicates that the mismatch rate falls within the QR code error-correction capacity, allowing the control reverting to the diffusion model to generate more appealing results.
</p>

<Gallery
<div class="image">
<ImageZoom src="images/iterative_refinement_process.png" :options="{ background: imageOverlayColor }" />
</div>

<p>
We analyze the robustness of the generated results through error analysis. The scanning robustness can be maintained as long as the modules after sampling and binarization yield identical results as the target QR code regardless of pixel color changes within the modules. Our aesthetic QR codes exhibit irregular colors and shapes in their modules. Despite undergoing sampling and binarization, the module results remain consistent with the original QR code. This suggests that our aesthetic QR codes are robust and readable by a standard QR code scanner.
</p>

<div class="image">
<ImageZoom src="images/error_analysis_qrcode_module_error.png" :options="{ background: imageOverlayColor }" />
</div>
<!-- <Gallery
:urls="[
'/videos/edit/bear',
'/videos/edit/blackswan',
Expand All @@ -121,9 +170,9 @@ function shuffle(array) {
'Prompt: a boat on fire',
'Prompt: flamingos in outer space',
]"
/>
/> -->

<div class="section-title">Video Anonymization</div>
<!-- <div class="section-title">Video Anonymization</div>
<p>
Finally, we demonstrate the versatility of MeDM. For example, MeDM can perform video anonymization out-of-the-box. We leverage the fact that human visual perception exhibits a remarkable sensitivity to human faces while our ability to detect and recognize other objects is not as specialized. We add noise to a video with a strength of 0.5T, which is strong enough to erase the identity while preserving other objects and the background scene, and perform denoising using MeDM to obtain the anonymized video. Text conditioning can also be injected to enable a more targeted identity modification. Celebrity videos from <a href="https://celebv-hq.github.io" target="_blank">CelebV-HQ</a>.
</p>
Expand All @@ -135,7 +184,7 @@ function shuffle(array) {
'/videos/anonymization/bill-gates',
'/videos/anonymization/dicaprio-n-obama',
])"
/>
/> -->

</div>
</template>
Expand All @@ -161,7 +210,7 @@ function shuffle(array) {
}
.image {
margin: 1.5rem 0 1rem 0;
line-height: 0;
/* line-height: 0; */
}
@media (max-width: 650px) {
.caption {
Expand All @@ -172,5 +221,9 @@ function shuffle(array) {
p {
margin: .5rem 0 1rem 0;
}
.bold{
font-weight: bold;
}
</style>

Binary file added public/images/compare.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/error.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/gradient_norm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/iterative_refinement_process.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/one_stage_generation_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/teaser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/two_stage_generation_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3f19d54

Please sign in to comment.