Jekyll2023-05-13T20:07:47+00:00https://wrona.me//feed.xmlwrona.meA highly correlated sequence of characters about Python, Web and AI.Infinite zoom with Stable Diffusion2023-05-13T15:13:20+00:002023-05-13T15:13:20+00:00https://wrona.me//2023/05/13/infinite-zoom-with-stable-diffusion<p>Some of you might have already seen these great animations, where
someone infinitely zooms in or out an image and it seems like it has no end.
Until now, it has required a lot of artistic work to make this happen.
Fortunately, with the recent advancements in Generative AI,
especially in the computer vision domain, we can generate them on our own,
with just a few lines of code.</p>
<p><em>Disclaimer: in this post, I won’t explain the details of the Stable Diffusion itself.
There’s already a lot of great content in the Internet to start with.
Personally, I can recommend the fast.ai course<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> or the original paper<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</em></p>
<h2 id="inpainting-and-outpainting">Inpainting and outpainting</h2>
<p>To start with, let’s explain what inpainting and outpainting is,
as this is going to be the core components of our solution.</p>
<p>Given an image and a binary mask, we want to replace the content of the image
behind that mask with something else. Obiously, we want the final image
to still be consistent with the untouched regions. This is called inpainting.</p>
<p><img src="/assets/images/posts/infinite-zoom/dog.png" alt="dog sitting on a bench" />
<img src="/assets/images/posts/infinite-zoom/dog_mask.png" alt="mask of a dog" /></p>
<p><img src="/assets/images/posts/infinite-zoom/dog_inpainted.png" alt="various inpainted results" /></p>
<p>Outpainting is a special case of inpainting, when the mask surrounds e.g. an object
and we’re interested in generating a scene around it. This is going to be our case here.</p>
<p><img src="/assets/images/posts/infinite-zoom/dog_mask_inverted.png" alt="inverted mask of a dog" /></p>
<h2 id="stable-diffusion">Stable Diffusion</h2>
<p>As the title of this post suggests, we’re going to use Stable Diffusion as our <em>painter</em>.</p>
<p>Stable Diffusion is a family of generative models based on the latent diffusion mechanism.
These models allow us to generate a broad range of images, given just a textual description of what we want (prompt).</p>
<p>Under the hood, it’s a diffusion process happening in the latent space, so we use Variational Autoencoder (VAE)
to calculate and restore latents. For conditioning, there’s cross-attention inside the denoising UNet.</p>
<p><img src="/assets/images/posts/infinite-zoom/ldm.png" alt="latent diffusion model architecture" /></p>
<h2 id="inpainting-using-stable-diffusion">Inpainting using Stable Diffusion</h2>
<p>We’ll be using <code class="language-plaintext highlighter-rouge">stabilityai/stable-diffusion-2-inpainting</code> model here,
which conditions the generation process on three inputs to generate an inpainted image:</p>
<ul>
<li>an image,</li>
<li>an inpainting mask,</li>
<li>a prompt.</li>
</ul>
<p>Let’s start with our first image. For this, we can use either our own image,
or we can generate one using Stable Diffusion. We’ll do the second thing.</p>
<p>Usually, we would use a non-inpaiting pipeline for this, but to save memory,
we can simply trick the inpainting model to completely inpaint an empty image:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">blank_image</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="s">"RGB"</span><span class="p">,</span> <span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="s">"RGB"</span><span class="p">,</span> <span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">),</span> <span class="p">(</span><span class="mi">255</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">255</span><span class="p">))</span>
<span class="n">initial_image</span> <span class="o">=</span> <span class="n">pipe</span><span class="p">(</span>
<span class="n">prompt</span><span class="o">=</span><span class="n">prompt</span><span class="p">,</span>
<span class="n">negative_prompt</span><span class="o">=</span><span class="n">negative_prompt</span><span class="p">,</span>
<span class="n">image</span><span class="o">=</span><span class="n">blank_image</span><span class="p">,</span>
<span class="n">mask_image</span><span class="o">=</span><span class="n">mask</span><span class="p">,</span>
<span class="n">num_inference_steps</span><span class="o">=</span><span class="n">num_inference_steps</span><span class="p">,</span>
<span class="n">guidance_scale</span><span class="o">=</span><span class="n">cfg</span><span class="p">,</span>
<span class="n">generator</span><span class="o">=</span><span class="n">generator</span><span class="p">,</span>
<span class="p">).</span><span class="n">images</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>
<p>That’s what I got after a few trials:</p>
<p><img src="/assets/images/posts/infinite-zoom/initial.png" alt="initial image" /></p>
<p>This looks like a good start.</p>
<h3 id="actual-inpainting-outpainting">Actual inpainting (outpainting)</h3>
<p>You might have already figured out, how are we going to utilize the inpainting model
to create an infinite zoom animation.
If we can outpaint one image, and generate more story, we can repeat the process as many times as we want.
Then, it is just a matter of smart resizing and croping the images.</p>
<p>There are many ways we can outpaint an image. We’ll focus on two of them.</p>
<p>The first one is called the <em>direct</em> one.
In this approach, we directly outpaint the reference image from all sides at the same time.</p>
<p>To do that, we need to first pad the image by the <code class="language-plaintext highlighter-rouge">outpainting_size</code> (128 in my example),
then set the mask for the padded region and run the inpainting model.
It’s important to set the correct padding mode - <code class="language-plaintext highlighter-rouge">symmetric</code> or <code class="language-plaintext highlighter-rouge">reflect</code> will do,
as the region under the mask influences the inpainting process.
In my experiments, I found out that if we pad it with black pixels, we’ll likely
end up with some kind of a frame in the outpainted image.
However, this might be sometimes desirable.</p>
<p><img src="/assets/images/posts/infinite-zoom/outpaint_directly.png" alt="directly outpainted image" />
<em>(left: padded image, center: mask, right: outpainted image)</em></p>
<p>Here’s the code for this method:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">outpaint_direct</span><span class="p">(</span>
<span class="n">initial_image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span>
<span class="n">outpaint_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">generator</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">],</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">],</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">resize</span> <span class="o">=</span> <span class="n">transforms</span><span class="p">.</span><span class="n">Pad</span><span class="p">(</span><span class="n">outpaint_size</span><span class="p">,</span> <span class="n">padding_mode</span><span class="o">=</span><span class="s">"symmetric"</span><span class="p">)</span>
<span class="n">image_raw</span> <span class="o">=</span> <span class="n">resize</span><span class="p">(</span><span class="n">_to_tensor</span><span class="p">(</span><span class="n">initial_image</span><span class="p">))</span>
<span class="n">padded_image</span> <span class="o">=</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">image_raw</span><span class="p">)</span>
<span class="n">ixs</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">padded_image</span><span class="p">.</span><span class="n">width</span><span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">image_raw</span><span class="p">)</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="n">ixs</span> <span class="o"><</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="p">:,</span> <span class="n">ixs</span> <span class="o"><</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="n">ixs</span> <span class="o">></span> <span class="n">initial_image</span><span class="p">.</span><span class="n">width</span> <span class="o">+</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="p">:,</span> <span class="n">ixs</span> <span class="o">></span> <span class="n">initial_image</span><span class="p">.</span><span class="n">height</span> <span class="o">+</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">generator</span><span class="p">(</span><span class="n">padded_image</span><span class="p">,</span> <span class="n">mask</span><span class="p">)</span>
<span class="k">return</span> <span class="n">image</span>
</code></pre></div></div>
<p>The second method, <em>sequential</em>, seems to produce better results, but it requires 4 generations for a single image.
The idea is also very simple: we outpaint the image 4 times, one side of the image every time.</p>
<p>Starting from the left:</p>
<p><img src="/assets/images/posts/infinite-zoom/outpaint_left.png" alt="sequential outpainting - left" /></p>
<p>Top:</p>
<p><img src="/assets/images/posts/infinite-zoom/outpaint_top.png" alt="sequential outpainting - top" /></p>
<p>Right:</p>
<p><img src="/assets/images/posts/infinite-zoom/outpaint_right.png" alt="sequential outpainting - right" /></p>
<p>And to the bottom:</p>
<p><img src="/assets/images/posts/infinite-zoom/outpaint_bottom.png" alt="sequential outpainting - bottom" /></p>
<p>At each step we do the same operations as with the direct method: padding -> mask -> inpainting.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">outpaint_sequentially</span><span class="p">(</span>
<span class="n">initial_image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span>
<span class="n">outpaint_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">generator</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">],</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">],</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">_to_pillow</span> <span class="o">=</span> <span class="n">transforms</span><span class="p">.</span><span class="n">ToPILImage</span><span class="p">()</span>
<span class="n">_to_tensor</span> <span class="o">=</span> <span class="n">transforms</span><span class="p">.</span><span class="n">PILToTensor</span><span class="p">()</span>
<span class="n">resize_transforms</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">transforms</span><span class="p">.</span><span class="n">Pad</span><span class="p">([</span><span class="n">outpaint_size</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">padding_mode</span><span class="o">=</span><span class="s">"symmetric"</span><span class="p">),</span>
<span class="n">transforms</span><span class="p">.</span><span class="n">Pad</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="n">outpaint_size</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">padding_mode</span><span class="o">=</span><span class="s">"symmetric"</span><span class="p">),</span>
<span class="n">transforms</span><span class="p">.</span><span class="n">Pad</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">outpaint_size</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">padding_mode</span><span class="o">=</span><span class="s">"symmetric"</span><span class="p">),</span>
<span class="n">transforms</span><span class="p">.</span><span class="n">Pad</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">outpaint_size</span><span class="p">],</span> <span class="n">padding_mode</span><span class="o">=</span><span class="s">"symmetric"</span><span class="p">),</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">_left_mask</span><span class="p">(</span><span class="n">image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">width</span><span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">))</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="p">:,</span> <span class="n">xs</span> <span class="o"><</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="k">return</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_top_mask</span><span class="p">(</span><span class="n">image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">ys</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">height</span><span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">))</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="n">ys</span> <span class="o"><</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="k">return</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_right_mask</span><span class="p">(</span><span class="n">image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">width</span><span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">))</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="p">:,</span> <span class="n">xs</span> <span class="o">></span> <span class="n">initial_image</span><span class="p">.</span><span class="n">width</span> <span class="o">+</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="k">return</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_bottom_mask</span><span class="p">(</span><span class="n">image</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">)</span> <span class="o">-></span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">:</span>
<span class="n">ys</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="n">height</span><span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">))</span>
<span class="n">mask</span><span class="p">[:,</span> <span class="n">ys</span> <span class="o">></span> <span class="n">initial_image</span><span class="p">.</span><span class="n">height</span> <span class="o">+</span> <span class="n">outpaint_size</span><span class="p">]</span> <span class="o">=</span> <span class="mi">255</span>
<span class="k">return</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>
<span class="n">mask_generators</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">_left_mask</span><span class="p">,</span>
<span class="n">_top_mask</span><span class="p">,</span>
<span class="n">_right_mask</span><span class="p">,</span>
<span class="n">_bottom_mask</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">initial_image</span>
<span class="k">for</span> <span class="n">resize</span><span class="p">,</span> <span class="n">mask_gen</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">resize_transforms</span><span class="p">,</span> <span class="n">mask_generators</span><span class="p">):</span>
<span class="n">image_raw</span> <span class="o">=</span> <span class="n">_to_tensor</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">padded_image</span> <span class="o">=</span> <span class="n">_to_pillow</span><span class="p">(</span><span class="n">resize</span><span class="p">(</span><span class="n">image_raw</span><span class="p">))</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">mask_gen</span><span class="p">(</span><span class="n">padded_image</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">generator</span><span class="p">(</span><span class="n">padded_image</span><span class="p">,</span> <span class="n">mask</span><span class="p">)</span>
<span class="k">return</span> <span class="n">image</span>
</code></pre></div></div>
<h3 id="lets-repeat">Let’s repeat</h3>
<p>No matter which method we choose, we end up with a pair of images: the original one and the outpainted.
But for the animation we need more than two images.
Let’s repeat the process a few times, remembering to scale the image back to original resolution after each outpainting:</p>
<p><img src="/assets/images/posts/infinite-zoom/repeat.png" alt="generated frames" /></p>
<p>Looks good! We can turn it into an animation right away, but it won’t be smooth:</p>
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/initial_animation.webm" type="video/webm" />
</video>
<p>Can we do better?</p>
<h3 id="adding-interpolation">Adding interpolation</h3>
<p>Let’s head back to having two frames only:</p>
<p><img src="/assets/images/posts/infinite-zoom/two_frames.png" alt="two frames" />
<em>(left: initial image, right: outpainted image)</em></p>
<p>For the zoom-out effect, we want the previous image to become smaller and towards the middle of the image,
while the next image starts appearing at the edges of the image. The image size, however, must remain fixed.</p>
<p>I came up with the solution below.
It’s not perfect, as we can see some artifacts in the animation, but I think it’s sufficient for a hobby project.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">interp</span><span class="p">(</span>
<span class="n">src</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span> <span class="n">dst</span><span class="p">:</span> <span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">,</span> <span class="n">step</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">num_interps</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="n">Image</span><span class="p">.</span><span class="n">Image</span><span class="p">]:</span>
<span class="n">width</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="n">src</span><span class="p">.</span><span class="n">size</span>
<span class="c1"># During outpainting we increased the image size by `step` on every side,
</span> <span class="c1"># so now we need to know the margin after resizing to 512x512.
</span> <span class="n">inner_step</span> <span class="o">=</span> <span class="p">(</span><span class="n">step</span> <span class="o">/</span> <span class="p">(</span><span class="n">width</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">step</span><span class="p">))</span> <span class="o">*</span> <span class="n">width</span>
<span class="n">frames</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_interps</span><span class="p">):</span>
<span class="n">canvas</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="n">new</span><span class="p">(</span><span class="s">"RGB"</span><span class="p">,</span> <span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
<span class="n">padding_src</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="p">(</span><span class="n">inner_step</span> <span class="o">/</span> <span class="n">num_interps</span><span class="p">))</span>
<span class="n">padding_dst</span> <span class="o">=</span> <span class="n">step</span> <span class="o">-</span> <span class="nb">round</span><span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="p">(</span><span class="n">step</span> <span class="o">/</span> <span class="n">num_interps</span><span class="p">))</span>
<span class="n">src_s</span> <span class="o">=</span> <span class="n">width</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">padding_src</span>
<span class="n">dst_s</span> <span class="o">=</span> <span class="n">width</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">padding_dst</span>
<span class="n">resized_src</span> <span class="o">=</span> <span class="n">src</span><span class="p">.</span><span class="n">resize</span><span class="p">((</span><span class="n">src_s</span><span class="p">,</span> <span class="n">src_s</span><span class="p">))</span>
<span class="n">resized_dst</span> <span class="o">=</span> <span class="n">dst</span><span class="p">.</span><span class="n">resize</span><span class="p">((</span><span class="n">dst_s</span><span class="p">,</span> <span class="n">dst_s</span><span class="p">))</span>
<span class="n">canvas</span><span class="p">.</span><span class="n">paste</span><span class="p">(</span><span class="n">resized_dst</span><span class="p">,</span> <span class="p">(</span><span class="o">-</span><span class="n">padding_dst</span><span class="p">,</span> <span class="o">-</span><span class="n">padding_dst</span><span class="p">))</span>
<span class="n">canvas</span><span class="p">.</span><span class="n">paste</span><span class="p">(</span><span class="n">resized_src</span><span class="p">,</span> <span class="p">(</span><span class="n">padding_src</span><span class="p">,</span> <span class="n">padding_src</span><span class="p">))</span>
<span class="n">frames</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">canvas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">frames</span>
</code></pre></div></div>
<p>And here’s the animation with 16 steps of interpolation between frames:</p>
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/animation.webm" type="video/webm" />
</video>
<p>It’s not a perfect solution, as I see two artifacts:</p>
<ul>
<li>color shift, probably due to the interpolation while resizing,</li>
<li>sometimes there’s a frame visible in interpolated frames.</li>
</ul>
<p>Nevertheless, the effect is great.</p>
<p>Here are more examples:</p>
<div class="gallery">
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/example_1.mp4" type="video/mp4" />
</video>
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/example_2.webm" type="video/webm" />
</video>
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/example_3.mp4" type="video/mp4" />
</video>
<video autoplay="" muted="" loop="" class="w256">
<source src="/assets/images/posts/infinite-zoom/example_4.mp4" type="video/mp4" />
</video>
</div>
<p>You can find the complete code, along with a Gradio app, <a href="https://github.com/iamhatesz/fun-with-ml/tree/main/stable_diffusion/infinite_zoom">here</a>.
I strongly recommend running this using a GPU.
I usually use A40 instances on RunPod for this, paying less than $1/hr.
If you’d like to try it yourself, and you find this post useful, please consider using this <a href="https://runpod.io?ref=ptk1veb3">reflink</a>.</p>
<h3 id="references">References</h3>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://course.fast.ai/Lessons/part2.html">https://course.fast.ai/Lessons/part2.html</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://arxiv.org/abs/2112.10752">https://arxiv.org/abs/2112.10752</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Some of you might have already seen these great animations, where someone infinitely zooms in or out an image and it seems like it has no end. Until now, it has required a lot of artistic work to make this happen. Fortunately, with the recent advancements in Generative AI, especially in the computer vision domain, we can generate them on our own, with just a few lines of code.Testing unknown applications with Selenium & WebDriver BiDirectional API2022-11-04T11:15:00+00:002022-11-04T11:15:00+00:00https://wrona.me//2022/11/04/testing-unknown-applications<p>You can find the code and examples from the talk I gave at PyConPL 2022 <a href="https://github.com/iamhatesz/pyconpl-2022">here</a>.</p>
<p>Unfortunately, the video of my talk is still missing on the organizers’ YouTube channel.</p>
<p>You can also use the preview below, however you will miss videos and the animated content.</p>
<object data="/assets/pdf/pyconpl2022.pdf" type="application/pdf" width="100%" height="500px"></object>You can find the code and examples from the talk I gave at PyConPL 2022 here.Monoid: algebra w służbie programisty2016-12-04T00:00:00+00:002016-12-04T00:00:00+00:00https://wrona.me//2016/12/04/monoid-algebra-w-sluzbie-programisty<p>Wiele osób zaczynających przygodę z programowaniem funkcyjnym, prędzej czy później natrafia na takie pojęcie jak <em>monoid</em>. Totalnym żółtodziobom w dziedzinie programowania funkcyjnego, takim jak ja, zrozumienie teorii za nim stojącej oraz poznanie sensu jego praktycznego wykorzystania zajmuje jakiś czas.</p>
<p>Tym postem postaram się zaoszczędzić trochę tego czasu, przybliżając w sposób minimalny, ale wystarczający do zrozumienia tego zagadnienia, opis matematyczny monoidu, a także prezentując przykładowe jego implementacje w języku Scala.</p>
<h3 id="zacznijmy-od-teorii">Zacznijmy od teorii</h3>
<p>W telegraficznym skrócie, monoid<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> to półgrupa<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, która posiada element neutralny. Półgrupa to grupoid<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>, którego działanie jest łączne. Natomiast grupoid to zbiór ze zdefiniowanym działaniem. Wszystko jasne. Zacznijmy więc od początku (w tym przypadku - od końca).</p>
<p>Grupoid (ang. magma) to struktura algebraiczna, którą definiuje się jako zbiór \(\mathsf{M}\) z pewną operacją \(\otimes\) spełniającą następujący warunek:</p>
\[\forall {a,b} \in \mathsf{M}: {a} \otimes {b} \in \mathsf{M}\]
<p><small><em>Krótkie przypomnienie matematyki ze szkoły średniej: znak \(\forall\) oznacza duży kwantyfikator, który czytamy jako “dla każdego”. Oprócz niego istnieje także mały kwantyfikator \(\exists\). Czyta się go jako “istnieje”.</em></small></p>
<p>Półgrupa (ang. semigroup) to grupoid, którego operacja \(\otimes\) jest łączna, tzn.:</p>
\[\forall {a,b,c} \in \mathsf{M}: ({a} \otimes {b}) \otimes {c} = {a} \otimes ({b} \otimes {c})\]
<p>Czasem działanie to rozszerza się także o przemienność:</p>
\[\forall {a,b} \in \mathsf{M}: {a} \otimes {b} = {b} \otimes {a}\]
<p>Mówimy wtedy o półgrupie przemiennej (lub bardziej formalnie: półgrupie abelowej). Dzięki tym własnościom, kolejność w jakiej wykonujemy działanie \(\otimes\) jest nieistotna. We wszystkich możliwych kombinacjach wynik będzie ten sam. Jeżeli spotkałeś się z obliczeniami rozproszonymi lub algorytmami typu MapReduce, to powinieneś zwrócić szczególną uwagę na przydatność tej własności. Dzięki niej, możemy w bardzo prosty sposób zrównoleglizować nasze obliczenia!</p>
<p>Posiadając półgrupę możemy w końcu zdefiniować nasz monoid. Jest to półgrupa, posiadająca element neutralny (zerowy) \(\mathsf{e}\), taki że:</p>
\[\exists {e} \in \mathsf{M} \ \ \forall {a} \in \mathsf{M}: {a} \otimes {e} = {e} \otimes {a} = {a}\]
<p>Możemy więc zapisać monoid jako trójkę \(\{ \mathsf{M} , \otimes , {e} \}\), gdzie:</p>
<p>\(\mathsf{M}\): zbiór elementów,</p>
<p>\(\otimes\): operacja łączna (i przemienna) zdefiniowana na zbiorze \(\mathsf{M}\),</p>
<p>\({e}\): element neutralny.</p>
<h3 id="przykładowe-monoidy">Przykładowe monoidy</h3>
<p>Najprostszym monoidem jest zapewne monoid, którego zbiorem jest zbiór liczb naturalnych, a operację na nim zdefiniowaną stanowi operacja dodawania. W takim przypadku elementem zerowym tego monoidu jest liczba zero.</p>
<dl>
<dt>łączność</dt>
<dd>
\[(2 + 3) + 7 = 2 + (3 + 7) = 12\]
</dd>
<dt>element neuralny</dt>
<dd>
\[5 + 0 = 0 + 5 = 5\]
</dd>
</dl>
<p>W życiu każdego programisty istnieje kolejny, bardzo często używany monoid: string.</p>
<dl>
<dt>łączność</dt>
<dd>
\[( {ala} + {ma} ) + {kota} = {ala} + ( {ma} + {kota} ) = {ala ma kota}\]
</dd>
<dt>element neutralny</dt>
<dd>
\[{ala} + {""} = {""} + {ala} = {ala}\]
</dd>
</dl>
<p><small><em>Zauważ, że w przypadku konkatenacji stringów kolejność ma znaczenie. Operacja na nich zdefiniowana jest więc łączna, ale nie przemienna.</em></small></p>
<p>Spróbujmy teraz znaleźć takie działanie, które w sposób naturalny nie jest monoidem. Żeby nie szukać długo rozważmy najpopularniejszą statystykę, czyli średnią ze zbioru liczb. Bardzo łatwo wykazać, że trójka składająca się ze zbioru liczb naturalnych, operacji średniej arytmetycznej \(\odot\) i zera nie jest monoidem:</p>
\[({2} \odot {4}) \odot {6} = 4.5 \not= {2} \odot ({4} \odot {6}) = 3.5\]
<p>Istnieje jednak sposób liczenia średniej za pomocą monoidów. Zanim do tego przejdziemy, stwórzmy interfejs monoida w Scali.</p>
<h3 id="monoid-w-scali">Monoid w Scali</h3>
<p>Stwórzmy trait w Scali reprezentujący monoid. Jak zdefiniowaliśmy wcześniej, każdy monoid składa się ze zbioru elementów, operacji na nim zdefiniowanej i elementu zerowego. Nasz trait będzie więc generyczny typu <code class="language-plaintext highlighter-rouge">T</code> i musi definiować dwie metody: <code class="language-plaintext highlighter-rouge">zero</code> oraz <code class="language-plaintext highlighter-rouge">op</code>. Dla bardziej przejrzystego API dodałem także obiekt towarzyszący.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">zero</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="k">def</span> <span class="nf">op</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">b</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">Monoid</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">op</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="n">a</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">b</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="nv">a</span><span class="o">.</span><span class="py">op</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Spróbujmy teraz napisać klasę rozszerzającą interfejs <code class="language-plaintext highlighter-rouge">Monoid</code> i opakować w nią operację liczenia średniej arytmetycznej w taki sposób, żeby operacja ta była łączna (a nawet przemienna!).</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">case</span> <span class="k">class</span> <span class="nc">Average</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="n">count</span><span class="k">:</span> <span class="kt">Long</span><span class="o">,</span> <span class="n">sum</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span>
<span class="o">(</span><span class="k">implicit</span> <span class="n">numeric</span><span class="k">:</span> <span class="kt">Numeric</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span> <span class="k">extends</span> <span class="nc">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">zero</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">0</span><span class="n">l</span><span class="o">,</span> <span class="nv">numeric</span><span class="o">.</span><span class="py">zero</span><span class="o">)</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">op</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">b</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
<span class="nf">case</span> <span class="o">(</span><span class="nc">Average</span><span class="o">(</span><span class="n">countA</span><span class="o">,</span> <span class="n">sumA</span><span class="o">),</span> <span class="nc">Average</span><span class="o">(</span><span class="n">countB</span><span class="o">,</span> <span class="n">sumB</span><span class="o">))</span> <span class="k">=></span>
<span class="nc">Average</span><span class="o">(</span><span class="n">countA</span> <span class="o">+</span> <span class="n">countB</span><span class="o">,</span> <span class="nv">numeric</span><span class="o">.</span><span class="py">plus</span><span class="o">(</span><span class="n">sumA</span><span class="o">,</span> <span class="n">sumB</span><span class="o">))</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="k">throw</span> <span class="k">new</span> <span class="nc">IllegalArgumentException</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">Average</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="n">value</span><span class="k">:</span> <span class="kt">T</span><span class="o">)(</span><span class="k">implicit</span> <span class="n">numeric</span><span class="k">:</span> <span class="kt">Numeric</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Average</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">1</span><span class="n">l</span><span class="o">,</span> <span class="n">value</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>
<p>So far, so good. Operacje wyliczania średniej sprowadziliśmy do zliczania ilości elementów oraz sumowania wartości. Zarówno inkrementacja jak i dodawanie są monoidami, więc opierając działanie naszej struktury o inne monoidy też otrzymamy monoid!</p>
<p>Sprawdźmy teraz naszą klasę pod kątem łączności jej operacji:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="nv">Monoid</span><span class="o">.</span><span class="py">op</span><span class="o">(</span><span class="nv">Monoid</span><span class="o">.</span><span class="py">op</span><span class="o">(</span><span class="nc">Average</span><span class="o">(</span><span class="mi">2</span><span class="o">),</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">4</span><span class="o">)),</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">6</span><span class="o">))</span>
<span class="n">res0</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span><span class="mi">12</span><span class="o">)</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">Monoid</span><span class="o">.</span><span class="py">op</span><span class="o">(</span><span class="nc">Average</span><span class="o">(</span><span class="mi">2</span><span class="o">),</span> <span class="nv">Monoid</span><span class="o">.</span><span class="py">op</span><span class="o">(</span><span class="nc">Average</span><span class="o">(</span><span class="mi">4</span><span class="o">),</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">6</span><span class="o">)))</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Average</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span><span class="mi">12</span><span class="o">)</span>
</code></pre></div></div>
<p>Jak widać, wynik jest taki jaki oczekaliśmy. Koleność wykonywania działania nie ma znaczenia, ponieważ wynik jest ten sam w każdym przypadku.</p>
<h3 id="trudniejszy-przykład">Trudniejszy przykład</h3>
<p>Podnieśmy poprzeczkę i zaimplementujmy monoid dla filtru Blooma. Filtr Blooma<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> jest to struktura stworzona przez Burtona H. Blooma w 1970 r, która pozwala stworzyć skończoną reprezentację dowolnego zbioru danych.</p>
<p>Struktura ta składa się z \({m}\)-bitowej tablicy, oraz \(k\) funkcji haszujących o zbiorze wartości \([0;{m})\) każda. Dodanie elementu polega na obliczeniu \({k}\) wartości funkcji haszujących tego elementu i na ich podstawie ustawieniu flag w tablicy bitowej.</p>
<p><img src="/assets/images/posts/monoid/filtr_blooma.png" alt="Filtr Blooma" title="Źródło: https://pl.wikipedia.org/wiki/Filtr_Blooma" class="center" /></p>
<p>Sprawdzenie elementu również sprowadza się do policzenia tych samych haszy co przy wstawianiu elementu. Następnie sprawdzamy wartości wyznaczonych pól w tablicy. Jeżeli przynajmniej jedno pole ma wartość równą zero, to możemy być pewni, że element ten nie należy do zbioru. Inaczej sytuacja ma się wtedy, kiedy wszystkie wyznaczone pola są jedynkami. Mówimy wtedy, że element ten należy do zbioru z pewnym prawdopodobieństwem. Prawdopodobieństwo pomyłki (ang. <em>false positive</em>) zależy od rozmiaru tablicy bitowej oraz ilości (i jakości) funkcji mieszających. Wynosi ono:</p>
\[{error} = (1 - {e}^{-kn/m})^{k}\]
<p>gdzie \({n}\) to ilość elementów, które wstawiliśmy do zbioru. W przejrzysty sposób działanie algorytmu przedstawił na <a href="https://www.jasondavies.com/bloomfilter/">swojej stronie</a> Jason Davies.</p>
<p>Zanim zaimplementujemy nasz filtr Blooma, stwórzmy klasę funkcji haszujących. W tym celu posłużyłem się dość powszechnym algorytmem FNV<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> (od pierwszych liter nazwisk autorów: Fowler-Noll-Vo), a dokładnie wariantem FNV-1a.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">HashFunction</span><span class="o">(</span><span class="k">private</span> <span class="k">val</span> <span class="nv">base</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span> <span class="o">{</span>
<span class="k">private</span> <span class="k">val</span> <span class="nv">OffsetBasis</span> <span class="k">=</span> <span class="nc">BigInt</span><span class="o">(</span><span class="s">"14695981039346656037"</span><span class="o">)</span>
<span class="k">private</span> <span class="k">val</span> <span class="nv">Prime</span> <span class="k">=</span> <span class="nc">BigInt</span><span class="o">(</span><span class="s">"1099511628211"</span><span class="o">)</span>
<span class="k">def</span> <span class="nf">apply</span><span class="o">(</span><span class="n">data</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Byte</span><span class="o">])</span><span class="k">:</span> <span class="kt">BigInt</span> <span class="o">=</span> <span class="nv">data</span><span class="o">.</span><span class="py">foldLeft</span><span class="o">(</span><span class="n">base</span> <span class="o">*</span> <span class="nc">OffsetBasis</span><span class="o">)</span> <span class="o">{</span>
<span class="nf">case</span> <span class="o">(</span><span class="n">hash</span><span class="o">,</span> <span class="n">byte</span><span class="o">)</span> <span class="k">=></span> <span class="o">(</span><span class="n">hash</span> <span class="o">^</span> <span class="o">(</span><span class="n">byte</span> <span class="o">&</span> <span class="mh">0xff</span><span class="o">))</span> <span class="o">*</span> <span class="nc">Prime</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Sam algorytm jest dość prosty, bazuje głównie na mnożeniu przez liczbę pierwszą oraz operacji alternatywy wykluczającej (XOR). Operuje on jednak na pojedynczych bajtach. By rozwiązać ten problem, stwórzmy <em>trait</em> Hashable, który zapewni nam konwersję typu <em>Int</em> oraz <em>String</em> na tablicę bajtów w sposób domniemany.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">Hashable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">toByteArray</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Byte</span><span class="o">]</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">Hashable</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">object</span> <span class="nc">IntHashable</span> <span class="k">extends</span> <span class="nc">Hashable</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">toByteArray</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Byte</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">buffer</span> <span class="k">=</span> <span class="nv">ByteBuffer</span><span class="o">.</span><span class="py">allocate</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span>
<span class="nv">buffer</span><span class="o">.</span><span class="py">putInt</span><span class="o">(</span><span class="n">elem</span><span class="o">)</span>
<span class="nv">buffer</span><span class="o">.</span><span class="py">array</span><span class="o">()</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">implicit</span> <span class="k">object</span> <span class="nc">StringHashable</span> <span class="k">extends</span> <span class="nc">Hashable</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">toByteArray</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Byte</span><span class="o">]</span> <span class="k">=</span> <span class="nv">elem</span><span class="o">.</span><span class="py">toCharArray</span><span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">toByte</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Mając już wszystkie wymagane elementy możemy przejść do definicji filtru Blooma. Jako argumenty przyjmuje on, oprócz zdefiniowanych przed chwilą obiektów <em>Hashable</em>, parametry \(m\) oraz \(k\).</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">BloomFilter</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="k">val</span> <span class="nv">m</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="k">val</span> <span class="nv">k</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span>
<span class="o">(</span><span class="k">implicit</span> <span class="k">val</span> <span class="nv">hashable</span><span class="k">:</span> <span class="kt">Hashable</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span> <span class="o">{</span>
<span class="k">private</span> <span class="k">val</span> <span class="nv">table</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">]</span> <span class="k">=</span> <span class="nv">Array</span><span class="o">.</span><span class="py">fill</span><span class="o">(</span><span class="n">m</span><span class="o">){</span><span class="kc">false</span><span class="o">}</span>
<span class="k">private</span> <span class="k">val</span> <span class="nv">hashFns</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">HashFunction</span><span class="o">]</span> <span class="k">=</span> <span class="o">(</span><span class="mi">0</span> <span class="n">until</span> <span class="n">k</span><span class="o">).</span><span class="py">map</span><span class="o">(</span><span class="k">new</span> <span class="nc">HashFunction</span><span class="o">(</span><span class="k">_</span><span class="o">))</span>
<span class="k">def</span> <span class="nf">add</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="nf">keys</span><span class="o">(</span><span class="n">elem</span><span class="o">).</span><span class="py">foreach</span><span class="o">(</span><span class="n">key</span> <span class="k">=></span> <span class="nv">table</span><span class="o">.</span><span class="py">update</span><span class="o">(</span><span class="n">key</span><span class="o">,</span> <span class="kc">true</span><span class="o">))</span>
<span class="k">def</span> <span class="nf">contains</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="o">!</span><span class="nf">keys</span><span class="o">(</span><span class="n">elem</span><span class="o">).</span><span class="py">exists</span><span class="o">(</span><span class="n">key</span> <span class="k">=></span> <span class="o">!</span><span class="nf">table</span><span class="o">(</span><span class="n">key</span><span class="o">))</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">keys</span><span class="o">(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">hashFns</span>
<span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nf">_</span><span class="o">(</span><span class="nv">hashable</span><span class="o">.</span><span class="py">toByteArray</span><span class="o">(</span><span class="n">elem</span><span class="o">)))</span>
<span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="n">hash</span> <span class="k">=></span> <span class="o">(</span><span class="n">hash</span> <span class="o">%</span> <span class="n">m</span><span class="o">).</span><span class="py">toInt</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Inicjalizacja obiektu polega na utworzeniu \(m\)-elementowej tablicy boolowskiej, której każdy element jest domyślnie ustawiony jako <em>false</em>. Dodatkowo tworzone jest \(k\) funkcji mieszających. Metody <em>add</em> oraz <em>contains</em> służą odpowiednio do wstawienia elementu do tablicy oraz sprawdzenia, czy element ten już się w niej znajduje.</p>
<p>Sprawdźmy, jak wygląda korzystanie z naszej struktury:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="nv">bf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">BloomFilter</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="mi">100</span><span class="o">,</span> <span class="mi">4</span><span class="o">)</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">bf</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"word"</span><span class="o">)</span>
<span class="n">res0</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">false</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">bf</span><span class="o">.</span><span class="py">add</span><span class="o">(</span><span class="s">"word1"</span><span class="o">)</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">bf</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"word"</span><span class="o">)</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">false</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">bf</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"word1"</span><span class="o">)</span>
<span class="n">res2</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">true</span>
</code></pre></div></div>
<p>Dla przypomnienia: jeżeli metoda <em>contains</em> zwraca wartość <em>false</em>, to możemy być pewni, że dany element nie należy do zbioru reprezentowanego przez ten filtr Blooma. Jeżeli jednak zwrócona wartość to <em>true</em>, nie mamy wtedy pewności. Możemy powiedzieć, że element ten należy do zbioru, ale teza ta obarczona będzie błędem, którego wartość oszacowaliśmy wyżej.</p>
<p>Ostatnim etapem naszej przygody będzie stworzenie monoidu, który posłuży jako <em>opakowanie</em> dla naszego filtru. Jak już wiemy, każdy monoid składa się z działania oraz elementu zerowego. Możemy zdefiniować je jako:</p>
<dl>
<dt>działanie</dt>
<dd>
\[\forall {a,b} \in \mathsf{F}, \\
{a} = \{ {a}_{0}, \dots, {a}_{m}, {h_1}, \dots, {h_k} \}, \\
{b} = \{ {b}_{0}, \dots, {b}_{m}, {h_1}, \dots, {h_k} \}: \\
{a} \otimes {b} = \{ {a}_{0} \vee {b}_{0}, \dots, {a}_{m} \vee {b}_{m}, {h_1}, \dots, {h_k} \}\]
</dd>
<dt>element zerowy</dt>
<dd>
\[{e} = \{ {false}, \dots, {false}, {h_1}, \dots, {h_k} \}\]
</dd>
</dl>
<p>Nasz filtr przedstawmy jako <em>case klasę</em> składającą się z tablicy bitowej i funkcji haszujących. Te dwa elementy jednoznacznie charakteryzują naszą strukturę.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">case</span> <span class="k">class</span> <span class="nc">BloomFilter</span><span class="o">[</span><span class="kt">T</span> <span class="kt">:</span> <span class="kt">Hashable</span><span class="o">](</span><span class="n">table</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">],</span> <span class="n">hashFns</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">HashFunction</span><span class="o">])</span>
<span class="k">extends</span> <span class="nc">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">zero</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="nv">BloomFilter</span><span class="o">.</span><span class="py">zero</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="nv">table</span><span class="o">.</span><span class="py">length</span><span class="o">,</span> <span class="nv">hashFns</span><span class="o">.</span><span class="py">size</span><span class="o">)</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">op</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">b</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Monoid</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
<span class="nf">case</span> <span class="o">(</span><span class="nc">BloomFilter</span><span class="o">(</span><span class="n">tableA</span><span class="o">,</span> <span class="n">hashFnsA</span><span class="o">),</span> <span class="nc">BloomFilter</span><span class="o">(</span><span class="n">tableB</span><span class="o">,</span> <span class="n">hashFnsB</span><span class="o">))</span>
<span class="k">if</span> <span class="nv">tableA</span><span class="o">.</span><span class="py">length</span> <span class="o">==</span> <span class="nv">tableB</span><span class="o">.</span><span class="py">length</span> <span class="o">&&</span> <span class="nv">hashFnsA</span><span class="o">.</span><span class="py">size</span> <span class="o">==</span> <span class="nv">hashFnsB</span><span class="o">.</span><span class="py">size</span> <span class="k">=></span>
<span class="nc">BloomFilter</span><span class="o">(</span><span class="nf">mergeTables</span><span class="o">(</span><span class="n">tableA</span><span class="o">,</span> <span class="n">tableB</span><span class="o">),</span> <span class="n">hashFnsA</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="k">throw</span> <span class="k">new</span> <span class="nc">IllegalArgumentException</span>
<span class="o">}</span>
<span class="c1">// Suma logiczna elementów dwóch tablic m-bitowych</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">mergeTables</span><span class="o">(</span><span class="n">tA</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">],</span> <span class="n">tB</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">])</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">]</span> <span class="k">=</span>
<span class="nv">tA</span><span class="o">.</span><span class="py">zip</span><span class="o">(</span><span class="n">tB</span><span class="o">).</span><span class="py">map</span> <span class="o">{</span>
<span class="nf">case</span> <span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">))</span> <span class="k">=></span> <span class="n">a</span> <span class="o">||</span> <span class="n">b</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Cała magia kryje się w metodzie <code class="language-plaintext highlighter-rouge">op</code>, która wykonuje sumę logiczną tablic dwóch przekazanych jej filtrów i zwraca ją jako nową instancję filtru Blooma.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">object</span> <span class="nc">BloomFilter</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="o">[</span><span class="kt">T</span> <span class="kt">:</span> <span class="kt">Hashable</span><span class="o">](</span><span class="n">m</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">k</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)(</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">BloomFilter</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">table</span> <span class="k">=</span> <span class="nf">getTable</span><span class="o">(</span><span class="n">m</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">hashFns</span> <span class="k">=</span> <span class="nf">getHashFns</span><span class="o">(</span><span class="n">k</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">keys</span> <span class="k">=</span> <span class="nf">getKeys</span><span class="o">(</span><span class="n">elem</span><span class="o">,</span> <span class="n">hashFns</span><span class="o">,</span> <span class="n">m</span><span class="o">)</span>
<span class="nv">keys</span><span class="o">.</span><span class="py">foreach</span><span class="o">(</span><span class="nv">table</span><span class="o">.</span><span class="py">update</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="kc">true</span><span class="o">))</span>
<span class="nc">BloomFilter</span><span class="o">(</span><span class="n">table</span><span class="o">,</span> <span class="n">hashFns</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">def</span> <span class="nf">zero</span><span class="o">[</span><span class="kt">T</span> <span class="kt">:</span> <span class="kt">Hashable</span><span class="o">](</span><span class="n">m</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">k</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">BloomFilter</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">table</span> <span class="k">=</span> <span class="nf">getTable</span><span class="o">(</span><span class="n">m</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">hashFns</span> <span class="k">=</span> <span class="nf">getHashFns</span><span class="o">(</span><span class="n">k</span><span class="o">)</span>
<span class="nc">BloomFilter</span><span class="o">(</span><span class="n">table</span><span class="o">,</span> <span class="n">hashFns</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">def</span> <span class="nf">contains</span><span class="o">[</span><span class="kt">T</span> <span class="kt">:</span> <span class="kt">Hashable</span><span class="o">](</span><span class="n">filter</span><span class="k">:</span> <span class="kt">BloomFilter</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">keys</span> <span class="k">=</span> <span class="nf">getKeys</span><span class="o">(</span><span class="n">elem</span><span class="o">,</span> <span class="nv">filter</span><span class="o">.</span><span class="py">hashFns</span><span class="o">,</span> <span class="nv">filter</span><span class="o">.</span><span class="py">table</span><span class="o">.</span><span class="py">length</span><span class="o">)</span>
<span class="o">!</span><span class="nv">keys</span><span class="o">.</span><span class="py">exists</span><span class="o">(</span><span class="n">key</span> <span class="k">=></span> <span class="o">!</span><span class="nv">filter</span><span class="o">.</span><span class="py">table</span><span class="o">(</span><span class="n">key</span><span class="o">))</span>
<span class="o">}</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">getTable</span><span class="o">(</span><span class="n">m</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Boolean</span><span class="o">]</span> <span class="k">=</span> <span class="nv">Array</span><span class="o">.</span><span class="py">fill</span><span class="o">(</span><span class="n">m</span><span class="o">){</span><span class="kc">false</span><span class="o">}</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">getHashFns</span><span class="o">(</span><span class="n">k</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">HashFunction</span><span class="o">]</span> <span class="k">=</span> <span class="o">(</span><span class="mi">0</span> <span class="n">until</span> <span class="n">k</span><span class="o">).</span><span class="py">map</span><span class="o">(</span><span class="k">new</span> <span class="nc">HashFunction</span><span class="o">(</span><span class="k">_</span><span class="o">))</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">getKeys</span><span class="o">[</span><span class="kt">T</span> <span class="kt">:</span> <span class="kt">Hashable</span><span class="o">](</span><span class="n">elem</span><span class="k">:</span> <span class="kt">T</span><span class="o">,</span> <span class="n">hashFns</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">HashFunction</span><span class="o">],</span> <span class="n">m</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span>
<span class="n">hashFns</span>
<span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nf">_</span><span class="o">(</span><span class="n">implicitly</span><span class="o">[</span><span class="kt">Hashable</span><span class="o">[</span><span class="kt">T</span><span class="o">]].</span><span class="py">toByteArray</span><span class="o">(</span><span class="n">elem</span><span class="o">)))</span>
<span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="n">hash</span> <span class="k">=></span> <span class="o">(</span><span class="n">hash</span> <span class="o">%</span> <span class="n">m</span><span class="o">).</span><span class="py">toInt</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Kod obiektu towarzyszącego też nie należy do zbyt skomplikowanych. Metody <code class="language-plaintext highlighter-rouge">apply</code> i <code class="language-plaintext highlighter-rouge">zero</code> służą do tworzenia odpowiednio instancji filtru dla zadanego elementu, oraz pustej struktury. Poza tym funkcja <code class="language-plaintext highlighter-rouge">contains</code> pozwala sprawdzić, czy filtr zawiera dany element czy nie.</p>
<p>Sprawdźmy jak możemy wykorzystać nasz kod w praktyce. Najpierw stwórzmy fabrykę do mapowania ciągów znaków na filtr Blooma.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">bfBuilder</span> <span class="k">=</span> <span class="nc">BloomFilter</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="mi">100</span><span class="o">,</span> <span class="mi">4</span><span class="o">)(</span><span class="k">_</span><span class="o">)</span>
</code></pre></div></div>
<p>Następnie zdefiniujmy nasz zbiór danych oraz poddajmy go działaniu naszej fabryki.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">values</span> <span class="k">=</span> <span class="nc">Seq</span><span class="o">(</span><span class="s">"Ala"</span><span class="o">,</span> <span class="s">"ma"</span><span class="o">,</span> <span class="s">"kota"</span><span class="o">,</span> <span class="s">"a"</span><span class="o">,</span> <span class="s">"Tomek"</span><span class="o">,</span> <span class="s">"nie ma"</span><span class="o">)</span>
<span class="c1">// values: Seq[String] = List(Ala, ma, kota, a, Tomek, nie ma)</span>
<span class="k">val</span> <span class="nv">filters</span> <span class="k">=</span> <span class="nv">values</span><span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="n">bfBuilder</span><span class="o">)</span>
<span class="c1">// filters: Seq[BloomFilter[String]] = ...</span>
</code></pre></div></div>
<p>Przed nami najpiękniejsza część. Złączenie wyników w jedną całość sprowadza się do zwinięcia <code class="language-plaintext highlighter-rouge">reduceLeft</code> naszej sekwencji za pomocą zdefiniowanego działania <code class="language-plaintext highlighter-rouge">op</code> (możemy też skorzystać z <code class="language-plaintext highlighter-rouge">foldLeft</code> podając dodatkowo element neutralny <code class="language-plaintext highlighter-rouge">zero</code>).</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">summary</span> <span class="k">=</span> <span class="nv">filters</span><span class="o">.</span><span class="py">reduceLeft</span><span class="o">(</span><span class="nv">Monoid</span><span class="o">.</span><span class="py">op</span><span class="o">).</span><span class="py">asInstanceOf</span><span class="o">[</span><span class="kt">BloomFilter</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span>
</code></pre></div></div>
<p>Dzięki sprowadzeniu formy naszej obliczeń do postaci monoidu, zyskujemy w naturalny sposób możliwość paralelizacji naszych obliczeń. Oznacza to, że jeśli nasz zbiór danych podzielimy i rozdystrybujemy pomiędzy kilka węzłów, przeprowadzimy na nich obliczenia, a następnie połączymy ze sobą otrzymane wyniki - za pomocą tej samej operacji, która wykonaliśmy na każdym węźle - otrzymamy ten sam rezultat, gdybyśmy cały zbiór danych przetworzyli w jednym wątku. Sprawiliśmy więc, że nasza aplikacja stała się <strong>skalowalna</strong>!</p>
<p>Na sam koniec upewnijmy się, że nasza struktura działa poprawnie.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="nv">BloomFilter</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="n">summary</span><span class="o">,</span> <span class="s">"Ala"</span><span class="o">)</span>
<span class="n">res0</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">true</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">BloomFilter</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="n">summary</span><span class="o">,</span> <span class="s">"Tomek"</span><span class="o">)</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">true</span>
<span class="n">scala</span><span class="o">></span> <span class="nv">BloomFilter</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="n">summary</span><span class="o">,</span> <span class="s">"psa"</span><span class="o">)</span>
<span class="n">res2</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">false</span>
</code></pre></div></div>
<h3 id="podsumowanie">Podsumowanie</h3>
<p>Tym postem starałem się przybliżyć, w sposób przystępny dla każdego, pojęcie monoidów i ich praktycznego zastosowania w tworzeniu skalowalnych aplikacji. Jeżeli dotarłeś do tego miejsca i czujesz się przytłoczony myślą o ilości kodu, jaki musisz napisać, aby przenieść swoją aplikację w świat struktur algebraicznych, to nie musisz. Tematyka ta bowiem nie jest żadną nowością i została już dawno zgłębiona przez programistów.</p>
<ul>
<li><a href="https://github.com/twitter/algebird">Algebird</a> - biblioteka stworzona przez Twittera; oprócz struktur algebraicznych zawiera także implementacje wielu algorytmów aproksymacyjnych (w tym filtru Blooma)</li>
<li><a href="https://github.com/scalaz/scalaz">Scalaz</a>, <a href="https://github.com/typelevel/cats">Cats</a> - biblioteki uzupełniające braki w standardowej bibliotece Scali o struktury charakterystyczne dla programowania funkcyjnego (w tym właśnie monoidy)</li>
</ul>
<h3 id="odwołania">Odwołania</h3>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://pl.wikipedia.org/wiki/Monoid">https://pl.wikipedia.org/wiki/Monoid</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://pl.wikipedia.org/wiki/P%C3%B3%C5%82grupa">https://pl.wikipedia.org/wiki/P%C3%B3%C5%82grupa</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://pl.wikipedia.org/wiki/Grupoid">https://pl.wikipedia.org/wiki/Grupoid</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://pl.wikipedia.org/wiki/Filtr_Blooma">https://pl.wikipedia.org/wiki/Filtr_Blooma</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="http://isthe.com/chongo/tech/comp/fnv/#FNV-1a">http://isthe.com/chongo/tech/comp/fnv/#FNV-1a</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Wiele osób zaczynających przygodę z programowaniem funkcyjnym, prędzej czy później natrafia na takie pojęcie jak monoid. Totalnym żółtodziobom w dziedzinie programowania funkcyjnego, takim jak ja, zrozumienie teorii za nim stojącej oraz poznanie sensu jego praktycznego wykorzystania zajmuje jakiś czas.