Differentiable Dithering
Colab source code will be learned here.
Negate of affairs
As an instance we should slice support the form of colors in an image. As an instance protect in thoughts the image of fruit below:
If we escape a depend of what number of colors are within the above image we fetch a whopping 157376 (for a 900 x 450 pixel image). Are all these colors with out a doubt valuable?
The image above has 16 colors and the one below best has 8.
The plan back of coloration palette reduction has been studied widely and the humble map works roughly as follows:

Blueprint a diminished coloration palette of dimension N by dividing the coloration plan up into N clear areas the build apart every field is represented by one coloration. Here’s on the total accomplished by thought to be one of serveral standard approaches.

Dither the image. The diagram of dithering eliminates coloration banding and creates the appearance of more colors thru a stippling love live. Whereas you are no longer familar with dithering we can explore it in extra detail later. Given a build coloration palette there are with out a doubt expert algorithms for dithering similar to Floyd Stienberg.
Rather than the humble map, we’re going to unravel every of these complications on the identical time the utilization of gradient descent.
First of all let’s elaborate a palette of N colors. For this article the colors will be 3 side vectors in rgb plan. A like a flash warning to graphics nerds, for portability and simplicity we live no longer rob gamma correction into legend.
Now how live we attach a discrete plan of colors to pixels in a differentiable map? I determined to whole this the utilization of chance distributions. Every pixel is represented by a vector containing the potentialities of every palette coloration being chosen for that pixel. When with out a doubt producing an image we just correct sample every pixel’s coloration from or no longer it’s distribution.
The above diagram is swish standard. Importantly every the colors within the palette and the mapping of image pixels to palette colors are variables which we are in a position to optimize over simultaneously. Now all we now must whole is connect any thought to be one of assorted loss functions.
The loss feature I selected to make exhaust of on the initiating modified into just correct the squared inequity between the everyday image and the expected trace of the output image:
(loss(output, target)=sum_{iin pixels}(target_i) () (E[output_i])^2) (equation 1)
So what’s the muse here? Basically the expected output permits our image to faux it has more colors than it with out a doubt does.
As an instance let’s faux our palette has best two colors, dusky and white. Furthermore narrate our target image is a 50% grey sq.. Exhaust into legend the following three potential representations of the image. One is an all dusky image, one is all white and the third has 50% dusky and 50% white pixels randomly dispensed across the image. If we gape from a long wayoff the third image will gape better. Here’s since the dusky and white pixels will blur collectively and seem grey. This live is named dithering.
By the above reasoning we should draw obvious that the dithered pixel project (every pixel has a 50% chance of being dusky or white) may presumably mute seem more correct than the opposite two candidates to our loss feature. Taking the squared error between the target image and the expected coloration of every pixel does exactly this.
You are going to be asking, why no longer rob the expected trace of the whole squared error? This would gape love:
(loss(output, target) = E[sum_{iin pixels}(target_i) () (output_i)^2])
This with out a doubt does no longer work. To catch why, let’s gape on the identical setup as above and protect in thoughts the expectation of a person pixel (for math sticklers we are in a position to live this because expectation is linear). The loss feature for a pixel denoted by the random variable (X) that progressively chooses dusky is:
(E[(0.5 – X)^2] = (0.5 – 0)^2 = 0.25)
And the loss feature for a pixel denoted by the random (X) which is 50% dusky and 50% white is:
(E[(0.5X)^2] = (0.50)^2 0.5) (+) ((0.51)^2 0.5 = 0.25)
Sadly the values here are the identical in every cases which rules out this loss feature.
Let’s strive the utilization of the loss from equation 1 with a palette of two colors:
Hello! No longer too execrable! As we are in a position to perceive, quite loads of shades are captured by quite loads of densities of darker pixels. For a starker instance let’s strive this image of a vertical dusky and white gradient:
Now let’s strive 16 colors:
The above image highlights one weak point of our recent loss feature. Or no longer it’s extraordinarily noisy, even when it doesn’t have to be.
To offer an vulgar instance, protect in thoughts an image with three colors crimson, blue and red (a aggregate of 50% crimson and 50% blue). As an instance we now have room for 3 colors in our palette. In the eyes of equation 1 every of the following solutions would have the identical loss:

The crimson pixels are crimson, the blue pixels are blue and the red pixels are red. We’re the utilization of all three colors in our palette to the simpler of our skill and the image is reproduced completely.

Every crimson pixel is crimson, every blue pixel is blue, every red pixel has a (frac{1}{2}) chance of being crimson and a (frac{1}{2}) chance of being blue. Witness here we best exhaust two out of three potential colors and the closing image is clearly decrease highquality.
To govern for this weak point, I added an further term to the loss feature which penalizes the sum of the pixel variances. With out delay I just correct hand tune the coefficient of the variance penalty. A just rule of thumb looks to be greater palettes may presumably mute weigh variance more carefully. Making exhaust of this penalty (variance coefficient = 0.25) provides us the 16 coloration image from the pinnacle of this post:
The tradeoff is that too limited variance will get rid of noise which, attributable to the absence of dithering, makes the closing image seem to bear fewer colors and additionally creates banding outcomes. The image below has 16 colors and variance coefficient = 1.0. It demonstrates every of these complications:
Present there are reasonably quite loads of legit picks of loss feature here and I am no longer claiming mine is correct the least bit. As an instance this article on establishing optimum dither patterns blurs every photography and takes the variation between these. Shall we strive to make exhaust of this thought or demand for one thing else to replace our easy squared error. It would additionally be engrossing to strive to make exhaust of an valid image highquality metric love SSIM to measure image highquality somewhat than the utilization of variance as a proxy.
One other plan for enchancment is that our map is sloooow (up to several minutes). It additionally does no longer scale smartly memory shining to mountainous palettes (when I strive the utilization of greater than 200 colors for the 900×450 pixel fruit image my colab pocket e book runs out of ram). Here’s because in that case there are greater than 200x950x450 variables to optimize over. Shall we potentially sort out these complications in two programs. To take care of escape shall we strive to break the image up into mini batches. To take care of memory utilization shall we strive to make exhaust of a neural community to output chances at every pixel field somewhat than storing all of them explicitly.
Why live I judge this map is engrossing?
Though this map is no longer instruct of the artwork by any capacity in both escape or highquality I judge or no longer it’s engrossing that we are in a position to optimize every the palette various and dithering on the identical time.
So a long way as I do know dithering and palette various have to now not with out a doubt half of instruct of the artwork lossy compression this day. However it’d be neat if these identical ideas will be utilized to one thing love the coloration plan rework, discrete cosine rework and weight quantization steps of jpeg compression.
A pipedream would per chance be a completely differentiable image compression pipeline the build apart the total steps will be swish tuned collectively to optimize a snarl image with respect to any differentiable loss feature.
Comprise questions / feedback / corrections?
Get involved: pstefek.dev@gmail.com