2/04/2014

02-04-14 - Understanding ANS - 6

Okay, let's get to streaming.

For illustration let's go back to the simple example of packing arbitrary base numbers into an integer :


// encode : put val into state
void encode(int & state, int val, int mod)
{
    ASSERT( val >= 0 && val < mod );
    state = state*mod + val;
}

// decode : remove a value from state and return it
int decode(int & state, int mod )
{
    int val = state % mod;
    state /= mod;
    return val;
}

as you encode, state grows, and eventually gets too big to fit in an integer. So we need to flush out some bits (or bytes).

But we can't just stream out bits. The problem is that the decoder does a modulo to get the next value. If we stream in and out high bits, that's equivalent to doing something like +65536 on the value. When you do a mod-3 (or whatever) on that, you have changed what you decode.

If you only ever did mod-pow2's, you could stream bits out of the top at any time, because the decoding of the low bits is not affected by the high bits. This is how the Huffman special case of ANS works. With Huffman coding you can stream in and out any bits that are above the current symbol, because they don't affect the mask at the bottom.

In general we want to stream bits (base 2) or bytes (base 256). To do ANS in general we need to mod and multiply by arbitrary values that are not factors of 2 or 256.

To ensure that we get decodability, we have to stream such that the decoder sees the exact value that the encoder made. That is :


ENCODE                      DECODE

|                           ^
V                           |

stream out                  stream in

|                           ^
V                           |

C(s,x) coding function      D(x) decoding function

|                           ^
V                           |

x'                          x'

The key thing is that the value of x' that C(s,x) makes is exactly the same that goes into D(x).

This is different from Huffman, as noted above. It's also different than arithmetic coding, which can have an encoder and decoder that are out of sync. An arithmetic decoder only uses the top bits, so you can have more or less of the rest of the stream in the low bits. While the basic ANS step (x/P + C) is a kind of arithmetic coding step, the funny trick we did to take some bits of x and mod it back down to the low bits (see earlier posts) means that ANS is *not* making a continuous arithmetic stream for the whole message that you can jump into anywhere.

Now it's possible there are multiple streaming methods that work. For example with M = a power of 2 in rANS you might be able to stream high bytes. I'm not sure, and I'm not going to talk about that in general. I'm just going to talk about one method of streaming that does work, which Duda describes.

To ensure that our encode & decode streaming produce the same value of x', we need a range to keep it in. If you're streaming in base b, this range is of the form [L, b*L-1] . So, I'll use Duda's terminology and call "I" the range we want x' to be in for decoding, that is


I = [L, b*L-1]

Decoder streams into x :

x <- x*b + get_base_b();

until x is in I

but the encoder must do something a bit funny :

stream out from x

x' = C(s,x)  , coding function

x' now in I

that is, the stream out must be done before the coding function, and you must wind up in the streaming range after the coding function. x' in the range I ensures that the encoder and decoder see exactly the same value (because any more streaming ops would take it out of I).

To do this, we must know the "precursor" ranges for C(). That is :


Is = { x : C(s,x) is in I }

that is, the values of x such that after coding with x' = C(s,x), x' is in I

these precursor ranges depend on s. So the encoder streaming is :

I'm about to encode symbol s

stream out bits from x :

put_base_b( x % b )
x <- x/b

until x is in Is

so we get into the precursor range, and then after the coding step we are in I.

Now this is actually a constraint on the coding function C (because it determines what the Is are). You must be able to encode any symbol from any state. That means you must be able to reach the Is precursor range for any symbol from any x in the output range I. For that to be true, the Is must span a power of b, just like "I" does. That is,


all Is must be of the form

Is = [ K, b*K - 1 ]

for some K

eg. to be concrete, if b = 2, we're streaming out bits, then Is = { 3,4,5 } is okay, you will be able to get there from any larger x by streaming out bits, but Is = {4,5,6} is not okay.


I = [8, 15]

Is = {4,5,6}

x = 14

x is out of Is, so stream out a bit ; 14 -> 7

x is out of Is, so stream out a bit ; 7 -> 3

x is below Is!  crap!

this constraint will be our primary guide in building the table-based version of ANS.

No comments:

old rants