4

When subdividing intervals for arithmetic coding, the worst case scenario is that the final interval will have size 2^(1-n), where n is the number of unique symbols you are encoding. This will reach the limits of machine precision, when using regular datatypes, really quickly: For instance, I've made a JavaScript implementation and it breaks after encoding around just 15 characters!

As this is pretty limiting, how do codecs get around this?

  • Encode [very] short blocks, within the machine's limits, and concatenate them together? That will add overhead.
  • Use non-native types with arbitrary precision? This will be slow.
  • Something else?...

2 Answers 2

4

According to Wikipedia:

Rather than try to simulate infinite precision, most arithmetic coders instead operate at a fixed limit of precision which they know the decoder will be able to match, and round the calculated fractions to their nearest equivalents at that precision. [...]

A process called renormalization keeps the finite precision from becoming a limit on the total number of symbols that can be encoded. Whenever the range is reduced to the point where all values in the range share certain beginning digits, those digits are sent to the output. For however many digits of precision the computer can handle, it is now handling fewer than that, so the existing digits are shifted left, and at the right, new digits are added to expand the range as widely as possible.

1
  • I don't immediately understand the renormalisation process -- I'll have to read up on that -- but so is what this is saying is that we have a buffer of compressed data, up to some precision, which (as per @Martin's answer) is streamed to the output whenever it reaches its precision limit? Commented Dec 21, 2012 at 12:54
1

While the "arithmetic" part of the name "arithmetic coding" is useful because it explains how the system works, we shouldn't assume that the encoded "number" is ever actually used or useful as a number. In fact, in the general case, it's just a stream of bits.

So I don't think the size of numeric datatype that can be handled natively is really an issue. You encode by pushing bits onto the end of a stream which can be arbitrarily large, and you decode by reading that stream.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.