When subdividing intervals for arithmetic coding, the worst case scenario is that the final interval will have size 2^(1-n)
, where n
is the number of unique symbols you are encoding. This will reach the limits of machine precision, when using regular datatypes, really quickly: For instance, I've made a JavaScript implementation and it breaks after encoding around just 15 characters!
As this is pretty limiting, how do codecs get around this?
- Encode [very] short blocks, within the machine's limits, and concatenate them together? That will add overhead.
- Use non-native types with arbitrary precision? This will be slow.
- Something else?...