What I'm wondering is, if there is a way to automatically find the best encoding for the bytes. Automatically find all the sequences that can be cachedput into a dictionary. I don't see how that's not possible, but I imagine it is otherwise it would've been done already. It seems like it would be best solved in the area of DNA sequence analysis.