0
\$\begingroup\$

JSON is quickly becoming the formatting standard of choice for quick, convenient, and reliable machine to machine communications. It is flexible and widely supported but it has some drawbacks when it comes to data payloads like those from databases. A JSON payload from a database table named fruits might look like this:

[{"id":1,"abbreviation":"appl","name":"Apple"},
{"id":2,"abbreviation":"pear","name":"Pear"},
{"id":3,"abbreviation":"bana","name":"Banana"},
{"id":4,"abbreviation":"bkby","name":"Blackberry"},
{"id":5,"abbreviation":"strw","name":"Stawberry"},
{"id":5,"abbreviation":"pech","name":"Peach"},
{"id":6,"abbreviation":"plum","name":"Plum"}]

Most of the data in this payload is repeated because of the array keys. This is very inefficient and troublesome from a bandwidth perspective. I've come up with a way to compress the JSON by removing the keys and putting them on the first row of the array like this:

C[["id","abbreviation","name"],
[1,"appl","Apple"],
[2,"pear","Pear"],
[3,"bana","Banana"],
[4,"bkby","Blackberry"],
[5,"strw","Stawberry"],
[5,"pech","Peach"],
[6,"plum","Plum"]]

I've written the JavaScript JSON encoder/decoder below that uses the built-in JSON methods but is designed to recognize the 'C' at the beginning of the data and act appropriately. The methods are named the same as the built-in methods, so to implement you would just have to do a "find-replace" JSON for CJSON.

Note that the code currently will only go 2 levels deep looking for arrays to compress.

I'm looking for answers that offer advice on making the code exception safe, and best practices. If you need more details or examples you can view the project's GitHub page.

var CJSON = (function(){
    var self = (function(){
        return {

            checkProperty: function(property){
                if(!Array.isArray(property)){
                    return false;
                }
                //if the array is empty or just one row then skip it
                if(property.length < 2){
                    return false;
                }
                return true;
            },

            compressArray: function(inArr){
                if(typeof(inArr[0]) === 'undefined'){
                    return inArr;
                }
                var keys = Object.getOwnPropertyNames(inArr[0]);
                var finalArr = [];
                finalArr[0] = keys;
                for(var i=0; i<inArr.length; i++){
                    var tempArr = [];
                    if(inArr[i] !== null){
                        for(var v=0; v<keys.length; v++){
                            if(typeof(inArr[i][keys[v]]) !== 'undefined'){
                                tempArr[v] = inArr[i][keys[v]];
                            }
                            else{
                                tempArr[v] = null;
                            }
                        }
                    }
                    else{
                        tempArr = null;
                    }
                    finalArr[i+1] = tempArr;
                }
                return finalArr;
            },

            uncompressArray: function(inArr){
                var original = new Array();
                var keys = inArr[0];
                for(var i=1; i<inArr.length; i++){
                    var row = new Object();
                    if(inArr[i] !== null){
                        for(var v=0; v<keys.length; v++){
                            var key = keys[v];
                            if(typeof(inArr[i][v]) !== 'undefined'){
                                row[key] = inArr[i][v];
                            }
                            else{
                                row[key] = null;
                            }
                        }
                    }
                    else{
                        row = null;
                    }
                    original[i-1] = row;
                }
                return original;
            }
        };
    })();

    return {
        stringify: function(inObj){
            return this.encodeJSON(inObj);
        },

        //note that this function alters inObj if appropriate arrays found as properties
        encodeJSON: function(inObj){
            var foundArr = false;
            var newObj = this.copyObject(inObj);
            if(self.checkProperty(newObj)){
                newObj = self.compressArray(newObj);
                foundArr = true;
            }
            else{
                var properties = Object.getOwnPropertyNames(newObj);
                for(var i=0; i<properties.length; i++){
                    if(!self.checkProperty(newObj[properties[i]])){
                        continue;
                    }
                    newObj[properties[i]] = self.compressArray(newObj[properties[i]]);
                    foundArr = true;
                }
            }
            if(foundArr){
                return "C"+JSON.stringify(newObj);
            }
            return JSON.stringify(newObj);
        },

        parse: function(jsonStr){
            return this.decodeJSON(jsonStr);
        },

        decodeJSON: function(jsonStr){
            if(!jsonStr){
                return null;
            }
            //if the first character is not a 'C' then this is regular JSON
            var firstChar = jsonStr.substring(0,1);
            if(firstChar != 'C'){
                var inObj = JSON.parse(jsonStr);
                //if the first property is a zero - array index
                if(typeof(inObj[0]) !== 'undefined'){
                    //set the length property of the object - in case it wasn't parsed to an array
                    inObj.length = Object.getOwnPropertyNames(inObj).length;
                }
                return inObj;
            }
            jsonStr = jsonStr.substring(1);
            var inObj = JSON.parse(jsonStr);
            if(self.checkProperty(inObj)){
                inObj = self.uncompressArray(inObj);
            }
            else{
                var properties = Object.getOwnPropertyNames(inObj);
                for(var i=0; i<properties.length; i++){
                    if(!self.checkProperty(inObj[properties[i]])){
                        continue;
                    }
                    inObj[properties[i]] = self.uncompressArray(inObj[properties[i]]);
                }
            }
            return inObj;
        },

        copyObject: function(obj){
            return JSON.parse(JSON.stringify(obj));
        }
    };
})();
\$\endgroup\$
6
  • 2
    \$\begingroup\$ Is the gzip/deflate transport encoding of HTTP not enough for some reason? \$\endgroup\$ Commented May 21, 2016 at 7:44
  • \$\begingroup\$ I figured why not do both? If you have a large amount of data a loss-less compression system like gzip isn't going to save as much as simply storing the data more efficiently \$\endgroup\$ Commented May 24, 2016 at 16:04
  • \$\begingroup\$ Deflate has a very good compression ratio for repetitive data. Can you do a comparison of encoding+gzip vs. gzip-only for 10 kB, 100 kB, 1000 kB (or whatever's a reasonable data set size for your use case)? It may not be worth the effort. \$\endgroup\$ Commented May 24, 2016 at 16:29
  • \$\begingroup\$ I'll have to do some more due diligence for sure. I suppose then the question is what percentage of clients don't support gzip these days. Supporting both would give better coverage but if gzip is in > 90% of devices AND the compression differences are low then you're right it wouldn't be worth it. \$\endgroup\$ Commented May 24, 2016 at 16:34
  • \$\begingroup\$ Practically all web browsers and platforms support gzip content encoding \$\endgroup\$ Commented May 24, 2016 at 16:41

1 Answer 1

3
\$\begingroup\$

First, you aren't really getting this for free. You are trading space (bandwidth) for processing, as well as adding complexity to both sending and receiving end. Additionally, your data isn't just "data" anymore. You now have metadata, that header you put in front. Instead of an array of things, you now have a table-kind-thingy.

Suggesting you stick with the "list of things". You can optimize stuff elsewhere, like enabling gzip on your server. This way, you still get compression, but it's not your code that has to pay (or suffer) for it. Additionally, consider pagination instead of loading a bunch of stuff up front.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ I get where you're coming from but bandwidth is much more expensive than processing power these days, especially in mobile. Gzip is loss-less which means it can only compress so much. More efficient encoding combined with gzip can give even better results. Furthermore unless I've missed something obvious in my code, the processing overhead here is negligible. \$\endgroup\$ Commented May 24, 2016 at 16:10

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.