0
\$\begingroup\$

I have a huge fileList array where many elements of array start from sub-** pattern. I want to group these files belonging to same sub-**(for e.g.: files which start with sub-01) into one subject specific array. Thus after creating all these sub specific array they can be pushed to a large array nifti_fileList. I have written following code in JavaScript with very crude logic:

        subFilelist = [];
    var previous_sub = null;
    var capture = null;
    var file_no = 0

    async.eachOfLimit(fileList, 200, function (file, key, cb) {
        testfile = file.relativePath;
        sub_capture_re = /sub-[a-zA-Z0-9]*/gi;
        var current_sub = testfile.match(sub_capture_re)
        if (current_sub) {
            if((file_no === 0)){
                subFilelist.push(testfile)
                file_no = file_no + 1;
                previous_sub = current_sub[0];
            }

            if ((current_sub[0] === previous_sub) && (file_no > 0)){
                previous_sub = current_sub[0];
                subFilelist.push(file.relativePath)
            } else {
                nifti_fileList.push(subFilelist)
                subFilelist = [];
                file_no =0
            }
        }
        process.nextTick(cb);
    });
    console.log(nifti_fileList);

Sample input:

fileList = [/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz,
/sub-10/anat/sub-10_T1w.json,
/sub-10/anat/sub-10_T1w.nii.gz,
/sub-10/fmap/sub-10_magnitude1.nii.gz,
/sub-10/fmap/sub-10_magnitude2.nii.gz,
/sub-10/fmap/sub-10_phasediff.json,
/sub-10/fmap/sub-10_phasediff.nii.gz,
/sub-11/anat/sub-11_T1w.nii.gz]

nifti_fileList = [[/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz],
    [/sub-10/anat/sub-10_T1w.json,
    /sub-10/anat/sub-10_T1w.nii.gz,
    /sub-10/fmap/sub-10_magnitude1.nii.gz,
    /sub-10/fmap/sub-10_magnitude2.nii.gz,
    /sub-10/fmap/sub-10_phasediff.json,
    /sub-10/fmap/sub-10_phasediff.nii.gz],
    [/sub-11/anat/sub-11_T1w.nii.gz]]

Can this be made more efficient?

\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

Observations and Remarks

  • Your code is pretty complex and hard to read, as it involves a callback with a complex outside state encoded within the global revious_sub, capture and file_no variables. Each time your callback is executed, it modifies some of those state variables = there are many side-effects, which makes things confusing.
  • The variable capture is unused.
  • Not all variables are declared and thus become global.
  • Your sample input seems to be a list of strings, but within your callback you access the path via testfile = file.relativePath; indicating that you deal with objects.
  • It doesn't seem necessary to me to perform a basic array operation without IO delays asynchronously. So I recommend a simpler synchronous function.
  • Your resulting grouped file list is not very useful, as its groups are unnamed. If we want to know the 'sub-*' prefix of a group, we would have to manually inspect a random group element. An explicit mapping from prefix or name to file would allow us to query e.g. groups.get('sub-03') in constant time.
  • Your current implementation requires a sorted input file list. By using a Map, you can guarantee the same runtime complexity without having to rely on sorted input.

Suggested Implementation

Write a general purpose groupBy helper function or use a library which already includes one. Such a function would group array elements by their label or name. The user supplies a callback which returns the label or name for each array element:

function groupBy(array, cp) {
    return array.reduce((groups, next) => {
        const name = cp(next);
        if (name !== undefined) {
            const group = groups.get(name);
            if (group) group.push(next);
            else groups.set(name, [next]);
        }
        return groups;
    }, new Map());
}

For a given list of file paths, you could then simply write:

const files = [
    '/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz',
    '/sub-10/anat/sub-10_T1w.json',
    '/sub-10/anat/sub-10_T1w.nii.gz',
    '/sub-10/fmap/sub-10_magnitude1.nii.gz',
    '/sub-10/fmap/sub-10_magnitude2.nii.gz',
    '/sub-10/fmap/sub-10_phasediff.json',
    '/sub-10/fmap/sub-10_phasediff.nii.gz',
    '/sub-11/anat/sub-11_T1w.nii.gz'
];

const groups = groupBy(files, path => {
    const match = path.match(/^\/(sub-\w+)\//);
    if (match) return match[1];
});

If your file list is made up of objects instead of strings, you could write:

const groups = groupBy(files, file => {
    const match = file.relativePath.match(/^\/(sub-\w+)\//);
    if (match) return match[1];
});

The result is a Map { "sub-09" → […], "sub-10" → […], "sub-11" → […] } which we can query via e.g. groups.get('sub-09') or transform into your array of arrays via e.g. [...groups.values()].

\$\endgroup\$
6
  • \$\begingroup\$ How should i access groups elements then..? \$\endgroup\$ Commented Dec 8, 2017 at 20:08
  • \$\begingroup\$ @learnningprogramming You can access the first file in a group by e.g. groups.get('sub-10')[0]. To print all files, use e.g. console.log(...groups.get('sub-10')). To iterate through all files in a group, use e.g. for (const file of groups.get('sub-10')) { ... }. To iterate all groups, use e.g. for (const [name, files] of groups.entries()) { ... }. \$\endgroup\$
    – le_m
    Commented Dec 8, 2017 at 20:22
  • \$\begingroup\$ the solution you provided works well however, as per ES6 return array.reduce((groups, next) => { const name = cp(next); gives lint error as follows: Parsing error: Parenthesized pattern \$\endgroup\$ Commented Dec 8, 2017 at 23:41
  • \$\begingroup\$ @learnningprogramming debug, update or change your linter then? \$\endgroup\$
    – le_m
    Commented Dec 8, 2017 at 23:49
  • \$\begingroup\$ I am using npm linter>> this code is for another huge chunk of software already written. dont want to break it. \$\endgroup\$ Commented Dec 9, 2017 at 0:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.