The demand for data deduplication is actually that tool libraries such as lodash have mature and complete implementations and can be maturely used in production environments. But this does not prevent us from the perspective of thinking expansion to see how duplication removal can be achieved using several ideas. This article mainly shares with you several ideas for deduplication of JavaScript arrays.
The first is the implementation of the conventional double-layer circular comparison idea
function doubleLoopUniq(arr) { let result = []; for (let i = 0, len = arr.length, isExist; i < len; i++) { // 定义一个变量表示当前元素在 result 中是否存在。 isExist = false; for (let j = 0, rLen = result.length; j < rLen; j++) { if (result[j] === arr[i]) { // 依次对result 中的元素 和 原数组元素进行比对。 isExist = true; break; } } // 最后判断如果不存在,则将此元素插入result !isExist && result.push(arr[i]); } return result; }
Use the built-in indexOf of js to remove duplicates
function indexOfUniq(arr) { let result = []; for (let i = 0, len = arr.length; i < len; i++) { // 用indexOf 简化了二层循环的流程 if (result.indexOf(arr[i]) === -1) result.push(arr[i]); } return result; }
After sorting, compare before and after to remove duplicates
function sortUniq(arr) { let result = [], last; // 这里解构是为了不对原数组产生副作用 [ ...arr ].sort().forEach(item => { if (item != last) { result.push(item); last = item; } }); return result; }
Deduplication through hashTable
function hashUniq(arr) { let hashTable = arr.reduce((result, curr, index, array) => { result[curr] = true; return result; }, {}) return Object.keys(hashTable).map(item => parseInt(item, 10)); }
ES6 SET one line of code to achieve deduplication
function toSetUniq(arr) { return Array.from(new Set(arr)); }
splice deduplication (directly operates the array itself, with side effects)
function inPlaceUniq(arr) { let idx = 0; while (idx < arr.length) { let compare = idx + 1; while (compare < arr.length) { if (arr[idx] == arr[compare]) { arr.splice(compare, 1); continue; } ++compare } ++idx; } return arr; }
Finally in Run a simple test under nodejs to see which one is more efficient~
let data = []; for (var i = 0; i < 100000; i++) { data.push(Math.random()) } // 实现一个性能测试的装饰器 function performanceTest(fn, descript) { var a = new Date().getTime(); return function () { fn.apply(this, [].slice.call(arguments, 0)); console.log(descript, new Date().getTime() - a) } } performanceTest(hashUniq, "hashTable")(data) performanceTest(sortUniq, "sortUniq")(data) performanceTest(toSetUniq, "toSetUniq")(data) performanceTest(indexOfUniq, "indexOfUniq")(data) performanceTest(doubleLoopUniq, "doubleLoopUniq")(data) performanceTest(inPlaceUniq, "inPlaceUniq")(data)
The results are as follows
hashTable 168ms sortUniq 332ms toSetUniq 80ms indexOfUniq 4280ms doubleLoopUniq 13303ms inPlaceUniq 9977ms
Extended thinking: If the elements in the array are objects, how to remove duplicates?
Since it is a reference type, deepEqual will inevitably be used. Although this idea can answer this problem, it is inevitably not efficient enough.
It can also be seen from the above test that deduplication through new Set and hashTable is the most efficient.
So there is no doubt that we need to transform based on these two methods. I want to use hashTable.
On the other hand, in order to reduce the time-consuming caused by deep comparison, I try to use JSON.stringify to reference The type is converted to a basic type.
function collectionUniq(collection) { let hashTable = {}; collection.forEach(item => { hashTable[JSON.stringify(item)] = true; }) return Object.keys(hashTable).map(item => JSON.parse(item)) }
Then here comes the problem. We all know that the attributes of objects are unordered. If the data is like this, then it is GG.
let collection = [ { a: 1, b: 2, c: 3 }, { b: 2, c: 3, a: 1 } ]
There is a toHash idea, After a basic deduplication of this array, in order to ensure accuracy,
first traverse the JSON string=>
Pass charCodeAt() gets the unicode encoding of each string =>
and adds them to get a total number. Finally, compare them in pairs. Those with equal values are duplicates, thus achieving the effect of deduplication.
function toHash(obj) { let power = 1; let res = 0; const string = JSON.stringify(obj, null, 2); for (let i = 0, l = string.length; i < l; i++) { switch (string[i]) { case '{': power *= 2 break case '}': power /= 2 break case ' ': case '\n': case '\r': case '\t': break default: res += string[i].charCodeAt(0) * power } } return res }
This is just a basic idea for implementation, and there is a lot of room for improvement. In order to reduce the possibility of hash collision, the weight of some special characters can be increased or decreased.
The key point is to ensure that the probability of collision is smaller than winning the jackpot.
Related recommendations:
Sharing several methods of deduplication in JavaScript arrays
Implementing array deduplication in PHP Method code
JS simple implementation of array deduplication method analysis
The above is the detailed content of Detailed examples of several ideas for deduplication of JavaScript arrays. For more information, please follow other related articles on the PHP Chinese website!