optimal algorithm grouping data in javascript
The following (simplified) json type of data defines a Contact:
{
id: number;
name: string;
phone: string;
email: string
}
There is the following group of data:
+---+----------+-------------+---------------------------+
|id | name | phone |email |
+---+----------+-------------+---------------------------+
|1 | John | 11111111 |aaaa@test.com |
|2 | Marc | 22222222 |bbbb@test.com |
|3 | Ron | 99999999 |aaaa@test.com |
|4 | Andrew | 55555555 |dddd@test.com |
|5 | Wim | 99999999 |gggg@test.com |
|6 | Marc | 33333333 |cccc@test.com |
|7 | Dan | 44444444 |cccc@test.com |
+---+----------+-------------+---------------------------+
Goal is to find the groups that belong together using javascript(optionally in lodash, but main idea is to get the algorithm clear) according to the following constraint: a contact belongs to a group when any of the following criteria are the same: name, phone or email. The results shows the id's grouped as arrays in arrays. a contact in a group of 1 is ignored.
In the example above, it means that contacts with ids 1,3,5 belong together since 1,3 share the same email and 3 and 5 share the same phonenumber. Likewise 2,6,7: 2 and 6 have the same name and 6 and 7 have the same email. 5 does not have anything in common.
The expected result therefore is:[[1,3,5], [2,6,7]]
Background:
One solution that works is iterating over every item and check for the remainder of the list if name, email, or phone is the same. If so, group them and take them out of the list (in the example we compare 1 with all the items in the list and only 3 is found). problem is that next items also need to be checked again to these groups, because in this case 5 is not yet detected as part of the group. This makes the algorithm complex, while I suspect there is a simple way of solving this in linear time. There might also be a name for this class of problems?
`
javascript algorithm grouping lodash
add a comment |
The following (simplified) json type of data defines a Contact:
{
id: number;
name: string;
phone: string;
email: string
}
There is the following group of data:
+---+----------+-------------+---------------------------+
|id | name | phone |email |
+---+----------+-------------+---------------------------+
|1 | John | 11111111 |aaaa@test.com |
|2 | Marc | 22222222 |bbbb@test.com |
|3 | Ron | 99999999 |aaaa@test.com |
|4 | Andrew | 55555555 |dddd@test.com |
|5 | Wim | 99999999 |gggg@test.com |
|6 | Marc | 33333333 |cccc@test.com |
|7 | Dan | 44444444 |cccc@test.com |
+---+----------+-------------+---------------------------+
Goal is to find the groups that belong together using javascript(optionally in lodash, but main idea is to get the algorithm clear) according to the following constraint: a contact belongs to a group when any of the following criteria are the same: name, phone or email. The results shows the id's grouped as arrays in arrays. a contact in a group of 1 is ignored.
In the example above, it means that contacts with ids 1,3,5 belong together since 1,3 share the same email and 3 and 5 share the same phonenumber. Likewise 2,6,7: 2 and 6 have the same name and 6 and 7 have the same email. 5 does not have anything in common.
The expected result therefore is:[[1,3,5], [2,6,7]]
Background:
One solution that works is iterating over every item and check for the remainder of the list if name, email, or phone is the same. If so, group them and take them out of the list (in the example we compare 1 with all the items in the list and only 3 is found). problem is that next items also need to be checked again to these groups, because in this case 5 is not yet detected as part of the group. This makes the algorithm complex, while I suspect there is a simple way of solving this in linear time. There might also be a name for this class of problems?
`
javascript algorithm grouping lodash
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:99999999
? Would it be[[1, 2, 3, 5, 6, 7]]
?
– Josep
Nov 20 '18 at 9:27
1
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53
add a comment |
The following (simplified) json type of data defines a Contact:
{
id: number;
name: string;
phone: string;
email: string
}
There is the following group of data:
+---+----------+-------------+---------------------------+
|id | name | phone |email |
+---+----------+-------------+---------------------------+
|1 | John | 11111111 |aaaa@test.com |
|2 | Marc | 22222222 |bbbb@test.com |
|3 | Ron | 99999999 |aaaa@test.com |
|4 | Andrew | 55555555 |dddd@test.com |
|5 | Wim | 99999999 |gggg@test.com |
|6 | Marc | 33333333 |cccc@test.com |
|7 | Dan | 44444444 |cccc@test.com |
+---+----------+-------------+---------------------------+
Goal is to find the groups that belong together using javascript(optionally in lodash, but main idea is to get the algorithm clear) according to the following constraint: a contact belongs to a group when any of the following criteria are the same: name, phone or email. The results shows the id's grouped as arrays in arrays. a contact in a group of 1 is ignored.
In the example above, it means that contacts with ids 1,3,5 belong together since 1,3 share the same email and 3 and 5 share the same phonenumber. Likewise 2,6,7: 2 and 6 have the same name and 6 and 7 have the same email. 5 does not have anything in common.
The expected result therefore is:[[1,3,5], [2,6,7]]
Background:
One solution that works is iterating over every item and check for the remainder of the list if name, email, or phone is the same. If so, group them and take them out of the list (in the example we compare 1 with all the items in the list and only 3 is found). problem is that next items also need to be checked again to these groups, because in this case 5 is not yet detected as part of the group. This makes the algorithm complex, while I suspect there is a simple way of solving this in linear time. There might also be a name for this class of problems?
`
javascript algorithm grouping lodash
The following (simplified) json type of data defines a Contact:
{
id: number;
name: string;
phone: string;
email: string
}
There is the following group of data:
+---+----------+-------------+---------------------------+
|id | name | phone |email |
+---+----------+-------------+---------------------------+
|1 | John | 11111111 |aaaa@test.com |
|2 | Marc | 22222222 |bbbb@test.com |
|3 | Ron | 99999999 |aaaa@test.com |
|4 | Andrew | 55555555 |dddd@test.com |
|5 | Wim | 99999999 |gggg@test.com |
|6 | Marc | 33333333 |cccc@test.com |
|7 | Dan | 44444444 |cccc@test.com |
+---+----------+-------------+---------------------------+
Goal is to find the groups that belong together using javascript(optionally in lodash, but main idea is to get the algorithm clear) according to the following constraint: a contact belongs to a group when any of the following criteria are the same: name, phone or email. The results shows the id's grouped as arrays in arrays. a contact in a group of 1 is ignored.
In the example above, it means that contacts with ids 1,3,5 belong together since 1,3 share the same email and 3 and 5 share the same phonenumber. Likewise 2,6,7: 2 and 6 have the same name and 6 and 7 have the same email. 5 does not have anything in common.
The expected result therefore is:[[1,3,5], [2,6,7]]
Background:
One solution that works is iterating over every item and check for the remainder of the list if name, email, or phone is the same. If so, group them and take them out of the list (in the example we compare 1 with all the items in the list and only 3 is found). problem is that next items also need to be checked again to these groups, because in this case 5 is not yet detected as part of the group. This makes the algorithm complex, while I suspect there is a simple way of solving this in linear time. There might also be a name for this class of problems?
`
javascript algorithm grouping lodash
javascript algorithm grouping lodash
edited Nov 20 '18 at 12:01
asked Nov 20 '18 at 9:18
Han
535
535
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:99999999
? Would it be[[1, 2, 3, 5, 6, 7]]
?
– Josep
Nov 20 '18 at 9:27
1
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53
add a comment |
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:99999999
? Would it be[[1, 2, 3, 5, 6, 7]]
?
– Josep
Nov 20 '18 at 9:27
1
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:
99999999
? Would it be [[1, 2, 3, 5, 6, 7]]
?– Josep
Nov 20 '18 at 9:27
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:
99999999
? Would it be [[1, 2, 3, 5, 6, 7]]
?– Josep
Nov 20 '18 at 9:27
1
1
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53
add a comment |
4 Answers
4
active
oldest
votes
Idea:
- Start with 0 groups
- Iterate your list of contacts
- Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.
Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
UsingMap
instead of JS objects would allow you to avoidhasOwnProperty
. You already useSet
, which has a similar API and advantages.
– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
add a comment |
Any answer that iterates over each of n
entries, and then over a growing list of m
groups to match against is going to have worst-time performance of O(n*m)
(found when there are no two entries that match on any term).
Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q
options will further have to pay a penalty of O(q)
per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m)
.
I believe this answer is O(n)
, because assuming that the number of fields to match with is a constant (in this case, 3: name
, phone
and email
), all operations in the main loop, which is run once per entry, are O(1)
.
There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m)
, with m the number of groups; with an extra O(n)
when actually copying entry IDs to the merged groups: overall, we are still in O(n)
territory.
The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.
const data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
const groups = function(inputs) {
let valuesToGroups = new Map(
['name', 'phone', 'email'].map(key => [key, new Map()]));
let groups = new Map();
let pendingMerges = ;
for (const entry of inputs) {
let group = undefined;
let found = ;
for (const [key, valueMap] of valuesToGroups) {
// look up value in values-index for current key
group = valueMap.get(entry[key]);
if (group !== undefined) {
found.push(group);
// not breaking allows groups to be merged
}
}
if (found.length === 0) {
// not found: create new group
group = groups.size;
groups.set(group, [entry.id]);
} else {
// found: add entry to chosen group
group = found[0];
groups.get(group).push(entry.id);
if (found.length > 1) {
pendingMerges.push(found);
}
}
// add entry's values to index, pointing to group
for (const [key, valueMap] of valuesToGroups) {
valueMap.set(entry[key], group);
}
}
// do pending merges; initially, all groups are stand-alone
let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));
for (const merge of pendingMerges) {
// contents will go to the lowest-numbered group
const sorted = merge.map(groupId => merges.get(groupId)).sort();
sorted.forEach(groupId => merges.set(groupId, sorted[0]));
}
const cleanGroups = new Map();
groups.forEach((value, key) => {
const k = merges.get(key);
if ( ! cleanGroups.has(k)) {
cleanGroups.set(k, );
}
value.forEach(id => cleanGroups.get(k).push(id))
})
// return only non-empty groups
return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);
}(data);
console.log(""+JSON.stringify(groups))
// output is [[1,2,3,5,6,7]]
add a comment |
Here is another suggestion of a route you could take. The idea is to use one Array.reduce
to group by id
and keep all the values (vls
) and combined results (ids
) in that accumulator object
.
This way you can easily compare the name/phone/email
using Array.some
+ Array.includes
(which is what the getGroupId
function does).
Once you have grouped and have the almost final result just prettify
it by removing the groups with length
of one and picking only the ids
array of the rest:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
One thing to note is that we do not care about the number of properties since we do Object.values
. So you can easily add another address
or fax
to that list and it would still work with zero code changes
.
As per feedback here is another version which works slightly different:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
The reason for this version is due to the testData
provided by @Mark
which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.
To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.
The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.
I would expect these to all be in one group:{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.
– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches inO(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
add a comment |
One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names
, phones
and emails
.
Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails
so that next contacts might fall into the same on.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.[4]
should not be in the result.
– Akrion
Nov 21 '18 at 6:11
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53389734%2foptimal-algorithm-grouping-data-in-javascript%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Idea:
- Start with 0 groups
- Iterate your list of contacts
- Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.
Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
UsingMap
instead of JS objects would allow you to avoidhasOwnProperty
. You already useSet
, which has a similar API and advantages.
– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
add a comment |
Idea:
- Start with 0 groups
- Iterate your list of contacts
- Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.
Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
UsingMap
instead of JS objects would allow you to avoidhasOwnProperty
. You already useSet
, which has a similar API and advantages.
– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
add a comment |
Idea:
- Start with 0 groups
- Iterate your list of contacts
- Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.
Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
Idea:
- Start with 0 groups
- Iterate your list of contacts
- Check if there is a group with the contact name, or phone, or email. Merge all the members of those groups as the same group. Then add yourself to that group. If not, begin a new group with yourself and set the name, phone and email group as yourself.
Union find is an efficient structure to handle merging of disjoint sets. Code taken from here. As it uses path compression and union by rank, you can consider that the whole code is linear in the amount of contacts.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
// UNION-FIND structure, with path comression and union by rank
var UNIONFIND = (function () {
function _find(n)
{
if(n.parent == n) return n;
n.parent = _find(n.parent);
return n.parent;
}
return {
makeset:function(id){
var newnode = {
parent: null,
id: id,
rank: 0
};
newnode.parent = newnode;
return newnode;
},
find: _find,
combine: function(n1, n2) {
var n1 = _find(n1);
var n2 = _find(n2);
if (n1 == n2) return;
if(n1.rank < n2.rank)
{
n2.parent = n2;
return n2;
}
else if(n2.rank < n1.rank)
{
n2.parent = n1;
return n1;
}
else
{
n2.parent = n1;
n1.rank += 1;
return n1;
}
}
};
})();
var groupHash = {name: {}, phone: {}, email: {}}
var groupNodes =
data.forEach(function(contact){
var group = UNIONFIND.makeset(contact.id);
var groups = new Set();
["name", "phone", "email"].forEach(function(attr){
if (groupHash[attr].hasOwnProperty(contact[attr])) groups.add(groupHash[attr][contact[attr]])
});
groups = Array.from(groups);
groups.push(group);
groupNodes.push(group);
for(var i = 1; i < groups.length; i++) {
UNIONFIND.combine(groups[0], groups[i]);
}
["name", "phone", "email"].forEach(function(attr){
groupHash[attr][contact[attr]] = groups[0];
});
})
var contactsInGroup = {}
groupNodes.forEach(function(group){
var groupId = UNIONFIND.find(group).id;
if (contactsInGroup.hasOwnProperty(groupId) == false) {
contactsInGroup[groupId] = ;
}
contactsInGroup[groupId].push(group.id);
})
var result = Object.values(contactsInGroup).filter(function(list){
return list.length > 1
})
console.log(result)
edited Nov 20 '18 at 14:59
answered Nov 20 '18 at 14:30
juvian
13.3k22026
13.3k22026
UsingMap
instead of JS objects would allow you to avoidhasOwnProperty
. You already useSet
, which has a similar API and advantages.
– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
add a comment |
UsingMap
instead of JS objects would allow you to avoidhasOwnProperty
. You already useSet
, which has a similar API and advantages.
– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
Using
Map
instead of JS objects would allow you to avoid hasOwnProperty
. You already use Set
, which has a similar API and advantages.– tucuxi
Nov 21 '18 at 11:35
Using
Map
instead of JS objects would allow you to avoid hasOwnProperty
. You already use Set
, which has a similar API and advantages.– tucuxi
Nov 21 '18 at 11:35
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
Your code seems to do mostly the same as mine, but you merge your groups on-the-fly while I wait until all are built to start processing merges. Both should be linear in the number of entries.
– tucuxi
Nov 21 '18 at 11:39
1
1
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
@tucuxi I usually only use map for object references, but it could be used yeah. Nice solution that doesn´t need an extra structure like union find. Tested with data replicated until 450k entries and yours takes 850ms while mine 1050ms, definitely both linear ^^
– juvian
Nov 21 '18 at 14:19
add a comment |
Any answer that iterates over each of n
entries, and then over a growing list of m
groups to match against is going to have worst-time performance of O(n*m)
(found when there are no two entries that match on any term).
Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q
options will further have to pay a penalty of O(q)
per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m)
.
I believe this answer is O(n)
, because assuming that the number of fields to match with is a constant (in this case, 3: name
, phone
and email
), all operations in the main loop, which is run once per entry, are O(1)
.
There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m)
, with m the number of groups; with an extra O(n)
when actually copying entry IDs to the merged groups: overall, we are still in O(n)
territory.
The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.
const data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
const groups = function(inputs) {
let valuesToGroups = new Map(
['name', 'phone', 'email'].map(key => [key, new Map()]));
let groups = new Map();
let pendingMerges = ;
for (const entry of inputs) {
let group = undefined;
let found = ;
for (const [key, valueMap] of valuesToGroups) {
// look up value in values-index for current key
group = valueMap.get(entry[key]);
if (group !== undefined) {
found.push(group);
// not breaking allows groups to be merged
}
}
if (found.length === 0) {
// not found: create new group
group = groups.size;
groups.set(group, [entry.id]);
} else {
// found: add entry to chosen group
group = found[0];
groups.get(group).push(entry.id);
if (found.length > 1) {
pendingMerges.push(found);
}
}
// add entry's values to index, pointing to group
for (const [key, valueMap] of valuesToGroups) {
valueMap.set(entry[key], group);
}
}
// do pending merges; initially, all groups are stand-alone
let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));
for (const merge of pendingMerges) {
// contents will go to the lowest-numbered group
const sorted = merge.map(groupId => merges.get(groupId)).sort();
sorted.forEach(groupId => merges.set(groupId, sorted[0]));
}
const cleanGroups = new Map();
groups.forEach((value, key) => {
const k = merges.get(key);
if ( ! cleanGroups.has(k)) {
cleanGroups.set(k, );
}
value.forEach(id => cleanGroups.get(k).push(id))
})
// return only non-empty groups
return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);
}(data);
console.log(""+JSON.stringify(groups))
// output is [[1,2,3,5,6,7]]
add a comment |
Any answer that iterates over each of n
entries, and then over a growing list of m
groups to match against is going to have worst-time performance of O(n*m)
(found when there are no two entries that match on any term).
Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q
options will further have to pay a penalty of O(q)
per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m)
.
I believe this answer is O(n)
, because assuming that the number of fields to match with is a constant (in this case, 3: name
, phone
and email
), all operations in the main loop, which is run once per entry, are O(1)
.
There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m)
, with m the number of groups; with an extra O(n)
when actually copying entry IDs to the merged groups: overall, we are still in O(n)
territory.
The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.
const data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
const groups = function(inputs) {
let valuesToGroups = new Map(
['name', 'phone', 'email'].map(key => [key, new Map()]));
let groups = new Map();
let pendingMerges = ;
for (const entry of inputs) {
let group = undefined;
let found = ;
for (const [key, valueMap] of valuesToGroups) {
// look up value in values-index for current key
group = valueMap.get(entry[key]);
if (group !== undefined) {
found.push(group);
// not breaking allows groups to be merged
}
}
if (found.length === 0) {
// not found: create new group
group = groups.size;
groups.set(group, [entry.id]);
} else {
// found: add entry to chosen group
group = found[0];
groups.get(group).push(entry.id);
if (found.length > 1) {
pendingMerges.push(found);
}
}
// add entry's values to index, pointing to group
for (const [key, valueMap] of valuesToGroups) {
valueMap.set(entry[key], group);
}
}
// do pending merges; initially, all groups are stand-alone
let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));
for (const merge of pendingMerges) {
// contents will go to the lowest-numbered group
const sorted = merge.map(groupId => merges.get(groupId)).sort();
sorted.forEach(groupId => merges.set(groupId, sorted[0]));
}
const cleanGroups = new Map();
groups.forEach((value, key) => {
const k = merges.get(key);
if ( ! cleanGroups.has(k)) {
cleanGroups.set(k, );
}
value.forEach(id => cleanGroups.get(k).push(id))
})
// return only non-empty groups
return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);
}(data);
console.log(""+JSON.stringify(groups))
// output is [[1,2,3,5,6,7]]
add a comment |
Any answer that iterates over each of n
entries, and then over a growing list of m
groups to match against is going to have worst-time performance of O(n*m)
(found when there are no two entries that match on any term).
Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q
options will further have to pay a penalty of O(q)
per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m)
.
I believe this answer is O(n)
, because assuming that the number of fields to match with is a constant (in this case, 3: name
, phone
and email
), all operations in the main loop, which is run once per entry, are O(1)
.
There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m)
, with m the number of groups; with an extra O(n)
when actually copying entry IDs to the merged groups: overall, we are still in O(n)
territory.
The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.
const data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
const groups = function(inputs) {
let valuesToGroups = new Map(
['name', 'phone', 'email'].map(key => [key, new Map()]));
let groups = new Map();
let pendingMerges = ;
for (const entry of inputs) {
let group = undefined;
let found = ;
for (const [key, valueMap] of valuesToGroups) {
// look up value in values-index for current key
group = valueMap.get(entry[key]);
if (group !== undefined) {
found.push(group);
// not breaking allows groups to be merged
}
}
if (found.length === 0) {
// not found: create new group
group = groups.size;
groups.set(group, [entry.id]);
} else {
// found: add entry to chosen group
group = found[0];
groups.get(group).push(entry.id);
if (found.length > 1) {
pendingMerges.push(found);
}
}
// add entry's values to index, pointing to group
for (const [key, valueMap] of valuesToGroups) {
valueMap.set(entry[key], group);
}
}
// do pending merges; initially, all groups are stand-alone
let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));
for (const merge of pendingMerges) {
// contents will go to the lowest-numbered group
const sorted = merge.map(groupId => merges.get(groupId)).sort();
sorted.forEach(groupId => merges.set(groupId, sorted[0]));
}
const cleanGroups = new Map();
groups.forEach((value, key) => {
const k = merges.get(key);
if ( ! cleanGroups.has(k)) {
cleanGroups.set(k, );
}
value.forEach(id => cleanGroups.get(k).push(id))
})
// return only non-empty groups
return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);
}(data);
console.log(""+JSON.stringify(groups))
// output is [[1,2,3,5,6,7]]
Any answer that iterates over each of n
entries, and then over a growing list of m
groups to match against is going to have worst-time performance of O(n*m)
(found when there are no two entries that match on any term).
Any answer that iterates over each entry, and then over groups, and uses arrays to test for matching values among q
options will further have to pay a penalty of O(q)
per match. In the worst case, with say all e-mails the same and all phones different, this will mean O(n*m)
.
I believe this answer is O(n)
, because assuming that the number of fields to match with is a constant (in this case, 3: name
, phone
and email
), all operations in the main loop, which is run once per entry, are O(1)
.
There is an extra complication to fix the fact that, late in the process, we may find a bridge between two (or even 3) groups, as entries can match on different fields with entries from different groups. This may happen several times. To avoid having to rebuild groups during the main loop, we leave merging to the very end, where we first build a map of what-group-ends-up-where, and then finally move all entry IDs to their final group. This can all be done in O(m)
, with m the number of groups; with an extra O(n)
when actually copying entry IDs to the merged groups: overall, we are still in O(n)
territory.
The last line builds arrays of ids from the merged groups, and filters out any that does not have over 1 element.
const data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'99999999',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
const groups = function(inputs) {
let valuesToGroups = new Map(
['name', 'phone', 'email'].map(key => [key, new Map()]));
let groups = new Map();
let pendingMerges = ;
for (const entry of inputs) {
let group = undefined;
let found = ;
for (const [key, valueMap] of valuesToGroups) {
// look up value in values-index for current key
group = valueMap.get(entry[key]);
if (group !== undefined) {
found.push(group);
// not breaking allows groups to be merged
}
}
if (found.length === 0) {
// not found: create new group
group = groups.size;
groups.set(group, [entry.id]);
} else {
// found: add entry to chosen group
group = found[0];
groups.get(group).push(entry.id);
if (found.length > 1) {
pendingMerges.push(found);
}
}
// add entry's values to index, pointing to group
for (const [key, valueMap] of valuesToGroups) {
valueMap.set(entry[key], group);
}
}
// do pending merges; initially, all groups are stand-alone
let merges = new Map(Array.from(groups.keys()).map(k => [k, k]));
for (const merge of pendingMerges) {
// contents will go to the lowest-numbered group
const sorted = merge.map(groupId => merges.get(groupId)).sort();
sorted.forEach(groupId => merges.set(groupId, sorted[0]));
}
const cleanGroups = new Map();
groups.forEach((value, key) => {
const k = merges.get(key);
if ( ! cleanGroups.has(k)) {
cleanGroups.set(k, );
}
value.forEach(id => cleanGroups.get(k).push(id))
})
// return only non-empty groups
return [... cleanGroups].filter(g => g[1].length>1).map(g => [... g[1]]);
}(data);
console.log(""+JSON.stringify(groups))
// output is [[1,2,3,5,6,7]]
answered Nov 21 '18 at 11:25
tucuxi
11.6k22959
11.6k22959
add a comment |
add a comment |
Here is another suggestion of a route you could take. The idea is to use one Array.reduce
to group by id
and keep all the values (vls
) and combined results (ids
) in that accumulator object
.
This way you can easily compare the name/phone/email
using Array.some
+ Array.includes
(which is what the getGroupId
function does).
Once you have grouped and have the almost final result just prettify
it by removing the groups with length
of one and picking only the ids
array of the rest:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
One thing to note is that we do not care about the number of properties since we do Object.values
. So you can easily add another address
or fax
to that list and it would still work with zero code changes
.
As per feedback here is another version which works slightly different:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
The reason for this version is due to the testData
provided by @Mark
which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.
To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.
The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.
I would expect these to all be in one group:{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.
– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches inO(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
add a comment |
Here is another suggestion of a route you could take. The idea is to use one Array.reduce
to group by id
and keep all the values (vls
) and combined results (ids
) in that accumulator object
.
This way you can easily compare the name/phone/email
using Array.some
+ Array.includes
(which is what the getGroupId
function does).
Once you have grouped and have the almost final result just prettify
it by removing the groups with length
of one and picking only the ids
array of the rest:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
One thing to note is that we do not care about the number of properties since we do Object.values
. So you can easily add another address
or fax
to that list and it would still work with zero code changes
.
As per feedback here is another version which works slightly different:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
The reason for this version is due to the testData
provided by @Mark
which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.
To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.
The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.
I would expect these to all be in one group:{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.
– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches inO(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
add a comment |
Here is another suggestion of a route you could take. The idea is to use one Array.reduce
to group by id
and keep all the values (vls
) and combined results (ids
) in that accumulator object
.
This way you can easily compare the name/phone/email
using Array.some
+ Array.includes
(which is what the getGroupId
function does).
Once you have grouped and have the almost final result just prettify
it by removing the groups with length
of one and picking only the ids
array of the rest:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
One thing to note is that we do not care about the number of properties since we do Object.values
. So you can easily add another address
or fax
to that list and it would still work with zero code changes
.
As per feedback here is another version which works slightly different:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
The reason for this version is due to the testData
provided by @Mark
which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.
To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.
The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.
Here is another suggestion of a route you could take. The idea is to use one Array.reduce
to group by id
and keep all the values (vls
) and combined results (ids
) in that accumulator object
.
This way you can easily compare the name/phone/email
using Array.some
+ Array.includes
(which is what the getGroupId
function does).
Once you have grouped and have the almost final result just prettify
it by removing the groups with length
of one and picking only the ids
array of the rest:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
One thing to note is that we do not care about the number of properties since we do Object.values
. So you can easily add another address
or fax
to that list and it would still work with zero code changes
.
As per feedback here is another version which works slightly different:
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
The reason for this version is due to the testData
provided by @Mark
which has the 2nd element not matching the first but matching the 3rd which actually matches the 1st ... so they all should be hits.
To get to that once we find a match we look for matches of that same initial match and push in the same group so we can have the maximum amount of data to match on.
The result is that once we get the first group with the first element we then find and push the 3rd as well and from there it is much easier to match the 2nd. The logic is slightly more complex and I would imagine less performant.
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r, c) => {
let values = Object.values(c), groupID = getGroupId(r, values)[0]
if(!groupID)
r[c.id] = ({ vls: values, ids: [...r[c.id] || , c.id] })
else {
r[groupID] = ({
vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id]
})
}
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r,c) => {
if(c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data)))
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
var data = [ {id:1,name:'John',phone:'11111111',email:'aaaa@test.com'}, {id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'}, {id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'}, {id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'}, {id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'}, {id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'}, {id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'} ];
var testData = [{ id: 1, name: 'John', phone: '1', email: 'a' }, { id: 2, name: 'Marc', phone: '2', email: 'b' }, { id: 3, name: 'Ron', phone: '1', email: 'b' }];
const getGroupId = (obj, vals) => Object.entries(obj)
.find(([k,v]) => v.vls.some(x => vals.includes(x))) ||
const group = d => d.reduce((r,c,i,a) => {
let values = Object.values(c), groupID = !i ? i : getGroupId(r, values)[0]
if (!groupID) {
let hits = a.filter(x =>
x.id != c.id && values.some(v => Object.values(x).includes(v)))
hits.forEach(h =>
r[c.id] = ({ vls: [...values, ...Object.values(h)], ids: [c.id, h.id] }))
}
else
r[groupID] = r[groupID].ids.includes(c.id) ? r[groupID] :
({ vls: [...r[groupID].vls, ...values], ids: [...r[groupID].ids, c.id] })
return r
}, {})
const prettify = grp => Object.values(grp).reduce((r, c) => {
if (c.ids.length > 1)
r.push(c.ids)
return r
}, )
console.log(prettify(group(data))) // OP data
console.log(prettify(group(testData))) // Test data
edited Nov 21 '18 at 7:50
answered Nov 21 '18 at 4:22
Akrion
9,39211224
9,39211224
I would expect these to all be in one group:{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.
– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches inO(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
add a comment |
I would expect these to all be in one group:{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.
– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches inO(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.
– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
I would expect these to all be in one group:
{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.– Mark Meyer
Nov 21 '18 at 5:47
I would expect these to all be in one group:
{id:1,name:'John',phone:'1',email:'a'}, {id:2,name:'Marc',phone:'2',email:'b'}{id:3,name:'Ron', phone:'1',email:'b'}
, but it seems to leave 2 out.– Mark Meyer
Nov 21 '18 at 5:47
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
I did not get it the first time due to the funky formatting in the comments section but you ware correct. I updated with another version which matches as expected. Thanks.
– Akrion
Nov 21 '18 at 7:50
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in
O(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.– tucuxi
Nov 21 '18 at 11:29
The matching code seems to iterate over all values. This is wasteful, because a map could, on the right field, report on matches in
O(1)
. Your code is brief (I have not looked into its correctness), but cannot be optimal.– tucuxi
Nov 21 '18 at 11:29
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
In order to get the correctness of the groupings you have to scan for matches once you got the initial hit. I am not saying it is perfect but it is pretty generic and could be optimized more for sure.
– Akrion
Nov 21 '18 at 19:31
add a comment |
One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names
, phones
and emails
.
Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails
so that next contacts might fall into the same on.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.[4]
should not be in the result.
– Akrion
Nov 21 '18 at 6:11
add a comment |
One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names
, phones
and emails
.
Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails
so that next contacts might fall into the same on.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.[4]
should not be in the result.
– Akrion
Nov 21 '18 at 6:11
add a comment |
One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names
, phones
and emails
.
Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails
so that next contacts might fall into the same on.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
One way to accomplish what you need, is to separate the contacts into groups.
Each group will contain a list of names
, phones
and emails
.
Then iterate through contacts, and see if the current contact falls into any of the groups. If not, create a new group and set its names/phones/emails
so that next contacts might fall into the same on.
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
var data = [
{id:1,name:'John',phone:'11111111',email:'aaaa@test.com'},
{id:2,name:'Marc',phone:'22222222',email:'bbbb@test.com'},
{id:3,name:'Ron',phone:'99999999',email:'aaaa@test.com'},
{id:4,name:'Andrew',phone:'55555555',email:'dddd@test.com'},
{id:5,name:'Wim',phone:'99999999',email:'gggg@test.com'},
{id:6,name:'Marc',phone:'33333333',email:'cccc@test.com'},
{id:7,name:'Dan',phone:'44444444',email:'cccc@test.com'}
];
var groups = ;
data.forEach(function(person){
var phone = person.phone;
var email = person.email;
var name = person.name;
var id = person.id;
var found = false;
groups.forEach(function(g){
if( g.names.indexOf(name) > -1
|| g.phones.indexOf(phone)>-1
|| g.emails.indexOf(email)>-1) {
found = true;
g.names.push(name);
g.phones.push(phone);
g.emails.push(email);
g.people.push(id);
}
});
if(!found) {
groups.push({names:[name],phones:[phone],emails:[email],people:[id]});
}
});
var output=;
groups.forEach(function(g){
output.push(g.people);
});
console.log(output); //[ [1,3,5] , [2,6,7] , [4] ]
answered Nov 20 '18 at 9:51
Ahmad
8,19743463
8,19743463
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.[4]
should not be in the result.
– Akrion
Nov 21 '18 at 6:11
add a comment |
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.[4]
should not be in the result.
– Akrion
Nov 21 '18 at 6:11
2
2
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
O(n^2*log(n)) ? I thought OP was looking for an optimal solution
– joyBlanks
Nov 20 '18 at 10:44
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
the log(n) part is ok, but the multiplication with quadratic part n^2 will cause problems for bigger datasets. However, do not have better solution myself
– Han
Nov 20 '18 at 13:47
Groups with only 1 element should not be part of the result as per OP.
[4]
should not be in the result.– Akrion
Nov 21 '18 at 6:11
Groups with only 1 element should not be part of the result as per OP.
[4]
should not be in the result.– Akrion
Nov 21 '18 at 6:11
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53389734%2foptimal-algorithm-grouping-data-in-javascript%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
That looks like a cool problem... Just to make sure that I understood it correctly: what would it be the expected result if #2 had the following phone #:
99999999
? Would it be[[1, 2, 3, 5, 6, 7]]
?– Josep
Nov 20 '18 at 9:27
1
correct, we would have one group.
– Han
Nov 20 '18 at 9:39
I posted my answer below. BTW: contact id 4 falls into its own category
– Ahmad
Nov 20 '18 at 9:52
Nice question. Upvoted and added my take on it. Hope it helps.
– Akrion
Nov 21 '18 at 4:53