Python: How to get the similar-sounding words together
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
add a comment |
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
add a comment |
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
python python-3.x list
edited Mar 25 at 11:19
DirtyBit
11.9k21842
11.9k21842
asked Mar 25 at 5:31
Marc StochMarc Stoch
884
884
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
@EricDuminil Pardon, but I don't quiet get howisSoundex
returning a boolean would do?
– DirtyBit
Mar 25 at 9:16
He means the nameisSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something likegetSoundexList
?
– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
@EricDuminil Pardon, but I don't quiet get howisSoundex
returning a boolean would do?
– DirtyBit
Mar 25 at 9:16
He means the nameisSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something likegetSoundexList
?
– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
|
show 2 more comments
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
@EricDuminil Pardon, but I don't quiet get howisSoundex
returning a boolean would do?
– DirtyBit
Mar 25 at 9:16
He means the nameisSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something likegetSoundexList
?
– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
|
show 2 more comments
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
edited Mar 26 at 6:07
answered Mar 25 at 5:34
DirtyBitDirtyBit
11.9k21842
11.9k21842
@EricDuminil Pardon, but I don't quiet get howisSoundex
returning a boolean would do?
– DirtyBit
Mar 25 at 9:16
He means the nameisSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something likegetSoundexList
?
– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
|
show 2 more comments
@EricDuminil Pardon, but I don't quiet get howisSoundex
returning a boolean would do?
– DirtyBit
Mar 25 at 9:16
He means the nameisSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something likegetSoundexList
?
– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
@EricDuminil Pardon, but I don't quiet get how
isSoundex
returning a boolean would do?– DirtyBit
Mar 25 at 9:16
@EricDuminil Pardon, but I don't quiet get how
isSoundex
returning a boolean would do?– DirtyBit
Mar 25 at 9:16
He means the name
isSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something like getSoundexList
?– user2397282
Mar 25 at 9:47
He means the name
isSoundex
is a binary statement ('is' or 'is not'), and should therefore be a boolean returning function. Maybe consider changing the name to something like getSoundexList
?– user2397282
Mar 25 at 9:47
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
@user2397282 Crap, I over-looked it. Thank you. edited! :)
– DirtyBit
Mar 25 at 9:49
1
1
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
@EricDuminil Done! :)
– DirtyBit
Mar 25 at 11:14
1
1
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
Oooooo nice answer sir ;) +1
– Matt B.
Mar 26 at 13:32
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown