How to make Spacy's statistical models faster
up vote
0
down vote
favorite
I am using Spacy's pretrained statistical models such as en_core_web_md. I am trying to find similar words between two lists. While the code is working fine. It takes a lot of time to load the statistical model, each time the code is run.
Here is the code I am using.
How to make the models load faster?
Is there a way to save the model to the disk ?
import spacy
nlp = spacy.load('en_core_web_md')
list1 =['mango','apple','tomato','orange','papaya']
list2 =['mango','fig','cherry','apple','dates']
s_words =
for token1 in list1:
list_to_sort =
for token2 in list2:
list_to_sort.append((token1, token2, nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
similar_words= list(zip(*s_words))[1]
python-3.x nlp spacy
add a comment |
up vote
0
down vote
favorite
I am using Spacy's pretrained statistical models such as en_core_web_md. I am trying to find similar words between two lists. While the code is working fine. It takes a lot of time to load the statistical model, each time the code is run.
Here is the code I am using.
How to make the models load faster?
Is there a way to save the model to the disk ?
import spacy
nlp = spacy.load('en_core_web_md')
list1 =['mango','apple','tomato','orange','papaya']
list2 =['mango','fig','cherry','apple','dates']
s_words =
for token1 in list1:
list_to_sort =
for token2 in list2:
list_to_sort.append((token1, token2, nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
similar_words= list(zip(*s_words))[1]
python-3.x nlp spacy
1
Model loading in IO bound. If you want it to go faster load smaller models. You are usingweb_md
, which stands for medium- there is alsoen_core_web_sm
– mbatchkarov
Nov 19 at 19:43
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using Spacy's pretrained statistical models such as en_core_web_md. I am trying to find similar words between two lists. While the code is working fine. It takes a lot of time to load the statistical model, each time the code is run.
Here is the code I am using.
How to make the models load faster?
Is there a way to save the model to the disk ?
import spacy
nlp = spacy.load('en_core_web_md')
list1 =['mango','apple','tomato','orange','papaya']
list2 =['mango','fig','cherry','apple','dates']
s_words =
for token1 in list1:
list_to_sort =
for token2 in list2:
list_to_sort.append((token1, token2, nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
similar_words= list(zip(*s_words))[1]
python-3.x nlp spacy
I am using Spacy's pretrained statistical models such as en_core_web_md. I am trying to find similar words between two lists. While the code is working fine. It takes a lot of time to load the statistical model, each time the code is run.
Here is the code I am using.
How to make the models load faster?
Is there a way to save the model to the disk ?
import spacy
nlp = spacy.load('en_core_web_md')
list1 =['mango','apple','tomato','orange','papaya']
list2 =['mango','fig','cherry','apple','dates']
s_words =
for token1 in list1:
list_to_sort =
for token2 in list2:
list_to_sort.append((token1, token2, nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)
similar_words= list(zip(*s_words))[1]
python-3.x nlp spacy
python-3.x nlp spacy
asked Nov 19 at 12:40
venkatttaknev
197
197
1
Model loading in IO bound. If you want it to go faster load smaller models. You are usingweb_md
, which stands for medium- there is alsoen_core_web_sm
– mbatchkarov
Nov 19 at 19:43
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05
add a comment |
1
Model loading in IO bound. If you want it to go faster load smaller models. You are usingweb_md
, which stands for medium- there is alsoen_core_web_sm
– mbatchkarov
Nov 19 at 19:43
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05
1
1
Model loading in IO bound. If you want it to go faster load smaller models. You are using
web_md
, which stands for medium- there is also en_core_web_sm
– mbatchkarov
Nov 19 at 19:43
Model loading in IO bound. If you want it to go faster load smaller models. You are using
web_md
, which stands for medium- there is also en_core_web_sm
– mbatchkarov
Nov 19 at 19:43
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53374876%2fhow-to-make-spacys-statistical-models-faster%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Model loading in IO bound. If you want it to go faster load smaller models. You are using
web_md
, which stands for medium- there is alsoen_core_web_sm
– mbatchkarov
Nov 19 at 19:43
@mbatchkarov Thanks for your answer. But I think I do require medium models at the least. I also tried disable =['parser','ner','tagger']. It definitely speeds up the loading but what if In some case, the user does require parser, ner etc. The model loading by default has to be faster in such a case. I believe disabling is just a hack. Wonder is there a better solution. This was the gist of my question above.
– venkatttaknev
Nov 19 at 20:15
A spacy model is a large and complex data structure. To make deserialisation faster you need to attack either the large part (by finding a smaller model) or by the complex part (by rewriting spacy or training your own models). Outside of these two options, we'd have to rethink the fundamentals of what you are doing. Here are a few questions. Do you have evidence to suggest you absolutely require at least medium model? Can you compromise on accuracy? Can you preload the models once and query them repeatedly (e.g. a web service/ a jupyter notebook cell)? Do you have a fast SSD?
– mbatchkarov
Nov 20 at 9:05