c++ splitting string on non alphabetic characters











up vote
2
down vote

favorite












I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.



I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.



    while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);

Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}


My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")



Which is no good to me.



Found a way to use isalpha



template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}

template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);

}


But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?










share|improve this question
























  • I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
    – sln
    Nov 13 '14 at 23:32












  • supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
    – Aistis Taraskevicius
    Nov 13 '14 at 23:35






  • 1




    You should check for while( (pos=...) != npos )
    – sln
    Nov 13 '14 at 23:36










  • yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
    – Aistis Taraskevicius
    Nov 13 '14 at 23:39










  • Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
    – sln
    Nov 13 '14 at 23:42

















up vote
2
down vote

favorite












I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.



I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.



    while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);

Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}


My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")



Which is no good to me.



Found a way to use isalpha



template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}

template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);

}


But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?










share|improve this question
























  • I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
    – sln
    Nov 13 '14 at 23:32












  • supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
    – Aistis Taraskevicius
    Nov 13 '14 at 23:35






  • 1




    You should check for while( (pos=...) != npos )
    – sln
    Nov 13 '14 at 23:36










  • yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
    – Aistis Taraskevicius
    Nov 13 '14 at 23:39










  • Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
    – sln
    Nov 13 '14 at 23:42















up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.



I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.



    while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);

Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}


My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")



Which is no good to me.



Found a way to use isalpha



template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}

template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);

}


But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?










share|improve this question















I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.



I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.



    while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);

Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}


My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")



Which is no good to me.



Found a way to use isalpha



template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}

}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}

template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);

}


But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?







c++ regex string






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 7:08









Cœur

17.1k9102140




17.1k9102140










asked Nov 13 '14 at 23:11









Aistis Taraskevicius

1971219




1971219












  • I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
    – sln
    Nov 13 '14 at 23:32












  • supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
    – Aistis Taraskevicius
    Nov 13 '14 at 23:35






  • 1




    You should check for while( (pos=...) != npos )
    – sln
    Nov 13 '14 at 23:36










  • yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
    – Aistis Taraskevicius
    Nov 13 '14 at 23:39










  • Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
    – sln
    Nov 13 '14 at 23:42




















  • I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
    – sln
    Nov 13 '14 at 23:32












  • supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
    – Aistis Taraskevicius
    Nov 13 '14 at 23:35






  • 1




    You should check for while( (pos=...) != npos )
    – sln
    Nov 13 '14 at 23:36










  • yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
    – Aistis Taraskevicius
    Nov 13 '14 at 23:39










  • Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
    – sln
    Nov 13 '14 at 23:42


















I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32






I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32














supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35




supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35




1




1




You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36




You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36












yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39




yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39












Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42






Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42














2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










#include <algorithm>
#include <cctype>
...

template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}


Online demo



Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced






share|improve this answer























  • I like it !! Ambitious ..
    – sln
    Nov 14 '14 at 0:49










  • Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
    – Aistis Taraskevicius
    Nov 14 '14 at 1:16










  • Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
    – Aistis Taraskevicius
    Nov 14 '14 at 1:43










  • Ok I think it should be fixed now
    – smac89
    Nov 14 '14 at 3:40










  • So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
    – Aistis Taraskevicius
    Nov 14 '14 at 11:52


















up vote
1
down vote













Try this test case. Two problems.



1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.



2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.



    int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";

while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}

str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}


Outputs >>



Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .





share|improve this answer























  • Hey, instant split() !!
    – sln
    Nov 14 '14 at 0:30










  • what is int k used for ?
    – Aistis Taraskevicius
    Nov 14 '14 at 0:32










  • I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
    – smac89
    Nov 14 '14 at 0:32












  • @Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
    – Aistis Taraskevicius
    Nov 14 '14 at 0:35












  • @AistisTaraskevicius, see my answer
    – smac89
    Nov 14 '14 at 0:48











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f26920212%2fc-splitting-string-on-non-alphabetic-characters%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










#include <algorithm>
#include <cctype>
...

template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}


Online demo



Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced






share|improve this answer























  • I like it !! Ambitious ..
    – sln
    Nov 14 '14 at 0:49










  • Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
    – Aistis Taraskevicius
    Nov 14 '14 at 1:16










  • Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
    – Aistis Taraskevicius
    Nov 14 '14 at 1:43










  • Ok I think it should be fixed now
    – smac89
    Nov 14 '14 at 3:40










  • So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
    – Aistis Taraskevicius
    Nov 14 '14 at 11:52















up vote
2
down vote



accepted










#include <algorithm>
#include <cctype>
...

template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}


Online demo



Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced






share|improve this answer























  • I like it !! Ambitious ..
    – sln
    Nov 14 '14 at 0:49










  • Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
    – Aistis Taraskevicius
    Nov 14 '14 at 1:16










  • Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
    – Aistis Taraskevicius
    Nov 14 '14 at 1:43










  • Ok I think it should be fixed now
    – smac89
    Nov 14 '14 at 3:40










  • So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
    – Aistis Taraskevicius
    Nov 14 '14 at 11:52













up vote
2
down vote



accepted







up vote
2
down vote



accepted






#include <algorithm>
#include <cctype>
...

template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}


Online demo



Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced






share|improve this answer














#include <algorithm>
#include <cctype>
...

template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();

}
else
cout<<"Unable to open file!n";
}


Online demo



Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '14 at 3:40

























answered Nov 14 '14 at 0:46









smac89

11.7k43472




11.7k43472












  • I like it !! Ambitious ..
    – sln
    Nov 14 '14 at 0:49










  • Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
    – Aistis Taraskevicius
    Nov 14 '14 at 1:16










  • Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
    – Aistis Taraskevicius
    Nov 14 '14 at 1:43










  • Ok I think it should be fixed now
    – smac89
    Nov 14 '14 at 3:40










  • So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
    – Aistis Taraskevicius
    Nov 14 '14 at 11:52


















  • I like it !! Ambitious ..
    – sln
    Nov 14 '14 at 0:49










  • Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
    – Aistis Taraskevicius
    Nov 14 '14 at 1:16










  • Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
    – Aistis Taraskevicius
    Nov 14 '14 at 1:43










  • Ok I think it should be fixed now
    – smac89
    Nov 14 '14 at 3:40










  • So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
    – Aistis Taraskevicius
    Nov 14 '14 at 11:52
















I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49




I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49












Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16




Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16












Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43




Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43












Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40




Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40












So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52




So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52












up vote
1
down vote













Try this test case. Two problems.



1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.



2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.



    int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";

while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}

str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}


Outputs >>



Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .





share|improve this answer























  • Hey, instant split() !!
    – sln
    Nov 14 '14 at 0:30










  • what is int k used for ?
    – Aistis Taraskevicius
    Nov 14 '14 at 0:32










  • I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
    – smac89
    Nov 14 '14 at 0:32












  • @Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
    – Aistis Taraskevicius
    Nov 14 '14 at 0:35












  • @AistisTaraskevicius, see my answer
    – smac89
    Nov 14 '14 at 0:48















up vote
1
down vote













Try this test case. Two problems.



1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.



2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.



    int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";

while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}

str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}


Outputs >>



Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .





share|improve this answer























  • Hey, instant split() !!
    – sln
    Nov 14 '14 at 0:30










  • what is int k used for ?
    – Aistis Taraskevicius
    Nov 14 '14 at 0:32










  • I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
    – smac89
    Nov 14 '14 at 0:32












  • @Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
    – Aistis Taraskevicius
    Nov 14 '14 at 0:35












  • @AistisTaraskevicius, see my answer
    – smac89
    Nov 14 '14 at 0:48













up vote
1
down vote










up vote
1
down vote









Try this test case. Two problems.



1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.



2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.



    int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";

while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}

str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}


Outputs >>



Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .





share|improve this answer














Try this test case. Two problems.



1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.



2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.



    int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";

while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}

str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}


Outputs >>



Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '14 at 0:38

























answered Nov 14 '14 at 0:25









sln

26.1k31536




26.1k31536












  • Hey, instant split() !!
    – sln
    Nov 14 '14 at 0:30










  • what is int k used for ?
    – Aistis Taraskevicius
    Nov 14 '14 at 0:32










  • I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
    – smac89
    Nov 14 '14 at 0:32












  • @Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
    – Aistis Taraskevicius
    Nov 14 '14 at 0:35












  • @AistisTaraskevicius, see my answer
    – smac89
    Nov 14 '14 at 0:48


















  • Hey, instant split() !!
    – sln
    Nov 14 '14 at 0:30










  • what is int k used for ?
    – Aistis Taraskevicius
    Nov 14 '14 at 0:32










  • I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
    – smac89
    Nov 14 '14 at 0:32












  • @Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
    – Aistis Taraskevicius
    Nov 14 '14 at 0:35












  • @AistisTaraskevicius, see my answer
    – smac89
    Nov 14 '14 at 0:48
















Hey, instant split() !!
– sln
Nov 14 '14 at 0:30




Hey, instant split() !!
– sln
Nov 14 '14 at 0:30












what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32




what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32












I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32






I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32














@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35






@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35














@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48




@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f26920212%2fc-splitting-string-on-non-alphabetic-characters%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?