c++ splitting string on non alphabetic characters
up vote
2
down vote
favorite
I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha
, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
c++ regex string
|
show 5 more comments
up vote
2
down vote
favorite
I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha
, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
c++ regex string
I never used it but how didfind_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
1
You should check forwhile( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on whatpos
returns/
– sln
Nov 13 '14 at 23:42
|
show 5 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha
, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
c++ regex string
I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha
, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
c++ regex string
c++ regex string
edited Nov 19 at 7:08
Cœur
17.1k9102140
17.1k9102140
asked Nov 13 '14 at 23:11
Aistis Taraskevicius
1971219
1971219
I never used it but how didfind_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
1
You should check forwhile( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on whatpos
returns/
– sln
Nov 13 '14 at 23:42
|
show 5 more comments
I never used it but how didfind_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
1
You should check forwhile( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on whatpos
returns/
– sln
Nov 13 '14 at 23:42
I never used it but how did
find_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.– sln
Nov 13 '14 at 23:32
I never used it but how did
find_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.– sln
Nov 13 '14 at 23:32
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
1
1
You should check for
while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
You should check for
while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what
pos
returns/– sln
Nov 13 '14 at 23:42
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what
pos
returns/– sln
Nov 13 '14 at 23:42
|
show 5 more comments
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
add a comment |
up vote
1
down vote
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos
as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
|
show 3 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
add a comment |
up vote
2
down vote
accepted
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"n";
}
else
{
insert(token, root);
cout << token << " added to tree.n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
edited Nov 14 '14 at 3:40
answered Nov 14 '14 at 0:46
smac89
11.7k43472
11.7k43472
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
add a comment |
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52
add a comment |
up vote
1
down vote
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos
as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
|
show 3 more comments
up vote
1
down vote
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos
as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
|
show 3 more comments
up vote
1
down vote
up vote
1
down vote
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos
as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos
as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .
edited Nov 14 '14 at 0:38
answered Nov 14 '14 at 0:25
sln
26.1k31536
26.1k31536
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
|
show 3 more comments
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
Hey, instant split() !!
– sln
Nov 14 '14 at 0:30
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48
|
show 3 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f26920212%2fc-splitting-string-on-non-alphabetic-characters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I never used it but how did
find_first_not_of()
work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.– sln
Nov 13 '14 at 23:32
supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35
1
You should check for
while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36
yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39
Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what
pos
returns/– sln
Nov 13 '14 at 23:42