c++ splitting string on non alphabetic characters

up vote
2
down vote

favorite

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.

I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.

    while(getline(fileToOpen,str))

    {

        unsigned int pos= 0;

        string token;

        //transform(str.begin(),str.end(),str.begin(),::tolower);

        while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))

        {

            token = str.substr(0, pos);

            //transform(str.begin(),str.end(),str.begin(),::tolower);



            Node<t>* ptr=search(token,root);

            if (ptr!=NULL)

            {

                ptr->count++;

                cout<<token<<" already in tree.Count "<<ptr->count<<"n";

            }

            else

            {

                insert(token,root);

                cout<<token<<" added to tree.n";

            }

            ptr=NULL;

            str.erase(0, pos);

        }



    }

My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")

Which is no good to me.

Found a way to use isalpha

template<typename t>

void Tree<t>::readFromFile(string filename)

{

    string str;

    ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        while(getline(fileToOpen,str))

        {

            unsigned int pos= 0;

            string token;

            //transform(str.begin(),str.end(),str.begin(),::tolower);

            while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)

            {

                token = str.substr(0, pos);

                transform(token.begin(),token.end(),token.begin(),::tolower);

                Node<t>* ptr=search(token,root);

                if (ptr!=NULL)

                {

                    ptr->count++;

                   // cout<<token<<" already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token,root);

                    cout<<token<<" added to tree.n";

                }

                ptr=NULL;

                str.erase(0, pos);

            }



        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}



template<typename t>

inline bool Tree<t>::aZCheck(char c)

{

    return !isalpha(c);



}

But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32

supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35

1

You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36

yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39

Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42

|
show 5 more comments

up vote
2
down vote

favorite

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.

I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.

    while(getline(fileToOpen,str))

    {

        unsigned int pos= 0;

        string token;

        //transform(str.begin(),str.end(),str.begin(),::tolower);

        while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))

        {

            token = str.substr(0, pos);

            //transform(str.begin(),str.end(),str.begin(),::tolower);



            Node<t>* ptr=search(token,root);

            if (ptr!=NULL)

            {

                ptr->count++;

                cout<<token<<" already in tree.Count "<<ptr->count<<"n";

            }

            else

            {

                insert(token,root);

                cout<<token<<" added to tree.n";

            }

            ptr=NULL;

            str.erase(0, pos);

        }



    }

My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")

Which is no good to me.

Found a way to use isalpha

template<typename t>

void Tree<t>::readFromFile(string filename)

{

    string str;

    ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        while(getline(fileToOpen,str))

        {

            unsigned int pos= 0;

            string token;

            //transform(str.begin(),str.end(),str.begin(),::tolower);

            while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)

            {

                token = str.substr(0, pos);

                transform(token.begin(),token.end(),token.begin(),::tolower);

                Node<t>* ptr=search(token,root);

                if (ptr!=NULL)

                {

                    ptr->count++;

                   // cout<<token<<" already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token,root);

                    cout<<token<<" added to tree.n";

                }

                ptr=NULL;

                str.erase(0, pos);

            }



        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}



template<typename t>

inline bool Tree<t>::aZCheck(char c)

{

    return !isalpha(c);



}

But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32

supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35

1

You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36

yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39

Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42

|
show 5 more comments

up vote
2
down vote

favorite

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.

I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.

    while(getline(fileToOpen,str))

    {

        unsigned int pos= 0;

        string token;

        //transform(str.begin(),str.end(),str.begin(),::tolower);

        while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))

        {

            token = str.substr(0, pos);

            //transform(str.begin(),str.end(),str.begin(),::tolower);



            Node<t>* ptr=search(token,root);

            if (ptr!=NULL)

            {

                ptr->count++;

                cout<<token<<" already in tree.Count "<<ptr->count<<"n";

            }

            else

            {

                insert(token,root);

                cout<<token<<" added to tree.n";

            }

            ptr=NULL;

            str.erase(0, pos);

        }



    }

My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")

Which is no good to me.

Found a way to use isalpha

template<typename t>

void Tree<t>::readFromFile(string filename)

{

    string str;

    ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        while(getline(fileToOpen,str))

        {

            unsigned int pos= 0;

            string token;

            //transform(str.begin(),str.end(),str.begin(),::tolower);

            while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)

            {

                token = str.substr(0, pos);

                transform(token.begin(),token.end(),token.begin(),::tolower);

                Node<t>* ptr=search(token,root);

                if (ptr!=NULL)

                {

                    ptr->count++;

                   // cout<<token<<" already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token,root);

                    cout<<token<<" added to tree.n";

                }

                ptr=NULL;

                str.erase(0, pos);

            }



        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}



template<typename t>

inline bool Tree<t>::aZCheck(char c)

{

    return !isalpha(c);



}

But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.

I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.

    while(getline(fileToOpen,str))

    {

        unsigned int pos= 0;

        string token;

        //transform(str.begin(),str.end(),str.begin(),::tolower);

        while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))

        {

            token = str.substr(0, pos);

            //transform(str.begin(),str.end(),str.begin(),::tolower);



            Node<t>* ptr=search(token,root);

            if (ptr!=NULL)

            {

                ptr->count++;

                cout<<token<<" already in tree.Count "<<ptr->count<<"n";

            }

            else

            {

                insert(token,root);

                cout<<token<<" added to tree.n";

            }

            ptr=NULL;

            str.erase(0, pos);

        }



    }

My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")

Which is no good to me.

Found a way to use isalpha

template<typename t>

void Tree<t>::readFromFile(string filename)

{

    string str;

    ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        while(getline(fileToOpen,str))

        {

            unsigned int pos= 0;

            string token;

            //transform(str.begin(),str.end(),str.begin(),::tolower);

            while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)

            {

                token = str.substr(0, pos);

                transform(token.begin(),token.end(),token.begin(),::tolower);

                Node<t>* ptr=search(token,root);

                if (ptr!=NULL)

                {

                    ptr->count++;

                   // cout<<token<<" already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token,root);

                    cout<<token<<" added to tree.n";

                }

                ptr=NULL;

                str.erase(0, pos);

            }



        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}



template<typename t>

inline bool Tree<t>::aZCheck(char c)

{

    return !isalpha(c);



}

But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?

c++ regex string

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

edited Nov 19 at 7:08

Cœur

17.1k9102140

edited Nov 19 at 7:08

Cœur

17.1k9102140

edited Nov 19 at 7:08

Cœur

17.1k9102140

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

asked Nov 13 '14 at 23:11

Aistis Taraskevicius

1971219

I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32

supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35

1

You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36

yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39

Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42

|
show 5 more comments

I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32

supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35

1

You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36

yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39

Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42

I never used it but how did find_first_not_of() work ? Should just store the pointers as split positions. Include both the upper case and lower case chars in the string passed to the function.
– sln
Nov 13 '14 at 23:32

supposedly returns location of character in a string that is not one of characters indicated, but in my case it just splits into single characters
– Aistis Taraskevicius
Nov 13 '14 at 23:35

You should check for while( (pos=...) != npos )
– sln
Nov 13 '14 at 23:36

yep, already added that, without it doesnt split, with it it splits into single characters. Something like this could be done with single 3 word statement in Java .... C++ doesnt make it easy
– Aistis Taraskevicius
Nov 13 '14 at 23:39

Ahh, like I said I never used it, do a lot of C++ too. But if I were going to beat the C++ consortium (or Ms) over the head, I would try single, simple use cases first. Breakpoint on what pos returns/
– sln
Nov 13 '14 at 23:42

|
show 5 more comments

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

#include <algorithm>

#include <cctype>

...



template<typename t>

void Tree<t>::readFromFile(std::string filename)

{

    std::string str;

    std::ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )

        {                

            for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();

                pos = std::find_if(prev, str.end(), isalpha))

            {

                prev = std::find_if_not(pos, str.end(), isalpha);

                std::string token(pos, prev);

                std::transform(token.begin(), token.end(), token.begin(), ::tolower);

                Node<t>* ptr = search(token, root);

                if (ptr != NULL)

                {

                    ptr->count++;

                   // cout<< token << " already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token, root);

                    cout << token << " added to tree.n";

                }

            }

        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}

Online demo

Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

add a comment |

up vote
1
down vote

Try this test case. Two problems.

1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.

2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.

    int pos= 0;

    string token;

    string str = "Thisis(asdfasdfasdf)and!this)))";



    while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )

    {

        if ( pos != 0 )

        {

            // Found a token

            token = str.substr(0, pos);

            cout << "Found: " << token << endl;

        }

        else

        {

            // Found another delimiter

            // Just move on to next one

        }



        str.erase(0, pos+1);  // Always remove pos+1 to get rid of delimiter

    }

    // Cover the last (or only) token

    if ( str.length() > 0 )

    {

        token = str;

        cout << "Found: " << token << endl;

    }

Outputs >>

Found: Thisis

Found: asdfasdfasdf

Found: and

Found: this

Press any key to continue . . .

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

|
show 3 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f26920212%2fc-splitting-string-on-non-alphabetic-characters%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

#include <algorithm>

#include <cctype>

...



template<typename t>

void Tree<t>::readFromFile(std::string filename)

{

    std::string str;

    std::ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )

        {                

            for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();

                pos = std::find_if(prev, str.end(), isalpha))

            {

                prev = std::find_if_not(pos, str.end(), isalpha);

                std::string token(pos, prev);

                std::transform(token.begin(), token.end(), token.begin(), ::tolower);

                Node<t>* ptr = search(token, root);

                if (ptr != NULL)

                {

                    ptr->count++;

                   // cout<< token << " already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token, root);

                    cout << token << " added to tree.n";

                }

            }

        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}

Online demo

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

add a comment |

up vote
2
down vote

accepted

#include <algorithm>

#include <cctype>

...



template<typename t>

void Tree<t>::readFromFile(std::string filename)

{

    std::string str;

    std::ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )

        {                

            for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();

                pos = std::find_if(prev, str.end(), isalpha))

            {

                prev = std::find_if_not(pos, str.end(), isalpha);

                std::string token(pos, prev);

                std::transform(token.begin(), token.end(), token.begin(), ::tolower);

                Node<t>* ptr = search(token, root);

                if (ptr != NULL)

                {

                    ptr->count++;

                   // cout<< token << " already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token, root);

                    cout << token << " added to tree.n";

                }

            }

        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}

Online demo

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

add a comment |

up vote
2
down vote

accepted

#include <algorithm>

#include <cctype>

...



template<typename t>

void Tree<t>::readFromFile(std::string filename)

{

    std::string str;

    std::ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )

        {                

            for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();

                pos = std::find_if(prev, str.end(), isalpha))

            {

                prev = std::find_if_not(pos, str.end(), isalpha);

                std::string token(pos, prev);

                std::transform(token.begin(), token.end(), token.begin(), ::tolower);

                Node<t>* ptr = search(token, root);

                if (ptr != NULL)

                {

                    ptr->count++;

                   // cout<< token << " already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token, root);

                    cout << token << " added to tree.n";

                }

            }

        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}

Online demo

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

#include <algorithm>

#include <cctype>

...



template<typename t>

void Tree<t>::readFromFile(std::string filename)

{

    std::string str;

    std::ifstream fileToOpen(filename.c_str());

    if (fileToOpen.is_open())

    {

        for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )

        {                

            for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();

                pos = std::find_if(prev, str.end(), isalpha))

            {

                prev = std::find_if_not(pos, str.end(), isalpha);

                std::string token(pos, prev);

                std::transform(token.begin(), token.end(), token.begin(), ::tolower);

                Node<t>* ptr = search(token, root);

                if (ptr != NULL)

                {

                    ptr->count++;

                   // cout<< token << " already in tree.Count "<<ptr->count<<"n";

                }

                else

                {

                    insert(token, root);

                    cout << token << " added to tree.n";

                }

            }

        }

        fileToOpen.close();



    }

    else

        cout<<"Unable to open file!n";

}

Online demo

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

edited Nov 14 '14 at 3:40

answered Nov 14 '14 at 0:46

smac89

11.7k43472

answered Nov 14 '14 at 0:46

smac89

11.7k43472

answered Nov 14 '14 at 0:46

smac89

11.7k43472

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

add a comment |

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

I like it !! Ambitious ..
– sln
Nov 14 '14 at 0:49

Thats exactly what am I doing, my insert function sets count to 1. And I am calling search function just once which returns pointer to node using that pointer I will either increase c ount by one, or add new node if pointer is null
– Aistis Taraskevicius
Nov 14 '14 at 1:16

Your solution is working , but last word wont be counted if there is no delimiter at the end of line, remove last any dot at the end of the line to see what I mean. Same goes if there is only single word on a line, it will be ignored.
– Aistis Taraskevicius
Nov 14 '14 at 1:43

Ok I think it should be fixed now
– smac89
Nov 14 '14 at 3:40

So I ran completion times on both solutions, and this one is about 0.003s or 30~ milliseconds faster. When processing file of 570k characters and total time is around 340-345~ ms, or 370~ for @sln solution.
– Aistis Taraskevicius
Nov 14 '14 at 11:52

add a comment |

up vote
1
down vote

Try this test case. Two problems.

1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.

2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.

    int pos= 0;

    string token;

    string str = "Thisis(asdfasdfasdf)and!this)))";



    while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )

    {

        if ( pos != 0 )

        {

            // Found a token

            token = str.substr(0, pos);

            cout << "Found: " << token << endl;

        }

        else

        {

            // Found another delimiter

            // Just move on to next one

        }



        str.erase(0, pos+1);  // Always remove pos+1 to get rid of delimiter

    }

    // Cover the last (or only) token

    if ( str.length() > 0 )

    {

        token = str;

        cout << "Found: " << token << endl;

    }

Outputs >>

Found: Thisis

Found: asdfasdfasdf

Found: and

Found: this

Press any key to continue . . .

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

|
show 3 more comments

up vote
1
down vote

Try this test case. Two problems.

1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.

2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.

    int pos= 0;

    string token;

    string str = "Thisis(asdfasdfasdf)and!this)))";



    while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )

    {

        if ( pos != 0 )

        {

            // Found a token

            token = str.substr(0, pos);

            cout << "Found: " << token << endl;

        }

        else

        {

            // Found another delimiter

            // Just move on to next one

        }



        str.erase(0, pos+1);  // Always remove pos+1 to get rid of delimiter

    }

    // Cover the last (or only) token

    if ( str.length() > 0 )

    {

        token = str;

        cout << "Found: " << token << endl;

    }

Outputs >>

Found: Thisis

Found: asdfasdfasdf

Found: and

Found: this

Press any key to continue . . .

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

|
show 3 more comments

up vote
1
down vote

Try this test case. Two problems.

1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.

2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.

    int pos= 0;

    string token;

    string str = "Thisis(asdfasdfasdf)and!this)))";



    while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )

    {

        if ( pos != 0 )

        {

            // Found a token

            token = str.substr(0, pos);

            cout << "Found: " << token << endl;

        }

        else

        {

            // Found another delimiter

            // Just move on to next one

        }



        str.erase(0, pos+1);  // Always remove pos+1 to get rid of delimiter

    }

    // Cover the last (or only) token

    if ( str.length() > 0 )

    {

        token = str;

        cout << "Found: " << token << endl;

    }

Outputs >>

Found: Thisis

Found: asdfasdfasdf

Found: and

Found: this

Press any key to continue . . .

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

Try this test case. Two problems.

1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)

This causes it to break out of the while. Use npos as a conditional check instead.

2 - You have to advance the postion past the delimiter when you erase, otherwise

it finds the same one over and over.

    int pos= 0;

    string token;

    string str = "Thisis(asdfasdfasdf)and!this)))";



    while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )

    {

        if ( pos != 0 )

        {

            // Found a token

            token = str.substr(0, pos);

            cout << "Found: " << token << endl;

        }

        else

        {

            // Found another delimiter

            // Just move on to next one

        }



        str.erase(0, pos+1);  // Always remove pos+1 to get rid of delimiter

    }

    // Cover the last (or only) token

    if ( str.length() > 0 )

    {

        token = str;

        cout << "Found: " << token << endl;

    }

Outputs >>

Found: Thisis

Found: asdfasdfasdf

Found: and

Found: this

Press any key to continue . . .

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

edited Nov 14 '14 at 0:38

answered Nov 14 '14 at 0:25

sln

26.1k31536

answered Nov 14 '14 at 0:25

sln

26.1k31536

answered Nov 14 '14 at 0:25

sln

26.1k31536

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

|
show 3 more comments

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

Hey, instant split() !!
– sln
Nov 14 '14 at 0:30

what is int k used for ?
– Aistis Taraskevicius
Nov 14 '14 at 0:32

I think using find_if_not with isalpha as a predicate will make this faster and eliminate the need to write out the entire alphabet
– smac89
Nov 14 '14 at 0:32

@Smac89 Any chance you could show how to use it, find_if returns an interator instead of position in string,
– Aistis Taraskevicius
Nov 14 '14 at 0:35

@AistisTaraskevicius, see my answer
– smac89
Nov 14 '14 at 0:48

|
show 3 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Z,0CwWYJR,kyFDywEW 6S8wf

搜尋此網誌

Argthtjtr