A newline/linebreak issue when writing parsed text to .csv file in python3

Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.

Parsing a page with data such as Title, Date, Description, Info

I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:

<p>Hello World <br/>

<br/>

Key points <br/>

<br/>

 -  Point number one  <br/>

 -  Point number two    <br/>

 -  Point number three  </p>

So I managed to extract it just as a text by using .text.strip() Now it's:

Hello World 



Key points 



 -  Point number one  

 -  Point number two    

 -  Point number three

Then I want to save results to a .csv file, each result to a new cell:

Title, Date, Description, Info

Title, Date, Description, Info

Title, Date, Description, Info

For this I'm creating a file, set headers and start writing in to it with FOR loop

filename = "scraping.csv"

f = open(filename, "w")



headers = "Title, Date, Description, Infon"

f.write(headers)

for article in articles:

    ...

    f.write(title + "," + date + "," + description + "," + info + "n")

f.close()

And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.

Title, Date, 

Des

crip

tion, Info

Title, Date, 

Des

crip

tion, Info    

Title, Date, 

Des

crip

tion, Info

If I write to the file everything except description all is good.

How do I save this description to a cell and ignore all unwanted newline/linebreak?

UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks

description = re.sub(r"[rn]+", " ", description)

Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn

pilcrow = """

    """

description = re.sub(r"[rn]+", pilcrow, description)

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16

Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26

Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30

@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40

nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46

add a comment |

Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.

Parsing a page with data such as Title, Date, Description, Info

I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:

<p>Hello World <br/>

<br/>

Key points <br/>

<br/>

 -  Point number one  <br/>

 -  Point number two    <br/>

 -  Point number three  </p>

So I managed to extract it just as a text by using .text.strip() Now it's:

Hello World 



Key points 



 -  Point number one  

 -  Point number two    

 -  Point number three

Then I want to save results to a .csv file, each result to a new cell:

Title, Date, Description, Info

Title, Date, Description, Info

Title, Date, Description, Info

For this I'm creating a file, set headers and start writing in to it with FOR loop

filename = "scraping.csv"

f = open(filename, "w")



headers = "Title, Date, Description, Infon"

f.write(headers)

for article in articles:

    ...

    f.write(title + "," + date + "," + description + "," + info + "n")

f.close()

And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.

Title, Date, 

Des

crip

tion, Info

Title, Date, 

Des

crip

tion, Info    

Title, Date, 

Des

crip

tion, Info

If I write to the file everything except description all is good.

How do I save this description to a cell and ignore all unwanted newline/linebreak?

UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks

description = re.sub(r"[rn]+", " ", description)

Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn

pilcrow = """

    """

description = re.sub(r"[rn]+", pilcrow, description)

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16

Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26

Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30

@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40

nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46

add a comment |

Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.

Parsing a page with data such as Title, Date, Description, Info

I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:

<p>Hello World <br/>

<br/>

Key points <br/>

<br/>

 -  Point number one  <br/>

 -  Point number two    <br/>

 -  Point number three  </p>

So I managed to extract it just as a text by using .text.strip() Now it's:

Hello World 



Key points 



 -  Point number one  

 -  Point number two    

 -  Point number three

Then I want to save results to a .csv file, each result to a new cell:

Title, Date, Description, Info

Title, Date, Description, Info

Title, Date, Description, Info

For this I'm creating a file, set headers and start writing in to it with FOR loop

filename = "scraping.csv"

f = open(filename, "w")



headers = "Title, Date, Description, Infon"

f.write(headers)

for article in articles:

    ...

    f.write(title + "," + date + "," + description + "," + info + "n")

f.close()

And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.

Title, Date, 

Des

crip

tion, Info

Title, Date, 

Des

crip

tion, Info    

Title, Date, 

Des

crip

tion, Info

If I write to the file everything except description all is good.

How do I save this description to a cell and ignore all unwanted newline/linebreak?

UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks

description = re.sub(r"[rn]+", " ", description)

Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn

pilcrow = """

    """

description = re.sub(r"[rn]+", pilcrow, description)

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.

Parsing a page with data such as Title, Date, Description, Info

I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:

<p>Hello World <br/>

<br/>

Key points <br/>

<br/>

 -  Point number one  <br/>

 -  Point number two    <br/>

 -  Point number three  </p>

So I managed to extract it just as a text by using .text.strip() Now it's:

Hello World 



Key points 



 -  Point number one  

 -  Point number two    

 -  Point number three

Then I want to save results to a .csv file, each result to a new cell:

Title, Date, Description, Info

Title, Date, Description, Info

Title, Date, Description, Info

For this I'm creating a file, set headers and start writing in to it with FOR loop

filename = "scraping.csv"

f = open(filename, "w")



headers = "Title, Date, Description, Infon"

f.write(headers)

for article in articles:

    ...

    f.write(title + "," + date + "," + description + "," + info + "n")

f.close()

And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.

Title, Date, 

Des

crip

tion, Info

Title, Date, 

Des

crip

tion, Info    

Title, Date, 

Des

crip

tion, Info

If I write to the file everything except description all is good.

How do I save this description to a cell and ignore all unwanted newline/linebreak?

UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks

description = re.sub(r"[rn]+", " ", description)

Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn

pilcrow = """

    """

description = re.sub(r"[rn]+", pilcrow, description)

python-3.x macos csv web-scraping beautifulsoup

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

edited Nov 21 '18 at 20:19

asked Nov 21 '18 at 15:52

Imeleges

asked Nov 21 '18 at 15:52

Imeleges

asked Nov 21 '18 at 15:52

Imeleges

Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16

Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26

Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30

@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40

nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46

add a comment |

Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16

Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26

Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30

@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40

nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46

Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16

Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26

Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30

@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40

nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46

add a comment |

1 Answer
1

active

oldest

votes

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:

    ...

    # description = re.sub(r"[rn]+", " ", description)

    description = description.replace('"', '""')

    rows = '"%s","%s","%s","%s"n' % (title, date, description, info)

    f.write(rows)

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415813%2fa-newline-linebreak-issue-when-writing-parsed-text-to-csv-file-in-python3%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:

    ...

    # description = re.sub(r"[rn]+", " ", description)

    description = description.replace('"', '""')

    rows = '"%s","%s","%s","%s"n' % (title, date, description, info)

    f.write(rows)

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

add a comment |

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:

    ...

    # description = re.sub(r"[rn]+", " ", description)

    description = description.replace('"', '""')

    rows = '"%s","%s","%s","%s"n' % (title, date, description, info)

    f.write(rows)

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

add a comment |

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:

    ...

    # description = re.sub(r"[rn]+", " ", description)

    description = description.replace('"', '""')

    rows = '"%s","%s","%s","%s"n' % (title, date, description, info)

    f.write(rows)

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:

    ...

    # description = re.sub(r"[rn]+", " ", description)

    description = description.replace('"', '""')

    rows = '"%s","%s","%s","%s"n' % (title, date, description, info)

    f.write(rows)

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

edited Nov 21 '18 at 20:47

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

answered Nov 21 '18 at 16:56

ewwink

11.8k22239

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

add a comment |

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18

you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23

Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21

great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr