A newline/linebreak issue when writing parsed text to .csv file in python3












1















Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.



Parsing a page with data such as Title, Date, Description, Info



I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:



<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>


So I managed to extract it just as a text by using .text.strip() Now it's:



Hello World 

Key points

- Point number one
- Point number two
- Point number three


Then I want to save results to a .csv file, each result to a new cell:



Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info


For this I'm creating a file, set headers and start writing in to it with FOR loop



filename = "scraping.csv"
f = open(filename, "w")

headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()


And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.



Title, Date, 
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info


If I write to the file everything except description all is good.



How do I save this description to a cell and ignore all unwanted newline/linebreak?



UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks



description = re.sub(r"[rn]+", " ", description)


Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn



pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)









share|improve this question

























  • Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

    – Guimoute
    Nov 21 '18 at 16:16











  • Just before writing it to a file, can you print description?

    – BlueSheepToken
    Nov 21 '18 at 16:26











  • Yes, and it will be just fine, no tags, just text with formatting

    – Imeleges
    Nov 21 '18 at 16:30













  • @Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

    – Imeleges
    Nov 21 '18 at 16:40











  • nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

    – ewwink
    Nov 21 '18 at 20:46
















1















Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.



Parsing a page with data such as Title, Date, Description, Info



I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:



<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>


So I managed to extract it just as a text by using .text.strip() Now it's:



Hello World 

Key points

- Point number one
- Point number two
- Point number three


Then I want to save results to a .csv file, each result to a new cell:



Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info


For this I'm creating a file, set headers and start writing in to it with FOR loop



filename = "scraping.csv"
f = open(filename, "w")

headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()


And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.



Title, Date, 
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info


If I write to the file everything except description all is good.



How do I save this description to a cell and ignore all unwanted newline/linebreak?



UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks



description = re.sub(r"[rn]+", " ", description)


Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn



pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)









share|improve this question

























  • Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

    – Guimoute
    Nov 21 '18 at 16:16











  • Just before writing it to a file, can you print description?

    – BlueSheepToken
    Nov 21 '18 at 16:26











  • Yes, and it will be just fine, no tags, just text with formatting

    – Imeleges
    Nov 21 '18 at 16:30













  • @Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

    – Imeleges
    Nov 21 '18 at 16:40











  • nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

    – ewwink
    Nov 21 '18 at 20:46














1












1








1








Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.



Parsing a page with data such as Title, Date, Description, Info



I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:



<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>


So I managed to extract it just as a text by using .text.strip() Now it's:



Hello World 

Key points

- Point number one
- Point number two
- Point number three


Then I want to save results to a .csv file, each result to a new cell:



Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info


For this I'm creating a file, set headers and start writing in to it with FOR loop



filename = "scraping.csv"
f = open(filename, "w")

headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()


And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.



Title, Date, 
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info


If I write to the file everything except description all is good.



How do I save this description to a cell and ignore all unwanted newline/linebreak?



UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks



description = re.sub(r"[rn]+", " ", description)


Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn



pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)









share|improve this question
















Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.



Parsing a page with data such as Title, Date, Description, Info



I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:



<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>


So I managed to extract it just as a text by using .text.strip() Now it's:



Hello World 

Key points

- Point number one
- Point number two
- Point number three


Then I want to save results to a .csv file, each result to a new cell:



Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info


For this I'm creating a file, set headers and start writing in to it with FOR loop



filename = "scraping.csv"
f = open(filename, "w")

headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()


And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.



Title, Date, 
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info


If I write to the file everything except description all is good.



How do I save this description to a cell and ignore all unwanted newline/linebreak?



UPDATE:

Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks



description = re.sub(r"[rn]+", " ", description)


Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn



pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)






python-3.x macos csv web-scraping beautifulsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 20:19







Imeleges

















asked Nov 21 '18 at 15:52









ImelegesImeleges

85




85













  • Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

    – Guimoute
    Nov 21 '18 at 16:16











  • Just before writing it to a file, can you print description?

    – BlueSheepToken
    Nov 21 '18 at 16:26











  • Yes, and it will be just fine, no tags, just text with formatting

    – Imeleges
    Nov 21 '18 at 16:30













  • @Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

    – Imeleges
    Nov 21 '18 at 16:40











  • nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

    – ewwink
    Nov 21 '18 at 20:46



















  • Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

    – Guimoute
    Nov 21 '18 at 16:16











  • Just before writing it to a file, can you print description?

    – BlueSheepToken
    Nov 21 '18 at 16:26











  • Yes, and it will be just fine, no tags, just text with formatting

    – Imeleges
    Nov 21 '18 at 16:30













  • @Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

    – Imeleges
    Nov 21 '18 at 16:40











  • nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

    – ewwink
    Nov 21 '18 at 20:46

















Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16





Might be a be ugly, but for now did you try f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n") to make sure the various strings are indeed clear of all line breaks?

– Guimoute
Nov 21 '18 at 16:16













Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26





Just before writing it to a file, can you print description?

– BlueSheepToken
Nov 21 '18 at 16:26













Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30







Yes, and it will be just fine, no tags, just text with formatting

– Imeleges
Nov 21 '18 at 16:30















@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40





@Guimoute, text is already passed from the list str(description[0].text.strip()) and it didn't help

– Imeleges
Nov 21 '18 at 16:40













nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46





nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .

– ewwink
Nov 21 '18 at 20:46












1 Answer
1






active

oldest

votes


















1














to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""



for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)





share|improve this answer


























  • Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

    – Imeleges
    Nov 21 '18 at 18:18













  • you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

    – ewwink
    Nov 21 '18 at 18:23











  • Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

    – Imeleges
    Nov 21 '18 at 20:21











  • great, and you're welcome.

    – ewwink
    Nov 21 '18 at 20:22











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415813%2fa-newline-linebreak-issue-when-writing-parsed-text-to-csv-file-in-python3%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""



for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)





share|improve this answer


























  • Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

    – Imeleges
    Nov 21 '18 at 18:18













  • you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

    – ewwink
    Nov 21 '18 at 18:23











  • Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

    – Imeleges
    Nov 21 '18 at 20:21











  • great, and you're welcome.

    – ewwink
    Nov 21 '18 at 20:22
















1














to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""



for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)





share|improve this answer


























  • Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

    – Imeleges
    Nov 21 '18 at 18:18













  • you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

    – ewwink
    Nov 21 '18 at 18:23











  • Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

    – Imeleges
    Nov 21 '18 at 20:21











  • great, and you're welcome.

    – ewwink
    Nov 21 '18 at 20:22














1












1








1







to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""



for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)





share|improve this answer















to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""



for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 21 '18 at 20:47

























answered Nov 21 '18 at 16:56









ewwinkewwink

11.8k22239




11.8k22239













  • Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

    – Imeleges
    Nov 21 '18 at 18:18













  • you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

    – ewwink
    Nov 21 '18 at 18:23











  • Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

    – Imeleges
    Nov 21 '18 at 20:21











  • great, and you're welcome.

    – ewwink
    Nov 21 '18 at 20:22



















  • Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

    – Imeleges
    Nov 21 '18 at 18:18













  • you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

    – ewwink
    Nov 21 '18 at 18:23











  • Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

    – Imeleges
    Nov 21 '18 at 20:21











  • great, and you're welcome.

    – ewwink
    Nov 21 '18 at 20:22

















Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18







Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting

– Imeleges
Nov 21 '18 at 18:18















you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23





you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.

– ewwink
Nov 21 '18 at 18:23













Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21





Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.

– Imeleges
Nov 21 '18 at 20:21













great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22





great, and you're welcome.

– ewwink
Nov 21 '18 at 20:22


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415813%2fa-newline-linebreak-issue-when-writing-parsed-text-to-csv-file-in-python3%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?