A newline/linebreak issue when writing parsed text to .csv file in python3
Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.
Parsing a page with data such as Title, Date, Description, Info
I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:
<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>
So I managed to extract it just as a text by using .text.strip() Now it's:
Hello World
Key points
- Point number one
- Point number two
- Point number three
Then I want to save results to a .csv file, each result to a new cell:
Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info
For this I'm creating a file, set headers and start writing in to it with FOR loop
filename = "scraping.csv"
f = open(filename, "w")
headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()
And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
If I write to the file everything except description all is good.
How do I save this description to a cell and ignore all unwanted newline/linebreak?
UPDATE:
Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks
description = re.sub(r"[rn]+", " ", description)
Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn
pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)
python-3.x macos csv web-scraping beautifulsoup
add a comment |
Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.
Parsing a page with data such as Title, Date, Description, Info
I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:
<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>
So I managed to extract it just as a text by using .text.strip() Now it's:
Hello World
Key points
- Point number one
- Point number two
- Point number three
Then I want to save results to a .csv file, each result to a new cell:
Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info
For this I'm creating a file, set headers and start writing in to it with FOR loop
filename = "scraping.csv"
f = open(filename, "w")
headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()
And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
If I write to the file everything except description all is good.
How do I save this description to a cell and ignore all unwanted newline/linebreak?
UPDATE:
Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks
description = re.sub(r"[rn]+", " ", description)
Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn
pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)
python-3.x macos csv web-scraping beautifulsoup
Might be a be ugly, but for now did you tryf.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?
– Guimoute
Nov 21 '18 at 16:16
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
@Guimoute, text is already passed from the liststr(description[0].text.strip())
and it didn't help
– Imeleges
Nov 21 '18 at 16:40
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46
add a comment |
Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.
Parsing a page with data such as Title, Date, Description, Info
I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:
<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>
So I managed to extract it just as a text by using .text.strip() Now it's:
Hello World
Key points
- Point number one
- Point number two
- Point number three
Then I want to save results to a .csv file, each result to a new cell:
Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info
For this I'm creating a file, set headers and start writing in to it with FOR loop
filename = "scraping.csv"
f = open(filename, "w")
headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()
And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
If I write to the file everything except description all is good.
How do I save this description to a cell and ignore all unwanted newline/linebreak?
UPDATE:
Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks
description = re.sub(r"[rn]+", " ", description)
Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn
pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)
python-3.x macos csv web-scraping beautifulsoup
Have an issue with parsing extracted text from html with BeautifulSoup and writing it to a .csv file.
Parsing a page with data such as Title, Date, Description, Info
I have a Description text example with exact structure parsed from a web page. It has those
tags and double spaces:
<p>Hello World <br/>
<br/>
Key points <br/>
<br/>
- Point number one <br/>
- Point number two <br/>
- Point number three </p>
So I managed to extract it just as a text by using .text.strip() Now it's:
Hello World
Key points
- Point number one
- Point number two
- Point number three
Then I want to save results to a .csv file, each result to a new cell:
Title, Date, Description, Info
Title, Date, Description, Info
Title, Date, Description, Info
For this I'm creating a file, set headers and start writing in to it with FOR loop
filename = "scraping.csv"
f = open(filename, "w")
headers = "Title, Date, Description, Infon"
f.write(headers)
for article in articles:
...
f.write(title + "," + date + "," + description + "," + info + "n")
f.close()
And what I have at the end of it is the .csv file with all information.
The problem is when the description is passed to the file it's breaking all lines.
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
Title, Date,
Des
crip
tion, Info
If I write to the file everything except description all is good.
How do I save this description to a cell and ignore all unwanted newline/linebreak?
UPDATE:
Based on the suggestion from @ewwink this combination helped to remove unwanted linebreaks
description = re.sub(r"[rn]+", " ", description)
Unfortunately it was printed to the .csv file's cell in one single line without formatting. But I was able to make newlines in the .csv file with invisible pilcrow symbol by replacing rn
pilcrow = """
"""
description = re.sub(r"[rn]+", pilcrow, description)
python-3.x macos csv web-scraping beautifulsoup
python-3.x macos csv web-scraping beautifulsoup
edited Nov 21 '18 at 20:19
Imeleges
asked Nov 21 '18 at 15:52
ImelegesImeleges
85
85
Might be a be ugly, but for now did you tryf.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?
– Guimoute
Nov 21 '18 at 16:16
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
@Guimoute, text is already passed from the liststr(description[0].text.strip())
and it didn't help
– Imeleges
Nov 21 '18 at 16:40
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46
add a comment |
Might be a be ugly, but for now did you tryf.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?
– Guimoute
Nov 21 '18 at 16:16
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
@Guimoute, text is already passed from the liststr(description[0].text.strip())
and it didn't help
– Imeleges
Nov 21 '18 at 16:40
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46
Might be a be ugly, but for now did you try
f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?– Guimoute
Nov 21 '18 at 16:16
Might be a be ugly, but for now did you try
f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?– Guimoute
Nov 21 '18 at 16:16
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
@Guimoute, text is already passed from the list
str(description[0].text.strip())
and it didn't help– Imeleges
Nov 21 '18 at 16:40
@Guimoute, text is already passed from the list
str(description[0].text.strip())
and it didn't help– Imeleges
Nov 21 '18 at 16:40
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46
add a comment |
1 Answer
1
active
oldest
votes
to save it as .csv
file you need to double quote value so if there is ,
it will not break your csv column and escape "
with ""
for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415813%2fa-newline-linebreak-issue-when-writing-parsed-text-to-csv-file-in-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
to save it as .csv
file you need to double quote value so if there is ,
it will not break your csv column and escape "
with ""
for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
add a comment |
to save it as .csv
file you need to double quote value so if there is ,
it will not break your csv column and escape "
with ""
for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
add a comment |
to save it as .csv
file you need to double quote value so if there is ,
it will not break your csv column and escape "
with ""
for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)
to save it as .csv
file you need to double quote value so if there is ,
it will not break your csv column and escape "
with ""
for article in articles:
...
# description = re.sub(r"[rn]+", " ", description)
description = description.replace('"', '""')
rows = '"%s","%s","%s","%s"n' % (title, date, description, info)
f.write(rows)
edited Nov 21 '18 at 20:47
answered Nov 21 '18 at 16:56
ewwinkewwink
11.8k22239
11.8k22239
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
add a comment |
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
Thank you ewwink, your suggestion is very helpful! I modified my code and tried to run it, now I have my .csv file almost correct. It did remove unwanted line breaks, but now it's (description block) written in one single line... Is there a way to keep formatting in each cell? I have text with bullet points and would like to keep it as it is, instead of manually correcting
– Imeleges
Nov 21 '18 at 18:18
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
you cannot have new line in csv, but xls can. maybe you can replace newline with symbol later when needed replace it newline.
– ewwink
Nov 21 '18 at 18:23
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
Once again, thanks for your suggestion, I managed to make newlines in my .csv file and updated my question with the solution.
– Imeleges
Nov 21 '18 at 20:21
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
great, and you're welcome.
– ewwink
Nov 21 '18 at 20:22
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415813%2fa-newline-linebreak-issue-when-writing-parsed-text-to-csv-file-in-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Might be a be ugly, but for now did you try
f.write(title.strip() + "," + date.strip() + "," + description.strip() + "," + info.strip() + "n")
to make sure the various strings are indeed clear of all line breaks?– Guimoute
Nov 21 '18 at 16:16
Just before writing it to a file, can you print description?
– BlueSheepToken
Nov 21 '18 at 16:26
Yes, and it will be just fine, no tags, just text with formatting
– Imeleges
Nov 21 '18 at 16:30
@Guimoute, text is already passed from the list
str(description[0].text.strip())
and it didn't help– Imeleges
Nov 21 '18 at 16:40
nice find, just knew it maybe last time I try the error came from unecaped quotes. but still the secret is double quotes, try you can remove line of regex replace .
– ewwink
Nov 21 '18 at 20:46