How to merge different rows in a csv files by python
I have scrapped some contents from a web site and saved the data into some different csv files.
For example,
csv1:-
row number time price
1 2018/01/01 12
2 2018/01/02 15
csv2:-
row number time address
1 2018/01/01 MI
2 2018/01/02 AR
Now, how can I Merge the two csv files into one csv file and below is the format of new csv.
row number time price address
1 2018/01/01 12 MI
2 2018/01/02 15 AR
Can someone help me?
This question has confused me several days.
Thanks a lot!
enter image description here
enter image description here
python-3.x
add a comment |
I have scrapped some contents from a web site and saved the data into some different csv files.
For example,
csv1:-
row number time price
1 2018/01/01 12
2 2018/01/02 15
csv2:-
row number time address
1 2018/01/01 MI
2 2018/01/02 AR
Now, how can I Merge the two csv files into one csv file and below is the format of new csv.
row number time price address
1 2018/01/01 12 MI
2 2018/01/02 15 AR
Can someone help me?
This question has confused me several days.
Thanks a lot!
enter image description here
enter image description here
python-3.x
add a comment |
I have scrapped some contents from a web site and saved the data into some different csv files.
For example,
csv1:-
row number time price
1 2018/01/01 12
2 2018/01/02 15
csv2:-
row number time address
1 2018/01/01 MI
2 2018/01/02 AR
Now, how can I Merge the two csv files into one csv file and below is the format of new csv.
row number time price address
1 2018/01/01 12 MI
2 2018/01/02 15 AR
Can someone help me?
This question has confused me several days.
Thanks a lot!
enter image description here
enter image description here
python-3.x
I have scrapped some contents from a web site and saved the data into some different csv files.
For example,
csv1:-
row number time price
1 2018/01/01 12
2 2018/01/02 15
csv2:-
row number time address
1 2018/01/01 MI
2 2018/01/02 AR
Now, how can I Merge the two csv files into one csv file and below is the format of new csv.
row number time price address
1 2018/01/01 12 MI
2 2018/01/02 15 AR
Can someone help me?
This question has confused me several days.
Thanks a lot!
enter image description here
enter image description here
python-3.x
python-3.x
edited Nov 21 '18 at 5:36
hygull
3,54511329
3,54511329
asked Nov 21 '18 at 4:29
Yao QiangYao Qiang
64
64
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
You may use Pandas df.append(). You may reference this answer.
If these CSVs have different columns, then individually read each one of them as a Pandas DataFrame, and then create a new DataFrame referencing columns from previously created individual DataFrames.
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
add a comment |
For your case, you can also use pd.merge
command of pandas:
In [488]: df1 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv1.csv')
In [498]: df1
Out[498]:
row_number time price
0 1 2018/01/01 12
1 2 2018/01/02 15
In [490]: df2 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv2.csv')
In [499]: df2
Out[499]:
row_number time address
0 1 2018/01/01 MI
1 2 2018/01/02 AR
In [500]: pd.merge(df1,df2, on=['row_number','time'])
Out[500]:
row_number time price address
0 1 2018/01/01 12 MI
1 2 2018/01/02 15 AR
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
add a comment |
Try the following:
import pandas as pd
csv1 = pd.read_csv("file1.csv")
csv2 = pd.read_csv("file2.csv")
csv_out = csv1.merge(csv2, on=['row number','time'])
csv_out.to_csv("file_out.csv", index=False)
Hope it helps.
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
add a comment |
I know you have csv files but here I am just showing and trying to help you by manually creating DataFrames as you have mentioned in the problem.
DataFrame:- https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html - Here you can visit and find more about the parameters to
merge()
method defined on DataFrame.
Below is the code that you're looking for.
>>> import pandas as pd
>>>
>>> dri = pd.date_range("2018/01/01", periods=2, freq="d")
>>>
>>> df = pd.DataFrame({"time": dri, "price": [12, 15]}, index = [1, 2])
>>> df
time price
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2 = pd.DataFrame({"time": dri, "address": ["MI", "AR"]}, index=[1, 2])
>>> df2
time address
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
...
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
By default, pandas does not include labels for index on left of DataFrame. If you really wish to have labels for the index of DataFrame as you have mentioned (In your case, that is row number
), have a look into below executed statements on Python interactive terminal.
>>> df.index.name = "row number"
>>> df
time price
row number
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2.index.name = "row number"
>>>
>>> df2
time address
row number
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
row number
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405278%2fhow-to-merge-different-rows-in-a-csv-files-by-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may use Pandas df.append(). You may reference this answer.
If these CSVs have different columns, then individually read each one of them as a Pandas DataFrame, and then create a new DataFrame referencing columns from previously created individual DataFrames.
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
add a comment |
You may use Pandas df.append(). You may reference this answer.
If these CSVs have different columns, then individually read each one of them as a Pandas DataFrame, and then create a new DataFrame referencing columns from previously created individual DataFrames.
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
add a comment |
You may use Pandas df.append(). You may reference this answer.
If these CSVs have different columns, then individually read each one of them as a Pandas DataFrame, and then create a new DataFrame referencing columns from previously created individual DataFrames.
You may use Pandas df.append(). You may reference this answer.
If these CSVs have different columns, then individually read each one of them as a Pandas DataFrame, and then create a new DataFrame referencing columns from previously created individual DataFrames.
answered Nov 21 '18 at 4:34
Random NerdRandom Nerd
1314
1314
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
add a comment |
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
Actualy, I have done like this , but there are some problems. The new csv files has all the data, but the same time data are in different rows according to their vulunms. For the example, the new csv file should have 2 rows , but after append(), it has 4 rows.
– Yao Qiang
Nov 21 '18 at 14:53
add a comment |
For your case, you can also use pd.merge
command of pandas:
In [488]: df1 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv1.csv')
In [498]: df1
Out[498]:
row_number time price
0 1 2018/01/01 12
1 2 2018/01/02 15
In [490]: df2 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv2.csv')
In [499]: df2
Out[499]:
row_number time address
0 1 2018/01/01 MI
1 2 2018/01/02 AR
In [500]: pd.merge(df1,df2, on=['row_number','time'])
Out[500]:
row_number time price address
0 1 2018/01/01 12 MI
1 2 2018/01/02 15 AR
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
add a comment |
For your case, you can also use pd.merge
command of pandas:
In [488]: df1 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv1.csv')
In [498]: df1
Out[498]:
row_number time price
0 1 2018/01/01 12
1 2 2018/01/02 15
In [490]: df2 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv2.csv')
In [499]: df2
Out[499]:
row_number time address
0 1 2018/01/01 MI
1 2 2018/01/02 AR
In [500]: pd.merge(df1,df2, on=['row_number','time'])
Out[500]:
row_number time price address
0 1 2018/01/01 12 MI
1 2 2018/01/02 15 AR
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
add a comment |
For your case, you can also use pd.merge
command of pandas:
In [488]: df1 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv1.csv')
In [498]: df1
Out[498]:
row_number time price
0 1 2018/01/01 12
1 2 2018/01/02 15
In [490]: df2 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv2.csv')
In [499]: df2
Out[499]:
row_number time address
0 1 2018/01/01 MI
1 2 2018/01/02 AR
In [500]: pd.merge(df1,df2, on=['row_number','time'])
Out[500]:
row_number time price address
0 1 2018/01/01 12 MI
1 2 2018/01/02 15 AR
For your case, you can also use pd.merge
command of pandas:
In [488]: df1 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv1.csv')
In [498]: df1
Out[498]:
row_number time price
0 1 2018/01/01 12
1 2 2018/01/02 15
In [490]: df2 = pd.read_csv('/home/mayankp/Documents/Personal/stackoverflow/csv2.csv')
In [499]: df2
Out[499]:
row_number time address
0 1 2018/01/01 MI
1 2 2018/01/02 AR
In [500]: pd.merge(df1,df2, on=['row_number','time'])
Out[500]:
row_number time price address
0 1 2018/01/01 12 MI
1 2 2018/01/02 15 AR
answered Nov 21 '18 at 4:50
Mayank PorwalMayank Porwal
4,8822724
4,8822724
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
add a comment |
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
add a comment |
Try the following:
import pandas as pd
csv1 = pd.read_csv("file1.csv")
csv2 = pd.read_csv("file2.csv")
csv_out = csv1.merge(csv2, on=['row number','time'])
csv_out.to_csv("file_out.csv", index=False)
Hope it helps.
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
add a comment |
Try the following:
import pandas as pd
csv1 = pd.read_csv("file1.csv")
csv2 = pd.read_csv("file2.csv")
csv_out = csv1.merge(csv2, on=['row number','time'])
csv_out.to_csv("file_out.csv", index=False)
Hope it helps.
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
add a comment |
Try the following:
import pandas as pd
csv1 = pd.read_csv("file1.csv")
csv2 = pd.read_csv("file2.csv")
csv_out = csv1.merge(csv2, on=['row number','time'])
csv_out.to_csv("file_out.csv", index=False)
Hope it helps.
Try the following:
import pandas as pd
csv1 = pd.read_csv("file1.csv")
csv2 = pd.read_csv("file2.csv")
csv_out = csv1.merge(csv2, on=['row number','time'])
csv_out.to_csv("file_out.csv", index=False)
Hope it helps.
answered Nov 21 '18 at 4:55
TeeKeaTeeKea
3,22341630
3,22341630
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
add a comment |
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:53
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I see. Can you please update your question to include these cases?
– TeeKea
Nov 22 '18 at 0:13
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
Great. You'll now just need to tick mark one of the answers that you feel it fits your needs as Accepted. Thanks.
– TeeKea
Nov 22 '18 at 0:20
add a comment |
I know you have csv files but here I am just showing and trying to help you by manually creating DataFrames as you have mentioned in the problem.
DataFrame:- https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html - Here you can visit and find more about the parameters to
merge()
method defined on DataFrame.
Below is the code that you're looking for.
>>> import pandas as pd
>>>
>>> dri = pd.date_range("2018/01/01", periods=2, freq="d")
>>>
>>> df = pd.DataFrame({"time": dri, "price": [12, 15]}, index = [1, 2])
>>> df
time price
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2 = pd.DataFrame({"time": dri, "address": ["MI", "AR"]}, index=[1, 2])
>>> df2
time address
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
...
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
By default, pandas does not include labels for index on left of DataFrame. If you really wish to have labels for the index of DataFrame as you have mentioned (In your case, that is row number
), have a look into below executed statements on Python interactive terminal.
>>> df.index.name = "row number"
>>> df
time price
row number
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2.index.name = "row number"
>>>
>>> df2
time address
row number
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
row number
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
add a comment |
I know you have csv files but here I am just showing and trying to help you by manually creating DataFrames as you have mentioned in the problem.
DataFrame:- https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html - Here you can visit and find more about the parameters to
merge()
method defined on DataFrame.
Below is the code that you're looking for.
>>> import pandas as pd
>>>
>>> dri = pd.date_range("2018/01/01", periods=2, freq="d")
>>>
>>> df = pd.DataFrame({"time": dri, "price": [12, 15]}, index = [1, 2])
>>> df
time price
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2 = pd.DataFrame({"time": dri, "address": ["MI", "AR"]}, index=[1, 2])
>>> df2
time address
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
...
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
By default, pandas does not include labels for index on left of DataFrame. If you really wish to have labels for the index of DataFrame as you have mentioned (In your case, that is row number
), have a look into below executed statements on Python interactive terminal.
>>> df.index.name = "row number"
>>> df
time price
row number
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2.index.name = "row number"
>>>
>>> df2
time address
row number
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
row number
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
add a comment |
I know you have csv files but here I am just showing and trying to help you by manually creating DataFrames as you have mentioned in the problem.
DataFrame:- https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html - Here you can visit and find more about the parameters to
merge()
method defined on DataFrame.
Below is the code that you're looking for.
>>> import pandas as pd
>>>
>>> dri = pd.date_range("2018/01/01", periods=2, freq="d")
>>>
>>> df = pd.DataFrame({"time": dri, "price": [12, 15]}, index = [1, 2])
>>> df
time price
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2 = pd.DataFrame({"time": dri, "address": ["MI", "AR"]}, index=[1, 2])
>>> df2
time address
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
...
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
By default, pandas does not include labels for index on left of DataFrame. If you really wish to have labels for the index of DataFrame as you have mentioned (In your case, that is row number
), have a look into below executed statements on Python interactive terminal.
>>> df.index.name = "row number"
>>> df
time price
row number
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2.index.name = "row number"
>>>
>>> df2
time address
row number
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
row number
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
I know you have csv files but here I am just showing and trying to help you by manually creating DataFrames as you have mentioned in the problem.
DataFrame:- https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html - Here you can visit and find more about the parameters to
merge()
method defined on DataFrame.
Below is the code that you're looking for.
>>> import pandas as pd
>>>
>>> dri = pd.date_range("2018/01/01", periods=2, freq="d")
>>>
>>> df = pd.DataFrame({"time": dri, "price": [12, 15]}, index = [1, 2])
>>> df
time price
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2 = pd.DataFrame({"time": dri, "address": ["MI", "AR"]}, index=[1, 2])
>>> df2
time address
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
...
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
By default, pandas does not include labels for index on left of DataFrame. If you really wish to have labels for the index of DataFrame as you have mentioned (In your case, that is row number
), have a look into below executed statements on Python interactive terminal.
>>> df.index.name = "row number"
>>> df
time price
row number
1 2018-01-01 12
2 2018-01-02 15
>>>
>>> df2.index.name = "row number"
>>>
>>> df2
time address
row number
1 2018-01-01 MI
2 2018-01-02 AR
>>>
>>> df.merge(df2, on = "time", how = "inner", left_index = True)
time price address
row number
1 2018-01-01 12 MI
2 2018-01-02 15 AR
>>>
edited Nov 21 '18 at 5:56
answered Nov 21 '18 at 5:14
hygullhygull
3,54511329
3,54511329
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
add a comment |
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
Very helpful! Thank you very much!
– Yao Qiang
Nov 21 '18 at 23:54
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I am sorry, there is another new problem. In my dataset, not all the column has the same number of rows, for example, price starts from 2018/01/01, but address starts from 2017/11/01. In this situation, the new csv file would only start from 2018/01/01. And it would drop the data of address from 2017/11/01 to 2017/12/31. So, how can I deal with this problem.
– Yao Qiang
Nov 22 '18 at 0:06
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
I have solved the problem . Thanks a lot.
– Yao Qiang
Nov 22 '18 at 0:17
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
Okay @Yao, just provide any output format of your data so that I could know your intention in a better way. You can create gist in github and send the link of input and output formats. That will help me to help you or if you wish you can add a little description in this problem as well. Thank you for replying me.
– hygull
Nov 22 '18 at 4:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405278%2fhow-to-merge-different-rows-in-a-csv-files-by-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown