finding equal values in 2 CSV files with python [duplicate]

Question

I have 2 excel files, each containing 2 columns like below:

file1:

name     price
a        1
b        78
f        54
g        32
t        2

file2:

name     price
f        4
x        23
e        76
a        4
o        0

I want to read file1 and search for values where file2 name is equal to file1 name and extract prices, then write them in price column of file2.

file2 should look like this:

file2:

name     price
f        54
x        23
e        76
a        1
o        0

I have tried as below (I've saved the excel files as CSV):

import pandas as pd
import numpy as np

df1  = pd.DataFrame(pd.read_csv('file1.csv'))
df2  = pd.DataFrame(pd.read_csv('file2.csv'))

if(df1[name] in df2[name]):
  df2[price] = df1[price]
  
df2.to_csv('file2.csv')

These look like TSV files maybe, not CSV. Can you please edit to clarify this, either by changing the delimiter to comma if that's what you actually have in the files, or else change the wording to not suggest that you have Comma-Separated Values format if you don't really. (Of course, if this misunderstanding is the cause of your problem, probably simply delete this question as unreproducible.) — tripleee, Commented Nov 12, 2023 at 11:15
Using Pandas and Numpy for simple file manipulation is severe overkill, though if you need them anyway for other reasons, I suppose they offer some limited additional conveniences over the Python standard library. — tripleee, Commented Nov 12, 2023 at 11:18
@tripleee what is the shortest way to do this? is there any way to manipulate csv files without converting them to df? — Xara, Commented Nov 12, 2023 at 11:56
some syntax errors on your side pd.read_csv returns a pd.DataFrame object so no need to wrap it around pd.read_csv see the duplicate post it will be really helpful now and in future — Umar.H, Commented Nov 12, 2023 at 12:04

gtomer · Accepted Answer · 2023-11-12 11:33:42Z

0

You need first to merge:

df2 = df2.merge(df1, on='name', how='left')

Then you can use np.where:

import numpy as np
df2['price'] = np.where(df2['price_y'].isnull(), df2['price_x'], df2['price_y'])

Later you can delete the merged columns:

df2.drop(columns=['price_x', 'price_y'], inplace=True)

This is the result:

name     price
f        54
x        23
e        76
a        1
o        0

answered Nov 12, 2023 at 11:33

gtomer

6,5741 gold badge14 silver badges28 bronze badges

Add a comment |

Panda Kim · Accepted Answer · 2023-11-12 12:50:53Z

Code

use combine_first

df1.set_index('name').combine_first(df2.set_index('name'))\
   .reindex(df2['name']).reset_index()

It seems that questioner do not know about Pandas' inplace, so I will attach the output results as image.

other way : use concat & groupby + last

pd.concat([df2, df1]).groupby('name').last()\
  .reindex(df2['name']).reset_index()

same result

other way : merge & pop

df2.merge(df1, on='name', how='left', suffixes=['1', ''])\
   .assign(price=lambda x: x['price'].fillna(x.pop('price1')).astype('int'))

same result

Example Code

import pandas as pd
import io

file1 = '''name     price
a        1
b        78
f        54
g        32
t        2
'''

file2 = '''name     price
f        4
x        23
e        76
a        4
o        0
'''
df1 = pd.read_csv(io.StringIO(file1), sep='\s+')
df2 = pd.read_csv(io.StringIO(file2), sep='\s+')

Collectives™ on Stack Overflow

finding equal values in 2 CSV files with python [duplicate]

2 Answers 2

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Linked

Related