0

I have 2 excel files, each containing 2 columns like below:

file1:

name     price
a        1
b        78
f        54
g        32
t        2

file2:

name     price
f        4
x        23
e        76
a        4
o        0

I want to read file1 and search for values where file2 name is equal to file1 name and extract prices, then write them in price column of file2.

file2 should look like this:

file2:

name     price
f        54
x        23
e        76
a        1
o        0

I have tried as below (I've saved the excel files as CSV):

import pandas as pd
import numpy as np

df1  = pd.DataFrame(pd.read_csv('file1.csv'))
df2  = pd.DataFrame(pd.read_csv('file2.csv'))

if(df1[name] in df2[name]):
  df2[price] = df1[price]
  
df2.to_csv('file2.csv')
4
  • These look like TSV files maybe, not CSV. Can you please edit to clarify this, either by changing the delimiter to comma if that's what you actually have in the files, or else change the wording to not suggest that you have Comma-Separated Values format if you don't really. (Of course, if this misunderstanding is the cause of your problem, probably simply delete this question as unreproducible.)
    – tripleee
    Commented Nov 12, 2023 at 11:15
  • Using Pandas and Numpy for simple file manipulation is severe overkill, though if you need them anyway for other reasons, I suppose they offer some limited additional conveniences over the Python standard library.
    – tripleee
    Commented Nov 12, 2023 at 11:18
  • @tripleee what is the shortest way to do this? is there any way to manipulate csv files without converting them to df?
    – Xara
    Commented Nov 12, 2023 at 11:56
  • 1
    some syntax errors on your side pd.read_csv returns a pd.DataFrame object so no need to wrap it around pd.read_csv see the duplicate post it will be really helpful now and in future
    – Umar.H
    Commented Nov 12, 2023 at 12:04

2 Answers 2

0

You need first to merge:

df2 = df2.merge(df1, on='name', how='left')

Then you can use np.where:

import numpy as np
df2['price'] = np.where(df2['price_y'].isnull(), df2['price_x'], df2['price_y'])

Later you can delete the merged columns:

df2.drop(columns=['price_x', 'price_y'], inplace=True)

This is the result:

name     price
f        54
x        23
e        76
a        1
o        0
0

Code

use combine_first

df1.set_index('name').combine_first(df2.set_index('name'))\
   .reindex(df2['name']).reset_index()

It seems that questioner do not know about Pandas' inplace, so I will attach the output results as image.

enter image description here

other way : use concat & groupby + last

pd.concat([df2, df1]).groupby('name').last()\
  .reindex(df2['name']).reset_index()

same result

other way : merge & pop

df2.merge(df1, on='name', how='left', suffixes=['1', ''])\
   .assign(price=lambda x: x['price'].fillna(x.pop('price1')).astype('int'))

same result


Example Code

import pandas as pd
import io

file1 = '''name     price
a        1
b        78
f        54
g        32
t        2
'''

file2 = '''name     price
f        4
x        23
e        76
a        4
o        0
'''
df1 = pd.read_csv(io.StringIO(file1), sep='\s+')
df2 = pd.read_csv(io.StringIO(file2), sep='\s+')
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.