0

Say you create a dataframe like this, where one column contains a numpy array:

data = {'col_1': [np.array([1,2,3])], 'col_2': 3}

x = pd.DataFrame.from_dict(data)

Then you write it to csv and (in a different file) import it as a dataframe:

filename = 'test.csv'

x.to_csv(filename)

x2 = pd.read_csv('test.csv')

Now when you print the type of the numpy array:

print(type(x2.iloc[0,2]))

it returns <class 'str'> instead of a numpy array. Is there a way to avoid this and keep the numpy array type, for instance by using Pickle instead of writing to a csv? Or are datatypes always lost when you write a dataframe to a file, then reload it?

3
  • 1
    not if you are going to use csv as a format, no there isn't. And yeah, pickle should work. Did you try it? Commented Nov 27, 2023 at 16:16
  • csv is a 2d format, rows and columns. Your array (or lists) adds a 3rd dimension. It can only write the string version of the array. Look at the csv itself.
    – hpaulj
    Commented Nov 27, 2023 at 16:21
  • 3
    Tried with x.to_pickle("test.pkl") then x3 = pd.read_pickle("test.pkl"). For printing : print(type(x3.iloc[0,0])) gives me : <class 'numpy.ndarray'>. Also x.to_parquet and pd.read_parquet works fine.
    – HMH1013
    Commented Nov 28, 2023 at 16:29

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.