python - Maintain numpy array data type when reloading data

Say you create a dataframe like this, where one column contains a numpy array:

data = {'col_1': [np.array([1,2,3])], 'col_2': 3}

x = pd.DataFrame.from_dict(data)

Then you write it to csv and (in a different file) import it as a dataframe:

filename = 'test.csv'

x.to_csv(filename)

x2 = pd.read_csv('test.csv')

Now when you print the type of the numpy array:

print(type(x2.iloc[0,2]))

it returns <class 'str'> instead of a numpy array. Is there a way to avoid this and keep the numpy array type, for instance by using Pickle instead of writing to a csv? Or are datatypes always lost when you write a dataframe to a file, then reload it?

edited Nov 27, 2023 at 17:06

user4136999

asked Nov 27, 2023 at 16:01

Dominique

911 gold badge1 silver badge7 bronze badges

1

not if you are going to use csv as a format, no there isn't. And yeah, pickle should work. Did you try it?
– juanpa.arrivillaga
Commented Nov 27, 2023 at 16:16
csv is a 2d format, rows and columns. Your array (or lists) adds a 3rd dimension. It can only write the string version of the array. Look at the csv itself.
– hpaulj
Commented Nov 27, 2023 at 16:21
3

Tried with x.to_pickle("test.pkl") then x3 = pd.read_pickle("test.pkl"). For printing : print(type(x3.iloc[0,0])) gives me : <class 'numpy.ndarray'>. Also x.to_parquet and pd.read_parquet works fine.
– HMH1013
Commented Nov 28, 2023 at 16:29

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Maintain numpy array data type when reloading data

0

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.