Say you create a dataframe like this, where one column contains a numpy array:
data = {'col_1': [np.array([1,2,3])], 'col_2': 3}
x = pd.DataFrame.from_dict(data)
Then you write it to csv and (in a different file) import it as a dataframe:
filename = 'test.csv'
x.to_csv(filename)
x2 = pd.read_csv('test.csv')
Now when you print the type of the numpy
array:
print(type(x2.iloc[0,2]))
it returns <class 'str'> instead of a numpy array. Is there a way to avoid this and keep the numpy array type, for instance by using Pickle instead of writing to a csv? Or are datatypes always lost when you write a dataframe to a file, then reload it?
x.to_pickle("test.pkl")
thenx3 = pd.read_pickle("test.pkl")
. For printing :print(type(x3.iloc[0,0]))
gives me :<class 'numpy.ndarray'>
. Alsox.to_parquet
andpd.read_parquet
works fine.