All Questions
2,548 questions
2
votes
2
answers
157
views
How to efficiently process a large CSV file with pandas when memory is limited?
I'm working with a very large CSV file (around 10GB) that doesn't fit into my computer's memory. When I try to load it into a pandas DataFrame using pd.read_csv(), I get a MemoryError.
What's the most ...
3
votes
2
answers
71
views
How to preprocess multivalue attributes in a dataframe?
Description:
Input is a CSV file
CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value
For the multivalue columns. Minimum is 1, ...
1
vote
1
answer
62
views
How can I extract specific objects, transpose, and combine multiple, complex-nested JSON files into a CSV using python and pandas?
I'm aware of the several posts that cover this topic, I apologise in advance. I've been reading and trying several times.
Here are three example json files that I save into fildir:
https://data.sec....
0
votes
0
answers
65
views
Is there a function to extract datetime from CSV without parsing it to a string?
I have raw measurement data in the form of a CSV file RAW CSV data format. Now I have a code that averages the milliseconds to the second. and then I need to apply a time correction of 1 hour 10 ...
0
votes
1
answer
54
views
Pandas adds "." + digit to the header of a csv
I would like to import a csv file with headers to pandas. Somehow, pandas appends a ".7" to the last headers name
The last header in the csv contains a "?" as the last character (...
0
votes
1
answer
59
views
Splitting into multiple dataframes if criteria is met
I am working on a small program and need some guidance.
Basically I am trying to read a CSV, put the attributes into a data frame and filter where "video = 1". This has been done.
What I ...
0
votes
0
answers
52
views
I cannot get all data to export to CSV
# Collect batting stats for the 2022, 2023, and 2024 seasons
try:
print("Collecting batting stats from 2022 to 2024...")
batting_data = batting_stats(2021, 2024, league="all&...
1
vote
2
answers
93
views
How can I clean a year column with messy values?
I have a project I'm working on for a data analysis course, where we pick a data set and go through the steps of cleaning and exploring the data with a question to answer in mind.
I want to be able to ...
2
votes
2
answers
98
views
How to Write a Pandas DataFrame to CSV With Strings Quoted and Integers/Empty Cells Unaltered Without Adding Escape Characters for Commas?
I am working on a Python script to write a DataFrame to a CSV file. My goal is to:
Enclose all string values in double quotes (").
Keep numeric values unchanged (no quotes).
Leave empty cells as ...
1
vote
1
answer
55
views
How to prevent Pandas to_csv double quoting empty fields in output csv
I currently have a sample python script that reads a csv with double quotes as a text qualifier and removes ascii characters and line feeds in the fields via a dataframe. It then outputs the dataframe ...
0
votes
1
answer
40
views
Dataframe of dataframes: writing and reading
I have a set of images. In each image, a program finds objects with attributes X and type. The number of objects vary from image to image. Hence for one image I have a df_objects with N_objects rows ...
0
votes
0
answers
62
views
Error tokenizing data when merging multiple csv into one excel file
I'm having an issue regarding on Error tokenizing data.
I'm trying to merge multiple csv files into one excel file and my files have some special characters in it with big data.
This is the error I ...
0
votes
1
answer
65
views
Pandas dataframe is mangled when writing to csv
I have written a pipeline to send queries to uniprot, but am having a strange issue with one of the queries. I've put this into a small test case below.
I am getting the expected dataframe (df) ...
0
votes
0
answers
48
views
How to handle cell with comma when using pd.readcsv? Error tokenizing data. C error: Expected 1 fields in line 88, saw 2
I'm reading a set of .csv files and adding them to one giant data frame called 'df', but I kept getting this error in some of my files: Error tokenizing data. C error: Expected 1 fields in line 88, ...
0
votes
1
answer
272
views
Expected String or bytes-like object, got 'float'
I'm trying to make an ETL (Extract, transform and load) algorithm with python. I got an amazon review database, but when i use the DataFrame.apply() method to apply the function with regex i got the ...