How to create multiple combined csv files out of a multi-dimensional numpy arrays

Question

I have 8 numpy arrays of the dimensionalty (number_of_buildings, number_of_weeks) which in my example is (2, 20). They look like that:

import numpy as np

result_multiple_buildings_simulationObjective_costs_Euro_OPT = np.array([
    [181.1, 172.59, 170.02, 173.83, 181.21, 160.95, 182.24, 186.1, 168.4, 153.3, 173.47, 175.36, 175.05, 171.96, 182.94, 163.36, 178.0, 174.27, 174.02, 177.51],
    [109.0, 107.78, 106.47, 104.33, 113.67, 98.19, 112.45, 114.94, 102.31, 96.45, 107.67, 107.0, 105.11, 103.22, 116.03, 101.46, 106.54, 106.89, 108.24, 108.84]
])

result_multiple_buildings_negativeScore_PhysicalLimit_OPT = np.array([
    [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]
])

result_multiple_buildings_simulationObjective_costs_Euro_CC = np.array([
    [226.29, 217.3, 214.17, 219.22, 226.2, 204.82, 226.94, 232.28, 213.46, 197.07, 218.78, 221.09, 219.95, 216.0, 228.37, 207.02, 223.6, 219.31, 219.67, 222.41],
    [148.25, 146.07, 145.98, 143.81, 151.92, 135.71, 151.11, 154.02, 141.29, 136.91, 146.03, 146.73, 145.5, 144.33, 154.31, 143.15, 144.81, 146.57, 146.92, 146.29]
])

result_multiple_buildings_negativeScore_PhysicalLimit_CC = np.array([
    [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0]
])

result_multiple_buildings_simulationObjective_costs_Euro_PSC = np.array([
    [217.38, 206.57, 203.58, 209.52, 215.38, 192.08, 216.79, 221.89, 201.99, 183.88, 207.59, 211.56, 208.42, 205.82, 218.36, 197.26, 212.61, 210.27, 209.26, 212.08],
    [137.93, 135.81, 134.24, 133.09, 140.28, 125.58, 139.31, 142.51, 130.13, 123.97, 134.83, 135.38, 133.45, 132.09, 143.84, 130.38, 132.86, 135.51, 135.75, 135.66]
])

result_multiple_buildings_negativeScore_total_overall_PSC = np.array([
    [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
])

result_multiple_buildings_simulationObjective_costs_Euro_ANN = np.array([
    [226.29, 217.3, 214.17, 219.22, 226.2, 204.82, 226.94, 232.28, 213.46, 197.07, 218.78, 221.09, 219.95, 216.0, 228.37, 207.02, 223.6, 219.31, 219.67, 222.41],
    [148.25, 146.07, 145.98, 143.81, 151.92, 135.71, 151.11, 154.02, 141.29, 136.91, 146.03, 146.73, 145.5, 144.33, 154.31, 0.0, 0.0, 0.0, 0.0, 0.0]
])

result_multiple_buildings_negativeScore_PhysicalLimit_ANN = np.array([
    [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0]
])

What I want is to have a csv file for each building, that lists all values of the week for that specific building of those 8 arrays. So it should in total have 9 columns: "Week (index)", "Opt_cost", "Opt_neg", "CC_cost", "CC_neg", "PSC_cost", "PSC_neg", "ANN_cost", "ANN_neg". The rows should be the weeks, so in my case 20 entries. Further, there should be an additional row which calculates the average of all 8 columns excluding the week index (rounded, to the first decimal value).

So in my case there should be 2 csv files with the name "Building_1.csv" and "Building_2.csv" but actually the number of buildings should be variable:

I tried the following code but it did not work:

datasets = [
        ("OPT", result_multiple_buildings_simulationObjective_costs_Euro_OPT,
         result_multiple_buildings_negativeScore_PhysicalLimit_OPT),
        ("CC", result_multiple_buildings_simulationObjective_costs_Euro_CC,
         result_multiple_buildings_negativeScore_PhysicalLimit_CC),
        ("PSC", result_multiple_buildings_simulationObjective_costs_Euro_PSC,
         result_multiple_buildings_negativeScore_PhysicalLimit_PSC),
        ("ANN", result_multiple_buildings_simulationObjective_costs_Euro_ML,
         result_multiple_buildings_negativeScore_PhysicalLimit_ML)
    ]

    # Iterate through datasets and create CSV files for each building
    for label, cost_array, neg_array in datasets:
        for building_index in range(cost_array.shape[0]):
            # Create a new CSV file for each building
            csv_file_name = f"{folderPath_WholeSimulation}/Results_HH{building_index + 1}.csv"

            # Combine data of the arrays and calculate averages
            data = np.column_stack((cost_array[building_index, :], neg_array[building_index, :]))
            week_index = np.arange(1, data.shape[0] + 1).reshape(-1, 1)
            data = np.column_stack((data, week_index))
            avg_data = np.mean(data, axis=0).reshape(1, -1)

            # column descriptions
            column_descriptions = [f"{label}", f"{label}_neg", f"CC", f"CC_neg", f"PSC", f"PSC_neg", f"ANN", f"ANN_neg",
                                   "Week Index"]

            # Write data and averages to the CSV file
            with open(csv_file_name, 'w') as csv_file:
                
                csv_file.write(";".join(column_descriptions) + "\n")

                # Write data rows
                for row in data:
                    csv_file.write(";".join(map(str, row)) + "\n")

                # Calculate average row
                avg_row = np.append(avg_data, ["Average"])
                csv_file.write(";".join(map(str, avg_row)) + "\n")

Do you have any idea how I can do that?

I would probably restructure the np arrays to a single pandas dataframe first, then save to csv. But if you want to stick with the current strategy, think about separating each step into smaller functions, then loop. Right now your loop has way too many responsibilities. — rance jeffries, Commented Nov 4, 2023 at 20:01
Writing csv is normally done with a pandas method, or np.savetxt. savetxt basically iterates on the rows (of a 2d array), and formats the row (using the % python formatting) and writes that. The source code for that is relatively simple, though I don't know how burried it is now. Writing your own lines should work, but I won't get into the details. — hpaulj, Commented Nov 4, 2023 at 20:04

Andrej Kesely · Accepted Answer · 2023-11-07 22:22:20Z

You can use pandas to compute the average and save it to the CSV, e.g.:

import pandas as pd

data = {
    "Euro_OPT": result_multiple_buildings_simulationObjective_costs_Euro_OPT,
    "PhysicalLimit_OPT": result_multiple_buildings_negativeScore_PhysicalLimit_OPT,
    "Euro_CC": result_multiple_buildings_simulationObjective_costs_Euro_CC,
    "PhysicalLimit_CC": result_multiple_buildings_negativeScore_PhysicalLimit_CC,
    "Euro_PSC": result_multiple_buildings_simulationObjective_costs_Euro_PSC,
    "PhysicalLimit_PSC": result_multiple_buildings_negativeScore_total_overall_PSC,
    "Euro_ANN": result_multiple_buildings_simulationObjective_costs_Euro_ANN,
    "PhysicalLimit_ANN": result_multiple_buildings_negativeScore_PhysicalLimit_ANN,
}

df = pd.DataFrame([dict(zip(data, row)) for row in zip(*data.values())])
df = df.explode(df.columns.to_list())

for idx, g in df.groupby(level=0):
    g = g.reset_index().rename(columns={"index": "Week"})
    g["Week"] = np.arange(1, len(g) + 1)
    mean = g.mean().to_frame().T.astype(float).round(2)
    mean["Week"] = "Averages"
    g = pd.concat([g, mean])
    g.to_csv(f"Building_{idx+1}.csv", index=False)

This creates two CSV files:

Building_1.csv

Week,Euro_OPT,PhysicalLimit_OPT,Euro_CC,PhysicalLimit_CC,Euro_PSC,PhysicalLimit_PSC,Euro_ANN,PhysicalLimit_ANN
1,181.1,1.0,226.29,0.0,217.38,0.0,226.29,0.0
2,172.59,0.0,217.3,0.0,206.57,0.0,217.3,1.0
3,170.02,0.0,214.17,0.0,203.58,2.0,214.17,0.0
4,173.83,0.0,219.22,1.0,209.52,0.0,219.22,0.0
5,181.21,0.0,226.2,0.0,215.38,0.0,226.2,0.0
6,160.95,0.0,204.82,0.0,192.08,0.0,204.82,0.0
7,182.24,0.0,226.94,0.0,216.79,0.0,226.94,0.0
8,186.1,7.0,232.28,0.0,221.89,0.0,232.28,0.0
9,168.4,0.0,213.46,0.0,201.99,0.0,213.46,0.0
10,153.3,0.0,197.07,0.0,183.88,1.0,197.07,0.0
11,173.47,0.0,218.78,0.0,207.59,1.0,218.78,0.0
12,175.36,0.0,221.09,0.0,211.56,1.0,221.09,0.0
13,175.05,0.0,219.95,0.0,208.42,0.0,219.95,0.0
14,171.96,0.0,216.0,0.0,205.82,0.0,216.0,0.0
15,182.94,0.0,228.37,0.0,218.36,0.0,228.37,0.0
16,163.36,0.0,207.02,0.0,197.26,0.0,207.02,0.0
17,178.0,0.0,223.6,0.0,212.61,0.0,223.6,0.0
18,174.27,0.0,219.31,0.0,210.27,0.0,219.31,0.0
19,174.02,0.0,219.67,0.0,209.26,0.0,219.67,0.0
20,177.51,0.0,222.41,0.0,212.08,0.0,222.41,0.0
Averages,173.78,0.4,218.7,0.05,208.11,0.25,218.7,0.05

Building_2.csv

Week,Euro_OPT,PhysicalLimit_OPT,Euro_CC,PhysicalLimit_CC,Euro_PSC,PhysicalLimit_PSC,Euro_ANN,PhysicalLimit_ANN
1,109.0,0.0,148.25,0.0,137.93,0.0,148.25,0.0
2,107.78,3.0,146.07,0.0,135.81,0.0,146.07,0.0
3,106.47,0.0,145.98,0.0,134.24,0.0,145.98,0.0
4,104.33,0.0,143.81,0.0,133.09,0.0,143.81,0.0
5,113.67,0.0,151.92,0.0,140.28,0.0,151.92,0.0
6,98.19,0.0,135.71,2.0,125.58,0.0,135.71,0.0
7,112.45,0.0,151.11,0.0,139.31,0.0,151.11,0.0
8,114.94,0.0,154.02,0.0,142.51,0.0,154.02,0.0
9,102.31,0.0,141.29,0.0,130.13,0.0,141.29,3.0
10,96.45,0.0,136.91,0.0,123.97,0.0,136.91,0.0
11,107.67,0.0,146.03,0.0,134.83,0.0,146.03,0.0
12,107.0,0.0,146.73,0.0,135.38,0.0,146.73,0.0
13,105.11,2.0,145.5,0.0,133.45,0.0,145.5,0.0
14,103.22,0.0,144.33,0.0,132.09,0.0,144.33,0.0
15,116.03,1.0,154.31,2.0,143.84,0.0,154.31,0.0
16,101.46,0.0,143.15,0.0,130.38,0.0,0.0,4.0
17,106.54,0.0,144.81,0.0,132.86,0.0,0.0,0.0
18,106.89,0.0,146.57,0.0,135.51,0.0,0.0,0.0
19,108.24,0.0,146.92,0.0,135.75,0.0,0.0,0.0
20,108.84,0.0,146.29,0.0,135.66,0.0,0.0,0.0
Averages,106.83,0.3,145.99,0.2,134.63,0.0,109.6,0.35

Thanks Andrej for your answer. Actually it works but there are 2 minor issues: 1) The week index is always the same for one building (in your example 1 for the first building and 2 for the second). What I want to have is 20 different week indeces for each building. So it should be 1, 2, 3,..., 20. 2) The name of the last row should be averages instead of the (wrong) week index. So instead of the "1" or "2" there should be "Averages" as the name of the last row. Ideally the average values should also be rounded to 2 decimal digits. Do you know how I can adjust your code for this? — PeterBe, Commented Nov 6, 2023 at 8:57
Thanks for your answer. Any comments to my last comment? I'll highly appreciate every further comment from you. — PeterBe, Commented Nov 7, 2023 at 9:24
Thanks a lot Andrej for your great help. I really appreciate it. — PeterBe, Commented Nov 8, 2023 at 10:14

Collectives™ on Stack Overflow

How to create multiple combined csv files out of a multi-dimensional numpy arrays

1 Answer 1

Building_1.csv

Building_2.csv

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Building_1.csv

Building_2.csv

Related