Python plot 1D array

Question

I would like to plot the difference between each individual point.

I have one series y_test which is one-dimensional and contains continuous values. The index is kinda whacky (7618, 276, 7045, 6095, 2296, 7191, 1213, 2408...).

And I have another numpy array ypred which is one-dimensional and contains the prediction of y_test. I would like to see the difference of each value predicted using a graph.

I tried this:

fig, ax1 = plt.subplots(figsize = (20,5))
ax1.bar(y_test, y_test.index color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.bar(y_pred, y_test.index, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()
plt.show()

but it returns error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape

bar/scatter/anything is fine I just wanted to take a look at all the values together.

This is so that I can group the best predicted values to understand which feature values within my original data are easiest to predict with.

If, incidentally, anyone could recommend the best XGBoost way of getting that information let me know too.

Here is some data:

ypred: 
[10.410029 ,   4.4897604,  29.77089  ,  23.548471 ,  27.415161 ,
        56.28772  ,  13.083108 ,  38.086662 ,  19.128792 ,  42.49037  ,
        65.15919  ,  47.172436 ,  39.517883 ,  13.782948 , 121.52351  ,
         8.388838 ,  49.625607 ,  24.28464  ,  49.55232  ,  34.797436] 

y_test:
7618      9.88
276       2.69
7045     26.93
6095     23.49
2296     24.79
7191     57.09
1213     15.90
2408     46.26
5961     18.60
275      41.03
1707     66.25
2333     53.50
5717     40.60
1497     12.34
4937    121.93
2654      7.97
7442     53.65
7157     25.93
2141     54.28
4339     36.93

Thank you

Attempting to run your code with data provided, after fixing the blatant syntax error, gives me TypeError: only size-1 arrays can be converted to Python scalars — Mad Physicist, Commented Aug 16, 2021 at 15:26
Please post a proper minimal reproducible example with a full error traceback — Mad Physicist, Commented Aug 16, 2021 at 15:27
@MadPhysicist you are right about the syntax error! I already approved an answer but thank you. I am not sure why you are receiving that type error. — Bigboss01, Commented Aug 16, 2021 at 15:29

mozway · Accepted Answer · 2021-08-16 15:10:04Z

2

plt.scatter(y_test, y_pred)?

Many points close to the equality line (diagonal) means good predictions, far away means not so good.

answered Aug 16, 2021 at 15:10

mozway

264k13 gold badges50 silver badges99 bronze badges

Thank for your answer! That's actually the first thing I tried, and while it does the job, I want the representation to be more intuitive for documentation purposes.
– Bigboss01
Commented Aug 16, 2021 at 15:16

Add a comment |

Zephyr · Accepted Answer · 2021-08-16 15:19:22Z

1

I assume y_test has a 'val' column, where the values you want to plot are stored.
Maybe this could be helpful?
You have the index on x axis, and predicted and true values on y axes.

fig, ax1 = plt.subplots(figsize = (20,5))

ax1.plot(y_test.index, y_test['val'], color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.plot(y_test.index, y_pred, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()

plt.show()

answered Aug 16, 2021 at 15:19

Zephyr

12.5k83 gold badges53 silver badges91 bronze badges

1

Amazing! Exactly what I was looking for. I've used scatter instead of line just because the index was not in order so the line was a little odd. Anyway. Thank you! I'm clearly coding while tired cause this was a very straightforward solution :)
– Bigboss01
Commented Aug 16, 2021 at 15:27

Add a comment |

Collectives™ on Stack Overflow

Python plot 1D array

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related