0

I would like to plot the difference between each individual point.

I have one series y_test which is one-dimensional and contains continuous values. The index is kinda whacky (7618, 276, 7045, 6095, 2296, 7191, 1213, 2408...).

And I have another numpy array ypred which is one-dimensional and contains the prediction of y_test. I would like to see the difference of each value predicted using a graph.

I tried this:

fig, ax1 = plt.subplots(figsize = (20,5))
ax1.bar(y_test, y_test.index color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.bar(y_pred, y_test.index, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()
plt.show()

but it returns error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape

bar/scatter/anything is fine I just wanted to take a look at all the values together.

This is so that I can group the best predicted values to understand which feature values within my original data are easiest to predict with.

If, incidentally, anyone could recommend the best XGBoost way of getting that information let me know too.

Here is some data:

ypred: 
[10.410029 ,   4.4897604,  29.77089  ,  23.548471 ,  27.415161 ,
        56.28772  ,  13.083108 ,  38.086662 ,  19.128792 ,  42.49037  ,
        65.15919  ,  47.172436 ,  39.517883 ,  13.782948 , 121.52351  ,
         8.388838 ,  49.625607 ,  24.28464  ,  49.55232  ,  34.797436] 

y_test:
7618      9.88
276       2.69
7045     26.93
6095     23.49
2296     24.79
7191     57.09
1213     15.90
2408     46.26
5961     18.60
275      41.03
1707     66.25
2333     53.50
5717     40.60
1497     12.34
4937    121.93
2654      7.97
7442     53.65
7157     25.93
2141     54.28
4339     36.93

Thank you

3
  • Attempting to run your code with data provided, after fixing the blatant syntax error, gives me TypeError: only size-1 arrays can be converted to Python scalars Commented Aug 16, 2021 at 15:26
  • Please post a proper minimal reproducible example with a full error traceback Commented Aug 16, 2021 at 15:27
  • @MadPhysicist you are right about the syntax error! I already approved an answer but thank you. I am not sure why you are receiving that type error.
    – Bigboss01
    Commented Aug 16, 2021 at 15:29

2 Answers 2

2

plt.scatter(y_test, y_pred)?

Many points close to the equality line (diagonal) means good predictions, far away means not so good.

1
  • Thank for your answer! That's actually the first thing I tried, and while it does the job, I want the representation to be more intuitive for documentation purposes.
    – Bigboss01
    Commented Aug 16, 2021 at 15:16
1

I assume y_test has a 'val' column, where the values you want to plot are stored.
Maybe this could be helpful?
You have the index on x axis, and predicted and true values on y axes.

fig, ax1 = plt.subplots(figsize = (20,5))

ax1.plot(y_test.index, y_test['val'], color = 'tab:orange')
ax1.set_ylabel('Actual',color = 'tab:orange')
ax2 = ax1.twinx()
ax2.plot(y_test.index, y_pred, color = 'tab:blue')
ax2.set_ylabel('Predicted',color = 'tab:blue')
plt.title('XGBoost Regression Performance')
fig.tight_layout()

plt.show()

enter image description here

1
  • 1
    Amazing! Exactly what I was looking for. I've used scatter instead of line just because the index was not in order so the line was a little odd. Anyway. Thank you! I'm clearly coding while tired cause this was a very straightforward solution :)
    – Bigboss01
    Commented Aug 16, 2021 at 15:27

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.