457

I want to create an empty array and append items to it, one at a time.

xs = []
for item in data:
    xs.append(item)

Can I use this list-style notation with NumPy arrays?

17 Answers 17

608

That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.

Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:

>>> import numpy as np

>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]

>>> a
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
17
  • 161
    There is also numpy.empty() if you don't need to zero the array.
    – janneb
    Commented Apr 19, 2009 at 21:19
  • 33
    What's the benefit of using empty() over zeros()?
    – Zach
    Commented Sep 1, 2012 at 16:11
  • 62
    that if you're going to initialize it with your data straight away, you save the cost of zeroing it.
    – marcorossi
    Commented Nov 13, 2012 at 9:23
  • 31
    @maracorossi so .empty() means one can find random values in the cells, but the array is created quicker than e.g. with .zeros() ? Commented Jul 13, 2016 at 17:38
  • 14
    @user3085931 yep !
    – Nathan
    Commented Sep 30, 2016 at 15:33
149

A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.

e.g.


mylist = []
for item in data:
    mylist.append(item)
mat = numpy.array(mylist)

item can be a list, an array or any iterable, as long as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use


mat = numpy.array(data)

(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)

EDIT:

If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!

4
  • 1
    Are numpy arrays/matrices fundamentally different from Matlab ones?
    – levesque
    Commented Nov 11, 2010 at 3:20
  • 5
    If for some reason you need to define an empty array, but with fixed width (e.g. np.concatenate()), you can use: np.empty((0, some_width)). 0, so your first array won't be garbage. Commented Sep 1, 2017 at 5:56
  • I think this is the right answer to the general case. It doesn't seems very elegant but it's the only way that I have found to address this in numpy. Commented Jan 31, 2023 at 16:13
  • @NumesSanguis It's the answer!
    – Mr.Spock
    Commented Nov 29, 2023 at 11:45
90

To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).

This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):

import numpy as np

n = 2
X = np.empty(shape=[0, n])

for i in range(5):
    for j  in range(2):
        X = np.append(X, [[i, j]], axis=0)

print X

which will give you:

[[ 0.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  1.]
 [ 2.  0.]
 [ 2.  1.]
 [ 3.  0.]
 [ 3.  1.]
 [ 4.  0.]
 [ 4.  1.]]
3
  • 7
    This should be the answer to the question OP asked, for the use case where you don't know #rows in advance, or want to handle the case that there are 0 rows
    – Hansang
    Commented Aug 15, 2019 at 4:52
  • While this does work as the OP asked, it is not a good answer. If you know the iteration range you know the target array size.
    – hpaulj
    Commented Apr 20, 2021 at 2:49
  • 5
    But there are of course plenty of examples where you don't know the iteration range and you don't care about the computational cost. Good answer in that case!
    – Tom Saenen
    Commented Dec 5, 2021 at 10:54
34

I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.

# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)

The result will be:

In [34]: x
Out[34]: array([], dtype=float64)

Therefore you can directly initialize an np array as follows:

In [36]: x= np.array([], dtype=np.float64)

I hope this helps.

2
  • 1
    This does not work for arrays, as in the question, but it can be useful for vectors.
    – divenex
    Commented Dec 22, 2017 at 16:31
  • 2
    a=np.array([]) seems to default to float64
    – P i
    Commented Sep 7, 2019 at 9:58
22

For creating an empty NumPy array without defining its shape you can do the following:

arr = np.array([])

The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.

for adding new element to the array us can do:

arr = np.append(arr, 'new element')

Note that in the background for python there's no such thing as an array without defining its shape. as @hpaulj mentioned this also makes a one-rank array.

2
  • 1
    No., np.array([]) creates an array with shape (0,), a 1d array with 0 elements. There's no such thing as an array without defined shape. And 2) does the same thing as 1).
    – hpaulj
    Commented Apr 20, 2021 at 2:42
  • It's true @hpaulj although the whole point of the discussion is to not think mentally about the shape when you're creating one. worth mentioning that anyway.
    – Pedram
    Commented Aug 21, 2021 at 9:27
10

You can use the append function. For rows:

>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],      
       [1, 2, 3]])

For columns:

>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],      
       [1, 2, 3, 15]])

EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.

1
  • 4
    How does this answer the question? I don't see the part about empty arrays Commented Sep 10, 2021 at 5:07
5

Here is some workaround to make numpys look more like Lists

np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)

OUTPUT: array([ 2., 24.])

1
  • 1
    Stay away from np.append. It's not a list append clone, despite the poorly chosen name.
    – hpaulj
    Commented Apr 20, 2021 at 2:45
3

If you absolutely don't know the final size of the array, you can increment the size of the array like this:

my_arr = numpy.zeros((0,5))
for i in range(3):
    my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)

[[ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]]
  • Notice the 0 in the first line.
  • numpy.append is another option. It calls numpy.concatenate.
3

You can apply it to build any kind of array, like zeros:

a = range(5)
a = [i*0 for i in a]
print a 
[0, 0, 0, 0, 0]
1
  • 4
    If you want to do that in pure python, a= [0] * 5 is the simple solution
    – Makers_F
    Commented Dec 22, 2015 at 3:46
3

Another simple way to create an empty array that can take array is:

import numpy as np
np.empty((2,3), dtype=object)
2

Depending on what you are using this for, you may need to specify the data type (see 'dtype').

For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):

myarray = numpy.empty(shape=(H,W),dtype='u1')

For an RGB image, include the number of color channels in the shape: shape=(H,W,3)

You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.

1

I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;

ur_list = []
for col in columns:
    ur_list.append(list(col))

mat = np.matrix(ur_list)
1

I think you can create empty numpy array like:

>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)

This format is useful when you want to append numpy array in the loop.

1

Perhaps what you are looking for is something like this:

x=np.array(0)

In this way you can create an array without any element. It similar than:

x=[]

This way you will be able to append new elements to your array in advance.

1
  • 1
    No, your x is a an array with shape (), and one element. It is more like 0 than []. You could call it a 'scalar array'.
    – hpaulj
    Commented Apr 20, 2021 at 2:44
0

The simplest way

Input:

import numpy as np
data = np.zeros((0, 0), dtype=float)   # (rows,cols)
data.shape

Output:
(0, 0)

Input:

for i in range(n_files):
     data = np.append(data, new_data, axis = 0)
1
  • 1
    Please don't recommend using np.append in a loop.
    – hpaulj
    Commented Oct 6, 2022 at 5:37
0

You might be better off using vstack in general case where you might want to add array of array. For example, let's say you generate batches and accumulate them.

import numpy as np
embeddings = np.empty((0, 768), dtype=np.float32)
for i in range(10):
    batch = generate() # shape: (64, 768)
    embeddings = np.vstack((embeddings, batch))
0

If the array has a float data type, I prefer to start with an array full of NaN values rather than using zeros or empty arrays. That way, if any elements aren't assigned for some reason, you don't get zeros or some other value creeping in.

data = [1.0, 2.0, 3.0]
xs = np.full_like(data, np.nan)
for i, item in enumerate(data):
    xs[i] = item

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.