Array Operations & Attributes Lesson

NumPy Array Attributes

17 min to complete · By Gilad Gressel

NumPy Array Attributes

Arrays in NumPy come with a number of attributes. In this lesson, we will examine a small subset of them. You will encounter and interact with these NumPy array attributes the most.

  1. .dtype
  2. .shape
  3. .ndim
  4. .size
  5. .itemsize
  6. .nbytes
  7. .T
  8. .base
import numpy as np

.dtype

You have already seen this NumPy array attribute in the previous section. It returns the data type of the array.

a = np.ones((3, 3))
a.dtype
dtype('float64')

.shape

.shape is one of the most essential attributes of an array. It returns a tuple consisting of array dimensions. The dimensions are the number of elements in each dimension. For example, a 2D array of shape (2, 3) has 2 rows and 3 columns.

print(a)
# a has 3 rows and 3 columns
a.shape
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

(3, 3)
b = np.ones((3, 3, 2))
# b has 3 rows, 3 columns, and 2 depth 
# you can think of it as 3 matrices of 3x2
print(b)
b.shape
[[[1. 1.]
  [1. 1.]
  [1. 1.]]
    
[[1. 1.]
  [1. 1.]
  [1. 1.]]
    
[[1. 1.]
  [1. 1.]
  [1. 1.]]]

(3, 3, 2)

Typically, in machine learning, we work with 2D arrays. Usually, a row is a sample or data point, and the column(s) represents some kind of information about the sample. This is also called "tabular data". However, you may also work with computer vision datasets, here it gets trickier.

Computer Vision Channels

In computer vision, we work with images. Images are 2D arrays, but they have a third dimension called channels. The number of channels is 3 for RGB images and 1 for black and white images. So, the shape of an image is (height, width, channels). For example, an image of size 32x32x3 has a shape of (32, 32, 3). The first two dimensions are the height and width of the image, and the last dimension is the number of channels. Then, if we have to stack the images into a dataset, we get a fourth dimension, which is the number of images. So, the shape of the dataset will be (number of images, height, width, channels).

For example, if we have 1000 images each of size 32x32x3, then the shape of the array representing the images will be (1000, 32, 32, 3). The first dimension is the number of images, the second and third dimensions are the height and width of the image, and the last dimension is the number of channels. The number of channels is 3 because we are working with RGB images. If we were working with black and white images, then the number of channels would be 1.

The conventions for which dimension represents what is not set in stone. It depends on the library and convention you are using. Just make sure to investigate how others are using the dimensions and follow the same convention.

Regardless of what kind of data you are working with, you will always have to check the shape of the array. It is a good practice (in the beginning) to check the shape of the array after every operation to ensure that the operation is doing what you think it is doing.

.ndim

This attribute returns the number of dimensions of the array. For example, a 2D array has 2 dimensions, a 3D array has 3 dimensions, and so on. It is the same as the length of the shape tuple.

print(b.ndim)
print(b.shape)
print(len(b.shape))
print(len(b.shape) == b.ndim)
3
(3, 3, 2)
3
True

.size

This attribute returns the total number of elements in the array. It is the product of the elements of the shape tuple. It is useful when you want to check if two arrays are compatible for some operation. For example, if you want to add two arrays, then the number of elements in both arrays should be the same.

It is also just a convenient way to check the size of the array. For example, if you have a 2D array of shape (2, 3), then the size of the array is 6. You can also get the size of the array by multiplying the number of rows by the number of columns. In this case, 2 x 3 = 6. But we programmers are lazy, so it's nice to have an attribute that just returns the size of the array.

print(b.shape)
print(b.size)
print(b.shape[0] * b.shape[1] * b.shape[2] == b.size)
(3, 3, 2)
18
True

.itemsize

This NumPy array attribute returns the size of each element in the array in bytes. For example, if you have an array of type int32, then each element in the array is 4 bytes. So, the itemsize of the array will be 4. If you have an array of type float64, then each element in the array is 8 bytes. So, the itemsize of the array will be 8. This is a useful function to check the size of the array in memory. For example, if you have an array of size 1000x1000x1000, then the size of the array in memory will be 1000x1000x1000x8 = 8,000,000,000 bytes. This is 8 GB. So, if you have a computer with 4 GB of RAM, then you will not be able to create an array of size 1000x1000x1000. You will get a memory error.

print(b.dtype)
print(b.itemsize)
# b.itemsize is the size of each element in bytes 
# not the total size of the array
float64
8

.nbytes

This attribute returns the total size of the array in bytes. It is the product of the elements of the shape tuple and the itemsize of the array. It is the same as the size of the array multiplied by the itemsize of the array.

print(b.nbytes)
# b.nbytes is the total size of the array in bytes
print(b.itemsize * b.size == b.nbytes)
144
True

.T

The .T NumPy array attribute returns the transpose of the array. It is the same as calling the transpose function on the array.

Note that the transpose of a 1D array is still a 1D array. The transpose of a 2D array is the same as the transpose of a matrix. The rows become columns, and the columns become rows. Transpose of a 3D array is a bit tricky. It is the same as the transpose of a 3D matrix. The rows become columns, and the columns become rows. But the third dimension stays the same. So, the transpose of a 3D array is a 3D array. The transpose of a 4D array is also a 4D array. And so on.

Also, in NumPy, when you perform a transpose, it creates a view of the original array (whenever possible). It does not create a copy. So, if you change the transpose, then the original array will also change. This is called a view. We will discuss views in more detail in the next section.

c = np.array([[1, 2, 3],[4, 5, 6]])
print(c)
print()
print(c.T)
[[1 2 3]
 [4 5 6]]
    
[[1 4]
 [2 5]
 [3 6]]

.base

This is a bit of an advanced NumPy array attribute. It returns the original object if the array is a view of another array. Otherwise, it returns None. We will discuss views in more detail in the later sections. But this is an important attribute to know about. It can help you debug your code if you are getting unexpected results.

x = np.arange(9)
print(x)
[0 1 2 3 4 5 6 7 8]
y = x.reshape(3, 3)
print("we reshape x to y with 3 rows and 3 columns")
print("y:")
print(y)

we reshape x to y with 3 rows and 3 columns
y:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
print("When we print the base of y, we get x, the original array that y was created from")
print(y.base)  # .reshape() creates a view
When we print the base of y, we get x, the original array that y was created from
[0 1 2 3 4 5 6 7 8]
z = y[[2, 1]]
print("we create z from y with advanced indexing, which will create a copy not a view")
print(z)

we create z from y with advanced indexing, which will create a copy, not a view
[[6 7 8]
 [3 4 5]]
print(f"the base of z is {z.base} because it is a copy of y")
print(z.base is None)  # advanced indexing creates a copy
the base of z is None because it is a copy of y
True

The above should be confusing! We are doing a bunch of stuff we haven't shown you yet, like reshaping and fancy indexing. However, the point that we want to drive home is that .base will be None if the array is not a view of another array (or put another way, it is None if the array is original, it is its own thing in memory). This is a useful attribute to know about. It can help you debug your code if you are getting unexpected results. Because arrays are mutable, and sometimes you may create views without realizing it - you can get into situations where you may have strange side effects. Using the .base attribute can help you debug these situations.

It's totally OK to forget about .base and just remember that "it can be used to test for views".

Summary: NumPy Array Attributes

  • NumPy arrays come with several useful attributes that provide information about the array's properties, such as data type (.dtype), shape (.shape), number of dimensions (.ndim), total number of elements (.size), size of each element in bytes (.itemsize), and total size in bytes (.nbytes).
  • The .T attribute returns the transpose of the array, while the .base attribute can be used to check if an array is a view of another array, which is helpful for debugging purposes when dealing with potential side effects from unintended views.
  • Understanding array shapes and dimensions is crucial, especially when working with different types of data like tabular data and image data, as the conventions for representing dimensions may vary across libraries and applications.
  • Keeping track of memory usage and being mindful of potential memory errors is essential when working with large arrays, as the total size of an array in memory is determined by its shape and the size of each element.