Module 3: Introduction to NumPy

Module 3: Introduction to NumPy#

In this module you will learn how to:

convert between Python lists and NumPy arrays,
select array elements,
slice arrays,
fill an array with values,
perform mathematical operations on arrays,
save arrays as a txt file.

NumPy is a Python package for efficient handling of data in the form of vectors, matrices, etc. Further details about the package and its manual can be found on https://numpy.org. NumPy provides a faster and more memory-efficient alternative to Python lists [NumPy Developers, n.d.]. It is also compatible with many statistical packages. In this section, we will work through the fundamentals of the package.

You have not installed NumPy yet, just run this command in terminal:

pip install numpy

Once you’ve done so, let us then import the package in Jupyter notebook:

# np is a common abbreviation for NumPy
import numpy as np

Introduction to NumPy arrays#

The main concept in NumPy is an array. It is equivalent to a Python list, but under the hood of the package, the underlying code representing an array is written differently and it has additional functionalities.

You can create a NumPy array from a Python list like this:

list = [1, 2, 3, 4]
# arr is a NumPy array
arr = np.array(list)

# or if you want to do it in one line:
arr = np.array([1, 2, 3, 4])

Each singular position in an array, like the numbers 1-4, are referred to as elements.

If you want to convert a NumPy array to a Python list, you can just do the following:

back_to_list = arr.tolist()

# where arr can be replaced with any other variable name pointing to a numpy array name you created

You can determine the type of your variable (and check if the conversion worked) like this:

# run the previous cells to be able to print the messages here
print('NumPy array:')
print(type(arr))


print('Python list:')
print(type(back_to_list))

NumPy array:
<class 'numpy.ndarray'>
Python list:
<class 'list'>

So far you have seen an example of a one-dimensional array [1, 2, 3, 4], which could represent a vector. You could also easily create 2D or 3D arrays:

print("2D array")
arr2D = np.array([[1,2], [3, 4]])
print(arr2D)

print("3D array")
arr3D = np.array([[[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]],
        
       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]],

       [[19, 20, 21],
        [22, 23, 24],
        [25, 26, 27]]])

print(arr3D)

2D array
[[1 2]
 [3 4]]
3D array
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]

As you can see above, creating more dimensions simply involves nesting another array inside array elements. This way you could create N-dimensional arrays, where N-1 is how many times arrays were nested.

You can check the size and the dimensionality of your arrays with .shape attribute (attributes are explained in module 7):

print("The shape of the 2D array:")
print(arr2D.shape)

print("The shape of the 3D array:")
print(arr3D.shape)

The shape of the 2D array:
(2, 2)
The shape of the 3D array:
(3, 3, 3)

The amount of numbers in parentheses indicates the dimensionality (which can alternatively be checked with arr2D.ndim) and the values correspond to the number of elements in each dimension/axis. So here for example the (2, 2) shape means the array has 2 rows and 2 columns.

Selecting a single array element#

Now let’s see how to select array elements we’re interested in.

To fetch a value, you need to know its index position in all dimensions of an array. For example, we would like to obtain the element with value 3 in array arr2D:

print(arr2D)

[[1 2]
 [3 4]]

This element is found in the second row and the first column. However indexing of all lists and arrays in Python starts with 0, so the element is in row 1, column 0 according to the Python language:

row_id = 1
column_id = 0

# let's fetch the element with value 3
print(arr2D[row_id, column_id])

Now we would like to select value 6 in arr3D. Since we have more dimensions now, referring to the element positions by just columns and rows is not sufficient, so we will call them axes instead, which is also the terminology used in NumPy documentation to describe their functions.

axis0 = 0
axis1 = 1
axis2 = 2
print(arr3D[axis0, axis1, axis2])

Exercise

(a) Select value 4 in arr2D and value 22 in arr3D.

(b) Select value 2 in arr.

# your code goes here

Selecting a range of elements in an array#

Let us see how to select a subset of elements in an array. If you are interested in an entire axis/row/column, just specify the index and : for all the other values that follow in the other dimension(s):

# select row 0 in arr2D
print(arr2D[0, :])

[1 2]

Here we will explore how to slice an array:

larger_array = np.array([[ 1,  2,  3,  4], 
                        [ 5,  6,  7,  8],
                        [ 9, 10, 11, 12], 
                        [13, 14, 15, 16]])
print(larger_array)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

We would like to select a subset of larger_array containing values 1, 2, 3, 5, 6, 7. These elements cover rows 0, 1 and columns 0, 1, 2. The interval of positions can be selected start:end (where the end position is not included in your sequence) like here:

row_start = 0
row_end = 2
col_start = 0
col_end = 3
print(larger_array[row_start:row_end,col_start:col_end])

[[1 2 3]
 [5 6 7]]

An important thing to keep in mind here is that end number is not included in the selected range of positions when slicing arrays, so eg. when you want to select rows 0 and 1, your end value is 2 like shown above.

Now, what if you are only interested in specific elements that don’t follow a neat range like in previous examples?

# do you remember the 1D array? 
print(arr)

[1 2 3 4]

That’s how you would select elements with values 1, 3, 4:

# put all of their positions in a list or an array
select_these = [0, 2, 3]
print(arr[select_these])

[1 3 4]

And how would you select elements with values 3, 10, 16 in larger_array?

print(larger_array)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

You would just create separate lists of their positions for each dimension:

axis0_list = [0, 2, 3]
axis1_list = [2, 1, 3]
print(larger_array[axis0_list, axis1_list])

[ 3 10 16]

Exercise

Select values 5, 12 in larger_array.

# your code goes here

Selecting array values according to specific criteria#

Here we will discuss a couple of examples where you can select values that meet certain rules:

# select elements with values that are below 10 in larger_array:

print(larger_array[larger_array < 10])

[1 2 3 4 5 6 7 8 9]

What if you want to select elements that need to meet more than one criterion? Put each criterion in brackets and combine them with & like here:

# select elements with values that are below 10 and more than 5 in larger_array:

print(larger_array[(larger_array > 5) & (larger_array < 10)])

[6 7 8 9]

Exercise

Filter out for values equal to or larger than 2, smaller than 5, and not equal to 4 in larger_array.

# your code goes here

Filling out arrays with values#

Now that we know how to select elements of interest, we can populate arrays with values in a way we want. Let’s inspect how we could do this:

new_array = np.zeros((5, 5)) # this function allows us to create a 2D NumPy array of size 5x5, all filled with zeros
print("Newly initiated array:")
print(new_array)
# fill element in position [3, 4] with a value of 1:
new_array[3, 4] = 1
print("Array with one new value:")
print(new_array)

# fill multiple elements with specified position with a value of 1:
axis0_list = [0, 2, 3]
axis1_list = [2, 1, 3]
new_array[axis0_list, axis1_list] = 1 
print("Array with more 1s:")
print(new_array)

Newly initiated array:
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
Array with one new value:
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0.]]
Array with more 1s:
[[0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 1.]
 [0. 0. 0. 0. 0.]]

Things get slightly trickier when you want to put an array of values into another larger array, not just a single value of 1 as shown above. The selection of the range of positions is specified differently:

# let's say we want to put arrayA in the middle of the larger arrayB:
arrayA = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("arrayA")
print(arrayA)
arrayB = np.zeros((5, 5))
print("arrayB")
print(arrayB)

arrayA
[[1 2 3]
 [4 5 6]
 [7 8 9]]
arrayB
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

If you just want to select an interval of positions, you need to remember to include the second square brackets for the second position and : in [:, 1:4] as a carry-over from the previous axis. This ensures the correct dimensionality format is preserved for arrayB.

arrayB[1:4][:, 1:4] = arrayA
print(arrayB)

[[0. 0. 0. 0. 0.]
 [0. 1. 2. 3. 0.]
 [0. 4. 5. 6. 0.]
 [0. 7. 8. 9. 0.]
 [0. 0. 0. 0. 0.]]

However sometimes you might prefer to provide the range of positions as a list of indices instead of the start:end interval - for example, because you might want to write a function that needs to accept arguments in such format. In this case, you would do it like this:

axis0_list = [1, 2, 3]
axis1_list = [1, 2, 3]

# this time the list of indices need to be in the array format
axis0_array = np.array(axis0_list)
axis1_array = np.array(axis1_list)

arrayB = np.zeros((5, 5))

arrayB[axis0_array[:, None], axis1_array] = arrayA

print(arrayB)

[[0. 0. 0. 0. 0.]
 [0. 1. 2. 3. 0.]
 [0. 4. 5. 6. 0.]
 [0. 7. 8. 9. 0.]
 [0. 0. 0. 0. 0.]]

What if you put a 2D slice of arrayB (with values 2, 3, 5, 6) using the list of positions into arrayC? In this case you would need to remember how to select this slice to preserve its 2D shape.

# the previously shown method for selecting elements puts them all into a 1D array:
axis0_list = [1, 1, 2, 2]
axis1_list = [2, 3, 2, 3]
print(arrayB[axis0_list, axis1_list])

[2. 3. 5. 6.]

# to preserve the 2D dimensionality of your selected slice to copy into arrayC, you need to choose the slice like this:
axis0_list = [1, 2]
axis1_list = [2, 3]
print(arrayB[axis0_list][:, axis1_list])

[[2. 3.]
 [5. 6.]]

arrayC = np.ones((5, 5)) # This generates a 2D array of size 5x5, all filled with ones.

axis0_array = np.array(axis0_list)
axis1_array = np.array(axis1_list)

arrayC[axis0_array[:, None], axis1_array] = arrayB[axis0_list][:, axis1_list]

print("populated arrayC")
print(arrayC)

populated arrayC
[[1. 1. 1. 1. 1.]
 [1. 1. 2. 3. 1.]
 [1. 1. 5. 6. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Joining arrays together#

Combining NumPy arrays into one is different from Python lists. Their shapes need to match for a successful concatenation.

To explain this, let’s try to append extra_array to arrayA:

arrayA = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("arrayA")
print(arrayA)

print("extra_array")
extra_array = np.array(["A", "A", "A"])
print(extra_array)

arrayA
[[1 2 3]
 [4 5 6]
 [7 8 9]]
extra_array
['A' 'A' 'A']

We won’t be able to concatenate both arrays immediately, because their shapes and dimensionalities don’t match:

print("arrayA")
print(arrayA.shape) # it's 2D
print("extra_array") 
print(extra_array.shape) # it's 1D

arrayA
(3, 3)
extra_array
(3,)

To make them match, we can reshape extra_array to become an additional row for arrayA.

# initially extra_array was just a 1D array, now we turn it into a 2D format.
extra_row_array = extra_array.reshape((1,-1)) 
# 1  in (1, -1) reshaping means that you want to have 1 position in dimension 1 
# and -1 means that you don't impose restrictions on how the rest of elements are arranged in the other dimension
print("As a row")
print(extra_row_array)
print("Its new shape")
print(extra_row_array.shape)

As a row
[['A' 'A' 'A']]
Its new shape
(1, 3)

# that's how we can append the extra_array at the end:
print(np.append(arrayA, extra_row_array, axis=0))
# We need to specify axis number in np.append to clarify in which direction we append the arrays

[['1' '2' '3']
 ['4' '5' '6']
 ['7' '8' '9']
 ['A' 'A' 'A']]

# we can also change the order
print(np.append(extra_row_array, arrayA, axis=0))

[['A' 'A' 'A']
 ['1' '2' '3']
 ['4' '5' '6']
 ['7' '8' '9']]

In this example, it’s shown how we can append the extra_array as a column:

extra_column_array = extra_array.reshape((-1, 1))
print("As a column")
print(extra_column_array)
print("Its new shape")
print(extra_column_array.shape)

As a column
[['A']
 ['A']
 ['A']]
Its new shape
(3, 1)

print(np.append(arrayA, extra_column_array, axis=1))

[['1' '2' '3' 'A']
 ['4' '5' '6' 'A']
 ['7' '8' '9' 'A']]

Exercise

Place the following array exercise_array in the middle and right column of arrayA.

exercise_array = np.array([[-9, -15, -9], [-3, -10, -5]])
arrayA = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# your code goes here

Mathematical operations on arrays#

Addition/subtraction on arrays in NumPy visually look like addition/subtraction on matrices and vectors. Each element in array 1 will “interact mathematically” with the element in array 2 with the corresponding positions. The arrays must have matching shapes for the operations to be successful. They can easily extend to N-dimensional arrays.

vector1 = np.array([1, 2])
vector2 = np.array([3, 4])

print("Sum of two array vectors:")
print(vector1+vector2)
print("Difference of two array vectors:")
print(vector1-vector2)

Sum of two array vectors:
[4 6]
Difference of two array vectors:
[-2 -2]

Here is an example with matrices:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[10, 2], [3, 1]])
print("Sum of two array matrices:")
print(arr1+arr2)

Sum of two array matrices:
[[11  4]
 [ 6  5]]

Multiplication and division between arrays is also done element-wise in NumPy, but it doesn’t have the same correspondence to the mathematical notation on paper as the operations above:

print("arr1:")
print(arr1)

print("arr2:")
print(arr2)

print("Elementwise division")
print(arr1 / arr2)
# here element [i, j] in arr1 is divided by the correspoding element [i, j] in arr2

# same rule applies to multiplication
print("Elementwise multiplication")
print(arr1 * arr2)

print("Elementwise raising to a power")
print(arr1 ** arr2)

arr1:
[[1 2]
 [3 4]]
arr2:
[[10  2]
 [ 3  1]]
Elementwise division
[[0.1 1. ]
 [1.  4. ]]
Elementwise multiplication
[[10  4]
 [ 9  4]]
Elementwise raising to a power
[[ 1  4]
 [27  4]]

Any addition/subtraction/multiplication/division/raising to a power on an array with a single value will perfom this operation on each array element. Some examples:

print("arr1:")
print(arr1)

print("Addition")
result = arr1 + 1
print(result)

print("Subtraction")
result = arr1 - 2
print(result)

print("Multiplication")
result = arr1 * 10
print(result)

print("Division")
result = arr1 / 10
print(result)

print("Raising to the second power")
result = arr1 ** 2
print(result)

arr1:
[[1 2]
 [3 4]]
Addition
[[2 3]
 [4 5]]
Subtraction
[[-1  0]
 [ 1  2]]
Multiplication
[[10 20]
 [30 40]]
Division
[[0.1 0.2]
 [0.3 0.4]]
Raising to the second power
[[ 1  4]
 [ 9 16]]

If you wanted to do a proper matrix multiplication or dot product, you can use @:

print("vector1:")
print(vector1)

print("vector2:")
print(vector2)

print("Dot product of vector1 and vector2")
print(vector1 @ vector2)

print("arr1:")
print(arr1)

print("arr2:")
print(arr2)

print("Matrix multiplication of arr1 and arr2")
print(arr1 @ arr2)

vector1:
[1 2]
vector2:
[3 4]
Dot product of vector1 and vector2
11
arr1:
[[1 2]
 [3 4]]
arr2:
[[10  2]
 [ 3  1]]
Matrix multiplication of arr1 and arr2
[[16  4]
 [42 10]]

With NumPy, you can also find the maximum and minimum values in an array:

print("arr2:")
print(arr2)

print("Maximum value in arr2 is " + str(np.max(arr2)))

print("Minimum value in arr2 is " + str(np.min(arr2)))

arr2:
[[10  2]
 [ 3  1]]
Maximum value in arr2 is 10
Minimum value in arr2 is 1

Not specifying axis in np.max or np.min leads to the search of the max/min value in an entire array. However determining one will return a max value along that dimension:

print(np.max(arr2, axis=0)) # returns the maximum value in each column
print(np.max(arr2, axis=1)) # returns the maximum value in each row

[10  2]
[10  3]

You could also get a position of the element with max/min value:

print("vector2:")
print(vector2)

print("Positions of elements with max and min values respectively: ")
print(np.argmax(vector2))
print(np.argmin(vector2))

vector2:
[3 4]
Positions of elements with max and min values respectively: 
1
0

Things get slightly more complicated when we want to obtain positions from argmax/argmin for an array with dimentionality N > 1:

print("arr2:")
print(arr2)

print("Argmax positons?")
print(np.argmax(arr2))

arr2:
[[10  2]
 [ 3  1]]
Argmax positons?
0

Only one index is returned, because the 2D array got flattened to 1D. To obtain both indices, we need to reconstruct the indices of the array to its original shape. We can do it with np.unravel_index function:

print(np.unravel_index(np.argmax(arr2), arr2.shape))

(np.int64(0), np.int64(0))

NumPy also conveniently provides built-in functions for common statistical operations:

print("arr2:")
print(arr2)

print("Arithmetic mean for all elements in arr2:")
print(np.mean(arr2))
print("Standard deviation for all elements in arr2:")
print(np.std(arr2))

arr2:
[[10  2]
 [ 3  1]]
Arithmetic mean for all elements in arr2:
4.0
Standard deviation for all elements in arr2:
3.5355339059327378

np.mean and np.std follow the same rules with axis argument like in np.max.

Saving a NumPy array as a txt file and loading it#

Some of your array computations might last a long time, so you might just want to save the final array and be able to return to it later. Here you can see how to save a NumPy array as a txt file:

# you can specify where you want to save your array, eg. arrayA, by providing a file path
file_path = "./" # in this example, we are saving the file in the same folder as this script
np.savetxt(file_path+"array_name.txt", arrayA, delimiter="\t") # delimiter informs what spacing should be used between array elements

# you can alternatively save your array as a CSV file (which is coma-separated)
np.savetxt(file_path+"array_name.csv", arrayA, delimiter=",")

Tab-separated and coma-separated txt files allow easier manipulation with code. Additionally, txt files are commonly used open-source human-readable formats, which enhances the reproducibility of your code analysis.

References#

[Dat22]

DataCamp. Python vs r for data science: which should you learn? https://www.datacamp.com/blog/python-vs-r-for-data-science-whats-the-difference, 2022.

[Jup15]

Project Jupyter. What is a notebook. https://docs.jupyter.org/en/latest/#what-is-a-notebook, 2015.

[Mic25]

Microsoft. Windows subsystem for linux. https://learn.microsoft.com/en-us/windows/wsl/about, 2025.

[Pan25]

Pandas. Pandas.series — pandas documentation. https://pandas.pydata.org/docs/reference/api/pandas.Series.html, 2025. Accessed: 2025-06-25.

[NumPyDevelopers]

NumPy Developers. What is numpy? Accessed: 27-04-2025. URL: https://numpy.org/doc/stable/user/whatisnumpy.html.

[PythonSFoundation25]

Python Software Foundation. Beginner's guide to python. https://wiki.python.org/moin/BeginnersGuide, 2025.

Module 3: Introduction to NumPy

Contents

Module 3: Introduction to NumPy#

Introduction to NumPy arrays#

Selecting a single array element#

Selecting a range of elements in an array#

Selecting array values according to specific criteria#

Filling out arrays with values#

Joining arrays together#

Mathematical operations on arrays#

Saving a NumPy array as a txt file and loading it#

References#