NumPy Introduction

What is NumPy?

NumPy stands for Numerical Python. It’s a multi-dimensional array library. It also has functions for working in domain of linear algebra, Fourier transform, and matrices

Difference between Python Lists and NumPy

NumPy is much faster than Lists performance-wise due to the following reason:

  • Lists uses much more memory than NumPy due to Python’s design architecture
  • Lists uses non-contiguous memory while NumPy uses contiguous memory.
  • Lists allow different data types while NumPy uses signle data type so there is no type check in NumPy
  • NumPy has much more features/fuctions than Lists, such as math methods.

Math comparison between Python List and NumPy Array

Math operators are all working with NumPy Array but not List.

import numpy as np
a = np.array([1,2,3,4,5])
print(a ** 2)
print(np.sin(a))
b = [1,2,3,4,5]
b+2

[ 1  4  9 16 25]
[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
-----------------------------------------------------------------
TypeError    Traceback (most recent call last)
<ipython-input-254-29a854a496cd> in <module>
      4 print(np.sin(a))
      5 b = [1,2,3,4,5]
----> 6 b+2
TypeError: can only concatenate list (not "int") to list

Applications of NumPy:

  • Mathematics (MATLAB Replacement)
  • Plotting (Matplotlib)
  • Backend (Pandas, Connect 4, Digital Photography)
  • Machine Learning (Tensor is similar to NumPy)

NumPy Basics

import numpy as np
a = np.array([1, 2, 3], dtype='int8')
print(a)
b = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(b)
print(a.ndim)
print(b.ndim)
print(a.shape)
print(b.shape)
print(a.dtype)
print(b.dtype)
print(f'a.itemsize is {a.itemsize}')
print(f'b.itemsize is {b.itemsize}')
print(a.size * a.itemsize)
print(a.nbytes)
print(b.nbytes)

[1 2 3]
[[1. 2. 3.]
 [4. 5. 6.]]
1
2
(3,)
(2, 3)
int8
float64
a.itemsize  is 1
b.itemsize  is 8
3
3
48

Accessing/Changing item values for columns, rows

a = np.array([[1,2,3,4,5,6,7], [8,9,10,11,12,13,14]])
print(f'a.shape is {a.shape}\n')
b = [1, 2, [3, 4]]
b[2][1] = 5
print(f'a[1,5] == a[1, -2] result is {a[1,5] == a[1, -2]}\n')
print(f'a[0,:] result is {a[0,:]}\n')
print(f'a[:, 2] result is {a[:, 2]}\n')
print(f'a[0, 1:6:2] result is {a[0, 1:6:2]}\n')
print(f'a[0, 1:-1:2] result is {a[0, 1:-1:2]}\n')
a1 = a2 = a3 = a
a1[1, 5] = 20
print(f'a1 is {a1}\n')
a2[:, 2] = 50
print(f'a2 is {a2}\n')
a3[:, 2] = [1,2]
print(f'a3 is {a3}\n')

a.shape is (2, 7)
a[1,5] == a[1, -2] result is True
a[0,:] result is [1 2 3 4 5 6 7]
a[:, 2] result is [ 3 10]
a[0, 1:6:2] result is [2 4 6]
a[0, 1:-1:2] result is [2 4 6]
a1 is [[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 20 14]]
a2 is [[ 1  2 50  4  5  6  7]
 [ 8  9 50 11 12 20 14]]
a3 is [[ 1  2  1  4  5  6  7]
 [ 8  9  2 11 12 20 14]]

Indexing and Slicing

You can index and slice NumPy arrays in the same ways you can slice Python lists.

data = np.array([1, 2, 3])
print(data[1])
print(data[0:2])
print(data[1:])
print(data[-2:])

2
[1 2]
[2 3]
[2 3]

You can visualize it this way:

Data Filtering with NumPy Array

NumPy array is very useful to do data filtering using certain conditions.

a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a.shape)
print(a[a < 7]) test = (a > 8)
print(test)
print(a[test])

(3, 4)
[1 2 3 4 5 6]
[[False False False False]
 [False False False False]
 [ True  True  True  True]]
[ 9 10 11 12]

a = np.array([1, 2, 3, 4, 5, 6, 7])
f = a % 3 == 0
b = a[f]
print(b)

[3 6]

numpy.where()

Return elements chosen from x or y depending on condition.

a = np.arange(10)
print(a)
b = np.where(a < 5, a, a*2)
print(b)

[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4 10 12 14 16 18]

numpy.nonzero()

Return the indices of the elements that are non-zero.

a = np.array([[0, 1, 2], [4, 2, 0], [3, 6, 0]])
a = a[np.nonzero(a)]
print(a)

[1 2 4 2 3 6]

numpy.choose()

Construct an array from an index array and a set of arrays to choose from.

a = [[11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44]]
'''
the 1st element of the result will be the 1st element of the third (2+1) in a - 31
the 2nd element will be the 2nd element of the fourth (3+1) in a - 42
the 3rd will be the 3rd item of the second(1+1) a array - 23
the 4th will be the 4th item of the first(0+1) in a - 14
'''
b = np.choose([2, 3, 1, 0], a)
print(b)

[31 42 23 14]

Create an array from existing data

New array is created by slicing and indexing, np.vstack(), np.hstack(), np.hsplit(), .view(), copy().

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
a1 = a[4:9]
print(a1)
a1 = a[2:,]
print(a1)

[5 6 7 8 9]
[ 3  4  5  6  7  8  9 10]

You can also stack two existing arrays, both vertically and horizontally.

a1 = np.array([[1, 1], [2, 2]])
a2 = np.array([[3, 3], [4, 4]])
a3 = np.vstack((a1, a2))
print(a3)
a4 = np.hstack((a1, a2))
print(a4)

[[1 1]
 [2 2]
 [3 3]
 [4 4]]
[[1 1 3 3]
 [2 2 4 4]]

You can split an array into several smaller arrays using hsplit.

x = np.arange(1, 25)
print(f'x is {x}')
x1 = x.reshape(2, 12)
print(f'x1 is {x1}')
x2 = np.hsplit(x1, 3)
print(f'x2 is {x2}')
x3 = np.hsplit(x, (3, 4))
print(f'x3 is {x3}')

x is [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
x1 is [[ 1  2  3  4  5  6  7  8  9 10 11 12]
 [13 14 15 16 17 18 19 20 21 22 23 24]]
x2 is [array([[ 1,  2,  3,  4],
       [13, 14, 15, 16]]), array([[ 5,  6,  7,  8],
       [17, 18, 19, 20]]), array([[ 9, 10, 11, 12],
       [21, 22, 23, 24]])]
x3 is [array([1, 2, 3]), array([4]), array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
       22, 23, 24])]

Initializing Different Types of Arrays

np.zeros( )

Method zeros(shape) will create an array filled with 0 values with the specified shape. The default dtype is float64.

np.zeros(5)
np.zeros((2,3))
np.zeros((2,3, 4))

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],
       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

np.ones()

np.ones((3,4,5), dtype='int8')

array([[[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],
       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],
       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]], dtype=int8)

np.arange()

Method arange() will create arrays with regularly incrementing values.

a = np.arange(4)
b = np.arange(2, 10, dtype=float)
c = np.arange(2, 3, 0.1)
print(f'a is {a}')
print(f'b is {b}')
print(f'c is {c}')

a is [0 1 2 3]
b is [2. 3. 4. 5. 6. 7. 8. 9.]
c is [2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]

np.linspace()

Method linspace() will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values. The advantage of this creation function is that one can guarantee the number of elements and the starting and end point, which arange() generally will not do for arbitrary start, stop, and step values.

a = np.linspace(1., 4., 5)
b = np.linspace(10, 20, 10)
print(a)
print(b)

[1.   1.75 2.5  3.25 4.  ]
[10.  11.11111111 12.22222222 13.33333333 14.44444444 15.55555556 16.66666667 17.77777778 18.88888889 20.        ]

np.indices()

Return an array representing the indices of a grid. Method indices() will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension.

a = np.indices((2,3))
b = np.indices((3,2))
c = np.indices((2, 3, 4))
print(f'a is\n {a}\n')
print(f'b is\n {b}')
print(f'c is\n {c}')

a is
 [[[0 0 0]
  [1 1 1]]
 [[0 1 2]
  [0 1 2]]]
b is
 [[[0 0]
  [1 1]
  [2 2]]
 [[0 1]
  [0 1]
  [0 1]]]
c is
 [[[[0 0 0 0]
   [0 0 0 0]
   [0 0 0 0]]
  [[1 1 1 1]
   [1 1 1 1]
   [1 1 1 1]]]
 [[[0 0 0 0]
   [1 1 1 1]
   [2 2 2 2]]
  [[0 0 0 0]
   [1 1 1 1]
   [2 2 2 2]]]
 [[[0 1 2 3]
   [0 1 2 3]
   [0 1 2 3]]
  [[0 1 2 3]
   [0 1 2 3]
   [0 1 2 3]]]]

x = np.arange(20).reshape(5, 4)
print(x)
row, col = np.indices((3, 3))
print(row, col)
x[row, col]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
[[0 0 0]
 [1 1 1]
 [2 2 2]] [[0 1 2]
 [0 1 2]
 [0 1 2]]
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

i, j = np.indices((3,3))
M = 2i + 3j
print(i, j)
print(M)

[[0 0 0]
 [1 1 1]
 [2 2 2]] [[0 1 2]
 [0 1 2]
 [0 1 2]]
[[ 0  3  6]
 [ 2  5  8]
 [ 4  7 10]]

Size of objects in memory

int, floats

import sys
sys.getsizeof(1)

28

sys.getsizeof(10**100)

72

np.dtype(int).itemsize

4

np.dtype(np.int8).itemsize

1

np.dtype(float).itemsize

8

sys.getsizeof(1.0)

24

Python Lists are even larger

sys.getsizeof([1])

64

np.array([1]).nbytes

4

Performance is important

l = list(range(100000))
a = np.arange(100000)

%time np.sum(a ** 2)

Wall time: 0 ns
216474736

%time sum(x**2 for x in l)

Wall time: 29 ms
333328333350000

Useful NumPy Functions

Random

np.random.random(size=2)

array([0.29536398, 0.16094588])

np.random.normal(size=2)

array([-0.39357   , -0.31629262])

np.random.rand(2, 4)

array([[0.48320961, 0.33642111, 0.56741904, 0.04794151],
       [0.38893703, 0.90630365, 0.16101821, 0.74362113]])

arange

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.arange(5,10)

array([5, 6, 7, 8, 9])

np.arange(0, 1, 0.1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

reshape

np.arange(10).reshape(2,5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

np.arange(10).reshape(5,2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

linspace

np.linspace(0,1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

np.linspace(0,1, 20)

array([0., 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421, 0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211, 0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

np.linspace(0,1,20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])