What is NumPy?
NumPy stands for Numerical Python. It’s a multi-dimensional array library. It also has functions for working in domain of linear algebra, Fourier transform, and matrices
Difference between Python Lists and NumPy
NumPy is much faster than Lists performance-wise due to the following reason:
- Lists uses much more memory than NumPy due to Python’s design architecture
- Lists uses non-contiguous memory while NumPy uses contiguous memory.
- Lists allow different data types while NumPy uses signle data type so there is no type check in NumPy
- NumPy has much more features/fuctions than Lists, such as math methods.
Math comparison between Python List and NumPy Array
Math operators are all working with NumPy Array but not List.
import numpy as np
a = np.array([1,2,3,4,5])
print(a ** 2)
print(np.sin(a))
b = [1,2,3,4,5]
b+2
[ 1 4 9 16 25] [ 0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427] ----------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-254-29a854a496cd> in <module> 4 print(np.sin(a)) 5 b = [1,2,3,4,5] ----> 6 b+2 TypeError: can only concatenate list (not "int") to list
Applications of NumPy:
- Mathematics (MATLAB Replacement)
- Plotting (Matplotlib)
- Backend (Pandas, Connect 4, Digital Photography)
- Machine Learning (Tensor is similar to NumPy)
NumPy Basics
import numpy as np
a = np.array([1, 2, 3], dtype='int8')
print(a)
b = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(b)
print(a.ndim)
print(b.ndim)
print(a.shape)
print(b.shape)
print(a.dtype)
print(b.dtype)
print(f'a.itemsize is {a.itemsize}')
print(f'b.itemsize is {b.itemsize}')
print(a.size * a.itemsize)
print(a.nbytes)
print(b.nbytes)
[1 2 3] [[1. 2. 3.] [4. 5. 6.]] 1 2 (3,) (2, 3) int8 float64 a.itemsize is 1 b.itemsize is 8 3 3 48
Accessing/Changing item values for columns, rows
a = np.array([[1,2,3,4,5,6,7], [8,9,10,11,12,13,14]])
print(f'a.shape is {a.shape}\n')
b = [1, 2, [3, 4]]
b[2][1] = 5
print(f'a[1,5] == a[1, -2] result is {a[1,5] == a[1, -2]}\n')
print(f'a[0,:] result is {a[0,:]}\n')
print(f'a[:, 2] result is {a[:, 2]}\n')
print(f'a[0, 1:6:2] result is {a[0, 1:6:2]}\n')
print(f'a[0, 1:-1:2] result is {a[0, 1:-1:2]}\n')
a1 = a2 = a3 = a
a1[1, 5] = 20
print(f'a1 is {a1}\n')
a2[:, 2] = 50
print(f'a2 is {a2}\n')
a3[:, 2] = [1,2]
print(f'a3 is {a3}\n')
a.shape is (2, 7) a[1,5] == a[1, -2] result is True a[0,:] result is [1 2 3 4 5 6 7] a[:, 2] result is [ 3 10] a[0, 1:6:2] result is [2 4 6] a[0, 1:-1:2] result is [2 4 6] a1 is [[ 1 2 3 4 5 6 7] [ 8 9 10 11 12 20 14]] a2 is [[ 1 2 50 4 5 6 7] [ 8 9 50 11 12 20 14]] a3 is [[ 1 2 1 4 5 6 7] [ 8 9 2 11 12 20 14]]
Indexing and Slicing
You can index and slice NumPy arrays in the same ways you can slice Python lists.
data = np.array([1, 2, 3])
print(data[1])
print(data[0:2])
print(data[1:])
print(data[-2:])
2 [1 2] [2 3] [2 3]
You can visualize it this way:
Data Filtering with NumPy Array
NumPy array is very useful to do data filtering using certain conditions.
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a.shape)
print(a[a < 7]) test = (a > 8)
print(test)
print(a[test])
(3, 4) [1 2 3 4 5 6] [[False False False False] [False False False False] [ True True True True]] [ 9 10 11 12]
a = np.array([1, 2, 3, 4, 5, 6, 7])
f = a % 3 == 0
b = a[f]
print(b)
[3 6]
numpy.where()
Return elements chosen from x or y depending on condition.
a = np.arange(10)
print(a)
b = np.where(a < 5, a, a*2)
print(b)
[0 1 2 3 4 5 6 7 8 9] [ 0 1 2 3 4 10 12 14 16 18]
numpy.nonzero()
Return the indices of the elements that are non-zero.
a = np.array([[0, 1, 2], [4, 2, 0], [3, 6, 0]])
a = a[np.nonzero(a)]
print(a)
[1 2 4 2 3 6]
numpy.choose()
Construct an array from an index array and a set of arrays to choose from.
a = [[11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44]]
'''
the 1st element of the result will be the 1st element of the third (2+1) in a - 31
the 2nd element will be the 2nd element of the fourth (3+1) in a - 42
the 3rd will be the 3rd item of the second(1+1) a array - 23
the 4th will be the 4th item of the first(0+1) in a - 14
'''
b = np.choose([2, 3, 1, 0], a)
print(b)
[31 42 23 14]
Create an array from existing data
New array is created by slicing and indexing, np.vstack(), np.hstack(), np.hsplit(), .view(), copy().
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
a1 = a[4:9]
print(a1)
a1 = a[2:,]
print(a1)
[5 6 7 8 9] [ 3 4 5 6 7 8 9 10]
You can also stack two existing arrays, both vertically and horizontally.
a1 = np.array([[1, 1], [2, 2]])
a2 = np.array([[3, 3], [4, 4]])
a3 = np.vstack((a1, a2))
print(a3)
a4 = np.hstack((a1, a2))
print(a4)
[[1 1] [2 2] [3 3] [4 4]] [[1 1 3 3] [2 2 4 4]]
You can split an array into several smaller arrays using hsplit.
x = np.arange(1, 25)
print(f'x is {x}')
x1 = x.reshape(2, 12)
print(f'x1 is {x1}')
x2 = np.hsplit(x1, 3)
print(f'x2 is {x2}')
x3 = np.hsplit(x, (3, 4))
print(f'x3 is {x3}')
x is [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] x1 is [[ 1 2 3 4 5 6 7 8 9 10 11 12] [13 14 15 16 17 18 19 20 21 22 23 24]] x2 is [array([[ 1, 2, 3, 4], [13, 14, 15, 16]]), array([[ 5, 6, 7, 8], [17, 18, 19, 20]]), array([[ 9, 10, 11, 12], [21, 22, 23, 24]])] x3 is [array([1, 2, 3]), array([4]), array([ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])]
Initializing Different Types of Arrays
np.zeros( )
Method zeros(shape) will create an array filled with 0 values with the specified shape. The default dtype is float64.
np.zeros(5)
np.zeros((2,3))
np.zeros((2,3, 4))
array([[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]], [[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]])
np.ones()
np.ones((3,4,5), dtype='int8')
array([[[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]], dtype=int8)
np.arange()
Method arange() will create arrays with regularly incrementing values.
a = np.arange(4)
b = np.arange(2, 10, dtype=float)
c = np.arange(2, 3, 0.1)
print(f'a is {a}')
print(f'b is {b}')
print(f'c is {c}')
a is [0 1 2 3] b is [2. 3. 4. 5. 6. 7. 8. 9.] c is [2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]
np.linspace()
Method linspace() will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values. The advantage of this creation function is that one can guarantee the number of elements and the starting and end point, which arange() generally will not do for arbitrary start, stop, and step values.
a = np.linspace(1., 4., 5)
b = np.linspace(10, 20, 10)
print(a)
print(b)
[1. 1.75 2.5 3.25 4. ] [10. 11.11111111 12.22222222 13.33333333 14.44444444 15.55555556 16.66666667 17.77777778 18.88888889 20. ]
np.indices()
Return an array representing the indices of a grid. Method indices() will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension.
a = np.indices((2,3))
b = np.indices((3,2))
c = np.indices((2, 3, 4))
print(f'a is\n {a}\n')
print(f'b is\n {b}')
print(f'c is\n {c}')
a is [[[0 0 0] [1 1 1]] [[0 1 2] [0 1 2]]] b is [[[0 0] [1 1] [2 2]] [[0 1] [0 1] [0 1]]] c is [[[[0 0 0 0] [0 0 0 0] [0 0 0 0]] [[1 1 1 1] [1 1 1 1] [1 1 1 1]]] [[[0 0 0 0] [1 1 1 1] [2 2 2 2]] [[0 0 0 0] [1 1 1 1] [2 2 2 2]]] [[[0 1 2 3] [0 1 2 3] [0 1 2 3]] [[0 1 2 3] [0 1 2 3] [0 1 2 3]]]]
x = np.arange(20).reshape(5, 4)
print(x)
row, col = np.indices((3, 3))
print(row, col)
x[row, col]
[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19]] [[0 0 0] [1 1 1] [2 2 2]] [[0 1 2] [0 1 2] [0 1 2]]
array([[ 0, 1, 2], [ 4, 5, 6], [ 8, 9, 10]])
i, j = np.indices((3,3))
M = 2i + 3j
print(i, j)
print(M)
[[0 0 0] [1 1 1] [2 2 2]] [[0 1 2] [0 1 2] [0 1 2]] [[ 0 3 6] [ 2 5 8] [ 4 7 10]]
Size of objects in memory
int, floats
import sys
sys.getsizeof(1)
28
sys.getsizeof(10**100)
72
np.dtype(int).itemsize
4
np.dtype(np.int8).itemsize
1
np.dtype(float).itemsize
8
sys.getsizeof(1.0)
24
Python Lists are even larger
sys.getsizeof([1])
64
np.array([1]).nbytes
4
Performance is important
l = list(range(100000))
a = np.arange(100000)
%time np.sum(a ** 2)
Wall time: 0 ns
216474736
%time sum(x**2 for x in l)
Wall time: 29 ms
333328333350000
Useful NumPy Functions
Random
np.random.random(size=2)
array([0.29536398, 0.16094588])
np.random.normal(size=2)
array([-0.39357 , -0.31629262])
np.random.rand(2, 4)
array([[0.48320961, 0.33642111, 0.56741904, 0.04794151], [0.38893703, 0.90630365, 0.16101821, 0.74362113]])
arange
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(5,10)
array([5, 6, 7, 8, 9])
np.arange(0, 1, 0.1)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
reshape
np.arange(10).reshape(2,5)
array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
np.arange(10).reshape(5,2)
array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
linspace
np.linspace(0,1, 5)
array([0. , 0.25, 0.5 , 0.75, 1. ])
np.linspace(0,1, 20)
array([0., 0.05263158, 0.10526316, 0.15789474, 0.21052632, 0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421, 0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211, 0.78947368, 0.84210526, 0.89473684, 0.94736842, 1. ])
np.linspace(0,1,20, False)
array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])