Link Search Menu Expand Document

NumPy, short for Numerical Python, is the fundamental package required for hig performance scientific computing and data analysis.Here are some of the things it provides:

  • ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities
  • Standard mathematical functions for fast operations on entire arrays of data without having to write loops
  • Linear algebra, random number generation, and Fourier transform capabilities
  • Tools for integrating code written in C, C++, and Fortran

The main areas of functionality of this package are:

  • Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations
  • Common array algorithms like sorting, unique, and set operations
  • Efficient descriptive statistics and aggregating/summarizing data
  • Data alignment and relational data manipulations for merging and joining together heterogeneous data sets
  • Expressing conditional logic as array expressions instead of loops with if-elifelse branches
  • Group-wise data manipulations (aggregation, transformation, function application).
import numpy as np

Creating Arrays

Create a list and convert it to a numpy array

mylist = [1, 2, 3]
x = np.array(mylist)
x
array([1, 2, 3])

Or just pass in a list directly

y = np.array([4, 5, 6])
y
array([4, 5, 6])
type(y) 
numpy.ndarray

Pass in a list of lists to create a multidimensional array.

m = np.array([[7, 8, 9], [10, 11, 12]])
m
array([[ 7,  8,  9],
       [10, 11, 12]])

Use the shape method to find the dimensions of the array. (rows, columns)

np.shape(m)
(2, 3)

in ndarrays, all elements must have same datatype; numpy transforms automatically

l = [1, 2.5, "Dog", True] #lists can store different datatypes

for i in l:
    print(type(i))

a = np.array(l) 
print(a)

for i in a:
    print(type(i))
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
['1' '2.5' 'Dog' 'True']
<class 'numpy.str_'>
<class 'numpy.str_'>
<class 'numpy.str_'>
<class 'numpy.str_'>

arange returns evenly spaced values within a given interval.

n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

reshape returns an array with the same data with a new shape.

n = n.reshape(3, 5) # reshape array to be 3x5
n
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

linspace returns evenly spaced numbers over a specified interval.

o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4
o
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

resize changes the shape and size of array in-place.

o.resize(3, 3)
o
array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

ones returns a new array of given shape and type, filled with ones.

np.ones((3, 2))
array([[1., 1.],
       [1., 1.],
       [1., 1.]])

zeros returns a new array of given shape and type, filled with zeros.

np.zeros((2, 3))
array([[0., 0., 0.],
       [0., 0., 0.]])

eye returns a 2-D array with ones on the diagonal and zeros elsewhere.

np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

diag extracts a diagonal or constructs a diagonal array.

print(y)
np.diag(y)
[4 5 6]





array([[4, 0, 0],
       [0, 5, 0],
       [0, 0, 6]])
print(o)
np.diag(o)
[[0.  0.5 1. ]
 [1.5 2.  2.5]
 [3.  3.5 4. ]]





array([0., 2., 4.])

Create an array using repeating list (or use np.tile)

np.array([1, 2, 3] * 3)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])
a = np.array([1, 2, 3])
np.tile(a, 3)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])

Repeat elements of an array using repeat.

np.repeat([1, 2, 3], 3)
array([1, 1, 1, 2, 2, 2, 3, 3, 3])

Combining Arrays

p = np.ones([2, 3], int)
p
array([[1, 1, 1],
       [1, 1, 1]])

Use vstack to stack arrays in sequence vertically (row wise).

np.vstack([p, 2*p])
array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])

Use hstack to stack arrays in sequence horizontally (column wise).

np.hstack([p, 2*p])
array([[1, 1, 1, 2, 2, 2],
       [1, 1, 1, 2, 2, 2]])

Operations

Use +, -, *, / and ** to perform element wise addition, subtraction, multiplication, division and power.

print(x + y) # elementwise addition     [1 2 3] + [4 5 6] = [5  7  9]
print(x - y) # elementwise subtraction  [1 2 3] - [4 5 6] = [-3 -3 -3]
[5 7 9]
[-3 -3 -3]
print(x * y) # elementwise multiplication  [1 2 3] * [4 5 6] = [4  10  18]
print(x / y) # elementwise divison         [1 2 3] / [4 5 6] = [0.25  0.4  0.5]
[ 4 10 18]
[0.25 0.4  0.5 ]
print(x**2) # elementwise power  [1 2 3] ^2 =  [1 4 9]
[1 4 9]
np.sqrt(x)
array([1.        , 1.41421356, 1.73205081])
np.exp(x)
array([ 2.71828183,  7.3890561 , 20.08553692])
np.log(x)
array([0.        , 0.69314718, 1.09861229])
np.ceil(np.log(x))
array([0., 1., 2.])
np.floor(np.log(x))
array([0., 0., 1.])
np.abs(x)
array([1, 2, 3])
np.around([-3.23, -0.76, 1.44, 2.65, ], decimals = 0) #evenly round all elements to the given number of decimals.
array([-3., -1.,  1.,  3.])

Dot Product

print(x, y)
x.dot(y) # dot product  1*4 + 2*5 + 3*6
[1 2 3] [4 5 6]





32
z = np.array([y, y**2])
print(z)
print(np.shape(z))
print(len(z)) # number of rows of array
[[ 4  5  6]
 [16 25 36]]
(2, 3)
2

Let’s look at transposing arrays. Transposing permutes the dimensions of the array.

z.T
array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])
z.T.shape
(3, 2)

Use .dtype to see the data type of the elements in the array.

z.dtype
dtype('int32')

Use .astype to cast to a specific type.

z = z.astype('f')
z.dtype
dtype('float32')

Math Functions

a = np.array([-4, -2, 1, 3, 5])
a.sum()
3
a.max()
5
a.min()
-4
a.mean()
0.6
a.std()
3.2619012860600183
a.var()
10.64
np.percentile(a, 3)
-3.76
Covariance

Covariance is an indicator of the extent to which 2 random variables are dependent on each other. A higher number denotes higher dependency. changes in scale affects covariance.

aa = np.random.random((3, 3))
aa
array([[0.38109917, 0.50598335, 0.31684724],
       [0.19499768, 0.46588364, 0.10965197],
       [0.82519221, 0.92736672, 0.90446043]])
np.cov(aa) #covariance matrix
array([[0.00924947, 0.01778154, 0.00199988],
       [0.01778154, 0.03459402, 0.0048454 ],
       [0.00199988, 0.0048454 , 0.00287463]])
Correlation

Correlation is a statistical measure that indicates how strongly two variables are related. changes in scale does not affect correlation.

np.corrcoef(aa) #correlation matrix
array([[1.        , 0.99405522, 0.38784076],
       [0.99405522, 1.        , 0.48589001],
       [0.38784076, 0.48589001, 1.        ]])

argmax and argmin return the index of the maximum and minimum values in the array.

a.argmax()
4
a.argmin()
0

Indexing / Slicing

s = np.arange(0, 13, 1) ** 2
s
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144],
      dtype=int32)

Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

s[0], s[4], s[-1]
(0, 16, 144)

Use : to indicate a range. array[start:stop]

Leaving start or stop empty will default to the beginning/end of the array.

s[1:5]
array([ 1,  4,  9, 16], dtype=int32)

Use negatives to count from the back.

s[-4:]
array([ 81, 100, 121, 144], dtype=int32)

A second : can be used to indicate step-size. array[start:stop:stepsize]

Here we are starting 5th element from the end, and counting backwards by 3 until the beginning of the array is reached.

s[-5::-3]
array([64, 25,  4], dtype=int32)

Let’s look at a multidimensional array.

r = np.arange(36)
r.resize((6, 6))
r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

Use bracket notation to slice: array[row, column]

r[2, 2]
14

And use : to select a range of rows or columns

r[3, 3:7]
array([21, 22, 23])

Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

r[:2, :-1]
array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])

This is a slice of the last row, and only every other element.

r[-1, 0:-1:2]
array([30, 32, 34])

We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see np.where)

r[r > 30]
array([31, 32, 33, 34, 35])

Here we are assigning all values in the array that are greater than 30 to the value of 30.

r[r > 30] = 30
r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])
mask1 = (r > 5) & (r < 8) #element-wise check if greater 5 and smaller 8 (logical and) 
mask2 = (r > 5) | (r < 8) #element-wise check if greater 5 or smaller 8 (logical or)
mask3 = ~((r > 5) & (r < 8)) #the opposite of mask1
r[mask1]
array([6, 7])
r[mask2]
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 30, 30,
       30, 30])
r[mask3]
array([ 0,  1,  2,  3,  4,  5,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 30, 30, 30, 30])

Copying Data

Be careful with copying and modifying arrays in NumPy!

r2 is a slice of r

r2 = r[:3,:3]
r2
array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

Set this slice’s values to zero ([:] selects the entire array)

r2[:] = 0
r2
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

r has also been changed!

r
array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

To avoid this, use r.copy to create a copy that will not affect the original array

r_copy = r.copy()
r_copy
array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Now when r_copy is modified, r will not be changed.

r_copy[:] = 10
print(r_copy, '\n')
print(r)
[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]] 

[[ 0  0  0  3  4  5]
 [ 0  0  0  9 10 11]
 [ 0  0  0 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 30 30 30 30 30]]

Iterating Over Arrays

Let’s create a new 4 by 3 array of random numbers 0-9.

test = np.random.randint(0,10,(4,3))
test
array([[2, 9, 7],
       [6, 9, 0],
       [8, 4, 0],
       [5, 3, 4]])

Iterate by row:

for row in test:
    print(row)
[2 9 7]
[6 9 0]
[8 4 0]
[5 3 4]

Iterate by index:

for i in range(len(test)):
    print(test[i])
[2 9 7]
[6 9 0]
[8 4 0]
[5 3 4]

Iterate by row and index:

for i, row in enumerate(test):
    print('row', i, 'is', row)
row 0 is [2 9 7]
row 1 is [6 9 0]
row 2 is [8 4 0]
row 3 is [5 3 4]

Use zip to iterate over multiple iterables.

test2 = test**2
test2
array([[ 4, 81, 49],
       [36, 81,  0],
       [64, 16,  0],
       [25,  9, 16]], dtype=int32)
for i, j in zip(test, test2):
    print(i,'+',j,'=',i+j)
[2 9 7] + [ 4 81 49] = [ 6 90 56]
[6 9 0] + [36 81  0] = [42 90  0]
[8 4 0] + [64 16  0] = [72 20  0]
[5 3 4] + [25  9 16] = [30 12 20]

Random Numbers

a = np.random.randint(1,101,10) #creating 10 random integers between 1 (incl.) and 101 (excl.)
a
array([57, 33,  4, 14, 69, 92, 33, 27, 51, 85])
np.random.seed(123) #setting a seed enables reproducibility
a = np.random.randint(1,101,10)
a
array([67, 93, 99, 18, 84, 58, 87, 98, 97, 48])
np.random.normal(5, 2,10) #creating 10 normal disctributed numbers with mean 5 and std 2
array([1.76139987, 2.77207117, 4.10511856, 8.33680322, 4.71325505,
       3.7616182 , 3.46113306, 6.15349204, 5.25305184, 2.39702205])
b = np.arange(1,101) #creating array b from 1 to 100
b
array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100])
np.random.shuffle(b) #randomly shuffle ndarray b
b
array([  5,   3,  96,  64,  73,  34,  37,  25,  67,  89,  17,  39,  30,
         9,   6,   1,  14,  80,  98,  87,  63,  38,  82,  33,  58,  29,
        66,  60,  32,  68,  20,  36,  74,  24,  10,  72, 100,  43,  46,
        47,  84,  75,  40,  95,  22,  12,  99,  88,  81,  61,  90,  97,
        42,  54,  45,  52,  69,  18,  79,  13,  50,  57,  21,  51,  26,
        56,  83,  44,   2,  55,  15,   7,  27,  71,  94,  31,  92,  16,
        19,  78,  23,  11,  59,  91,  76,  65,  70,   4,  41,  77,  35,
        28,  86,  53,  93,   8,  49,  62,  48,  85])
b.sort() #sorting ndarray b again
b[::-1] #sorting in reverse order
array([100,  99,  98,  97,  96,  95,  94,  93,  92,  91,  90,  89,  88,
        87,  86,  85,  84,  83,  82,  81,  80,  79,  78,  77,  76,  75,
        74,  73,  72,  71,  70,  69,  68,  67,  66,  65,  64,  63,  62,
        61,  60,  59,  58,  57,  56,  55,  54,  53,  52,  51,  50,  49,
        48,  47,  46,  45,  44,  43,  42,  41,  40,  39,  38,  37,  36,
        35,  34,  33,  32,  31,  30,  29,  28,  27,  26,  25,  24,  23,
        22,  21,  20,  19,  18,  17,  16,  15,  14,  13,  12,  11,  10,
         9,   8,   7,   6,   5,   4,   3,   2,   1])
np.random.seed(123)
b1 = np.random.choice(b, 100, replace = True) #randomly creating a 100 elements sample of ndarray b with/without replacement  
b1
array([ 67,  93,  99,  18,  84,  58,  87,  98,  97,  48,  74,  33,  47,
        97,  26,  84,  79,  37,  97,  81,  69,  50,  56,  68,   3,  85,
        40,  67,  85,  48,  62,  49,   8, 100,  93,  53,  98,  86,  95,
        28,  35,  98,  77,  41,   4,  70,  65,  76,  35,  59,  11,  23,
        78,  19,  16,  28,  31,  53,  71,  27,  81,   7,  15,  76,  55,
        72,   2,  44,  59,  56,  26,  51,  85,  57,  50,  13,  19,  82,
         2,  52,  45,  49,  57,  92,  50,  87,   4,  68,  12,  22,  90,
        99,   4,  12,   4,  95,   7,  10,  88,  15])
b1.sort() #sorting b1
b1
array([  2,   2,   3,   4,   4,   4,   4,   7,   7,   8,  10,  11,  12,
        12,  13,  15,  15,  16,  18,  19,  19,  22,  23,  26,  26,  27,
        28,  28,  31,  33,  35,  35,  37,  40,  41,  44,  45,  47,  48,
        48,  49,  49,  50,  50,  50,  51,  52,  53,  53,  55,  56,  56,
        57,  57,  58,  59,  59,  62,  65,  67,  67,  68,  68,  69,  70,
        71,  72,  74,  76,  76,  77,  78,  79,  81,  81,  82,  84,  84,
        85,  85,  85,  86,  87,  87,  88,  90,  92,  93,  93,  95,  95,
        97,  97,  97,  98,  98,  98,  99,  99, 100])
np.unique(b1) #unique elements of b1
array([  2,   3,   4,   7,   8,  10,  11,  12,  13,  15,  16,  18,  19,
        22,  23,  26,  27,  28,  31,  33,  35,  37,  40,  41,  44,  45,
        47,  48,  49,  50,  51,  52,  53,  55,  56,  57,  58,  59,  62,
        65,  67,  68,  69,  70,  71,  72,  74,  76,  77,  78,  79,  81,
        82,  84,  85,  86,  87,  88,  90,  92,  93,  95,  97,  98,  99,
       100])
np.array(list(set(b1))) #same
array([  2,   3,   4,   7,   8,  10,  11,  12,  13,  15,  16,  18,  19,
        22,  23,  26,  27,  28,  31,  33,  35,  37,  40,  41,  44,  45,
        47,  48,  49,  50,  51,  52,  53,  55,  56,  57,  58,  59,  62,
        65,  67,  68,  69,  70,  71,  72,  74,  76,  77,  78,  79,  81,
        82,  84,  85,  86,  87,  88,  90,  92,  93,  95,  97,  98,  99,
       100])
np.unique(b1).size #how many unique elements?
66
np.unique(b1, return_index= True, return_counts=True) #.unique()-method is quite informative
(array([  2,   3,   4,   7,   8,  10,  11,  12,  13,  15,  16,  18,  19,
         22,  23,  26,  27,  28,  31,  33,  35,  37,  40,  41,  44,  45,
         47,  48,  49,  50,  51,  52,  53,  55,  56,  57,  58,  59,  62,
         65,  67,  68,  69,  70,  71,  72,  74,  76,  77,  78,  79,  81,
         82,  84,  85,  86,  87,  88,  90,  92,  93,  95,  97,  98,  99,
        100]),
 array([ 0,  2,  3,  7,  9, 10, 11, 12, 14, 15, 17, 18, 19, 21, 22, 23, 25,
        26, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 40, 42, 45, 46, 47, 49,
        50, 52, 54, 55, 57, 58, 59, 61, 63, 64, 65, 66, 67, 68, 70, 71, 72,
        73, 75, 76, 78, 81, 82, 84, 85, 86, 87, 89, 91, 94, 97, 99],
       dtype=int64),
 array([2, 1, 4, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1,
        1, 1, 1, 1, 1, 2, 2, 3, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1,
        1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 3, 1, 2, 1, 1, 1, 2, 2, 3, 3, 2, 1],
       dtype=int64))

Performance

size = 1000000 #number of elements
a = np.arange(size) #ndarray 
l = list(range(size)) #list
%timeit a+2 #ndarray: measuring time for element-wise addition
1.47 ms ± 60.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [i+2 for i in l] #list: measuring time for element-wise addition
64.7 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit a*2 #multiplication
1.68 ms ± 63.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [i*2 for i in l] #multiplication
66.9 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit a**2 #square
1.67 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [i**2 for i in l] #square
251 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit np.sqrt(a) #square root
3.37 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [i**0.5 for i in l] #square root
198 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Case Study Numpy vs. Python Standard Library

%timeit (np.random.randint(1,11,100*10000).reshape(10000,100) == 1).sum(axis = 1).mean()
16.1 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
import random

def simulation(): # using nested loops, if statements and lists
    results = []
    for _ in range(10000):    
        l = []
        for _ in range(100):
            if random.randint(1,10) == 1:
                l.append(True)
            else:
                l.append(False)
        results.append(sum(l))
    return (sum(results) / len(results))
%timeit simulation()
757 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

References

  • Applied data science with python by Michigan university, Coursera
  • Python for data analysis book by O’Reilly
  • Pandas Bootcamp by Udemy

Neural Network - Computer Science Faculty of Shahid Beheshti University. Winter 2023 - Contact us at abtinmahyar@gmail.com