NUMPY(array and matrix) in Machine Learning


NUMPY(array and matrix) in Machine Learning


NumPy is an extension to the Python programming language, adding support for large, multi-dimensional (numerical) arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.


import numpy as np

NumPy’s main object is the homogeneous multidimensional array.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.
In NumPy dimensions are called axes. The number of axes is rank.

For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis. That axis has a length of 3.
In the example pictured below,

[[ 1., 0., 0.],
 [ 0., 1., 2.]]
the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.

NumPy’s array class is called ndarray. It is also known by the alias array.
Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an ndarray object are:
  • ndarray.ndim
  • the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.
  • ndarray.shape
  • the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.
  • ndarray.size
  • the total number of elements of the array. This is equal to the product of the elements of shape.
  • ndarray.dtype
  • an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
  • ndarray.itemsize
  • the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
  • ndarray.data
  • the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>

Create arrays


Create ndarrays from lists.
Note: every element must be the same type (will be converted if possible)


data1 = [1, 2, 3, 4, 5] # list
arr1 = np.array(data1) # 1d array
data2 = [range(1, 5), range(5, 9)] # list of lists
arr2 = np.array(data2) # 2d array
arr2.tolist() # convert array back to list

Create special arrays

np.zeros(10) # numer
np.zeros((3, 6)) # shape
np.ones(10)
np.linspace(0, 1, 5) # 0 to 1 (inclusive) with 5 points
np.logspace(0, 3, 4) # 10^0 to 10^3 (inclusive) with 4 points

arange is like range, except it returns an array (not a list)

int_array = np.arange(5)
float_array = int_array.astype(float)

Examining arrays


Array can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection.
There are three kinds of indexing available, which one occurs depends on obj:
  • field access
  • The simplest case of indexing with N integers returns an array scalar representing the corresponding item.
  • basic slicing
  • Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets), an integer, or a tuple of slice objects and integers.
  • advanced indexing


arr1.dtype # float64
arr2.dtype # int32
arr2.ndim # 2
arr2.shape # (2, 4) - axis 0 is rows, axis 1 is columns
arr2.size # 8 - total number of elements
len(arr2) # 2 - size of first dimension (aka axis)


One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.


>>> a = np.arange(10)**3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
>>> a[ : :-1]                                 # reversed a
array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])
>>> for i in a:
...     print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then
  • x[1,2,...] is equivalent to x[1,2,:,:,:]
  • x[...,3] to x[:,:,:,:,3]
  • x[4,...,5,:] to x[4,:,:,5,:]

Iterating over multidimensional arrays is done with respect to the first axis:

>>> for row in b:
...     print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

If one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:

>>> for element in b.flat:
...     print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43



Reshaping


An array has a shape given by the number of elements along each axis:
The shape of an array can be changed with various commands.


arr = np.arange(10, dtype=float).reshape((2, 5))
print(arr.shape) # (2, 5)
print(arr.reshape(5, 2)) # [[ 0. 1.][ 2. 3.][ 4. 5.][ 6. 7.][ 8. 9.]]
Note that the following three commands all return a modified array, but do not change the original array:

>>> a.ravel()  # returns the array, flattened
array([ 2.,  8.,  0.,  6.,  4.,  5.,  1.,  1.,  8.,  9.,  3.,  6.])
>>> a.reshape(6,2)  # returns the array with a modified shape
array([[ 2.,  8.],
       [ 0.,  6.],
       [ 4.,  5.],
       [ 1.,  1.],
       [ 8.,  9.],
       [ 3.,  6.]])
>>> a.T  # returns the array, transposed
array([[ 2.,  4.,  8.],
       [ 8.,  5.,  9.],
       [ 0.,  1.,  3.],
       [ 6.,  1.,  6.]])
If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

>>> a.reshape(3,-1)
array([[ 2.,  8.,  0.,  6.],
       [ 4.,  5.,  1.,  1.],
       [ 8.,  9.,  3.,  6.]])


Add an axis

The numpy.newaxis object can be used in all slicing operations to create an axis of length one.

a = np.array([0, 1])
a.shape # (2,)
a_col = a[:, np.newaxis]
print(a_col) # [[0][1]]
a_col.shape # (2, 1)
#or
a_col = a[:, None]

Transpose

print(a_col.T) # [[0 1]]

Flatten: always returns a flat copy of the original array

arr_flt = arr.flatten()
arr_flt[0] = 33
print(arr_flt) # [ 33. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
print(arr) # [[ 0. 1. 2. 3. 4.][ 5. 6. 7. 8. 9.]]

Ravel: returns a view of the original array whenever possible.

arr_flt = arr.ravel()
arr_flt[0] = 33
print(arr_flt) # [ 33. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
print(arr) # [[ 33. 1. 2. 3. 4.][ 5. 6. 7. 8. 9.]]

Stack arrays


Several arrays can be stacked together along different axes:

>>> a
array([[ 8.,  8.],
       [ 0.,  0.]])
>>> b
array([[ 1.,  8.],
       [ 0.,  4.]])
>>> np.vstack((a,b))
array([[ 8.,  8.],
       [ 0.,  0.],
       [ 1.,  8.],
       [ 0.,  4.]])
>>> np.hstack((a,b))
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])

Stack flat arrays in columns

a = np.array([0, 1])
b = np.array([2, 3])
ab = np.stack((a, b)).T
print(ab) # [[0 2][1 3]]
# or
np.hstack((a[:, None], b[:, None]))

Selection


Single item


arr = np.arange(10, dtype=float).reshape((2, 5))
arr[0] # 0th element (slices like a list)
arr[0, 3] # row 0, column 3: returns 4
arr[0][3] # alternative syntax

Slicing

Syntax: start:stop:step with start (default 0) stop (default last) step (default 1)


arr[0, :] # row 0
arr[:, 0] # column 0
arr[:, :2] # the first 2 columns
arr[:, 2:] # columns after the 2nd column 
arr2 = arr[:, 1:4] # columns between index 1 (included) and 4 (excluded) print(arr2)

Advanced indexing: Integer and boolean indexing


Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

Integer array indexing


Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.

From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:

>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])

Boolean array indexing


This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators.

To add a constant to all negative elements:

>>> x = np.array([1., -1., -2., 3])

x[x<0]
Out[25]: array([-1., -2.])

>>> x[x < 0] += 20
>>> x
array([  1.,  19.,  18.,   3.])

Vectorized operations



nums = np.arange(5)

nums
Out[38]: array([0, 1, 2, 3, 4])

nums * 10 # multiply each element by 10
Out[39]: array([ 0, 10, 20, 30, 40])

nums = np.sqrt(nums) # square root of each element

nums
Out[41]: array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ])

np.ceil(nums)  # also floor, rint (round to nearest int)
Out[42]: array([ 0.,  1.,  2.,  2.,  2.])

np.isnan(nums) # checks for NaN
Out[43]: array([False, False, False, False, False], dtype=bool)

nums + np.arange(5) # add element-wise
Out[44]: array([ 0.        ,  2.        ,  3.41421356,  4.73205081,  6.        ])

np.maximum(nums, np.array([1, -2, 3, -4, 5])) # compare element-wise
Out[45]: array([ 1.        ,  1.        ,  3.        ,  1.73205081,  5.        ])

# math and stats
rnd = np.random.randn(4, 2) # random normals in 4x2 array rnd.mean()
array([[ 0.08279742,  1.66288772],
       [ 0.85726878, -1.21109599],
       [ 1.42106772,  0.27319021],
       [ 1.21154191, -0.23446712]])

rnd.std()
Out[48]: 0.90325103415683905

rnd.mean()
Out[50]: 0.50789883169757122
rnd.argmin() # index of the mimum element
Out[51]: 3

rnd.sum()
rnd.sum(axis=0) # sum of columns
rnd.sum(axis=1)# sum of rows

# methods for boolean arrays
(rnd > 0).sum() # counts number of positive values
(rnd > 0).any() # checks if any value is True
(rnd > 0).all() # checks if all values are True

np.random.seed(12234)
np.random.rand(2, 3) #2 x 3 matrix generated with values in [ 0 1 ]
Out[54]: 
array([[ 0.00630595,  0.20303476,  0.76478993],
       [ 0.55513384,  0.74358546,  0.93777808]])

np.random.randn(10) # Return 10 samples from the “standard normal” distribution.

np.random.randint(0, 2, 10) # Return 10 random integers from 0 (inclusive) to 2 (exclusive).

Broadcasting

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.

The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2.,  4.,  6.])
We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
  • they are equal, or
  • one of them is 1
When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.
Arrays do not need to have the same number of dimensions.
Lining up the sizes of the trailing axes of these arrays according to the broadcast rules,

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

a = np.array([[ 0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
b = np.array([0, 1, 2])
print(a + b)

[[0 1 2] [10 11 12] [20 21 22] [30 31 32]]

留言

熱門文章