Raspberry Pi: OpenCV


Installing OpenCV


OpenCV


OpenCV的全稱是Open Source Computer Vision Library,是一個跨平台的電腦視覺程式庫。
OpenCV可用於開發即時的圖像處理、電腦視覺以及模式識別程式。
OpenCV用C++語言編寫,它的主要介面也是C++語言,但是依然保留了大量的C語言介面。該庫也有大量的Python, Java and MATLAB/OCTAVE (版本2.5)的介面。

Windows Setup


Installing OpenCV from prebuilt binaries
  • Below Python packages are to be downloaded and installed to their default locations.
    • Python-2.7.x
    • Numpy
    • https://downloads.sourceforge.net/project/numpy/NumPy/1.8.0/numpy-1.8.0-win32-superpack-python2.7.exe Numerical Python. NumPy is a general-purpose array-processing package designed to efficiently manipulate large multi-dimensional arrays of arbitrary records without sacrificing too much speed for small multi-dimensional arrays. NumPy is built on the Numeric code base and adds features introduced by numarray as well as an extended C-API and the ability to create arrays of arbitrary type which also makes NumPy suitable for interfacing with general-purpose data-base applications. Execute the downloaded package, it will find the Python's folder then install NumPy under Python's folder C:\Python27\Lib\site-packages.
    • Matplotlib (Matplotlib is optional, but recommended since we use it a lot in our tutorials)
    • matplotlib strives to produce publication quality 2D graphics for interactive graphing, scientific publishing, user interface development and web application servers targeting multiple user interfaces and hardcopy output formats. There is a 'pylab' mode which emulates matlab graphics. Execute the downloaded package, it will find the Python's folder then install NumPy under Python's folder C:\Python27\Lib\site-packages.
  • Install all packages into their default locations.
  • Python will be installed to C:/Python27/.
  • After installation, open Python IDLE.
  • Enter import numpy and make sure Numpy is working fine.
    
    >>> import numpy as np
    >>> x = np.array([1, 2, 3])
    >>> x
    array([1, 2, 3])
    >>> y = np.arange(10)
    >>> y
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> 
    
  • Download latest OpenCV release from sourceforge site and double-click to extract it.
  • Goto the extracted folder opencv/build/python/2.7
  • Copy cv2.pyd to C:/Python27/lib/site-packages.
  • Open Python IDLE and type following codes in Python terminal.
  • 
    >>> import cv2
    >>> print cv2.__version__
    
    
    If you see the error "ImportError: DLL load failed: %1 不是正確的 Win32 應用程式 。", you may download the incorrect binary for your Python x86/64 system. If you see the error "RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9", you need to download the numPy corresponding to your Python version. In my testing, the following 2 match with each other
    
    >>> import cv2
    >>> print cv2.__version__
    3.2.0
    >>> import numpy as np
    >>> print np.__version__
    1.10.2
    >>> 
    
If the results are printed out without any errors, congratulations !!! You have installed OpenCV-Python successfully.

Open and Show the Image File



>>> import cv2
>>> img=cv2.imread('D:\mini.jpg')
>>> cv2.imshow('image',img)
>>> cv2.waitKey(0)  # the window will close immediately without this line
>>> cv2.destroyAllWindows()

imread( path, flag )


where flag:
  • cv2.IMREAD_COLOR
  • cv2.IMREAD_GRAYSCALE
  • cv2.IMREAD_UNCHANGED

Convert and Save Image Files



>>> import cv2
>>> img=cv2.imread('D:\mini.jpg')
>>> cv2.imwrite('D:\mini.png',img)

Accessing and Modifying pixel values


>>> import cv2
>>> import numpy as np
>>> img=cv2.imread('D:\mini.jpg')
>>> pixel = img[200,200]
>>> print pixel
[ 4  3 23]
>>> img[200,200]=[64,64,64]
>>> print img[200,200]
[64 64 64]

Accessing Image Properties


  • The number of rows, columns and channels
  • 
    >>> print img.shape
    (354, 630, 3)
    
  • Total number of pixels
  • 
    >>> print img.size
    669060
    
  • Image datatype
  • 
    >>> print img.dtype
    uint8
    

Region of Interest (often abbreviated ROI)


Move with (10,10)


>>> square = img[100:(100+50), 50:(50+50)]
>>> img[100+10:(100+50+10), 50+10:(50+50+10)] = square

Splitting and Merging Image Channels


Sometimes you will need to work separately on B,G,R channels of image. Then you need to split the BGR images to single planes. Or another time, you may need to join these individual channels to BGR image. You can do it simply by:


>>> b,g,r = cv2.split(img)
>>> img = cv2.merge((b,g,r))
Warning
cv2.split() is a costly operation (in terms of time). So do it only if you need it. Otherwise go for Numpy indexing.


Storage Requirements

This installation require more space because some packages must be rebuilt using the source code.
You need to use large micro SD card or external storage for this installation.

Raspbian knows which disks to mount at boot time by reading the file-system table ( /etc/fstab ), and we could put our /dev/sda1 in there, but if we start up with two drives plugged in, the wrong one may be selected.
Fortunately, disks (or rather, disk partitions) have unique labels known as UUIDs randomly allocated when the partition is created.
Find them with sudo blkid , which also helpfully tells you the label, if any, that often contains the make and model of external drives, or look in /dev/disk/by-uuid .

For an NTFS-formatted drive, we called sudo nano /etc/fstab and added the following to the end of the file:

/dev/disk/by-uuid/E4EE32B4EE327EBC /media/usb1t ntfs defaults 0 0
This gives the device name (yours will be different, of course), mount point, file-system type, options, and two numeric fields:
  • the first of these should be zero (it relates to the unused dump backup program)
  • the second is the order of check and repair at boot
    1 for the root file system, 2 for other permanently mounted disks for data, and 0 (no check) for all others.
man mount will tell you about possible options. For ex., after creating 4 partitions ,

$ sudo blkid
/dev/sr0: UUID="2016-12-13-15-39-36-00" LABEL="Debian jessie 20161213-13:58" TYPE="iso9660" PTUUID="0eddfb88" PTTYPE="dos"
/dev/loop0: TYPE="squashfs"
/dev/sda1: PARTUUID="60b5ae09-01"
/dev/sda2: PARTUUID="60b5ae09-02"
/dev/sda3: PARTUUID="60b5ae09-03"
/dev/sda4: PARTUUID="60b5ae09-04"

Installing OpenCV 3 on Mac OS


Install Xcode

The easiest method to download Xcode is to open up the App Store application on your desktop, search for “Xcode” in the search bar, and then click the “Get” button.
After installing Xcode you’ll want to open up a terminal and ensure you have accepted the developer license:

sudo xcodebuild -license
By typing 'agree' to accept the terms of the software license agreements.

We also need to install the Apple Command Line Tools. These tools include programs and libraries such as GCC, make, clang, etc. You can use the following command to install the Apple Command Line Tools:

sudo xcode-select --install
Click the “Install” button to continue. The actual installation process should take less than 5 minutes to complete.

Install Homebrew


Reference: https://www.pyimagesearch.com/2016/12/05/macos-install-opencv-3-and-python-3-5/

Homebrew is a package manager for macOS. You can think of Homebrew as the macOS equivalent of Ubuntu/Debian-based apt-get.
Installing Homebrew:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
script執行時會解釋它正在做什麼,並在你確認之前暫停下來。
  • 使用 Homebrew 安裝 Apple 沒有預裝但是你需要的東西
  • brew install wget
  • Homebrew 會將 packages 安裝在他們自己的目錄,然後把檔案 symlink 到 /usr/local 下。
  • 
    $ cd /usr/local
    $ find Cellar
    Cellar/wget/1.16.1
    Cellar/wget/1.16.1/bin/wget
    Cellar/wget/1.16.1/share/man/man1/wget.1
    
    $ ls -l bin
    bin/wget -> ../Cellar/wget/1.16.1/bin/wget
    

Once Homebrew is installed you should make sure the package definitions are up to date by running:

brew update
Set the homebrew path in our ~/.bash_profile file:

# Homebrew
export PATH=/usr/local/bin:$PATH
then,

source ~/.bash_profile

Install Python 3


The system version of Python should serve exactly that — system routines. The system version of Python is located under /usr/bin.
You should install your own version of Python that is independent from the system install.

brew install python python3
After python is installed, check your python version:

python --version
Python 2.7.10

python3 --version
Python 3.6.5

Install OpenCV

We are now ready to install OpenCV 3.

Installing OpenCV 3 with Python 3 bindings via Homebrew

You can see the full listing of options/switches by running brew info opencv3 , the output of which I’ve included below:

opencv: stable 3.4.1 (bottled)
Open source computer vision library
https://opencv.org/
Not installed
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/opencv.rb
==> Dependencies
Build: cmake ✘, pkg-config ✘
Required: eigen ✘, ffmpeg ✘, jpeg ✘, libpng ✘, libtiff ✘, openexr ✘, python ✔, python@2 ✘, numpy ✘, tbb ✘
To start the OpenCV 3 install process, just execute the following command:

brew install opencv3 --with-contrib --with-python3
The install process :

==> Installing dependencies for opencv: eigen, lame, x264, xvid, ffmpeg, jpeg, libpng, libtiff, ilmbase, openexr, python@2, numpy, tbb
==> Installing opencv dependency: eigen
==> Downloading https://homebrew.bintray.com/bottles/eigen-3.3.4.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring eigen-3.3.4.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/eigen/3.3.4: 486 files, 6.5MB
==> Installing opencv dependency: lame
==> Downloading https://homebrew.bintray.com/bottles/lame-3.100.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring lame-3.100.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/lame/3.100: 27 files, 2.1MB
==> Installing opencv dependency: x264
==> Downloading https://homebrew.bintray.com/bottles/x264-r2854.high_sierra.bottle.1.tar.gz
######################################################################## 100.0%
==> Pouring x264-r2854.high_sierra.bottle.1.tar.gz
🍺  /usr/local/Cellar/x264/r2854: 11 files, 3.4MB
==> Installing opencv dependency: xvid
==> Downloading https://homebrew.bintray.com/bottles/xvid-1.3.5.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring xvid-1.3.5.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/xvid/1.3.5: 10 files, 1.2MB
==> Installing opencv dependency: ffmpeg
==> Downloading https://homebrew.bintray.com/bottles/ffmpeg-4.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring ffmpeg-4.0.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/ffmpeg/4.0: 246 files, 49.6MB
==> Installing opencv dependency: jpeg
==> Downloading https://homebrew.bintray.com/bottles/jpeg-9c.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring jpeg-9c.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/jpeg/9c: 21 files, 724.5KB
==> Installing opencv dependency: libpng
==> Downloading https://homebrew.bintray.com/bottles/libpng-1.6.34.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libpng-1.6.34.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/libpng/1.6.34: 26 files, 1.2MB
==> Installing opencv dependency: libtiff
==> Downloading https://homebrew.bintray.com/bottles/libtiff-4.0.9_3.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libtiff-4.0.9_3.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/libtiff/4.0.9_3: 246 files, 3.5MB
==> Installing opencv dependency: ilmbase
==> Downloading https://homebrew.bintray.com/bottles/ilmbase-2.2.1.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring ilmbase-2.2.1.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/ilmbase/2.2.1: 353 files, 5.6MB
==> Installing opencv dependency: openexr
==> Downloading https://homebrew.bintray.com/bottles/openexr-2.2.0_1.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring openexr-2.2.0_1.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/openexr/2.2.0_1: 132 files, 11MB
==> Installing opencv dependency: python@2
==> Downloading https://homebrew.bintray.com/bottles/python@2-2.7.15.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring python@2-2.7.15.high_sierra.bottle.tar.gz
==> /usr/local/Cellar/python@2/2.7.15/bin/python -s setup.py --no-user-cfg install --force --verbose --single-version-externally-managed --record=in
==> /usr/local/Cellar/python@2/2.7.15/bin/python -s setup.py --no-user-cfg install --force --verbose --single-version-externally-managed --record=in
==> /usr/local/Cellar/python@2/2.7.15/bin/python -s setup.py --no-user-cfg install --force --verbose --single-version-externally-managed --record=in
==> Caveats
Pip and setuptools have been installed. To update them
  pip install --upgrade pip setuptools

You can install Python packages with
  pip install 

They will install into the site-package directory
  /usr/local/lib/python2.7/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> Summary
🍺  /usr/local/Cellar/python@2/2.7.15: 4,669 files, 82.7MB
==> Installing opencv dependency: numpy
==> Downloading https://homebrew.bintray.com/bottles/numpy-1.14.3_1.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring numpy-1.14.3_1.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/numpy/1.14.3_1: 939 files, 24.9MB
==> Installing opencv dependency: tbb
==> Downloading https://homebrew.bintray.com/bottles/tbb-2018_U3_1.high_sierra.bottle.1.tar.gz
######################################################################## 100.0%
==> Pouring tbb-2018_U3_1.high_sierra.bottle.1.tar.gz
🍺  /usr/local/Cellar/tbb/2018_U3_1: 131 files, 2.1MB
Warning: opencv: this formula has no --with-contrib option so it will be ignored!
Warning: opencv: this formula has no --with-python3 option so it will be ignored!
==> Installing opencv
==> Downloading https://homebrew.bintray.com/bottles/opencv-3.4.1_5.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring opencv-3.4.1_5.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/opencv/3.4.1_5: 551 files, 97.8MB
Python + OpenCV 3 bindings are now installed:

ls /usr/local/Cellar/opencv/3.4.1_5/lib/py*

/usr/local/Cellar/opencv/3.4.1_5/lib/python2.7:
site-packages

/usr/local/Cellar/opencv/3.4.1_5/lib/python3.6:
site-packages
Verify the openCV:

python3

Python 3.6.5 (default, Apr 25 2018, 14:23:58) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> print(cv2.__version__)
3.4.1



Installing OpenCV 3 on Raspbian Jessie

Installation in Linux



$ sudo apt-get update

The packages can be installed using a terminal and the following commands :
  • [compiler]
  • sudo apt-get install build-essential
  • [required]
  • sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
  • [optional]
  • sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev

Install pip:


$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py

Getting OpenCV Source Code: 3.2.0


sudo install -d /usr/local/src/opencv/build
cd /usr/local/src/opencv/
sudo unzip /home/pi/Downloads/opencv-3.2.0.zip

Building OpenCV from Source Using CMake:
  • Configuring
  • 
    cd /usr/lcal/src/opencv/build
    sudo cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local /usr/local/src/opencv/opencv-3.2.0
    
  • Build
  • sudo make -j7
  • Build
  • sudo make install

opencv_contrib


If you see the following error:

AttributeError: module 'cv2' has no attribute 'xxx'

You may check if it belongs to a "extra" module which has not been put in the main modules.

There is a repository intended for development of so-called "extra" modules, contributed functionality. New modules quite often do not have stable API, and they are not well-tested. Thus, they shouldn't be released as a part of official OpenCV distribution, since the library maintains binary compatibility, and tries to provide decent performance and stability.

So, all the new modules should be developed separately, and published in the opencv_contrib repository at first. Later, when the module matures and gains popularity, it is moved to the central OpenCV repository, and the development team provides production quality support for this module.

You can build the latest OpenCV with the extra modules included for the Raspi again.

cd /usr/local/src/opencv

sudo git clone https://github.com/opencv/opencv.git 
sudo git clone https://github.com/opencv/opencv_contrib.git

cd build
sudo rm -rf *
sudo cmake -DOPENCV_EXTRA_MODULES_PATH=/usr/local/src/opencv/opencv_contrib/modules /usr/local/src/opencv/opencv

sudo make -j2
sudo make install

If you see the building aborted with the following:

[ 97%] Building CXX object modules/python2/CMakeFiles/opencv_python2.dir/__/src2/cv2.cpp.o
...
This is due to the memory was ran out while compiling the cv2.cpp file.
If you compile a single source file that contains many routines, the compiler might run out of memory or swap space.
You need to change the swap file from the default 100M to 1024M.
The memory usage during the compiling:

pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         859          31           1          47          30
Swap:          1023         360         663
pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         858          33           1          47          32
Swap:          1023         358         665
pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         860          31           1          47          30
Swap:          1023         356         667
pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         882          27           1          28          17
Swap:          1023         567         456
pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         865          34           0          38          29
Swap:          1023         537         486

pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         877          32           0          29          22
Swap:          1023         483         540
pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939         864          32           0          42          29
Swap:          1023         161         862

pi@raspberrypi:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:            939          40         669           1         229         848
Swap:          1023         157         866

You can see that the swap space needs at least 600M.

Machine Learning Overview

Training Data

Training data includes several components:
  • A set of training samples
  • Each training sample is a vector of values (in Computer Vision it's sometimes referred to as feature vector). Usually all the vectors have the same number of components (features); OpenCV ml module assumes that. Each feature can be ordered (i.e. its values are floating-point numbers that can be compared with each other and strictly ordered, i.e. sorted) or categorical (i.e. its value belongs to a fixed set of values that can be integers, strings etc.).
  • Optional set of responses corresponding to the samples.
  • Training data with no responses is used in unsupervised learning algorithms that learn structure of the supplied data based on distances between different samples. Training data with responses is used in supervised learning algorithms, which learn the function mapping samples to responses. Usually the responses are scalar values, ordered (when we deal with regression problem) or categorical (when we deal with classification problem; in this case the responses are often called "labels"). Some algorithms, most noticeably Neural networks, can handle not only scalar, but also multi-dimensional or vector responses.
  • Another optional component is the mask of missing measurements.
  • Most algorithms require all the components in all the training samples be valid, but some other algorithms, such as decision tress, can handle the cases of missing measurements.
  • In the case of classification problem, user may want to give different weights to different classes.
  • This is useful, for example, when:
    • user wants to shift prediction accuracy towards lower false-alarm rate or higher hit-rate.
    • user wants to compensate for significantly different amounts of training samples from different classes.
  • Each training sample may be given a weight
  • If user wants the algorithm to pay special attention to certain training samples and adjust the training model accordingly.
  • User may wish not to use the whole training data at once, but rather use parts of it, e.g. to do parameter optimization via cross-validation procedure.
Training data can have rather complex structure; besides, it may be very big and/or not entirely available, so there is need to make abstraction for this concept. In OpenCV ml there is cv::ml::TrainData class for that.

Normal Bayes Classifier

This simple classification model assumes that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.

K-Nearest Neighbors

The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (K) of the nearest neighbors of the sample using voting, calculating weighted sum, and so on. The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.

Support Vector Machines

Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique was extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper- plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined. The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called support vectors, which means that the position of other vectors does not affect the hyper-plane (the decision function).

Decision Trees

A decision tree is a binary tree (tree where each non-leaf node has two child nodes). It can be used either for classification or for regression. For classification, each tree leaf is marked with a class label; multiple leaves may have the same label. For regression, a constant is also assigned to each tree leaf, so the approximation function is piecewise constant.

Predicting with Decision Trees

Training Decision Trees

Variable Importance


Raspberry Pi Computer Vision Programming


Design and implement your own computer vision applications with the Raspberry Pi
by Ashwin Pajankar

Chapter 1: Introduction to Computer Vision and Raspberry Pi


Preparing your Pi for computer vision

Install OpenCV for Python by using the following command:

       sudo apt-get install python-opencv
This is the easiest way to install OpenCV for Python.
However, there is a problem with this. Raspbian repository does not contain the latest OpenCV version.
Another method is to compile OpenCV from the source.

Testing OpenCV installation with Python

On a terminal, type python, and then type the following lines:

>>> import cv2
>>> print cv2.__version__

This will show us the version of OpenCV that was installed on Pi.

NumPy


It is a matrix library for linear algebra.
It adds support for large multidimensional arrays and matrices, along with a large library of high-level mathematical functions that can be used to operate on these arrays.

Array creation


>>> import numpy as np
>>> x=np.array([1,2,3])
>>> x
array([1, 2, 3])
>>> y=arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic operations on arrays

  • linspace(start_num, end_num, count)
  • 
    >>> b=np.linspace(1,16,4)
    >>> b
    array([  1.,   6.,  11.,  16.])
    >>> c=np.linspace(0,1,4)
    >>> c
    array([ 0.        ,  0.33333333,  0.66666667,  1.        ])
    >>>
    
  • square
  • 
    >>> a=np.linspace(0,5,3)
    >>> a
    array([ 0. ,  2.5,  5. ])
    >>> a**2
    array([  0.  ,   6.25,  25.  ])
    >>> 
    
  • Linear algebra
  • 
    >>> a=np.array([[1,2],[3,4]])
    >>> a
    array([[1, 2],
           [3, 4]])
    >>> a.transpose()
    array([[1, 3],
           [2, 4]])
    >>> b=np.array([[5,6],[7,8]])
    >>> b
    array([[5, 6],
           [7, 8]])
    >>> np.dot(a,b)
    array([[19, 22],
           [43, 50]])
    

Indexing


An numpy.ndarray is a (usually fixed-size) multidimensional container of items of the same type and size. ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection.

An associated data-type object describes the format of each element in the ndarray.
The number of dimensions and items in an array is defined by its shape, which is a tuple of N positive integers that specify the sizes of each dimension.
For example, a 2-dimensional array of size 2 x 3, composed of 4-byte integer elements:

>>> x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
>>> x.shape
(2, 3)
>>> x[1, 2] # Pyhton's way: 1 for the second row, 2 for the third column
6
Different ndarrays can share the same data, so that changes made in one ndarray may be visible in another. That is, an ndarray can be a “view” to another ndarray, slicing can produce views of the array:

>>> y = x[:,1]
>>> y
array([2, 5])
>>> y[0] = 9 # this also changes the corresponding element in x
>>> y
array([9, 5])
>>> x
array([[1, 9, 3],
       [4, 5, 6]])


Basic Slicing and Indexing
Basic slicing extends Python’s basic concept of slicing to N dimensions. Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets)

  • The basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (k\neq0).
  • 
    >>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> x[1:7:2]
    array([1, 3, 5])
    
  • Negative i and j are interpreted as n + i and n + j where n is the number of elements in the corresponding dimension.( the idea of ring buffer ) Negative k makes stepping go towards smaller indices.
  • 
    >>> x[-2:10]
    array([8, 9])
    >>> x[-3:3:-1]
    array([7, 6, 5, 4])
    
  • Assume n is the number of elements in the dimension being sliced. Then, if i is not given it defaults to 0 for k > 0 and n - 1 for k < 0 . If j is not given it defaults to n for k > 0 and -n-1 for k < 0 . If k is not given it defaults to 1. Note that :: is the same as : and means select all indices along this axis.
















  • 
    >>> x[5:]
    array([5, 6, 7, 8, 9])
    

















Chapter 2 Working with Images, Webcams and GUI

Working with Images

cv2.imread(file_path, read_flag)

read_flag specifies the mode the image should be read in:
  • cv2.IMREAD_COLOR
  • 1.(default)
  • cv2.IMREAD_GRAYSCALE
  • 0.
  • cv2.IMREAD_UNCHANGED
  • -1.
Ex.,

>>> img=cv2.imread('/home/pi/Downloads/test.jpg',1)
>>> cv2.imshow('test',img)
>>> cv2.waitKey(0)
255
>>> cv2.destroyWindow('test')
We can also create a named window in advance and assign an image to that window later.

cv2.namedWindow('test', cv2.WINDOW_AUTOSIZE)
int cv::waitKey (int delay = 0) : The function waitKey waits for a key event infinitely (when delay eq 0 ) or for delay milliseconds, when it is positive. cv2.waitKey() is a keyboard function, it is the only method to fetch and handle events. We must use it for using cv2.imshow() or no image will be displayed on the screen.

cv2.imwrite(file_path, img)

Ex.,

>>> cv2.imshow('test',img)
>>> key=cv2.waitKey(0)
>>> key
99
>>> ord('c')
99
>>> if key == ord('c'):
...     cv2.imwrite('/home/pi/test_out.jpg',img)
...     cv2.destroyWindow('test')
... 
True
>>> 

Warning: Color image loaded by OpenCV is in BGR mode. But Matplotlib displays in RGB mode. So color images will not be displayed correctly in Matplotlib if image is read with OpenCV.

cv2.waitKey(0) is used to get the key event from the displayed window. The Python's built-in function ord(c) returns the a 8-bits value of a character: Given a string of length one, return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string.

Using matplotlib

It is a 2D plotting library for Python. To install :

sudo apt-get install python3-matplotlib
The python3-matplotlib package is not available on wheezy, but on jessie. You could install it manually:

git clone https://github.com/matplotlib/matplotlib
cd matplotlib
python3 setup.py build
sudo python3 setup.py install
Ex.,

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img=mpimg.imread('/home/pi/png.png')
imgplot=plt.imshow(img)
plt.title('png')
plt.xticks([])
plt.yticks([])
plt.show()

Drawing geometric shapes


import cv2
import numpy as np

# create a 3D array of 0: a black image with dimensions 200 x 200, as (0,0,0) represents the color black:
img = np.zeros((200,200,3), np.uint8)

# draws a line with coordinates (0,199) and (199,0) in red color [(0,0,255) for BGR] with a thickness of 2
cv2.line(img,(0,199),(199,0),(0,0,255),2)

# draws a blue rectangle with (20,20) and (60,60) 
cv2.rectangle(img,(20,20),(60,60),(255,0,0),1)

# draws a green filled circle with (80,80) as center and 10 as radius:
cv2.circle(img,(80,80),10,(0,255,0),-1)

# draws a polygon with four points:
points = np.array([[100,5],[125,30],[175,20],[185,10]], np.int32)
points = points.reshape((-1,1,2))
cv2.polylines(img,[points],True,(255,255,0))

#adds text to the image with (80,180) as the bottom-left corner of the text and HERSHEY_DUPLEX as the font with the size of 1 and color pink
cv2.putText(img,'Test',(80,180), cv2.FONT_HERSHEY_DUPLEX , 1, (255,0,255))

cv2.imshow('Shapes', img)
cv2.waitKey(0)

Working with trackbar and named window

The cv2.createTrackbar() method creates a trackbar and takes the following parameters:
  • Name
  • This refers to the name of the trackbar to be created.
  • Window_name
  • This specifies the name of the named window to be associated with.
  • Value
  • This refers to the initial value of the slider when created.
  • Count
  • This is the maximum value of the slider—the minimum is always 0.
  • Onchange()
  • This function is called when the slider changes position.

import numpy as np
import cv2

def empty(z):
    pass

# Create a black background
image = np.zeros((300,512,3), np.uint8)
cv2.namedWindow('Palette')

# create trackbars for colors and associate those with the created window Pallete
cv2.createTrackbar('B','Palette',0,255,empty)
cv2.createTrackbar('G','Palette',0,255,empty)
cv2.createTrackbar('R','Palette',0,255,empty)

while(True):
    cv2.imshow('Palette',image)
    if cv2.waitKey(1) == 27:
        break

    # fetch the color value
    blue = cv2.getTrackbarPos('B','Palette')
    green = cv2.getTrackbarPos('G','Palette')
    red = cv2.getTrackbarPos('R','Palette')
    image[:] = [blue,green,red]

cv2.destroyWindow('Pallete')
The empty() function does not performe any activity when the slider is changed. The cv2.getTrackbarPos() function returns the current position of the specified trackbar. A color palette based on the positions selected repeatedly until a key is pressed, ending the infinite loop and stopping the program.

Working with a webcam

Rather than using the Raspberry Pi camera module, you can use a standard USB webcam to take pictures and video on the Raspberry Pi. The list of supported webcams by Pi at http://elinux.org/RPi_USB_Webcams. Attach your USB webcam to Raspberry Pi through the USB port on Pi and run the lsusb command to make sure it can be listed. Install the fswebcam utility with the command:

   sudo apt-get install fswebcam
You can use the following command to capture the image:

  fswebcam -r 1280x960 --no-banner ~/book/output/camtest.jpg
  • -r
  • specify a resolution of 1280 x 960.
  • --no-banner
  • disable the timestamp banner
To record live videos using avconv, install it:

  sudo apt-get install avconv
Use the following command to record a video:

  avconv -f video4linux2 -r 25 -s 544x288 -i /dev/video0 ~/book/output/VideoStream.avi
We can play the video using omxplayer.

Working with a USB webcam using OpenCV


import cv2

# initialize the camera
cam = cv2.VideoCapture(1) # if the video device index is 1 for the Webcam
ret, image = cam.read()

if ret:
    cv2.imshow('videoCaptureTest',image)
    cv2.waitKey(0)
    cv2.destroyWindow('videoCaptureTest')
    cv2.imwrite('videoCaptureTest.jpg',image)

# When everything done, release the capture
cam.release()

You can find out the number of cameras and associated device indexes by using the ls -l /dev/video* command. If the image capture is successful, then cam.read() returns True; otherwise, it will return false. To display a live video stream from a webcam:

import cv2

cam = cv2.VideoCapture(1)
print("Default Resolution is %s x %s\n",  str(int(cam.get(3))) ,str(int(cam.get(4))) )
w=1024
h=768
cam.set(3,w)
cam.set(4,h)

print("Now resolution is set to %x x \n", str(w),str(h) )
while(True):
    # Capture frame-by-frame
    ret, frame = cam.read()
    # Display the resulting frame
    cv2.imshow('Video Test',frame)
    # Wait for Escape Key
    if cv2.waitKey(1) == 27 :
        break


# When everything done, release the capture
cam.release()
cv2.destroyAllWindows()    

You can access the features of the video device with cam.get(propertyID). 3 stands for width and 4 stands for height. These properties could be set with cam.set(propertyID, value). To write a video to a file:

 import cv2

cap = cv2.VideoCapture(1)
w=640
h=480
cap.set(3,w)
cap.set(4,h)

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
# frame size in out.write(frame) must be the same as the size argument in the constructor VideoWriter.
out = cv2.VideoWriter('output.avi',fourcc, 20.0, (w,h))

while (cap.isOpened()):
    ret, frame = cap.read()
    if ret == True:
        # write the flipped frame
        out.write(frame)
        cv2.imshow('VideoStream', frame )
        if cv2.waitKey(1) == 27 :
            break 
    else:
        break

# When everything done, release the capture
cap.release()
out.release()
cv2.destroyAllWindows()

cv2.VideoWriter() accepts the following parameters:
  • Filename
  • This refers to the name of the video file.
  • FourCC
  • This is a 4-byte code used to specify the video codec: DIVX, XVID, MJPG, X264, WMV1, WMV2. (XVID is more preferable. MJPG results in high size video. X264 gives very small size video). FourCC code is passed as cv2.VideoWriter_fourcc('M','J','P','G') or cv2.VideoWriter_fourcc(*'MJPG) for MJPG. This function accepts FourCC in *'code' format.
  • Framerate
  • This refers to the rate of frames to be captured per second.
  • Resolution
  • This specifies the resolution of the video to be captured.
The preceding code records the video until the Esc key is pressed and saves it in the specified file.

Using "with as" in Python 3 (2.6)

Context managers allow you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement.

with open('some_file', 'w') as opened_file:
    opened_file.write('Hola!')
The above code opens the file, writes some data to it and then closes it. If an error occurs while writing the data to the file, it tries to close it. The above code is equivalent to:

file = open('some_file', 'w')
try:
    file.write('Hola!')
finally:
    file.close()
The main advantage of using a with statement is that it makes sure our file is closed without paying attention to how the nested block exits.

Working with the Pi camera module

picamera is a python package which provides a programming interface to the Pi camera module. You can install it using
sudo apt-get install python-picamera
To capture a picture,

   import picamera
   import time

   with picamera.PiCamera() as cam:
     cam.resolution=(1024,768)
     cam.start_preview()
     time.sleep(5) # waits for 5 seconds before cam. capture() captures and saves the image in the specified file.
     cam.capture('/home/pi/still.jpg')


   import picamera
   import picamera.array
   import time
   import cv2

   with picamera.PiCamera() as camera:
     rawCap=picamera.array.PiRGBArray(camera)
     camera.start_preview()
     time.sleep(3)
     camera.capture(rawCap,format="bgr")
     image=rawCap.array

   cv2.imshow("Test",image)
   cv2.waitKey(0)
   cv2.destroyAllWindows()

3 Basic Image Processing

Retrieving image properties


   import cv2

   img = cv2.imread('/home/pi/book/test_set/lena_color_512.tif',1)
   print img.shape
   print img.size
   print img.dtype

Arithmetic operations on images

  • Images are represented as matrices in OpenCV
  • Images must be of the same size for you to perform arithmetic operations on the images
  • these operations are performed on individual pixels
  • cv2.add()
  • This function is used to add two images, where the images are passed as parameters.
  • cv2.subtract()
  • This function is used to subtract an image from another.

Blending and transitioning images

The cv2.addWeighted(img1, alpha, img2, beta, gamma) function calculates the weighted sum of two images. The output image value is calculated with the following formula:

Output = (alpha*img1) + (beta*img 2) + gamma
We can create a film-style transition effect on the two images by using the same function.

   import cv2
   import numpy as np
   import time

   img1 = cv2.imread('/home/pi/book/test_set/4.2.03.tiff',1)
   img2 = cv2.imread('/home/pi/book/test_set/4.2.04.tiff',1)
   for i in np.linspace(0,1,40):
     alpha=i
     beta=1-alpha
     print 'ALPHA ='+ str(alpha)+' BETA ='+str (beta)
     cv2.imshow('Image Transition', cv2.addWeighted(img1,alpha,img2,beta,0))
     time.sleep(0.05)
     if cv2.waitKey(1) == 27 :
       break
   cv2.destroyAllWindows()
linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None): Return evenly spaced numbers over a specified interval. Returns num evenly spaced samples, calculated over the interval [start, stop]. The endpoint of the interval can optionally be excluded.

Splitting and merging image colour channels

cv2.split() is used to split an image into three different intensity arrays for each color channel, whereas cv2.merge() is used to merge different arrays into a single multi-channel array, that is, a color image.

   import cv2
   img = cv2.imread('/home/pi/book/test_set/4.2.03.tiff',1)

   b,g,r = cv2.split (img)
   img=cv2.merge((b,g,r))

Creating a negative of an image

A pixel value ranges from 0 to 255, and therefore, negation involves the subtracting of the pixel value from 255.

   import cv2
   img = cv2.imread('/home/pi/book/test_set/4.2.07.tiff')
   grayscale = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
   negative = abs(255-grayscale)

Logical operations on images

OpenCV provides bitwise logical operation functions for images.

   import cv2
   import matplotlib.pyplot as plt

   img1 = cv2.imread('/home/pi/book/test_set/Barcode_Hor.png',0)
   img2 = cv2.imread('/home/pi/book/test_set/Barcode_Ver.png',0)

   not_out=cv2.bitwise_not(img1)
   and_out=cv2.bitwise_and(img1,img2)
   or_out=cv2.bitwise_or(img1,img2)
   xor_out=cv2.bitwise_xor(img1,img2)

   titles = ['Image 1','Image 2','Image 1 NOT','AND','OR','XOR']
   images = [img1,img2,not_out,and_out,or_out,xor_out]
   for i in xrange(6):
       plt.subplot(2,3,i+1)
       plt.imshow(images[i],cmap='gray')
       plt.title(titles[i])
       plt.xticks([]),plt.yticks([])
   plt.show()
The xrange type is an immutable sequence object, range creates a list, The plt.subplot() function is used to display multiple images. Typical call signature:

subplot(nrows, ncols, plot_number)
Where nrows and ncols are used to notionally split the figure into nrows * ncols sub-axes, and plot_number is used to identify the particular subplot that this function is to create within the notional grid. plot_number starts at 1, increments across rows first and has a maximum of nrows * ncols. In the preceding example, we created a grid with 2 rows and 3 columns for our images and displayed each image in every part of the grid. In the case when nrows, ncols and plot_number are all less than 10, a convenience exists, such that the a 3 digit number can be given instead, where the hundreds represent nrows, the tens represent ncols and the units represent plot_number. For instance:

subplot(211)
produces a subaxes in a figure which represents the top plot (i.e. the first) in a 2 row by 1 column notional grid.

Colorspaces, Transformations, and Thresholds

Colorspaces and conversions

  • BGR
  • blue, green, and red. (OpenCV's default colorspace)
  • RGB
  • Grayscale
  • HSV
  • Hue, Saturation, and Value. Hue is expressed as a number representing hues of red, yellow, green, cyan, blue, and magenta. Saturation is the amount of gray in the color. Value works in conjunction with saturation and describes the brightness or intensity of the color.
matplotlib uses the RGB format for images so that we need to convert an image from BGR to RGB colorspace before displaying an image with matplotlib. OpenCV has a function cv2.cvtColor(img,conv_flag) that allows us to change the colorspace of an image (img), while the source and target colorspaces are indicated on the conv_flag parameter. For BGR to Gray conversion we use the flags cv2.COLOR_BGR2GRAY. Similarly for BGR to HSV, we use the flag cv2.COLOR_BGR2HSV. To get other flags, just run following commands in your Python terminal :

flags = [i for i in dir(cv2) if i.startswith('COLOR_')]
  print flags
There are 176 Colorspace Conversion flags in OpenCV.

Tracking in real time based on color

In HSV format, it's much easier to recognize the color range.

  • Hue is expressed as a number from 0 to 360 degrees representing hues of red (which start at 0), yellow (starting at 60), green (starting at 120), cyan (starting at 180), blue (starting at 240) and magenta (starting at 300).
  • Saturation is the amount of gray from zero percent to 100 percent in the color.
  • Value (or brightness) works in conjunction with saturation and describes the brightness or intensity of the color from zero percent to 100 percent.
For HSV, Hue range is [0,179], Saturation range is [0,255] and Value range is [0,255]. Different softwares use different scales. So if you are comparing OpenCV values with them, you need to normalize these ranges. We can use the cv2.inRange() function to check whether the part of that image falls within the HSV color range of our interest.

  dst = cv2.inRange(InputArray src, InputArray lowerb, InputArray upperb)
Checks if array elements lie between the elements of two other arrays. dst is set to 255 (all 1 bits) if src is within the specified 1D, 2D, 3D, ... box and 0 otherwise. You can define the pixel in HSV range for tracking:

  mask = cv2.inRange(hsv_img, lower_hsv, upper_hsv)
If the pixel value falls in the given color range, the corresponding pixel in the output image is 255; otherwise it is 0, thus creating a binary mask. How to find HSV values to track? You can use the function cv2.cvtColor(), you just pass the BGR values you want. For example, to find the HSV value of Green, try following commands in Python terminal:

>>> green = np.uint8([[[0,255,0 ]]])
>>> hsv_green = cv2.cvtColor(green,cv2.COLOR_BGR2HSV)
>>> print hsv_green
[[[ 60 255 255]]]
Now you take [H-10, 100,100] and [H+10, 255, 255] as lower bound and upper bound respectively. We can use bitwise_and() to extract the color range we're interested in using this binary mask thereafter.

import numpy as np
import cv2

img=cv2.imread('/home/pi/png.png')
hsv=cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# blue=np.uint8([[[0,0,255]]])
# hsv_blue=cv2.cvtColor(blue, cv2.COLOR_RGB2HSV)
# print hsv_blue
# [[[120 255 255]]]

lower_blue=np.array([80, 50, 50])
upper_blue=np.array([140, 255, 255])
mask = cv2.inRange(hsv, lower_blue, upper_blue)

cv2.imshow('mask', mask)
cv2.waitKey(0)

tracked=cv2.bitwise_and(img,img,mask=mask)
cv2.imshow('tracked', tracked)
cv2.waitKey(0)

Image transformations

Scaling

Scaling is just resizing of the image. OpenCV comes with a function cv2.resize() for this purpose.


cv2.resize(src, dst [, dsize[, fx[, fy[, interpolation] ] ] ] ) 

  • src
  • input image.
  • dst
  • output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src.
  • dsize
  • – output image size; if it equals zero, it is computed as: dsize = Size(round(fx*src.cols), round(fy*src.rows)) Either dsize or both fx and fy must be non-zero.
  • fx
  • scale factor along the horizontal axis; when it equals 0, it is computed as (double)dsize.width/src.cols
  • fy
  • scale factor along the vertical axis; when it equals 0, it is computed as (double)dsize.height/src.rows
  • interpolation
  • interpolation method:
    • INTER_NEAREST - a nearest-neighbor interpolation
    • INTER_LINEAR - a bilinear interpolation (used by default)
    • INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
    • INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
    • INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
The size of the image can be specified manually, or you can specify the scaling factor. Different interpolation methods are used. Preferable interpolation methods are cv2.INTER_AREA for shrinking and cv2.INTER_CUBIC (slow) & cv2.INTER_LINEAR for zooming. By default, interpolation method used is cv2.INTER_LINEAR for all resizing purposes. You can resize an input image either of following methods:

import cv2
import numpy as np

img = cv2.imread('messi5.jpg')
res = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
#OR
height, width = img.shape[:2]
res = cv2.resize(img,(2*width, 2*height), interpolation = cv2.INTER_CUBIC)
The following example shows the usage for upscaling and downscaling:

   import cv2

   img = cv2.imread('/home/pi/book/test_set/house.tiff',1)
   upscale = cv2.resize(img,None,fx=1.5,fy=1.5,interpolation=cv2.INTER_CUBIC)
   downscale = cv2.resize(img,None,fx=0.5,fy=0.5, interpolation=cv2.INTER_AREA)
   cv2.imshow('upscale',UpScale)
   cv2.waitKey(0)
   cv2.imshow('downscale',DownScale)
   cv2.waitKey(0)
   cv2.destroyAllWindows()

Translation, rotation, and affine transformation

The cv2.warpAffine() function can be used to perform translation, rotation, and affine transformation. OpenCV provides two transformation functions, cv2.warpAffine and cv2.warpPerspective, with which you can have all kinds of transformations. cv2.warpAffine takes a 2x3 transformation matrix while cv2.warpPerspective takes a 3x3 transformation matrix as input.
  • Translation
  • Let the shift in (x,y) be (Tx,Ty), you can create the transformation matrix M as follows:
    
    1 0 Tx
    0 1 Ty
    
    You can make it into a Numpy array of type np.float32 and pass it into cv2.warpAffine() function. See below example for a shift of (100,50):
    
    rows,cols = img.shape
    M = np.float32([[1,0,100],[0,1,50]])
    dst = cv2.warpAffine(img, M ,(cols,rows))
    
    Third argument of the cv2.warpAffine() function is the size of the output image, which should be in the form of **(width, height)**.
  • Rotations
  • We need to define a rotation matrix with the use of cv2.getRotationMatrix2D(), which accepts the center of the rotation, the angle of anti-clockwise rotation(in degrees), and the scale as parameters. The following example rotates the image by 45 degrees with the center of the image as the center of rotation, and scales it down to 50% of the original image:
    
    rows,cols,channel = img.shape
       R = cv2.getRotationMatrix2D((cols/2,rows/2),45,0.5)
       output = cv2.warpAffine(input,R,(cols,rows))
    
  • Affine transformations
  • The affine transformation needs any three non-collinear points (points which are not on the same line) in the original image and the corresponding points in the transformed image. These points are passed as arguments to cv2.getAffineTransform() to get the transformation matrix, and that matrix, in turn, is passed to cv2.warpAffine() as an argument.
    
       points1 = np.float32([[100,100],[300,100],[100,300]])
       points2 = np.float32([[200,150],[400,150],[100,300]])
       A = cv2.getAffineTransform(points1,points2)
       output = cv2.warpAffine(input,A,(cols,rows))
    
  • Perspective transformation
  • When human eyes see near things they look bigger as compare to those who are far away. This is called perspective in a general way
    
    import cv2
       import numpy as np
       from matplotlib import pyplot as plt
       image = cv2.imread('/home/pi/book/test_set/ruler.512.tiff',1)
       #changing the colorspace from  BGR->RGB
       input = cv2.cvtColor(image, cv2.COLOR_BGR2RGB )
       rows,cols,channels = input.shape
       points1 = np.float32([[0,0],[400,0],[0,400],[400,400]])
       points2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
       P = cv2.getPerspectiveTransform(points1,points2)
       output = cv2.warpPerspective(input,P,(300,300))
       plt.subplot(121),plt.imshow(input),plt.title('Input')
       plt.subplot(122),plt.imshow(output),plt.title('Perspective Transform')
       plt.show()
    

Thresholding image

In OpenCV, the cv2.threshold() function is used to threshold images. It takes as input, grayscale image, threshold value, maxVal, and threshold method as parameters, and returns the thresholded image as output. The maxVal parameter is the value assigned to the pixel if the pixel intensity is greater (or less in some methods) than the threshold. There are five threshold methods available in OpenCV:
  • cv2.THRESH_BINARY
  • If intensity(x,y) > thresh, then set intensity(x,y) = maxVal; else set intensity(x,y) = 0.
  • cv2.THRESH_BINARY_INV
  • If intensity(x,y) > thresh, then set intensity(x,y) = 0; else set intensity(x,y) = maxVal.
  • cv2.THRESH_TRUNC
  • If intensity(x,y) > thresh, then set intensity(x,y)=threshold; else leave intensity(x,y) as it is.
  • cv2.THRESH_TOZERO
  • If intensity(x,y) > thresh; then leave intensity(x,y) as it is; else set intensity(x,y) = 0.
  • cv2.THRESH_TOZERO_INV
  • If intensity(x,y) > thresh, then set intensity(x,y) = 0; else leave intensity(x,y) as it is.
Otsu's method
If the image has background and foreground pixels, Otsu's method is the best way to separate these two sets of pixels automatically without specifying the threshold value. This method is applied in addition to other methods and the threshold is passed as 0. Try implementing the following code:

   ret,output=cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

Noise and filter

Kernels

In image processing, a kernel, convolution matrix, or mask is a small matrix used in some image processing operations. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image. Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel. Depending on the element values, a kernel can cause a wide range of effects. One of the main uses of kernels is to apply a low-pass filter to an image. Low-pass filters average out the rapid changes in the intensity of image pixels. This basically smoothens or blurs the image. A simple averaging kernel can be mathematically represented as follows:

 Ones Matrix / rows * cols
We can use the following NumPy code to create a 3x3 averaging kernel:

   K=np.ones((3,3),np.uint32)/9
numpy.ones(shape, dtype=None, order='C') Return a new array of given shape and type, filled with ones.

2D convolution filtering

Each output pixel is altered by contributions from a number of adjoining input pixels. These types of operations are commonly referred to as convolution or spatial convolution. Convolution kernels typically feature an odd number of rows and columns in the form of a square, with a 3 x 3 pixel mask (convolution kernel) being the most common form, but 5 x 5 and 7 x 7 kernels are also frequently employed. OpenCV provides a function cv2.filter2D() to convolve a kernel with an image.

cv2.filter2D(src, ddepth, kernel[, dst[, anchor[, delta[, borderType]]]]) → dst¶
where
  • src
  • input image.
  • ddepth
  • desired depth of the destination image; if it is negative, it will be the same as src.depth(); the following combinations of src.depth() and ddepth are supported:
    • src.depth() = CV_8U, ddepth = -1/CV_16S/CV_32F/CV_64F
    • src.depth() = CV_16U/CV_16S, ddepth = -1/CV_32F/CV_64F
    • src.depth() = CV_32F, ddepth = -1/CV_32F/CV_64F
    • src.depth() = CV_64F, ddepth = -1/CV_64F
  • kernel
  • convolution kernel (or rather a correlation kernel), a single-channel floating point matrix; if you want to apply different kernels to different channels, split the image into separate color planes using split() and process them individually.
  • dst
  • output image of the same size and the same number of channels as src.
  • anchor
  • anchor of the kernel that indicates the relative position of a filtered point within the kernel; the anchor should lie within the kernel; default value (-1,-1) means that the anchor is at the kernel center.
  • delta
  • optional value added to the filtered pixels before storing them in dst.
  • borderType
  • pixel extrapolation method (see borderInterpolate for details).
As an example, we will try an averaging filter on an image. A 5x5 averaging filter kernel is used:

    import cv2
    import numpy as np
    from matplotlib import pyplot as plt
     
    img = cv2.imread('opencv_logo.png')
     
    kernel = np.ones((5,5),np.float32)/25
    dst = cv2.filter2D(img,-1,kernel)
     
    plt.subplot(121),plt.imshow(img),plt.title('Original')
    plt.xticks([]), plt.yticks([])
    plt.subplot(122),plt.imshow(dst),plt.title('Averaging')
    plt.xticks([]), plt.yticks([])
    plt.show()

Low-pass filtering

boxFilter
Blurs an image using the box filter.

dst = cv2.boxFilter(src, ddepth, ksize[, dst[, anchor[, normalize[, borderType]]]])
The cv2.boxFilter() function takes the image, ddepth, and size of the kernel as inputs and blurs the image. We can specify normalize as either true or false. The function smoothes an image using the kernel:
blur
Blurs an image using the normalized box filter.

dst = cv2.blur(src, ksize[, dst[, anchor[, borderType]]])
The call blur(src, dst, ksize, anchor, borderType) is equivalent to boxFilter(src, dst, src.type(), anchor, true, borderType) .
GaussianBlur
Blurs an image using a Gaussian filter.

cv2.GaussianBlur(src, ksize, sigmaX[, dst[, sigmaY[, borderType]]]) 
The function convolves the source image with the specified Gaussian kernel. This filter is highly effective against Gaussian noise.
medianBlur
Blurs an image using the median filter.

dst = cv2.medianBlur(src, ksize[, dst])
Parameters:
  • src
  • input 1-, 3-, or 4-channel image; when ksize is 3 or 5, the image depth should be CV_8U, CV_16U, or CV_32F, for larger aperture sizes, it can only be CV_8U.
  • ksize
  • aperture linear size; it must be odd and greater than 1, for example: 3, 5, 7 ...
  • dst
  • destination array of the same size and type as src.
It calculates the median of all the values under the kernel, and the centre pixel in the kernel is replaced with the calculated medium. In this filter, a window slides along the image, and the median intensity value of the pixels within the window becomes the output intensity of the pixel being processed. It's highly effective against salt-and-pepper noise. The following code introduces salt-and-pepper noise in the image and then applies the cv2.medianBlur() function to that to remove the noise:

import cv2
   import numpy as np
   import random
   from matplotlib import pyplot as plt
   img = cv2.imread('/home/pi/book/test_set/lena_color_512.tif',1)
   input = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
   output = np.zeros(input.shape,np.uint8)
   p = 0.2 # probablity of noise
   for i in range (input.shape[0]):
     for j in range(input.shape[1]):
       r = random.random()
       if r < p/2:
         output[i][j] = 0,0,0
       elif r < p:
         output[i][j] = 255,255,255
       else:
         output[i][j] = input[i][j]
   noise_removed = cv2.medianBlur(output,3)
   plt.subplot(121),plt.imshow(output),plt.title('Noisy Image')
   plt.xticks([]), plt.yticks([])
   plt.subplot(122),plt.imshow(noise_removed),plt.title('Median
   Filtering')
   plt.xticks([]), plt.yticks([])
   plt.show()

Chapter 6 Edges, Circles, and Lines' Detection

High-pass filters

All high-pass filters(HPF) will let high-frequency information like edges to enhance, while restricting low-frequency information (hence, they are called high-pass filters). These filters are also called derivative masks and are widely used in edge detection and extraction algorithms. OpenCV provides three types of gradient filters or High-pass filters:
  • Sobel()
  • Sobel operators is a joint Gausssian smoothing plus differentiation operation, so it is more resistant to noise. You can specify the direction of derivatives to be taken, vertical or horizontal (by the arguments, yorder and xorder respectively). You can also specify the size of kernel by the argument ksize. This is used to detect two kinds of edges in an image:
    • Vertical direction
    • THE VERTICAL MASK OF SOBEL OPERATOR:
      
      -1 0 1
      -2 0 2
      -1 0 1
      
      It simply works like as first order derivate and calculates the difference of pixel intensities in a edge region. As the center column is of zero so it does not include the original values of an image but rather it calculates the difference of right and left pixel values around that edge. Also the center values of both the first and third column is 2 and -2 respectively. This give more weight age to the pixel values around the edge region. This increase the edge intensity and it become enhanced comparatively to the original image.
    • Horizontal direction
    • FOLLOWING IS THE HORIZONTAL MASK OF SOBEL OPERATOR:
      
      -1 -2 -1
      0 0 0
      1 2 1
      
      Above mask will find edges in horizontal direction and it is because that zeros column is in horizontal direction.
    
    Sobel  (  InputArray   src,
      OutputArray   dst,
      int   ddepth,
      int   dx,
      int   dy,
      int   ksize = 3,
      double   scale = 1,
      double   delta = 0,
      int   borderType = BORDER_DEFAULT 
     ) 
    
    
    Parameters:
    • src
    • input image.
    • dst
    • output image of the same size and the same number of channels as src .
    • ddepth
    • output image depth, see combinations; in the case of 8-bit input images it will result in truncated derivatives.
    • dx
    • order of the derivative x.
    • dy
    • order of the derivative y.
    • ksize
    • size of the extended Sobel kernel; it must be 1, 3, 5, or 7.
    • scale
    • optional scale factor for the computed derivative values; by default, no scaling is applied (see cv::getDerivKernels for details).
    • delta
    • optional delta value that is added to the results prior to storing them in dst.
    • borderType
    • pixel extrapolation method, see cv::BorderTypes
    In all cases except one, the ksize×ksize separable kernel is used to calculate the derivative.
  • Laplacian()
  • Laplacian Operator is also a derivative operator which is used to find edges in an image, but, Laplacian is a second order derivative mask. In this mask we have two further classifications one is Positive Laplacian Operator and other is Negative Laplacian Operator.
    • Positive Laplacian Operator
    • Positive Laplacian Operator is use to take out outward edges in an image.
      
      0 1 0
      1 -4 1
      0 1 0
      
    • Negative Laplacian Operator
    • Negative Laplacian operator is use to take out inward edges in an image.
      
      0 -1 0
      -1 4 -1
      0 -1 0
      
    Laplacian is a derivative operator; its uses highlight gray level discontinuities in an image and try to deemphasize regions with slowly varying grey levels. Remember we can’t apply both the positive and negative Laplacian operator on the same image. we have to apply just one but the thing to remember is that if we apply positive Laplacian operator on the image then we subtract the resultant image from the original image to get the sharpened image. Similarly if we apply negative Laplacian operator then we have to add the resultant image onto original image to get the sharpened image.
  • Scharr()
  • The function computes the first x- or y- spatial image derivative using the Scharr operator. The call
    
    Scharr(src, dst, ddepth, dx, dy, scale, delta, borderType)
    
    is equivalent to
    
    Sobel(src, dst, ddepth, dx, dy, CV\_SCHARR, scale, delta, borderType).
    


import cv2
import matplotlib.pyplot as plt


img=cv2.imread('grid.jpg',1)

laplacian = cv2.Laplacian(img,ddepth=cv2.CV_32F, ksize=17,scale=1,delta=0,borderType=cv2.BORDER_DEFAULT)
sobel = cv2.Sobel(img,ddepth=cv2.CV_32F,dx=1,dy=0, ksize=11,scale=1,delta=0,borderType=cv2.BORDER_DEFAULT)
scharr = cv2.Scharr(img,ddepth=cv2.CV_32F,dx=1,dy=0,scale=1,delta=0,borderType=cv2.BORDER_DEFAULT)
sobelx = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=7)
sobely = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=7)
    
images=[img,laplacian,sobel,scharr,sobelx,sobely]
titles=['Original','Laplacian','Sobel','Scharr', 'Sobel-x','Sobel-y']

for i in range(6):
    plt.subplot(3,2,i+1)
    plt.imshow(images[i],cmap = 'gray')
    plt.title(titles[i]), plt.xticks([]), plt.yticks([])

plt.show()

Canny Edge detector

The Canny Edge detector is a multistage edge detection method developed by John Canny:
  • Noise Reduction
  • Since edge detection is susceptible to noise in the image, first step is to remove the noise in the image with a 5x5 Gaussian kernel/filter.
  • Finding Intensity Gradient of the Image
  • Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to get first derivative in horizontal direction(Gx) and vertical direction (Gy). From these two images, we can find edge gradient and direction for each pixel. Gradient direction is always perpendicular to edges. It is rounded to one of four angles representing vertical, horizontal and two diagonal directions.
  • Non-maximum Suppression
  • After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a local maximum in its neighborhood in the direction of gradient.
  • Hysteresis Thresholding
  • This stage decides which are all edges are really edges and which are not. For this, we need two threshold values, minVal and maxVal. Any edges with intensity gradient more than maxVal are sure to be edges and those below minVal are sure to be non-edges, so discarded. Those who lie between these two thresholds are classified edges or non-edges based on their connectivity. If they are connected to "sure-edge" pixels, they are considered to be part of edges. Otherwise, they are also discarded. See the image below: A is considered as "sure-edge", C is considered as valid edge, B is discarded.
OpenCV puts all the above in single function, cv2.Canny(). The following parameters are usually passed to cv2.Canny() :
  • image
  • 8-bit input image.
  • threshold1
  • first threshold for the hysteresis procedure.
  • threshold2
  • second threshold for the hysteresis procedure.
  • apertureSize
  • aperture size for the Sobel operator.
  • L2gradient
  • a flag, indicating whether a more accurate L2 norm should be used to calculate the image gradient magnitude ( L2gradient=true ), or whether the default L1 norm is enough ( L2gradient=false ).
The function will return a set with the detected edges; single channels 8-bit image, which has the same size as image .

import cv2
import matplotlib.pyplot as plt
img = cv2.imread('/home/jerry/grid.jpg',0)
edges1 = cv2.Canny(img,100,200,L2gradient=False)
edges2 = cv2.Canny(img,100,200,L2gradient=True)
images = [img,edges1,edges2]
titles = ['Original','L1 Gradient','L2 Gradient']
for i in range(3):
    plt.subplot(1,3,i+1)
    plt.imshow(images[i],cmap = 'gray')
    plt.title(titles[i]),
    plt.xticks([]), plt.yticks([])

plt.show()

Hough circle and line transforms

OpenCV has cv2.HoughCircles() to detect the circle feature in an image, and it returns the circles in the images in the form of a vector (x, y, radius).

import cv2
import numpy as np

img = cv2.imread('/home/jerry/opencv.png',0) # Load an image
img = cv2.medianBlur(img,5) #  to reduce noise and avoid false circle detection
cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR) # Convert it to grayscale

circles = cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,20,
                            param1=50,param2=30,minRadius=0,maxRadius=0)

circles = np.uint16(np.around(circles))
for i in circles[0,:]:
    # draw the outer circle
    cv2.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
    # draw the center of the circle
    cv2.circle(cimg,(i[0],i[1]),2,(0,0,255),3)

cv2.imshow('detected circles',cimg)
cv2.waitKey(0)
cv2.destroyAllWindows()
To apply Hough Circle Transform with the arguments:
  • src_gray
  • An 8-bit single-channel grayscale input image
  • CV_HOUGH_GRADIENT
  • Define the detection method. Currently this is the only one available in OpenCV
  • dp
  • The inverse ratio of resolution, dp = (image resolution)/(accumulator resolution)
  • min_dist
  • Minimum distance between the center of detected circles
  • param_1
  • Upper threshold for the internal Canny edge detector
  • param_2
  • Threshold for center detection.
  • min_radius
  • Minimum radius to be detected. If unknown, put zero as default.
  • max_radius
  • Maximum radius to be detected. If unknown, put zero as default
This function returns a vector of circles that stores sets of 3 values: (x, y, r) for each detected circle. OpenCV also has a cv2.HoughLines() function to find the lines. To use Hough Lines Transform, processed image should be binary. But we would like to search for the straight lines on an original, color image. Therefore, probably the most common solution is to firstly grayscale the image and then to detect edges. Such mask of edges can be then fetched to the Hough Lines method which should output a set of straight lines found on an image. A line can be represented as y = mx+c (Cartesan coordinate system) or in parametric form (Polar coordinate system), as

     (r, theta)
where r is the perpendicular distance from origin to the line, and theta is the angle formed by this perpendicular line and horizontal axis measured in counter-clockwise (in OPenCV).

  y * sin(theta) + x * cos(theta) = d
Therefore, with a known (d , theta), you can draw a line for a given range for x ( y can be calculated by the above formula). [reference] we can draw such point in (ρ,θ) coordinates which will be later called a Hough space. Now, in the image space, we are drawing other lines which are intersecting at one common point. Let's see what points will be produced in Hough space which are corresponding to these lines. It turns out that these points in (ρ,θ) space are forming a sinusoid. Finally, maybe the most interesting effect. If we draw points which form a line in the image space, we will obtain a bunch of sinusoids in the Hough space. But, magically, they are intersecting at exactly one point! It means that, to identify candidates for being a straight line, we should seek for intersections in Hough space. Now let’s see how Hough Transform works for lines. Any line can be represented in these two terms, (d, theta). So first it creates a 2D array or accumulator (to hold values of two parameters) and it is set to 0 initially. Let rows denote the d and columns denote the theta. Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1 degree, you need 180 columns. For r, the maximum distance possible is the diagonal length of the image. So taking one pixel accuracy, number of rows can be diagonal length of the image.

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2

def draw_lines(img, houghLines, color=[0, 255, 0], thickness=2):
    for line in houghLines:
        for rho,theta in line:
            a = np.cos(theta)
            b = np.sin(theta)
            x0 = a*rho
            y0 = b*rho
            x1 = int(x0 + 1000*(-b))
            y1 = int(y0 + 1000*(a))
            x2 = int(x0 - 1000*(-b))
            y2 = int(y0 - 1000*(a))
 
            cv2.line(img,(x1,y1),(x2,y2),color,thickness)   
                
 
def weighted_img(img, initial_img, α=0.8, β=1., λ=0.):
    return cv2.addWeighted(initial_img, α, img, β, λ)    

 
image = mpimg.imread("licensePlate.jpg")
gray_image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
blurred_image = cv2.GaussianBlur(gray_image, (9, 9), 0)
edges_image = cv2.Canny(blurred_image, 50, 120)
   
rho_resolution = 1
theta_resolution = np.pi/180
threshold = 155
 
hough_lines = cv2.HoughLines(edges_image, rho_resolution , theta_resolution , threshold)
 
hough_lines_image = np.zeros_like(image)
draw_lines(hough_lines_image, hough_lines)
original_image_with_hough_lines = weighted_img(hough_lines_image,image)
 
plt.figure(figsize = (30,20))
plt.subplot(131)
plt.imshow(image)
plt.subplot(132)
plt.imshow(edges_image, cmap='gray')
plt.subplot(133)
plt.imshow(original_image_with_hough_lines, cmap='gray') 
plt.show()

It's worth noting that in OpenCV there exists another version of the function to find Hough Lines. It's named HoughLinesP. P suffix stands for probabilistic here. It doesn’t take all the points into consideration, instead take only a random subset of points and that is sufficient for line detection. Just we have to decrease the threshold. The Hough transform functions have to be tuned for the given sample set. So, if you cannot see any circles and lines in your video or if there are a lot of false positives (that is, the programs detect circles and lines even when they are not present in the input frame), you might want to play a bit with the parameters to tune them according to your sample input to get the desired results. OpenCV implementation is based on Robust Detection of Lines Using the Progressive Probabilistic Hough Transform by Matas, J. and Galambos, C. and Kittler, J.V.. The function used is cv2.HoughLinesP(). It has two new arguments.
  • minLineLength
  • Minimum length of line. Line segments shorter than this are rejected.
  • maxLineGap
  • Maximum allowed gap between line segments to treat them as single line.
Best thing is that, it directly returns the two endpoints of lines.

import cv2
import numpy as np

img = cv2.imread('dave.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength = 100
maxLineGap = 10
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x1,y1,x2,y2 in lines[0]:
    cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2)

cv2.imwrite('houghlines5.jpg',img)

Chapter 7 Image Restoration, Quantization, and Depth Map

Restoring images using inpainting

Image restoration is the process of reconstructing the damaged parts of an image. OpenCV offers two of these with its cv2.inpaint() function. It accepts a source image, an inpaint mask that is a grayscale image representation of the damaged area where the nonzero (white) pixels denote the area to be inpainted, an inpaint neighborhood side, and an algorithm ( cv2.INPAINT_TELEA, cv2.INPAINT_NS )that has to be applied as parameters. The function then returns the inpainted image. We need to create a mask of same size as that of input image, where non-zero pixels corresponds to the area which is to be inpainted.

import numpy as np
import cv2
img = cv2.imread('messi_2.jpg')
mask = cv2.imread('mask2.png',0)
dst = cv2.inpaint(img,mask,3,cv2.INPAINT_TELEA)
cv2.imshow('dst',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()

Image segmentation

Image segmentation is the process of dividing images into multiple, relevant sections or parts based on some criteria. Thresholding the image can be considered the simplest form of segmentation.
Mean shift algorithm based segmentation
PyMeanShift is a Python module/extension for segmenting images using the mean shift algorithm. The mean shift algorithm and its C++ implementation are by Chris M. Christoudias and Bogdan Georgescu. The PyMeanShift extension provides a Python interface to the meanshift C++ implementation using Numpy arrays. Installation instructions :
  • Download the latest version from https://github.com/fjean/pymeanshift/archive/master.zip
  • Decompress the file then run the following commands to build and install it
  • sudo ./setup.py build sudo ./setup.py install
  • vefify the installation
  • import pymeanshift as pms
If you see the build error "fatal error: Python.h: No such file or directory", you deed to install phytho3-devel package.

   sudo dnf install python3-devel
An example,

import cv2
import pymeanshift as pms
from matplotlib import pyplot as plt

original_image = cv2.imread("licensePlate.jpg")
#changing the colorspace from BGR->RGB
input_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB )

(segmented_image, labels_image, number_regions) = pms.segment(input_image, spatial_radius=6, range_radius=4.5, min_density=50)

plt.subplot(131),plt.imshow(input_image),plt.title('input_image')
plt.xticks([]),plt.yticks([])
plt.subplot(132),plt.imshow(segmented_image),plt.title(
'Segmented Output')
plt.xticks([]),plt.yticks([])
plt.subplot(133),plt.imshow(labels_image),plt.title(
'Labeled Output')
plt.xticks([]),plt.yticks([])
plt.show()

K-means clustering and image quantization
The k-means clustering algorithm is a quantization algorithm that maps sets of values within a range into a cluster determined by a value (mean). It basically divides a given set of n values into k partitions. This is called clustering when it's applied on data with two or more dimensions. OpenCV has cv2.kmeans() for the implementation of the k-means algorithm. It accepts the following input parameters:
  • samples
  • This is the data that has to be clustered. If we provide an image, the output will be a quantized (segmented) image. It should be of np.float32 data type, and each feature should be put in a single column.
  • nclusters(K)
  • This is the number of clusters(partitions) in the output set (it is the number of colors in the output if the input is an image).
  • criteria
  • It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are ( type, max_iter, epsilon ):
    • type of termination criteria
    • It has 3 flags as below: cv2.TERM_CRITERIA_EPS - stop the algorithm iteration if specified accuracy, epsilon, is reached. cv2.TERM_CRITERIA_MAX_ITER - stop the algorithm after the specified number of iterations, max_iter. cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER - stop the iteration when any of the above condition is met.
    • max_iter
    • An integer specifying maximum number of iterations.
    • epsilon
    • Required accuracy
  • attempts
  • Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output.
  • flags
  • This flag is used to specify how initial centers are taken. Normally two flags are used for this : cv2.KMEANS_PP_CENTERS and cv2.KMEANS_RANDOM_CENTERS.
Output parameters
  • compactness
  • It is the sum of squared distance from each point to their corresponding centers.
  • labels
  • This is the label array where each element marked ‘0’, ‘1’.....
  • centers
  • This is array of centers of clusters.
Consider data with Only One Feature,

import numpy as np
import cv2
from matplotlib import pyplot as plt

x = np.random.randint(25,100,25)
y = np.random.randint(175,255,25)
z = np.hstack((x,y)) # Stack arrays in sequence horizontally (column wise).
z = z.reshape((50,1))
z = np.float32(z)
plt.hist(z,256,[0,256])
plt.show()
So we have ‘z’ which is an array of size 50,
  • values ranging from 0 to 255
  • 25 values are in the range [25 100] and other values are in the range [175 255]
  • 'z' is reshaped to a column vector for 1 feature
Now we apply the KMeans function.

# whenever 10 (max_iter) iterations of algorithm is ran, or an accuracy of 1.0 (epsilon)is reached, stop the algorithm and return the answer
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS

# Apply KMeans
compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)
In this case, we got 2 centers.
>>> centers
array([[  63.47999954],
       [ 213.24000549]], dtype=float32)
Labels will have the same size as that of test data where each data will be labelled as ‘0’,‘1’,‘2’ etc. depending on their centroids. Now we split the data to different clusters depending on their labels. Now we split the data to different clusters depending on their labels.

A = z[labels==0]
B = z[labels==1]
Now we plot A in Red color and B in Blue color and their centroids in Yellow color.

# Now plot 'A' in red, 'B' in blue, 'centers' in green
plt.hist(A,256,[0,256],color = 'r')
plt.hist(B,256,[0,256],color = 'b')
plt.hist(centers,32,[0,256],color = 'g')
plt.show()
Consider data with Multiple Features, each feature is arranged in a column, while each row corresponds to an input test sample. Here is an example for 2 features

import numpy as np
import cv2
from matplotlib import pyplot as plt

# create two 25 x 2  matrixes
X = np.random.randint(25,50,(25,2))
Y = np.random.randint(60,85,(25,2))
Z = np.vstack((X,Y))

# convert to np.float32
Z = np.float32(Z)

# define criteria and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret,label,center=cv2.kmeans(Z,2,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)

# Now separate the data. Note the ravel() return a contiguous flattened (1-D) array.
A = Z[label.ravel()==0]
B = Z[label.ravel()==1]

# Plot the data. Make a scatter plot of (x,y).
plt.scatter(A[:,0],A[:,1])
plt.scatter(B[:,0],B[:,1],c = 'r')
plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's')
plt.xlabel('Height'),plt.ylabel('Weight')
plt.show()
In this case, we got 2 centers.
>>> center
array([[ 72.48000336,  72.31999969],
       [ 36.15999985,  34.91999817]], dtype=float32)
Color Quantization is the process of reducing number of colors in an image. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image).

import cv2
import numpy as np
import matplotlib.pyplot as plt

image=cv2.imread('licensePlate.jpg')
input = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
Z=input.reshape((-1,3))
Z=np.float32(Z)
criteria=(cv2.TERM_CRITERIA_EPS+ cv2.TERM_CRITERIA_MAX_ITER,10,1.0)

K=2
ret,label1,center1=cv2.kmeans(Z,K, None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center1=np.uint8(center1)
res1=center1[label1.flatten()]
output1=res1.reshape((image.shape))

K=4
ret,label2,center2=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center2=np.uint8(center2)
res2=center2[label2.flatten()]
output2=res2.reshape((image.shape))

K=8
ret,label3,center3=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)

# Now convert back into uint8, and make original image
center3=np.uint8(center3)
res3=center3[label3.flatten()]
output3=res3.reshape((image.shape))
titles=['Original','K=2','K=4','K=8']
output=[input,output1,output2,output3]
for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(output[i]),plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
    
plt.show()
  • K=2
  • center1: array([[ 167.2142334 , 186.80093384, 219.57791138], [ 29.40773582, 31.39480972, 47.05055237]], dtype=float32)
  • K=4
  • center2: array([[ 62, 69, 97], [131, 149, 184], [ 21, 21, 34], [195, 216, 247]], dtype=uint8)
  • K=8
  • center3: array([[228, 239, 250], [116, 130, 158], [ 86, 94, 121], [ 55, 61, 91], [ 31, 34, 55], [167, 196, 244], [140, 160, 197], [ 17, 17, 26]], dtype=uint8)

Disparity map and depth estimation

Disparity refers to the difference in the location of an object in the corresponding two (left and right) images as seen by the left and right eye, which is created due to a parallax. Our brain uses this disparity to estimate the depth information from the pair of two-dimensional images. In biology, this is called stereoscopic vision. OpenCV provides the cv2.StereoBM.compute() function, which takes the left image and the right image as a parameter and returns the disparity map of the image pair.

import numpy as np
import cv2
from matplotlib import pyplot as plt

# Load the left and right images in gray scale
imgL = cv2.imread('tsukuba_l.png',0)
imgR = cv2.imread('tsukuba_r.png',0)
# Initialize the stereo block matching object 
stereo = cv2.StereoBM_create(numDisparities=32, blockSize=13)
# Compute the disparity image
disparity = stereo.compute(imgL,imgR)

titles=['Left','Right','Depth Map']
output=[imgL,imgR,disparity]
for i in xrange(3):
 plt.subplot(1,3,i+1),plt.imshow(output[i],cmap='gray')
 plt.title(titles[i])
 plt.xticks([]),plt.yticks([])
plt.show()

Chapter 8 Histograms, Contours, Morphological Transformations, and Performance Measurement

Image histograms

A histogram is a way to graphically represent the distribution of data. the histogram of an image is a graphical representation of the distribution of color or luminance variance in an image. Both OpenCV and Numpy come with in-built function for histogram. Matplotlib comes with a histogram plotting function : matplotlib.pyplot.hist()
matplotlib.pyplot.hist(
    x, bins=None, range=None, normed=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, hold=None, data=None, **kwargs
  )
Parameters:
  • x
  • Input values, this takes either a single array or a sequency of arrays which are not required to be of the same length
  • bins
  • integer or array_like or ‘auto’. If an integer is given, bins + 1 bin edges are returned. Unequally spaced bins are supported if bins is a sequence.
  • range
  • The lower and upper range of the bins. It is the range of intensity values you want to measure. If not provided, range is (x.min(), x.max()). Values outside the range are ignored.

import cv2
import matplotlib.pyplot as plt
img = cv2.imread('/home/pi/book/test_set/4.1.08.tiff',0)
plt.hist(img.ravel(),256,[0,256])
plt.show()
.ravel(), is an attribute to numpy matrices, which can be used to faltten the src Matrix, there are other similar APIs' as well which can be used for this purpose such as : .flatten(), .reshape() The NumPy library also has an np.histogram() histogram function that can be used to compute the histogram of a set of data.

 hist, bin_edges = numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)
OpenCV also has a function to compute histograms for color images.

  cv2.calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]])
  • images : it is the source image of type uint8 or float32. it should be given in square brackets, ie, "[img]".
  • channels : it is also given in square brackets. It is the index of channel for which we calculate histogram. For example, if input is grayscale image, its value is [0]. For color image, you can pass [0], [1] or [2] to calculate histogram of blue, green or red channel respectively.
  • mask : mask image. To find histogram of full image, it is given as "None". But if you want to find histogram of particular region of image, you have to create a mask image for that and give it as mask. (I will show an example later.)
  • histSize : this represents our BIN count. Need to be given in square brackets. For full scale, we pass [256].
  • ranges : this is our RANGE. Normally, it is [0,256].
The following example shows its usage by plotting a histogram for each channel (red, green, and blue):

import cv2
from matplotlib import pyplot as plt

img = cv2.imread('building.jpg',1)
input=cv2.cvtColor(img,cv2.COLOR_RGB2BGR)
histr_RED = cv2.calcHist([input],[0],None,[256],[0,256])
histr_GREEN = cv2.calcHist([input],[1],None,[256],[0,256])
histr_BLUE = cv2.calcHist([input],[2],None,[256],[0,256])
plt.subplot(221),plt.imshow(input),plt.title('Original
Image'),plt.xticks([]),plt.yticks([])
plt.subplot(222),plt.plot(histr_RED,color='r'),
plt.title('Red'), plt.xlim([0,256]), plt.yticks([])
plt.subplot(223),plt.plot(histr_GREEN,color='g'), plt.title('Green'),
plt.xlim([0,256]), plt.yticks([])
plt.subplot(224),plt.plot(histr_BLUE,color='b'), plt.title('Blue'),
plt.xlim([0,256]), plt.yticks([])
plt.show()

Image contours

A contour is a curve joining all the continuous points along the boundary with the same color value. Contours are often obtained from edges, but they are aimed to be object contours For better accuracy, use binary images. So before finding contours, apply threshold or canny edge detection. In OpenCV, finding contours is like finding white object from black background. So remember, object to be found should be white and background should be black.

import numpy as np
import cv2
import matplotlib.pyplot as plt

im = cv2.imread('licensePlate.jpg')
imgray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(imgray, 127, 255, 0)
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

cv2.drawContours(im, contours, -1, (0,255,0), 3)
plt.imshow(im)
plt.title('Contours')
plt.xticks([])
plt.yticks([])
plt.show()

If you need to draw a specific contour,then you can use cv2.drawContours by specifying the contour index.

  cv2.drawContours(im, contours, 100, (0,255,0), 3)

Morphological transformations on image

Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play.
  • Erosion(侵蝕)
  • The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground object (Always try to keep foreground in white). So what it does? The kernel slides through the image (as in 2D convolution). A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is eroded (made to zero). So what happends is that, all the pixels near boundary will be discarded depending upon the size of kernel. So the thickness or size of the foreground object decreases or simply white region decreases in the image. It is useful for removing small white noises , detach two connected objects etc.
  • Dilation(擴張)
  • It is just opposite of erosion. Here, a pixel element is ‘1’ if atleast one pixel under the kernel is ‘1’. So it increases the white region in the image or size of foreground object increases. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object. So we dilate it. Since noise is gone, they won’t come back, but our object area increases. It is also useful in joining broken parts of an object.
  • morphological Opening
  • Opening is just another name of erosion followed by dilation. It is useful in removing noise, as we explained above.
    
    opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
    
  • morphological Closing
  • Closing is reverse of Opening, Dilation followed by Erosion. It is useful in closing small holes inside the foreground objects, or small black points on the object.
    
      closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
    
  • morphological Gradient
  • It is the difference between dilation and erosion of an image. The result will look like the outline of the object.

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('blackWhite.png',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 2)
dilation = cv2.dilate(img,kernel,iterations = 2)
gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
titles=['Original','Erosion','Dilation','Gradient']
output=[img,erosion,dilation,gradient]

for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(output[i],cmap='gray')
    plt.title(titles[i]),plt.xticks([]),plt.yticks([])

plt.show()
We manually created a structuring elements in the previous examples with help of Numpy. It is rectangular shape. But in some cases, you may need elliptical/circular shaped kernels. So for this purpose, OpenCV has a function, cv2.getStructuringElement(). You just pass the shape and size of the kernel, you get the desired kernel.

# Rectangular Kernel
>>> cv2.getStructuringElement(cv2.MORPH_RECT,(5,5))
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]], dtype=uint8)

# Elliptical Kernel
>>> cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
array([[0, 0, 1, 0, 0],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 0, 0]], dtype=uint8)

# Cross-shaped Kernel
>>> cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
array([[0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0]], dtype=uint8)

OpenCV performance measurement and improvement

In Python, we can use the time library to obtain the current time. This allows us to measure how long a piece of code takes to run, as shown in the following code:

t1 = time.time()
# Image Processing code goes here
t2 = time.time()
print (t2-t1)
OpenCV also provides cv2.getTickCount() and cv2.getTickFrequency() , which can be used for the same purpose. The cv2.getTickCount() function returns the number of clock cycles and cv2.getTickFrequency() returns the clock frequency.

c1=cv2.getTickCount()
# Image processing code goes here
c2=cv2.getTickCount()
print ((c2-c1)/cv2.getTickFrequency())

Chapter 9 Real-life Computer Vision Applications

Barcode detection

A barcode always has a very high horizontal gradient and a very low vertical gradient. So, in our image, we need to search for a region that fulfills this property.

import numpy as np
import cv2

image=cv2.imread('barcode.jpg',1)
input = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('input', input)
cv2.waitKey(0)
The best way to accomplish this is to compute the Sobel derivatives of the first order in horizontal and vertical directions, and then subtract the vertical derivative from the horizontal derivative

hor_der = cv2.Sobel(input, ddepth = -1 , dx = 1, dy = 0, ksize=5)
ver_der = cv2.Sobel(input, ddepth = -1 , dx = 0, dy = 1, ksize=5)
diff = cv2.subtract(hor_der, ver_der)
diff = cv2.convertScaleAbs(diff)
cv2.imshow('diff', diff)
cv2.waitKey(0)
We convert the output in 8-bit unsigned integer format using:

 cv.ConvertScaleAbs(src, dst, scale=1.0, shift=0.0) 
On each element of the input array, the function convertScaleAbs performs three operations sequentially: scaling, taking an absolute value, conversion to an unsigned 8-bit type:

 dst = (unsigned char) src * scale + shift
Then we can apply Gaussian blur and a binary threshold to this blurred image with 255

blur = cv2.GaussianBlur(diff, (3, 3),0)
ret, th = cv2.threshold(blur, 225, 255, cv2.THRESH_BINARY)
This will yield the following image: We can fill in the gaps between the bars of the barcode by dilating it:

dilated = cv2.dilate(th, None, iterations = 7)
The output will contain a big rectangle-like box corresponding to the barcode region. We can eliminate the other region that we're not interested in with the erosion operation:

eroded = cv2.erode(dilated, None, iterations = 7)
We can find out the list of contours in this binary image then the biggest contour in this image would be the contour corresponding to the barcode region.

_, contours, _ = cv2.findContours(eroded, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(temp) for temp in contours]
max_index = np.argmax(areas)
largest_contour=contours[max_index]
We can get the coordinates of the bounding rectangle for the contour with cv2.boundingRect() , an OpenCV function, and draw it as follows:

x,y,width,height = cv2.boundingRect(largest_contour)
cv2.rectangle(image,(x,y),(x+width,y+height),(0,255,0),2)
cv2.imshow('Detected Barcode',image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Note, the iterations used in dilate() and erode() need to be tuned for different images to get the best effect. Use the same image to be tested for different iterations:
  • "iterations=15"
  • "iterations=4"

Motion detection and tracking



import cv2
import numpy as np

camera = cv2.VideoCapture(1)

# create a kernel for the dilation operation,
k=np.ones((3,3),np.uint8)

# initialize the first frame
f1_gray = None

while(True):
    # grab the current frame
    (grabbed, f2) = camera.read()
    # if the frame could not be grabbed, end it.
    if not grabbed:
        break
    
    # convert it to grayscale, and blur it
    f2_gray = cv2.cvtColor(f2, cv2.COLOR_BGR2GRAY)
    f2_gray = cv2.GaussianBlur(f2_gray, (21, 21), 0)

    # if the first frame has not set, initialize it
    if f1_gray is None:
        f1_gray = f2_gray
        continue

    # compute the absolute difference between the current frame and the last frame
    frameDelta = cv2.absdiff(f1_gray, f2_gray) 
    # convert this noise-removed output into a binary image
    ret, th = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)
    # dilate the image so that it is easier for us to find the boundary clearly
    dilated=cv2.dilate(th, k, iterations=2)
    # find the contour
    im2, contours, hierarchy= cv2.findContours(dilated, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    # draw the contour
    o_frame = f2
    # 1 draw all countours
    cv2.drawContours(o_frame, contours, -1, (0,255,0), 2 )
    # 2 draw the bounding box for the biggest objects
    max_area = 0
    x = 0
    y = 0
    w = 0
    h = 0
    for c in contours:
        # if the contour is small, ignore it
        area = cv2.contourArea(c)
        if area < max_area:
            continue
        max_area = area
        # compute the bounding box for the contour, draw it on the frame,
        (x, y, w, h) = cv2.boundingRect(c)
        
    cv2.rectangle(o_frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 
    cv2.imshow('Output', o_frame )
    # assign the latest frame to the older frame
    f1_gray = f2_gray
    # terminate the loop once we detect the Esc keypress
    if cv2.waitKey(5) == 27 :
        break

# release the camera and destroy the display window
camera.release()
cv2.destroyAllWindows()

This above example is limited to the black-while difference between the foreground object and background.

Hand gesture detection

We are going to implement code to count the number of fingers in the hand held in front of the camera.

Chroma key with green screen in the live video

References

留言

熱門文章