Identifying and segregating blobs from the Image
As Data Scientist, when we are presented with and image, we look for points of interests or objects on that image. It is important to extract those information and translate it to more useful format in order to be able to generate more insights out of it.
Let's start with the definition first. What is a blob? A BLOB* stands for Binary Large OBject. It was referred to as "Large" since it only indicates objects of interest with certain size. Other smaller binary objects are usually noise. These so called "blobs" are the ones we are going to extract form the image.
Let's begin by loading the required python libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import skimage
from skimage.io import imread, imshow
from skimage.color import rgb2gray, rgb2hsv
from skimage.filters import threshold_otsu
from scipy.ndimage import median_filter
from skimage.measure import label, regionprops, regionprops_table
And now, let us load the image file and see what we will be working on.
Identifying the flowers from the image with our eyes is easy. But how would we ask the computer to identify and segregate it?
Let try to binarize the image to see if we can segregate our object of interest. To binarize the image, we can try converting the image to grayscale first, then thresholding the image's value to certain level or by using Otsu threshold. Let's try both.
Hmmm... This would not work. Even if I'm note colorblind, I wouldn't be able to distinguish the flower from the image. Let's try a different approach.
We can try to use the HSV color space to observe what we can utilize for masking.
In HSV colorspace, we can see the flowers to be more pinkish than other objects. We can use masking to try to segregate the flower from the image.
With the initial masks, we can see that the flowers are already separated. However, there are some small blobs due to the background. Let's take a look on the other hsv channels to and check if there are characteristics we can adjust.
For now, we will take the advantage of using the low saturation of the background and include it to the masking.
Now the leaves us with the flowers and the specks. Let us use the median_filter from the scikit-image library to clean these noise.
Alright! Nice and clean. We will now use the label function of scikit-image to label each blob.
From above, we can now distinguish each blob. Unfortunately for the lower left flowers, they were identified as one.
.
Region_props function would give us the properties of each identified blobs. Placing it to the pandas data frame would provide the flexibility we needed to plot the segmented image's bounding box. If there are blobs that are not supposed to be captured, we can further filter the unwanted blobs using the data frame parameters.
The image below shows the image with a rectangular bounding box per flower. For the flowers on the lower left, it was captured as one since we based our segmentation on the color of the petals. The blob identification and segmentation on this image can be further improved by selecting a better mask for the object of interest.
Here's the segmented image of the blobs as captured by the filtering method used on the image.
Let's review. A blob is a lump in the image bounded with its perimeter, shape, or color. A Blob is a large binary object since smaller bounded objects are classified as noise most of the time.
To segment blobs, we can try to binarize the object. In our example, the use or binarizing method was not successful, which lead us to utilize the HSV colorspace in masking to identify and segment the blob from the main image. Scipy ndimage's median filter can help clean-up small specks from the image. Scikit-image's regionprops_table provides properties of the blobs which are compatible with the pandas data frame table.
And lastly, it takes a lot of trial and error especially when finding the right filter. Be patient and enjoy the journey.
Comments