I have recently been working on several projects which allowed me to dive into image analysis. What I found was a lot of people still using nested loops to process pixel data. This is fine for small data sets. Once you have amassed several thousand images to process though, you need a better way. For many new data scientists, one of the most confusing aspects of manipulating image data programmatically seems to be understanding the axes layout many libraries (like NumPy and Pandas) rely on. Understanding how the data is laid out, and how you can efficiently process it, can save you hours of waiting for your code to execute. In this post I will explain this layout and how you can leverage it to your programmatic advantage.
The most useful data in images exists in a format not easily understood by machines. Over the years I have collected several snippets of code which I use to turn this data into useful feature sets for analysis. They rely on manipulating these axes to reduce the complexity of the code I needed to write.
As you're probably aware: color images are composed of 3-4 separate information layers corresponding to the color channels in the image (and the alpha channel, if one is available). You may have seen these colors represented in RGBA notation like so (0,255,0,1), which defines a pure green color with 100% alpha. You may also have seen this written in hex notation like so: #00ff00. Each of these representations denotes a single pixel value in an image. This creates a data set that exists in 3 dimensions, rather than the 2 dimensional data frames most people are used to. Figure 1 shows the 3D layout of a 4x8 pixel image, as well as the NumPy axes associated with it.
You will get back In this case it would be 4x8 pixel image represented with 4 channels (Red, Green, Blue, and Alpha respectively). If you examine this data with image.shape you will find that the shape matches this expectation with (4,8,4) giving a total of 128 data points. The 0 axis represents the number of samples in a given data set. This is also sometimes referred to as the row count or y-axis. For an image, each row of data actually corresponds to a horizontal segment of pixel values for the given channel. When you perform some function along the 0 axis, you are telling the program to consider the information in one channel from top to bottom. For example, you can determine the vertical sum of all columns of pixels using the code
vert_sums = np.sum(image)
The source file is a PNG file which is 668 pixels high, 960 pixels wide, and has 4 channels.
Separating channels
Sometimes it is useful to be able to separate an image into a 2d array for each channel. This is intuitively just like splitting the data up along axis 2.r, g, b, a = img[:, :, 0], img[:, :, 1], img[:, :, 2], img[:, :, 3]
The result is four separate 2D arrays, each containing one full channel from the image. Figure 3 shows the result of display each channel side-by-side.
You can see the alpha channel on the far-right of the image. The solid black pixels denote areas which will be completely visible. White pixels denote areas that will be completely hidden. You can tell that the silhouette of the bird is the only visible area of the data. Looking at the three color channels shed some light on why this is the case, you can clearly see artifacts in the image that the editor wanted to hide with something called a layer mask (a technique to non-destructively hide parts of an image, rather than remove them entirely). By removing the alpha channel we can see the artifacts left behind.
Averaging channels
Often times, you want to reduce the number of features a color image will produce. It is common to flatten it into a single 2 dimensional data frame which consists of the mean value of all the channels. You can just consider the color channels in cases where the alpha is consistent. Processing a 668x960 pixel, 4-color image yields 2,565,120 individual values. However, averaging leaves you with only 1/4 of the values to deal with (641,280 to be precise). For example suppose the first pixel the image has the RGB tuple (60,120,0,1). The new value for this pixel in the output would be (60+120+0+1)/4=45.25. You can think of this intuitively as converting the image to a single-channel gray scale image. To achieve this you need to sum each element along axis number 2, then divide by the number of channels in the image to get the average. Luckily this pattern is so common that the skimage library includes a function to perform the math in the backgroundmean_img = img.mean(axis=2)
The resulting image can be seen in Figure 3
You can see the general softening of the image. and the loss of detail. This is a negative image (the darker pixels indicate higher light values). Notice the input parameter axis, this is where you can change the behavior based on your newfound understanding of each axis. To really gain an understanding though, think about what might happen if you chose a different axis? For example, taking the mean of the 0 axis is equivalent to taking the mean value of each vertical column of pixels within a single channel. What would the shape of this data be? Try it out and see.
Flattening the image
Whether or not you choose to flatten the channels (or use some other manipulation), you still have a matrix of data remaining. However learning is done on vectors of features. Once again there are several approaches to manipulating the data. The one I will show here is probably the simplest. You can think of it intuitively as unraveling the image into one long string of pixels. In the case of the bird example that leaves us with 641,280 values.num_pix = (mean_img.shape[0]*mean_img.shape[1])
features = pd.Series(list(np.reshape(mean_img, num_pix)))
df = pd.DataFrame(features)
df = df.transpose()
First, I calculate the number of pixels in the image and store this to the num_pix variable. I pass the image data along with this new shape (641,280x1 the 1 is implied). I explicitly cast these to a list (because explicit is better than implicit). The resulting list is converted into a Pandas series representing the features, this produces one row per pixel, but what I want is in fact 1 column per pixel, therefor I use the transpose function to produce a DataFrame object with 1 row and 641,280 columns (not including the automatically added index).
Conclusion
I hope this has showed you some of the simplifications you can achieve in your code by taking advantage of the axes the way the libraries intend. You can find the Jupyter Notebook which I wrote as the basis for this post on my GitHub(https://github.com/dreilly369/PAMS_pub/blob/master/image_manipulation.ipynb)
No comments:
Post a Comment