How AI Can Help Us Recycle

Authors: Duncan Wang, Arnaud Guzman-Annès, Sophie Courtemanche-Martel & James Hogan

The recycling problem

Reduce, Reuse, and Recycle. Otherwise known as the three Rs of waste management, this aphorism has been largely popularized as the solution for solving the rising waste crisis. In North America, rising environmental awareness amongst the general public and the spread of movements such as conscious consumerism have put waste reduction at the forefront of socio-environmental concerns, and most cities have mature recycling programs today. Yet for consumers, while the principles behind “Reduce” and “Reuse” are generally quite straightforward, the processes behind the third R remain complex and poorly understood.

While recycling may seem as simple as placing specific waste items in a special bin to be shipped off to a magical plant and turned into new materials, the actual outcomes of recycling programs are often quite dismal. In fact, the United States Environmental Protection Agency (EPA) estimates that while 75% of American waste is recyclable, a mere 30% is actually recycled. In Canada, only 9% of the 3.3 million tons of plastic waste produced is successfully recycled, while over 75% ends up in landfills. Poor recycling outcomes can be attributed to a combination of a lack of coordination between producers, consumers, and municipalities, differing regulations and capabilities governing recycling, and low public understanding. In Canada, municipalities govern and set guidelines for what is recyclable and what is not, depending on where waste is sold, what buyers are willing to process, and what waste is deemed financially justifiable to recycle. As a result, fragmented recycling systems can confuse consumers, which ultimately leads to large amounts of potentially recyclable items ending up in our landfills.

If this in-depth educational content is useful for you, subscribe to our AI research mailing list to be alerted when we release new material.

How artificial intelligence can help us recycle

Recent advances in artificial intelligence (AI) have led to a rise in AI-driven solutions to help tackle socio-environmental issues, ranging from the use of predictive forecasting to balance the supply and demand of grid-powered energy, to the use of optimization to help reduce waste from manufacturing facilities. In terms of recycling, a recent report by McKinsey & Co. identified that the market opportunity for reducing waste from consumer electronics is up to $90 billion a year, derived from solutions such as the use of image recognition and robotics to automate recycling infrastructure.

One particular difficulty with recycling, and an area of opportunity for AI-driven solutions, is the issue of improper sorting. Due to the variety of waste-material types and differing regulations, consumers can find it difficult to identify the composition of waste items, and therefore improperly classify an item as recyclable or non-recyclable. Such mixing of recyclable and non-recyclable goods decreases the value of and makes it difficult to sell to-be-recycled items, and increases the volume of recyclables that end up in landfills. As such, one potential AI application is the use of image classification to identify and help consumers recognize the material composition and consequent recyclability of their waste items.

Objective

The objective of this guide is to walk through the process of how we can use Convolutional Neural Networks (CNNs) using the Keras API in Python to identify and properly classify common waste items into their associated material groups. CNNs are a popular class of neural network architecture used in deep learning, and are commonly used to perform image classification. After being trained on images of waste items labeled by their associated material type, our goal is to enable the CNN to take an unclassified image of a waste item submitted by a user and generate a prediction representing the item’s material composition. Though simple in nature, such a tool can be used to help inform user decision-making to reduce the quantity of improperly sorted waste items, so that everyone can play a role in improving the recycling process.

The full code needed to replicate the project can be found here.

Introduction to CNN architecture
Data preparation
Building the CNN model
Generating waste category predictions
Final thoughts

A gentle introduction to CNN architecture

Convolutional Neural Networks (CNNs), or ConvNets, are a type of neural network widely used in image recognition and classification tasks. The basic idea behind how a neural network does this is through mimicking how the human brain makes decisions by simulating a network of interconnected layers, each layer of which is composed of “neurons” — i.e. mathematical functions for synthesizing the input features. Neural networks are thus able to identify hidden relationships between large amounts of data that may seem invisible to the human eye.

In order to be able to detect and classify images, a CNN model will take as input a colour image which has been converted into a 3D array representing numerical pixels. It will then pass this input through a series of convolutional layers, pooling layers and fully connected layers, each of which performs a different task. The final layer of the CNN then applies a function known as softmax, which outputs a probability (0–1) that the image belongs to a specific class — in our case, the material composition of the waste item.

Without going into too much detail, the hidden layers in the CNN are generally convolution and pooling layers. In the convolution layers, a filter of a predefined size is moved across the image to perform convolution operations — element-wise matrix multiplication between the filter values and the pixels of the images. The sum of the resulting values will form a feature map, where each feature map extracts a distinct characteristic or quality of the original image.

Steps to creating a convolutional layer. Figure by authors.

Then, we have the pooling layers. These are used to downsample the feature maps by reducing the parameters required to train the network and thus the number of computations required. This also helps “generalize” the inputs, which helps prevent the network from overfitting based on the training images.

Steps to creating a pooling layer. Figure by authors.

After a series of convolutional and pooling layers, we reach the third type of layer present in the CNN architecture: the fully-connected (FC) layer, a type of layer also typically seen in regular neural networks. The FC layers form the last few layers in the network and will take the flattened output from the final pooling or convolutional layer. Using a softmax activation function, the final layer of the network will output the classification of probabilities.

Flattening of the pooled feature map. Figure by authors.

Now that we’ve had a brief introduction to the CNN architecture, let’s jump into how we built our model, starting with the data.

Preparing the data

The data we will use contains 2,532 images of recyclable items made of cardboard, glass, metal, paper, plastic, as well as items that are non-recyclable (trash). The data source is available here.

In order to feed into our model a set of images that can be digested, the first step is to transform every picture of our dataset into a 3D array of pixels:

#load image and transform to 3D array - image array shape is 384 x 512 pixels, colour channels = 3 (RGB)
img_array = image.img_to_array(test_image)
#scale from -1 to 1
img_array = keras.applications.xception.preprocess_input(img_array)

For our images, the shape of the resulting array is (384, 512, 3), where the first and second elements in the array represent the pixel dimensions and the third element represents three colour channels for red, green, and blue (RGB).

Data augmentation

Next, we randomly crop and flip some images in order to augment the data by introducing some randomness. Adding variability helps reduce the risk of overfitting, and therefore increases the ability of the model to identify future unseen images.

Here, we either artificially introduce variability by applying either a random crop or a center crop:

Random crop vs. center crop. Note that the random crop’s location may differ each time. Figure by authors.

Here is a look at how we did so. Note that each of the below cropping functions incorporates the previous code snippet for converting each image into a 3D array before cropping. We then bring the cropping functions together into a new function which randomly either applies the random crop or center crop to the images with a 50% probability of each, and also flips approximately 50% of the images.

#center crop of incoming image, sizing it down to (224, 224, 3)
def center_crop(img):
  image_array = image.img_to_array(img)
  shape = image_array.shape # H, W, D
  left = int((shape[1]-224)/2)
  right = left + 224
  bottom = int((shape[0]-224)/2)
  top = bottom + 224
  return image_array[bottom:top, left:right]

#random crop of the image within the defined range
def random_crop(img):
  image_array = image.img_to_array(img)
  shape = image_array.shape
  left = np.random.randint(0, shape[1]-224)
  right = left + 224
  bottom = np.random.randint(0, shape[0]-224)
  top = bottom + 224
  return image_array[bottom:top, left:right]

#center crop or random crop, + flip or no flip, with 50/50 probability 
def process_img(img):
  if np.random.rand() &amp;gt; 0.5:
    cropped = center_crop(img)
    if np.random.rand() &amp;gt; 0.5:
      return np.flip(cropped, axis=np.random.randint(0, 2))
    else:
      return cropped
  else:
    cropped = random_crop(img)
    if np.random.rand() &amp;gt; 0.5:
      return np.flip(cropped, axis=np.random.randint(0, 2))
    else:
      return cropped

Let’s look at how these data augmentation techniques would modify a sample image:

**Note:** The original image and augmented image shapes differ as the original image shape is (384, 512, 3) but is sized down to (224, 224, 3) during the cropping process. The fully augmented image has both a random crop and image flip applied. Figure by authors.

To process the images, we pass each image through the augmentation function, and scale the resulting arrays so that the pixel values lie between -1 and 1, which is required for input to the CNN model:

#pass each image through the image-to-array and augmentation function 
img_array = image.process_img(test_image)
#scale from -1 to 1
img_array = keras.applications.xception.preprocess_input(img_array)

Data exploration

To get a better understanding of the data we’re dealing with, let’s look at the distribution of waste items by category. It seems that the classes are relatively balanced with the exception of trash. For now, we’ll leave the data as is, but if we wanted to create more balanced classes, we could source additional samples or further augment the existing trash photos to produce new representations.

Let’s also have a look at a sample of the images that we currently have in each category:

Sample of various waste images sorted by category.

Preparing the model inputs

To prepare the model inputs, we first factorize the six waste item material types into numerical format, since the CNN model cannot interpret a word-label category directly:

0: Cardboard
1: Glass
2: Metal
3: Paper
4: Plastic
5: Trash

Since the images are loaded sequentially by category, we also shuffle the data so that each dataset does not contain a vastly unequal proportion of images from a specific waste type once split.

Lastly, we split the data into three sets: train, validation, and test. The resulting split creates 1750 training observations used to fit the model, 518 validation observations used to provide an unbiased evaluation of the model’s fit while tuning the parameters of our model during the training phase, and 259 test observations which will allow us to evaluate how our model performs once it is fully built and trained.

After preparing the model inputs, now comes time to build our model.

Building the CNN model

To build a CNN classifier, we first leverage an external pre-trained model to use as the base model of the network. This idea is known as transfer learning, which allows us to use layers that have already been trained on another model to help build our model, so that the waste-item classifier does not have to start learning from scratch. Since we do not have a pre-trained garbage-classifier, we use the ImageNet model instead, which is trained on a large collection of images of various items and could be useful for identifying waste items too.

In short, transfer learning allows us to do three key things:

Instantiate a base model and load pre-trained weights onto it
Freeze all layers in the base model and create a new model on top
Train the new model on our dataset.

Here, we define the base ImageNet model and extract its trained weights:

base_model = keras.applications.xception.Xception(weights="imagenet", 
                                                  include_top=False, 
                                                  input_shape=(224, 224, 3))

We will initially freeze the entire convolutional base so that we only use its outputs to feed into our classifier, without re-training the ImageNet model.

Freezing the convolutional base model. Figure by authors.

#freeze the base model 
base_model.trainable = False
#define the type of NN architecture - sequential model specifies a linear stack of layers 
model = keras.models.Sequential()
#add the pre-trained model
model.add(base_model)

We can now build the layers for our customized classifier:

#pool layer to prepare data as input into dense layer 
model.add(keras.layers.GlobalAveragePooling2D())
model.add(keras.layers.Dense(256, activation='relu'))
#batch normalization layer re-centers and re-scales the network - helps accelerate training
model.add(keras.layers.BatchNormalization())
#dropout layer - temporarily deactivates 20% of the nodes in the network each epoch to redistribute weights/help network concentrate on "weak" features and prevent overfitting
model.add(keras.layers.Dropout(0.2))
#flatten layer to single array for input into dense layer 
model.add(keras.layers.Flatten())
# prediction layer - 6 neurons = 6 category outputs, and softmax to normalize the output of the network to a probability distribution over the predicted output classes 
model.add(keras.layers.Dense(6, activation='softmax'))

Here is a quick look at our layers:

Pooling layer: The pooling layer is used to prepare the data as an input that will be pushed into a dense, or fully connected layer.
Batch normalization layer: Implements a regularization technique which re-centers and re-scales the network to stabilize the learning process and accelerate training. This fixes the means and the variances of each layer’s input and thus can be added anywhere in the neural network to improve performance.
Dropout layer: Dropouttemporarily deactivates 20% of the nodes in the network at every epoch to redistribute weights and help the network focus on weak features. This can help prevent overfitting on the training dataset.
Flatten Layer: The flatten layer will convert our data into a 1-dimensional array that will be the input of our fully-connected/dense layer.
Prediction layer: In our final layer, the softmax activation normalizes the output of the network to a probability distribution over the 6 possible output classes.

Here’s a look at the summary of our model:

Now, let’s compile our model. We specify sparse categorical cross-entropy, used for multi-class classification tasks, and evaluate its performance using classification accuracy.

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', metrics=['accuracy'])

Call-back functions

Next, we implement a few callback functions to use when training the model. A callback is an object that can perform specific actions at a given stage of the training process, which allows us to customize the behaviour of the model.

The first callback creates a custom function which stops training if accuracy on the training data is over 0.999, to prevent overfitting. The second callback logs model statistics using the TensorBoard visualization toolkit, which we can use to track the model’s metrics during training.

# implement a custom callback to stop training if we reach accuracy &amp;gt;0.999 because that indicates overfitting on this relatively small dataset
class OverfittingCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if logs.get('accuracy') &amp;gt; 0.999:
      self.model.stop_training = True
      print('Trying to prevent overfitting - stopping training')

#TensorBoard - log model statistics
root_logdir = os.path.join(os.curdir, 'my_logs')
def get_run_logdir():
  import time
  run_id = time.strftime('run_%Y_%m_%d-%H_%M_%S')
  return os.path.join(root_logdir, run_id)
run_logdir = get_run_logdir()

#implement a custom tensorboard callback 
tensorboard_callback = keras.callbacks.TensorBoard(run_logdir, update_freq="epoch")

Training the model

We first train the model for 40 epochs (or until our custom callback stops or the keras EarlyStopping callback is activated) with the base model layers frozen.

#train model for 40 epochs or until early stopping initiated 
history = model.fit(X_train, y_train, epochs=40, 
                    validation_data = (X_valid, y_valid), 
                    callbacks = [keras.callbacks.EarlyStopping(patience=3, monitor='val_accuracy'), 
                                 OverfittingCallback(), tensorboard_callback])

After the model has converged on the new data, we can unfreeze the base ImageNet model and retrain with the unfrozen base model layers by specifying base_model.trainable = True. Note that this is an important step, given that if the randomly-initialized trainable layers are mixed with trainable layers from the base model that hold pre-trained features, the randomly-initialized layers will cause very large gradient updates during training, which will destroy the pre-trained features from the base model.

With the initial training done, we can now re-train the model again in the same manner, but with the un-frozen base layer.

Here’s a look at our model’s training performance with the active base layer. We can see that the model completed five training epochs before the callback function was activated, indicating that the training accuracy was over 99.9%. At the time of training completion, we can see that the classification accuracy on the validation set is: 79.54%.

The TensorBoard callback function also allows us to view and interact with various metrics associated with the training process. Here, we can track how accuracy increases and loss decreases at each training epoch:

Predicting waste categories

After training the model, we can evaluate its performance with “unseen” data: our test dataset of 259 images. Here, we can generate an array containing the predicted classes (0–5) of each waste image in the test dataset.

generate predictions using the test dataset
predicted_class=np.argmax(model.predict(X_test), axis = -1)

To interpret these results, we can assign each label back to its material type. We can also plot the comparisons between the predicted waste item types and the actual waste item types in the test dataset:

Sample of correct vs. incorrectly classified waste items. Figure by authors.

Evaluating performance

We can analyze the predictions of the image classification model using a confusion matrix. This allows us to compare the predicted vs. actual material types to evaluate the number of correct vs. incorrect classifications, for each waste type.

Confusion matrix for predicted vs. actual classes of waste. The prediction breakdown for the trash class has been highlighted for demonstration purposes. Figure by authors.

We see that overall, 202 of the 259 (78%) of the images were predicted correctly. While the prediction accuracy is decent, the model’s ability to identify trash is most important to us, since we want to be sure that the classifier can separate trash from recyclable items.

If we examine trash, we have the following results:

True positives (rate): 4 (33.3%)
True negatives (rate): 240 (96.8%)
False positives (rate): 8 (3.2%)
False negatives (rate): 7 (63.6%)

For trash, the overall prediction accuracy is 94.2%, representing the total number of correctly classified trash or non-trash predictions. If we look at the false-positive rate, we see also that a very low percentage of non-trash items are mistakenly identified as trash. While accuracy is high and false positives are low, this is mostly because the majority of items are not trash in the first place.

However, we see that the false-negative error rate is quite high, suggesting that trash items are often mistakenly classified as non-trash items. For recycling purposes, this is not ideal, since the cost of a false-negative is high. In other words, to avoid the contamination of to-be-recycled items, it is less important that non-trash items are mistakenly classified as trash and thrown out, but more critical that trash items are not mixed in with recyclable items. Since there are fewer trash items in our dataset overall, we would therefore recommend to further augment the dataset with additional trash images in order to improve the false-negative error rate of prediction.

Generating new predictions

We’ve now seen how the model performs using the existing dataset. Let’s examine how it responds to new data. In this final step, we create a new function which intakes a new image, processes it, and feeds it into the trained CNN model.

def classify_image(my_image):
 #load custom image and re-size 
 custom_image = image.load_img(my_image, target_size=(224, 224))
 #convert to array and rescale from -1 to 1 
 img_array = image.img_to_array(custom_image)
 processed_img = keras.applications.xception.preprocess_input(img_array).astype(np.float32)
 #ensure consistent formatting 
 swapped = np.moveaxis(processed_img, 0,1)
 arr4d = np.expand_dims(swapped, 0)
 #generate a prediction and print output 
 new_prediction= class_convert(np.argmax(model.predict(arr4d), axis = -1))
 print('Your item is: ', new_prediction[0])

#call function and generate prediction for new image
classify_image(test_image)

Here, we’ve uploaded our own photo of a somewhat destroyed post-it note with some coloured text print. We can see below that our model correctly classifies its material as paper. Success!

Final thoughts

The overall goal of this project was to examine how we can build an AI-driven model and apply it towards the current waste crisis. By exploring the intuition behind Convolutional Neural Networks (CNNs) and the steps taken to build one, we were able to successfully train a popular deep learning framework to classify images of waste items from five recyclable groups and distinguish them from non-recyclable trash items.

Though classifying common waste items can seem like a trivial task, applications of CNN such as this can be powerful when manifested at scale. Since North America produces 14% of the world’s waste, CNN-driven algorithms could be scaled into industrial solutions for automated waste sorting, and improve the effectiveness of modern recycling systems. If designed properly, CNN-driven solutions have the potential to help combat human judgment-based errors, contribute to reducing overall sorting costs, and can redefine what it means to be green in the 21st century.

Resources

This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.

We’ll let you know when we release more technical education.

The recycling problem

How artificial intelligence can help us recycle

Objective

Table of contents

A gentle introduction to CNN architecture

Preparing the data

Data augmentation

Data exploration

Preparing the model inputs

Building the CNN model

Call-back functions

Training the model

Predicting waste categories

Evaluating performance

Generating new predictions

Final thoughts

Resources

Enjoy this article? Sign up for more AI updates.

Related

Reader Interactions

About Duncan Wang

Leave a Reply

Footer

About TOPBOTS