Computer Vision for Garbage Detection

Published in

Ramudroid

5 min readOct 4, 2020

In recent time the use of Machine learning has surged with more computing power ( GPUs) being available for research and training. Deep Learning is now helping us venture into problem areas pertaining to major environmental and ecological impacts. One such area of concern is garbage identification and classification. When garbage is identifiable it can be recycled efficiently thus helping the environment and climate change in the long run.

Robot Ramudroid aims to meet this challenge of identifying and picking up recyclable litter from roadsides, alleys, lanes, sidewalks and other urban outdoor places using clean solar energy.

Some experiments at automating garbage classification for the project Ramudorid are summarised as follows -

HAAR cascades and HOG + SVM

The first and simplest approach to be used was a sliding window approach where pixels outside of the window are cropped and the smaller image is then sent out to classifier. The downsides of this approach were that it was useful only when detecting a single object class with a fixed aspect ratio.

Regions with CNN features ( R-CNN ) — GoogLeNet / Inception/ VGG Network

The concept of sliding window bounding boxes was tossed out in favour of a model that can propose locations of bounding boxes that contain the object.

While AlexNet achieved 84.7 % accuracy and subsequent deeper models such as GoogleNet , Inception, VGG further improved the performance. The method was unsuitable for garbage classification primarily due to their weight and processing time, presumably due to large categories. For example, GoogLeNet was trained on 1.2 million images for 1000-classes object recognition

Detecting objects, not specific Litter, garbage or trash

RESNET

This achieved 95% accuracy on single class detection

But this model too did not perform very well on a group of objects

Pre-trained RESNET model: https://www.kaggle.com/keras/resnet50

you only look once (YOLO)

YOLO ( v2 and v3) typically employs a single neural network to perform predictions of bounding boxes and class probability in one evaluation ( look only once ), making it faster.

This approach divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Requires only one forward propagation pass through the network to make predictions. After non-max suppression, it then outputs recognized objects together with the bounding boxes.

We achieved decent real-time performance, however more suited to autonomous self-driving cars the model was not able to detect objects very well from the low ground level of camera.

IBM Watson Vision

Cloud-based computer vision services such as IBM Watson vision perform remarkably well-generating tags like litter in spite of no prior custom tagging or training by me, as shown by the screenshot below.

Google Cloud -vision

However, on the flip side the robot cannot entirely depend on payment based cloud-services due to the high volume of data

With the unsatisfactory findings from the above approaches, I started with self-trained models with a custom dataset collected from the neighbourhood which includes litter of various types like — plastic, metal cans and caps , wrappers, cardboard, paper and glass .