Understanding PASCAL VOC Dataset
— Technical, Machine Learning — 5 min read
Object detection refers to the ability of computer systems to locate desired types of objects from an image/scene.
For object detection, the train data is either represented using XML files or JSON files. Each representation has its pros and cons.
In this article, we will be understanding how one such dataset representation helps us with object detection.
We will discuss what the PASCAL VOC format is, the history behind it, and how we use it for object detection.
We will also build a simple dataset format validator using Python to verify if the dataset adheres to the rules of the PASCAL VOC format.
Table of contents
Prerequisites
To follow along, the reader must have the following:
- A good understanding of how to work with machine learning datasets.
- A decent understanding of object detection.
- Good knowledge of Python.
- A code editor of your choice.
Introduction
For a machine learning model to detect objects of an image, it must be trained with a dataset that holds all information about the objects present in an image.
The dataset that contains information about all objects present in an image is built using a process called Annotation.
In the context of object detection, annotation helps us map an object to its respective label by drawing a rectangular box (called bounding box) over the object.
Source: An example of annotation by becominghuman.ai
As you can see in the above image, we map the objects with their respective labels like a car
, person
, bicycle
, or traffic light
.
Each object-label mapping is represented with a rectangular box called "Bounding box". Bounding boxes are a series of coordinates or values that represent the position of an object in an image.
The representation of bounding boxes might vary according to the dataset.
Let's discuss more about bounding boxes in the upcoming sections.
PASCAL VOC
This dataset provides standardized images for object detection and segmentation problems.
These datasets are built using tools that follow standardized procedures for the evaluation and comparison of different methods.
In 2008, PASCAL VOC datasets were declared as the benchmark for object detection.
History behind PASCAL VOC
Pattern Analysis, Statistical Modelling, and Computational Learning (PASCAL) ran a series of challenges for object detection from 2005 to 2012 following a standardized file structure for holding these image annotations.
The PASCAL Visual Object Classes (VOC) challenge had two main components:
- A publicly available dataset with standardized evaluation software.
- An annual competition and a workshop.
The main objectives of this challenge were to find out the ability of models to perform:
Classification
- Check if an object is part of the image.Detection
- Locate the position of the objects present in the image.
This series of challenges came to end in 2012 with major enhancements and improvements to the dataset.
Now, PASCAL VOC provides standardized image datasets for over 20 different classes that are commonly used for tasks like object detection, semantic segmentation, and other classification tasks.
To understand more about PASCAL VOC, it is highly recommended to read this research paper.
PASCAL VOC taxonomy
Here is a sample of what the structure of the PASCAL VOC dataset looks like:
Source: Marmot dataset for table recognition
You can find the above sample dataset here.
As you can see in the above image, these object annotations are represented using the following fields:
folder
The name of the parent folder that the dataset is present in. This field helps us locate the annotated images within a directory.
Here, as you can see, the image file is present within a folder named MARMOT_ANNOTATION
.
filename
The image filename where the data is annotated on. This field specifies a relative path of the annotated image file.
Here, the file we are working on is 10.1.1.1.2006_3.bmp
.
path
The absolute path where the image file is present.
Here, we have all the image files present under the absolute path MARMOT_ANNOTATION/10.1.1.1.2006_3.bmp
.
source
Specifies the original location of the file in a database.
Since we do not use a database, it is set to Unknown
by default.
size
Specifies the width
, height
, depth
of an image.
As you can see the image is 793
pixels wide, 1123
pixels tall, and 3
pixels deep.
In images, usually,
depth
field represents the RGB color scale i.e. 3.
segmented
This field signifies if the images contain annotations that are non-linear (irregular) in shape - commonly referred to as polygons.
By default, the segmented
value is set as 0
(linear shape).
object: name
This field specifies the name of the annotated label. Here, the label is a column
.
object: pose
Specifies the skewness or orientation of the image. By default, it is specified as Unspecified
, which means that the image is not skewed.
object: truncated
Tells if an object is fully or partially visible (can be either 0 or 1 respectively).
object: difficult
Tells if an object is difficult to recognize from an image (can be either 0 - easy or 1 - difficult).
object: bndbox
These are coordinates that determine the location of the object.
These coordinates are represented as [xmin, ymin, xmax, ymax]
where they correspond to (x, y)
coordinates of top-left and bottom-right positions of an object.
Here, the values of bounding boxes are [458, 710, 517, 785]
.
PASCAL VOC validator
Having understood the overall structure of how the PASCAL VOC dataset looks like, let's now dive into implementing a simple dataset validator using Python.
Import libraries
We will use 2 libraries for handling XML files:
xmltodict
- to work with XML files as we work with JSON files or dictionaries.xml.etree
- used for parsing and creating XML data.
Import them as shown below:
1import xmltodict2import xml.etree.ElementTree as ET
Create object
Here, we will be reading the dataset file by parsing it with an XML parser as shown:
1dataset_file = r'/sample.xml' # The path to the XML file2
3xml_tree = ET.parse(dataset_file) # Parse the XML file4root = xml_tree.getroot() # Find the root element
Assertions
To verify the validity of a PASCAL VOC dataset, we will be using assert()
assertion statements in Python.
In simple words, assert()
is used to debug code by testing for certain criteria. If it does not meet the criteria, it throws a default error. Although, we can customize the errors to be raised.
To learn more about assertions in Python, it is recommended to read this article.
Validation
It is highly recommended to learn by keeping the sample of the PASCAL VOC dataset open in a new tab or window.
You can find the sample dataset here.
1assert root.tag == 'annotation' or root.attrib['verified'] == 'yes', "PASCAL VOC does not contain a root element" # Check if the root element is "annotation"2assert len(root.findtext('folder')) > 0, "XML file does not contain a 'folder' element"3assert len(root.findtext('filename')) > 0, "XML file does not contain a 'filename'"4assert len(root.findtext('path')) > 0, "XML file does not contain 'path' element"5assert len(root.find('source')) == 1 and len(root.find('source').findtext('database')) > 0, "XML file does not contain 'source' element with a 'database'"6assert len(root.find('size')) == 3, "XML file doesn not contain 'size' element"7assert root.find('size').find('width').text and root.find('size').find('height').text and root.find('size').find('depth').text, "XML file does not contain either 'width', 'height', or 'depth' element"8assert root.find('segmented').text == '0' or len(root.find('segmented')) > 0, "'segmented' element is neither 0 or a list"9assert len(root.findall('object')) > 0, "XML file contains no 'object' element" # Check if the root contains zero or more 'objects'
The code above does the following:
- Checks if the
root
isannotation
. Having theverified
attribute to beyes
, is optional. - Checks if the dataset contains a
folder
,filename
,path
, andsource
by verifying the length to be greater than0
. - Checks for the
size
object to containwidth
,height
, anddepth
. - Finally, it checks for the
segmented
parameter. It must either contain a value of0
or an empty list.
A
segmented
list denotes that the object is not in linear shape. Therefore, the mask values for the polygon (non-linear) shape must be present to identify such objects. You can read more about this here.
Having covered all the meta-data about the image, let's move into validating each object.
Under the
annotation
key, there may be more than one object. Therefore, we loop through all theobject
keys.
1required_objects = ['name', 'pose', 'truncated', 'difficult', 'bndbox'] # All possible meta-data about an object2
3for obj in root.findall('object'):4 assert len(obj.findtext(required_objects[0])) > 0, "Object does not contain a parameter 'name'"5 assert len(obj.findtext(required_objects[1])) > 0, "Object does not contain a parameter 'pose'"6 assert int(obj.findtext(required_objects[2])) in [0, 1], "Object does not contain a parameter 'truncated'"7 assert int(obj.findtext(required_objects[3])) in [0, 1], "Object does not contain a parameter 'difficult'"8 assert len(obj.findall(required_objects[4])) > 0, "Object does not contain a parameter 'bndbox'"9 for bbox in obj.findall(required_objects[4]):10 assert int(bbox.findtext('xmin')) > 0, "'xmin' value for the bounding box is missing "11 assert int(bbox.findtext('ymin')) > 0, "'ymin' value for the bounding box is missing "12 assert int(bbox.findtext('xmax')) > 0, "'xmax' value for the bounding box is missing "13 assert int(bbox.findtext('ymax')) > 0, "'ymax' value for the bounding box is missing "14
15print('The dataset format is PASCAL VOC!')
The above code does the following:
- Declares a list
required_objects
containing all possible meta-data keys that are present within theobject
. - Loops through each
object
to check for the presence of keys inrequired_objects
. - The possible values for
truncated
anddifficult
are binary. Therefore, we check if the extracted value is either0
or1
.
If all the assertions are passed successfully, we may call the dataset to be in PASCAL VOC format.
The above code snippets help us validate and point out errors if we have missed out on any required key.
Conclusion
PASCAL VOC dataset is used for object detection and segmentation. Its representation as XML files helps us customize datasets easily while using a standardized format for representation.
To summarize, the reader learned:
- How objects are detected by training the annotations.
- What PASCAL VOC is and how it originated.
- The different meta-data parameters required for PASCAL VOC dataset representation.
- Finally, the reader implemented a simple Python validation script to verify the authenticity of the PASCAL VOC dataset.
You can find the source code here.
Further reading
Peer Review Contributions by: Wanja Mike