Skip to content
Srishilesh P S

View All Posts

Tech Tutorials



Understanding PASCAL VOC Dataset

Technical, Machine Learning5 min read

Object detection refers to the ability of computer systems to locate desired types of objects from an image/scene.

For object detection, the train data is either represented using XML files or JSON files. Each representation has its pros and cons.

In this article, we will be understanding how one such dataset representation helps us with object detection.

We will discuss what the PASCAL VOC format is, the history behind it, and how we use it for object detection.

We will also build a simple dataset format validator using Python to verify if the dataset adheres to the rules of the PASCAL VOC format.

Table of contents


To follow along, the reader must have the following:

  • A good understanding of how to work with machine learning datasets.
  • A decent understanding of object detection.
  • Good knowledge of Python.
  • A code editor of your choice.


For a machine learning model to detect objects of an image, it must be trained with a dataset that holds all information about the objects present in an image.

The dataset that contains information about all objects present in an image is built using a process called Annotation.

In the context of object detection, annotation helps us map an object to its respective label by drawing a rectangular box (called bounding box) over the object.

An example of annotation

Source: An example of annotation by

As you can see in the above image, we map the objects with their respective labels like a car, person, bicycle, or traffic light.

Each object-label mapping is represented with a rectangular box called "Bounding box". Bounding boxes are a series of coordinates or values that represent the position of an object in an image.

The representation of bounding boxes might vary according to the dataset.

Let's discuss more about bounding boxes in the upcoming sections.


This dataset provides standardized images for object detection and segmentation problems.

These datasets are built using tools that follow standardized procedures for the evaluation and comparison of different methods.

In 2008, PASCAL VOC datasets were declared as the benchmark for object detection.

History behind PASCAL VOC

Pattern Analysis, Statistical Modelling, and Computational Learning (PASCAL) ran a series of challenges for object detection from 2005 to 2012 following a standardized file structure for holding these image annotations.

The PASCAL Visual Object Classes (VOC) challenge had two main components:

  1. A publicly available dataset with standardized evaluation software.
  2. An annual competition and a workshop.

The main objectives of this challenge were to find out the ability of models to perform:

  • Classification - Check if an object is part of the image.
  • Detection - Locate the position of the objects present in the image.

This series of challenges came to end in 2012 with major enhancements and improvements to the dataset.

Now, PASCAL VOC provides standardized image datasets for over 20 different classes that are commonly used for tasks like object detection, semantic segmentation, and other classification tasks.

To understand more about PASCAL VOC, it is highly recommended to read this research paper.

PASCAL VOC taxonomy

Here is a sample of what the structure of the PASCAL VOC dataset looks like:


Source: Marmot dataset for table recognition

You can find the above sample dataset here.

As you can see in the above image, these object annotations are represented using the following fields:


The name of the parent folder that the dataset is present in. This field helps us locate the annotated images within a directory.

Here, as you can see, the image file is present within a folder named MARMOT_ANNOTATION.


The image filename where the data is annotated on. This field specifies a relative path of the annotated image file.

Here, the file we are working on is


The absolute path where the image file is present.

Here, we have all the image files present under the absolute path MARMOT_ANNOTATION/


Specifies the original location of the file in a database.

Since we do not use a database, it is set to Unknown by default.


Specifies the width, height, depth of an image.

As you can see the image is 793 pixels wide, 1123 pixels tall, and 3 pixels deep.

In images, usually, depth field represents the RGB color scale i.e. 3.


This field signifies if the images contain annotations that are non-linear (irregular) in shape - commonly referred to as polygons.

By default, the segmented value is set as 0 (linear shape).

object: name

This field specifies the name of the annotated label. Here, the label is a column.

object: pose

Specifies the skewness or orientation of the image. By default, it is specified as Unspecified, which means that the image is not skewed.

object: truncated

Tells if an object is fully or partially visible (can be either 0 or 1 respectively).

object: difficult

Tells if an object is difficult to recognize from an image (can be either 0 - easy or 1 - difficult).

object: bndbox

These are coordinates that determine the location of the object.

These coordinates are represented as [xmin, ymin, xmax, ymax] where they correspond to (x, y) coordinates of top-left and bottom-right positions of an object.

Here, the values of bounding boxes are [458, 710, 517, 785].

PASCAL VOC validator

Having understood the overall structure of how the PASCAL VOC dataset looks like, let's now dive into implementing a simple dataset validator using Python.

Import libraries

We will use 2 libraries for handling XML files:

  1. xmltodict - to work with XML files as we work with JSON files or dictionaries.
  2. xml.etree - used for parsing and creating XML data.

Import them as shown below:

1import xmltodict
2import xml.etree.ElementTree as ET
Create object

Here, we will be reading the dataset file by parsing it with an XML parser as shown:

1dataset_file = r'/sample.xml' # The path to the XML file
3xml_tree = ET.parse(dataset_file) # Parse the XML file
4root = xml_tree.getroot() # Find the root element

To verify the validity of a PASCAL VOC dataset, we will be using assert() assertion statements in Python.

In simple words, assert() is used to debug code by testing for certain criteria. If it does not meet the criteria, it throws a default error. Although, we can customize the errors to be raised.

To learn more about assertions in Python, it is recommended to read this article.


It is highly recommended to learn by keeping the sample of the PASCAL VOC dataset open in a new tab or window.

You can find the sample dataset here.

1assert root.tag == 'annotation' or root.attrib['verified'] == 'yes', "PASCAL VOC does not contain a root element" # Check if the root element is "annotation"
2assert len(root.findtext('folder')) > 0, "XML file does not contain a 'folder' element"
3assert len(root.findtext('filename')) > 0, "XML file does not contain a 'filename'"
4assert len(root.findtext('path')) > 0, "XML file does not contain 'path' element"
5assert len(root.find('source')) == 1 and len(root.find('source').findtext('database')) > 0, "XML file does not contain 'source' element with a 'database'"
6assert len(root.find('size')) == 3, "XML file doesn not contain 'size' element"
7assert root.find('size').find('width').text and root.find('size').find('height').text and root.find('size').find('depth').text, "XML file does not contain either 'width', 'height', or 'depth' element"
8assert root.find('segmented').text == '0' or len(root.find('segmented')) > 0, "'segmented' element is neither 0 or a list"
9assert len(root.findall('object')) > 0, "XML file contains no 'object' element" # Check if the root contains zero or more 'objects'

The code above does the following:

  • Checks if the root is annotation. Having the verified attribute to be yes, is optional.
  • Checks if the dataset contains a folder, filename, path, and source by verifying the length to be greater than 0.
  • Checks for the size object to contain width, height, and depth.
  • Finally, it checks for the segmented parameter. It must either contain a value of 0 or an empty list.

A segmented list denotes that the object is not in linear shape. Therefore, the mask values for the polygon (non-linear) shape must be present to identify such objects. You can read more about this here.

Having covered all the meta-data about the image, let's move into validating each object.

Under the annotation key, there may be more than one object. Therefore, we loop through all the object keys.

1required_objects = ['name', 'pose', 'truncated', 'difficult', 'bndbox'] # All possible meta-data about an object
3for obj in root.findall('object'):
4 assert len(obj.findtext(required_objects[0])) > 0, "Object does not contain a parameter 'name'"
5 assert len(obj.findtext(required_objects[1])) > 0, "Object does not contain a parameter 'pose'"
6 assert int(obj.findtext(required_objects[2])) in [0, 1], "Object does not contain a parameter 'truncated'"
7 assert int(obj.findtext(required_objects[3])) in [0, 1], "Object does not contain a parameter 'difficult'"
8 assert len(obj.findall(required_objects[4])) > 0, "Object does not contain a parameter 'bndbox'"
9 for bbox in obj.findall(required_objects[4]):
10 assert int(bbox.findtext('xmin')) > 0, "'xmin' value for the bounding box is missing "
11 assert int(bbox.findtext('ymin')) > 0, "'ymin' value for the bounding box is missing "
12 assert int(bbox.findtext('xmax')) > 0, "'xmax' value for the bounding box is missing "
13 assert int(bbox.findtext('ymax')) > 0, "'ymax' value for the bounding box is missing "
15print('The dataset format is PASCAL VOC!')

The above code does the following:

  • Declares a list required_objects containing all possible meta-data keys that are present within the object.
  • Loops through each object to check for the presence of keys in required_objects.
  • The possible values for truncated and difficult are binary. Therefore, we check if the extracted value is either 0 or 1.

If all the assertions are passed successfully, we may call the dataset to be in PASCAL VOC format.

The above code snippets help us validate and point out errors if we have missed out on any required key.


PASCAL VOC dataset is used for object detection and segmentation. Its representation as XML files helps us customize datasets easily while using a standardized format for representation.

To summarize, the reader learned:

  • How objects are detected by training the annotations.
  • What PASCAL VOC is and how it originated.
  • The different meta-data parameters required for PASCAL VOC dataset representation.
  • Finally, the reader implemented a simple Python validation script to verify the authenticity of the PASCAL VOC dataset.

You can find the source code here.

Further reading

Peer Review Contributions by: Wanja Mike