Advanced Deep Learning with PyTorch

In this advanced excercise, the instructions will be a little more vague and you’ll need to go figure find out much on your own, part of the learning and challenge.

Why do this task:  Usually, beginner tutorials around ML and neural networks begin with classifying hand-written digits from the MNIST dataset or CIFAR-10. We are going to begin with something more challenging and much of it will be dealing with data, labeling and data formats. This is to simulate how life will likely be in real life and it’s hoped you will learn how to create machine learning models more effectively and quickly in the real world. The reason to work through the following is:

  • It will force you to read and learn from scratch.  You will learn the different label file formats, deserializers and how things compute. 
  • For for example, in energy/manufacturing you will get .png or .jpg or .tiff files and not stuff already in the perfect format.  You may even get video data.
  • Learning this will help you understand the concept of “Data Packing”. 
  • This is not the simplest way, but it forces greater learning.

Working with PyTorch on More Complex Data

See Setup for more instructions on how to

In the real world, data is rarely so uniform and simple pixels will not be suitable: this has led to a large literature on feature extraction methods for image data.

Image Classification

Start with raw video of fish swimming at a video trap in the northern territories of Australia.

Sample image:

lutjanus johnii at video trap

  1. Download the video sample from here: https://github.com/Azadehkhojandi/FindingFishwithTensorflowObjectdetectionAPI/blob/master/fish_detection/fishdata/Videos/video1.mp4
  2. Separate the video input into individual frames and then separate the frames into a ‘fish’ or ‘no fish’ folder with manual (time consuming) or automated inspection (use pre-built APIs).

An automated intial pass could be done with the Microsoft Computer Vision API or, even, Custom Vision (w/ free tiers). An exmaple of using the Computer Vision API for this task may be found in this script on GitHub (with good instructions on getting access to the API on the Readme).

This API may not perform well on the raw images, however - can you see why? How could they be transformed? Now let’s create our own model.

  1. Create your own classifier in PyTorch to classify frames as fish/no fish using the parsed data, now in proper folders.
  2. Use transforms modules from torchvision and other libraries to:
    • Try out some data augmentation - (e.g. random vertical flip and blur the images) as well as the more standard “good idea” normalization.
  3. Make sure you also create an example for inference.
  4. Use Scikit-learns’s confusion matrix and classification_report to generate metrics.
    1. Scikit-learn’s confusion matrix
    2. Scikit-learn’s classification report

Object Detection

Use the Out of Box Faster-RCNN or YOLOv3 solution to identify fish in frames and, for this, you will need to label with bounding boxes - good tools are VoTT and VGG Image Annotator

Additional Help

  • PyTorch forums - Ref
  • StackOverflow with pytorch tag

Credit

Thank you to David Crook for providing the initial wording for the excercise intro.