First Custom ML: Classical and First Neural Net

It is recommended that you have completed the Level 1 Preparation.

In this first set of practice problems you’ll learn about basic ML and neural networks, hands-on, with Jupyter notebooks and Python for image classification. You’ll be introduced to scikit-learn and PyTorch as Python packages commonly used in data manipulation and data science.

This section will end on simple object detection with classical methods.

Here and throughout these practice exercises you’ll work with the following image datasets: COCO, Fashion MNIST, the Hymenoptera insect and a few custom ones you create.

Be sure to observe your metrics in the process (accuracy, precision, recall, etc.).

First Custom ML (Open Source Tools)

For these two problems, it is recommended to go through the code from the original source line by line in whatever fashion you see fit so that you really understand what is going on.

TIPS: Place all imports at the top of the notebook. Call the training data something consistent thoughout all of your work (X_train -> training data, y_train -> labels, X_test -> test data…).

Image Classification with Classical ML

fashion dataset sample

Create a Python program to classify images from Fashion MNIST Dataset (get here) leveraging code samples from the Python Data Science Handbook - Ref.

Refer to Chapter 2 and 3 of the Python Data Science Handbook for information on data manipulation in Python if not already familiar.

Do this in a Jupyter notebook (any service or locally) - recall you learned about this tool in the Setup section.

Steps:

Visualize a sample of 50-100 images with labels
Try fitting a Gaussian naive Bayes model. How does it compare results found in the Handbook for the MNIST Digits dataset (a black and white 8x8 pixel dataset of handwritten digits)?

Additionally:

Which fashion item has the best accuracy, which the worst? Use a confusion matrix. Why do you think that is? Is there a way you could imagine improving this model?
Normalize the images (in sklearn) and check the accuracy of the model(s) again. Did it improve or worsen?
Try a different model - SVM or Random Forest

Image Classification with Basic Neural Nets

The purpose of the Basic Neural Nets exercises are to familiarize you with how a simple artificial neuron works all from the ground-up - this knowledge will serve you well. See Level 1 Preparation for more information.

Adapt a from-scratch Perceptron as in this Jupyter notebook to train and test on the Fashion MNIST dataset.
- Does the model converge or not (plot the training and validation error)?
Adapt a from-scratch Multilayer Perceptron (MLP) as in this Jupyter notebook
- Try it again with the scikit-learn MLP class.
- Does the model converge now? What accuracy does the model achieve?

Object Detection with Histogram of Oriented Gradients

Create a Python program to detect bear faces (perhaps you’re builing a bear watch app for safety in the woods) by leveraging code samples from this Python Data Science Handbook notebook.

bear face with hog

Collect 50-100 images of bear faces from the web and square-pad them. In addition, resize them to the same shape (228x228 for example). Observe, that in the code sample, the shape of the final image data for training will be (100, 228, 228) if 100 samples are collected. These constitute the “positive” training samples.

An example of the image pre-processing (padding is up to you):

data_array = []

# Get image files
img_files = glob.glob('../../data/bears_pad/*.*')

for img in img_files:
    im = Image.open(img)
    # Resize to uniform size
    im = im.resize((228, 228))
    # Convert to only grayscale in case of an alpha channel
    im = im.convert('L')
    im = np.asarray(im)
    data_array.append(im)

# Convert collection to numpy array
positive_patches = np.asarray(data_array)
positive_patches.shape

The rest of the steps are outlined as follows (as described in the Handbook):

Obtain a set of image thumbnails of non-faces to constitute “negative” training samples.
Extract HOG features from these training samples.
Train a linear SVM classifier on these samples.
For an “unknown” image, pass a sliding window across the image, using the model to evaluate whether that window contains a face or not.
If detections overlap, combine them into a single window.

Additionally:

What other confounding factors are there for images other than illumination, you think?
Plot the original image along with the skimage.rgb2gray version and the HOG representation. See how this works in matplotlib. What does skimage.rgb2gray actually do?
Try out the model on the entire test image. What do you find out?

A cursory result might be (after varying window sizes): model prediction

Try using sliding windows with a variety of sizes (and aspect ratios). What do you find out?
Augment the data to expand the training and test datasets (e.g. use a library like imgaug to left-right flip, blur, contrast normalize, etc.) and retrain and test. How does the performance change and why is that?
Extra credit: Implement Non-Maximum Suppression in Python to find the single best bounding box of a group of bounding boxes as are found above. Apply this to the test image.

Additional Help

StackOverflow with sklearn, jupyter
For Custom Vision you can email customvisionteam@microsoft.com.

Level 1 Practice