Level 1 Practice
First Custom ML: Classical and First Neural Net
It is recommended that you have completed the Level 1 Preparation.
In this first set of practice problems you’ll learn about basic ML and neural networks, hands-on, with Jupyter notebooks and Python for image classification. You’ll be introduced to scikit-learn
and PyTorch as Python packages commonly used in data manipulation and data science.
This section will end on simple object detection with classical methods.
Here and throughout these practice exercises you’ll work with the following image datasets: COCO, Fashion MNIST, the Hymenoptera insect and a few custom ones you create.
Be sure to observe your metrics in the process (accuracy, precision, recall, etc.).
First Custom ML (Open Source Tools)
For these two problems, it is recommended to go through the code from the original source line by line in whatever fashion you see fit so that you really understand what is going on.
TIPS: Place all imports at the top of the notebook. Call the training data something consistent thoughout all of your work (X_train -> training data, y_train -> labels, X_test -> test data…).
Image Classification with Classical ML
Create a Python program to classify images from Fashion MNIST Dataset (get here) leveraging code samples from the Python Data Science Handbook - Ref.
Refer to Chapter 2 and 3 of the Python Data Science Handbook for information on data manipulation in Python if not already familiar.
Do this in a Jupyter notebook (any service or locally) - recall you learned about this tool in the Setup section.
Steps:
- Visualize a sample of 50-100 images with labels
- Try fitting a Gaussian naive Bayes model. How does it compare results found in the Handbook for the MNIST Digits dataset (a black and white 8x8 pixel dataset of handwritten digits)?
Additionally:
- Which fashion item has the best accuracy, which the worst? Use a confusion matrix. Why do you think that is? Is there a way you could imagine improving this model?
- Normalize the images (in
sklearn
) and check the accuracy of the model(s) again. Did it improve or worsen? - Try a different model - SVM or Random Forest
Image Classification with Basic Neural Nets
The purpose of the Basic Neural Nets exercises are to familiarize you with how a simple artificial neuron works all from the ground-up - this knowledge will serve you well. See Level 1 Preparation for more information.
-
Adapt a from-scratch Perceptron as in this Jupyter notebook to train and test on the Fashion MNIST dataset.
- Does the model converge or not (plot the training and validation error)?
-
Adapt a from-scratch Multilayer Perceptron (MLP) as in this Jupyter notebook
- Try it again with the
scikit-learn
MLP class. - Does the model converge now? What accuracy does the model achieve?
- Try it again with the
Object Detection with Histogram of Oriented Gradients
Create a Python program to detect bear faces (perhaps you’re builing a bear watch app for safety in the woods) by leveraging code samples from this Python Data Science Handbook notebook.
- Collect 50-100 images of bear faces from the web and square-pad them. In addition, resize them to the same shape (228x228 for example). Observe, that in the code sample, the shape of the final image data for training will be (100, 228, 228) if 100 samples are collected. These constitute the “positive” training samples.
An example of the image pre-processing (padding is up to you):
data_array = []
# Get image files
img_files = glob.glob('../../data/bears_pad/*.*')
for img in img_files:
im = Image.open(img)
# Resize to uniform size
im = im.resize((228, 228))
# Convert to only grayscale in case of an alpha channel
im = im.convert('L')
im = np.asarray(im)
data_array.append(im)
# Convert collection to numpy array
positive_patches = np.asarray(data_array)
positive_patches.shape
The rest of the steps are outlined as follows (as described in the Handbook):
- Obtain a set of image thumbnails of non-faces to constitute “negative” training samples.
- Extract HOG features from these training samples.
- Train a linear SVM classifier on these samples.
- For an “unknown” image, pass a sliding window across the image, using the model to evaluate whether that window contains a face or not.
- If detections overlap, combine them into a single window.
Additionally:
- What other confounding factors are there for images other than illumination, you think?
- Plot the original image along with the
skimage.rgb2gray
version and the HOG representation. See how this works inmatplotlib
. What doesskimage.rgb2gray
actually do? - Try out the model on the entire test image. What do you find out?
A cursory result might be (after varying window sizes):
- Try using sliding windows with a variety of sizes (and aspect ratios). What do you find out?
- Augment the data to expand the training and test datasets (e.g. use a library like
imgaug
to left-right flip, blur, contrast normalize, etc.) and retrain and test. How does the performance change and why is that? - Extra credit: Implement Non-Maximum Suppression in Python to find the single best bounding box of a group of bounding boxes as are found above. Apply this to the test image.
Additional Help
- StackOverflow with
sklearn
,jupyter
- For Custom Vision you can email customvisionteam@microsoft.com.