# Image Classification with Machine Learning

The following is based on the tutorials Daniel van Strien et al., “Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification,” Programming Historian, August 17, 2022, [Part 1](https://programminghistorian.org/en/lessons/computer-vision-deep-learning-pt1) and [Part 2](https://programminghistorian.org/en/lessons/computer-vision-deep-learning-pt2), as well as the [Google Colab notebook created for these tutorials](https://colab.research.google.com/github/programminghistorian/jekyll/blob/gh-pages/assets/computer-vision-deep-learning-pt1/computer-vision-deep-learning-pt1-2.ipynb). You can read about the process of training the machine learning model in much more detail in these tutorials.

The goal of this machine-learning model is to be able to categorize scans from historical newspapers as having text only or also illustrations.

The data used for this comes from the dataset "[Newspaper Navigator](https://perma.cc/8U7H-9NUS)," which adds labels to the Library of Congress’ [Chronicling America collection](https://perma.cc/P98H-P3WS). This subset of 752 images is specifically advertisements in these newspapers from 1880–1885. In this data, the scans are labeled as "text-only" or "illustrations" to indicate whether they contain illustrations.

## 1. Download data

The first step is to download the data you will need for training the model. The following cell uses the command line to create folders (the `mkdir` command), download materials (with `wget`), and uncompress the zip files (`unzip`).

If you open the folder icon to the left in Google Colab, you will see these files (but the download will take a while).

In [None]:
# @title Download data
%%capture
!mkdir ads_data/ -p
!wget https://zenodo.org/record/5838410/files/ads_upsampled.csv?download=1 -O ads_data/ads_upsampled.csv
!mkdir ads_data/images/ -p
!wget -O images.zip https://zenodo.org/record/5838410/files/images.zip?download=1
!unzip images.zip -d ads_data/images/
!mkdir trial-images/ -p
!wget -O trial-images/biscuits.jpg https://c8.alamy.com/comp/D42A8B/original-1920s-vintage-print-advertisement-from-english-country-gentlemans-D42A8B.jpg
!wget -O trial-images/coca-cola.jpg https://i.pinimg.com/736x/17/9c/33/179c33794b084ff441431f2ad936f19e--coca-cola-vintage-vintage-signs.jpg

## 2. Install fastai

Next, you'll install one of the python libraries that allows you to work with machine learning models.

In [None]:
!pip install fastai --upgrade

## 3. Create an image classifier in fastai

First, bring in the fastai modules having to do with computer vision.

In [None]:
from fastai.vision.all import *

Next, to be able to use some graphs and visualizations in python, we need `matplotlib`. (Don't worry if you get an error about "seaborn" being deprecated.)

In [None]:
# @title Add `matplotlib`
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

### Load the Data


The data we'll use to train the model consists of images (in the `ads_data/images` folder) and a spreadsheet (`ads_data/ads_upsampled.csv`).

The spreadsheet has 3 columns, for example:

| id | filename | label |
| - | - | - |
| 0 | pst_fenske_ver02_data_sn84026497_00280776129_1880042101_0834_002_6_96.jpg | text-only |

Here, the JPG image has been labeled as "text-only".

In the code below, we load the data into a variable (`ad_data`). The command to load the data from the spreadsheet (`ImageDataLoaders.from_csv`) has several parameters (see the comments on each). Note that in the `item_tfms` parameter, the images are resized so that they are all the same size. This will help make sure that the data is uniform and the training doesn't get thrown off by irrelevant factors.

In [None]:
ad_data = ImageDataLoaders.from_csv(
    path="ads_data/",  # root path to csv file and image directory
    csv_fname="ads_upsampled.csv",  # the name of our csv file
    folder="images/",  # the folder where our images are stored
    fn_col="file",  # the file column in our csv
    label_col="label",  # the label column in our csv
    item_tfms=Resize(224, ResizeMethod.Squish),  # resize imagesby squishing so they are 224x224 pixels
    seed=42,  # set a fixed seed to make results more reproducible
)

Let's test to make sure our data has loaded correctly by using the fastai `show_batch` command on our new variable `ad_data`.

In [None]:
ad_data.show_batch()

### Create the Model

Now we make a variable (`learn`) that holds a pre-trained model. "Pre-trained" means someone has already trained this model on _other_ data (which might be similar), so we have a starting point (instead of starting from zero).

The crucial command here is `vision_learner`, which takes the data we just loaded (`ad_data`), the pre-trained model (we've chosen to start with `resnet18`), and also that we want to track the accuracy of this model as it trains.

In [None]:
learn = vision_learner(
    ad_data,  # the data the model will be trained on
    resnet18,  # the type of model we want to use
    metrics=accuracy,  # the metrics to track
)

### Train the Model

We could now do some detailed training to adapt the model to our needs. (If you're interested in that, check out [Part 2 of the Programming Historian tutorial](https://programminghistorian.org/en/lessons/computer-vision-deep-learning-pt2).) But, to keep this simpler, we'll use only the `fine_tune` method, which trains the model at a shallower level.

The `5` here represents the number of "epochs." Epochs are the stages of training, or, more precisely, the cycles of adjusting the model repeatedly to be more accurate. The model attempts to make certain "neuron" connections, then tests how good its results were, then adjusts the connections and tests again, and so on. The number of epochs is often something you want to play with: a higher number of epochs will make the training take longer and will sometimes be more accurate, but not always.

This will take a long time ...

In [None]:
learn.fine_tune(5)

What you see here is the accuracy of the model at each epoch (e.g., `0.94333` means 94.333% accuracy) and the "loss," that is, which mistakes it detected.

## 4. Try out your model

Now that you've trained the model, it's time to see whether it works for your needs. Earlier, we already downloaded a couple old newspaper advertisements to the `trial-image` folder. The first one (biscuits) has only words, whereas the second (coca-cola) has an illustration.

We'll use the `PILImage.create` method to load the file into the variable `img`. Then we'll take the model we trained (`learn`) and use the `predict` method with our `img` variable.

In [None]:
img = PILImage.create('trial-images/biscuits.jpg')
learn.predict(img)[0]

Did you get the correct result "text-only"?

Now let's try it with the Coca-Cola ad.

In [None]:
img = PILImage.create('trial-images/coca-cola.jpg')
learn.predict(img)[0]

Now you can try it with your own image. Find an old newspaper ad (for example, crop one from an image at <https://chroniclingamerica.loc.gov/>), and upload it to the trial-images folder. You can do this by clicking on the folder icon on the right, then clicking on the `...` beside `trial-images`, then `Upload`. Change `your-image.jpg` to the name of your image in the code below.

In [None]:
img = PILImage.create('trial-images/your-image.jpg')
learn.predict(img)[0]

### You did it!

Of course, there are ways to run the model on lots of images at once, but we won't go into that. For now, you can see that the model was trained and can (successfully?) identify newspaper ads as containing only text or also illustrations. You could further train it to categorize images as having animals, humans, landscapes, etc. For this, see [Part 2 of the Programming Historian tutorial](https://programminghistorian.org/en/lessons/computer-vision-deep-learning-pt2).