Working with Digital Data

in Religious Studies

12. Advanced Processing & AI: Process Audio and Video

Summer Semester 2024
Prof. Dr. Nathan Gibson

Outline

  1. Review: Images & Text Recognition
  2. Tutorial: Image Classification

    Break

  3. Processing Audio & Video

Project & Presentation

Sign up in OLAT

5 minutes:

  1. What is the format and source of your data? (Include critical reflection.)
  2. How did you edit, process, filter, or add to it? Why? (Show how you have used at least one of the approaches we discussed in class.)
  3. What is the most interesting question you might answer with your dataset?
  4. What is the most valuable thing you learned in the process?

A few slides or screen-sharing is allowed, but make sure you can keep it to 5 minutes!

Term Papers (Hausarbeit)

Last chance! Please set up a meeting with me if you haven’t already.

  • Mondays 15:00-16:00 (in person, IG 6.552) sign-up
  • Fridays 12:00-13:00 (via Zoom) sign-up

Review: Image processing & text recognition

Last objective: Be able to find an appropriate workflow for processing your images.

Review: Image processing & text recognition

  • Large language models (in-depth)
  • Questions to ask: Purpose, Source, Access, Format, Manual or automatic processing

Review: Image processing tasks

  • classification
  • object recognition
  • text recognition
  • color recognition
  • spatial dimensions

Review: Image processing example

https://recogito.pelagios.org/document/sapfxiswsuxh3b

Tutorial: Image Classification

https://24data.pages.gwdg.de/machine-learning-images-tutorial

Break

3. Processing Audio & Video

Objective: Prepare audio and video for analysis with an appropriate workflow and tools.

3. Processing Audio & Video: Sources & Formats

Where might you get audio and video from? What formats might it be in?

  • digitized/not digitized?
  • length?
  • speakers?
  • quality?
  • encoding?

3. Processing Audio & Video: Analysis Goals

What do you want to do with your audio/video files?

Does it relate to …

  • words/text?
  • music/sound?
  • visual?
  • connection between these?

3. Processing Audio & Video: Target Data & Metadata Formats

What information do you need to generate or tag to do this analysis? What format do the media files ultimately need to be in?

  • logs of files?
  • timestamps of scenes?
  • frame-grabs?
  • encoding?

3. Processing Audio & Video: Tools

Two especially important tools:

  • WhisperAI for transcribing audio
  • FFMPEG: command-line utility for converting and sampling video and audio

3. Processing Audio & Video: Generating Audio & Video

What kind of data and training would you imagine went into generators like these?

Preview

Advanced Processing & AI: Use AI to Label Your Data