Working with Digital Data

in Religious Studies

10. Advanced Processing & AI: Get a Grip on Big Data

Summer Semester 2024
Prof. Dr. Nathan Gibson

Outline

  1. Review: Metadata & FAIR Data
  2. Principles of AI/Machine Learning

    Break

  3. Playtime!
  4. Critical reflection

Project & Presentation

Sign up in OLAT

5 minutes:

  1. What is the format and source of your data? (Include critical reflection.)
  2. How did you edit, process, filter, or add to it? Why? (Show how you have used at least one of the approaches we discussed in class.)
  3. What is the most interesting question you might answer with your dataset?
  4. What is the most valuable thing you learned in the process?

A few slides or screen-sharing is allowed, but make sure you can keep it to 5 minutes!

Exams & Term Papers

Sign up by next Friday!

Term paper: Set up a meeting with me if you haven’t already.

  • Mondays 15:00-16:00 (in person, IG 6.552) sign-up
  • Fridays 12:00-13:00 (via Zoom) sign-up

1. 📈 Metadata & FAIR Data Review: Learning Objective

Assess whether and how to make your data more open.

1. 📈 Metadata & FAIR Data Review: Metadata

Metadata: data about your data

  • research metadata: creators, process, guidelines, version information
  • decisions about metadata should reflect data format and research purpose
  • metadata lets people find, connect to, and keep track of your data

1. 📈 Metadata & FAIR Data Review: FAIR Principles

F

A

I

R

1. 📈 Metadata & FAIR Data Review: Findable

unique identifiers, metadata, in a searchable resource

FAIR Principles

1. 📈 Metadata & FAIR Data Review: Accessible

data can be accessed using a standard system on the basis of identifiers

FAIR Principles

1. 📈 Metadata & FAIR Data Review: Interoperable

data is in a format that can be used by common systems and is linked to other datasets

FAIR Principles

1. 📈 Metadata & FAIR Data Review: Re-usable

data is licensed for re-use, source is known, meets community standards

FAIR Principles

1. 📈 Metadata & FAIR Data Review: Should you make your data FAIR?

🧭 Today’s Learning Objective

Critically explore the relationships between the inputs and outputs of machine-learning models (artificial intelligence).

2. Principles of AI/Machine Learning: Big Data

Big Data: Data that defies “traditional methods” of processing or analysis because of its large scale.

2. Principles of AI/Machine Learning: Big Data

Examples:

  • The Facebook Graph: ca. 3 billion users (and the relationships between them!)
  • GPT-3 training data: 500 billion words

2. Principles of AI/Machine Learning: Big Data

Humanities examples:

  • Thousands of handwritten manuscripts (too many to transcribe and collate manually)
  • The Louvre art collection (too many for traditional art criticism)
  • Social media posts about “pagans” (too many languages and search terms)

2. Principles of AI/Machine Learning: Artificial Intelligence

Artificial Intelligence (AI): a vague term used for

  • science-fiction computers that take over the world (HAL 9000, etc.)
  • chat bots: things that “communicate” in a human-like way
  • generative AI: bots that create text, images, music, software, etc.
  • machine-learning algorithms and models

2. Principles of AI/Machine Learning: Machine Learning

Machine Learning: A process of using data to train software to recognize or predict patterns in new data

Machine Learning Process

2. Principles of AI/Machine Learning: Machine Learning

What would happen if … ?

2. Principles of AI/Machine Learning: Machine Learning

Ground truth: Correctly labeled data used for training and testing

Neural networks use a process that turns nodes on or off based on many different inputs, and then goes back and refines the “weight” of these inputs.

Large language models predict the next word(s) after having been trained on a very large dataset.

Break

3. Playtime!

4. Critical reflection

Preview