Working with Digital Data

in Religious Studies

14. Conclusion: Present Your Dataset

Summer Semester 2024
Prof. Dr. Nathan Gibson

Outline

  1. Review: AI Pipeline
  2. 5 Presentations –Break
  3. Course Overview
  4. Final Presentation & Conclusions

Announcements

Scheine (Leistungsnachweis/Teilnahmenachweis) can be dropped off with Eva Kramberger (IG 1.551, Kramberger [at] em…)

📈 Review: AI Pipeline

Flawed Data

"Flawed Data", xkcd, CC BY-NC 2.5

📈 Review: AI Pipeline

Flawed Data

"Flawed Data", xkcd, CC BY-NC 2.5

📈 Review: AI Pipeline

Flawed Data

"Flawed Data", xkcd, CC BY-NC 2.5

📈 Review: AI Pipeline

Flawed Data

"Flawed Data", xkcd, CC BY-NC 2.5

📈 Review: AI Pipeline

🧭 Objective: Plan the process of integrating machine-learning tools into your data preparation.

📈 Review: AI Pipeline

AI Data prep

📈 Review: AI Pipeline

(see whiteboard)

Presentations

5 minutes:

  1. Data format & source
  2. Editing, processing, filtering, adding
  3. Question you could answer
  4. Most valuable learning

Break

Course Overview: Goal

By the end of the course, students should be able to create and curate datasets relevant to religious studies in formats that can be used for further analysis while reflecting critically on this process.

Course Overview: Goal

cacao tree cacao pod utensils chocolate bar
Pick Prepare Process Package
Collecting sources Structuring data Outputs Presentation
Manuscripts, Photos, Interviews Transcribing, Collating Textual comparison, criticism, content analysis, coding Edition, Narrative, Thematic discussion, Interactive website

Course Overview: Knowledge

You’ve learned what the following are:

  • text, image, audio, and video formats
  • git versioning system
  • databases
  • Python
  • URLs, domains, queries, and APIs
  • metadata
  • FAIR principles for data
  • AI/machine learning, large language models (LLMs), ground truth, neural networks

Course Overview: Skills

You’ve learned how to …

  • do complex find/replace operations with regular expressions (RegEx)
  • preserve your data
  • track changes with git
  • use spreadsheet formulas
  • structure your data into entities, attributes, and relationships

Course Overview: Skills

You’ve learned how to …

  • clean, filter, and match your data
  • download/scrape websites using API queries, wget, and Python
  • add metadata to your data
  • use machine learning for specific data tasks
  • transcribe text from audio/video

Course Overview: Reflection

You’ve thought critically about …

  • the “computational turn” as one of the “turns” in humanities research
  • the difference between data and the way it is formatted or presented
  • which data might become inaccessible
  • systematic errors in data that might distort analysis

Course Overview: Reflection

You’ve thought critically about …

  • guidelines for consistent data entry
  • open standards for data formats
  • legal and ethical aspects of downloading and publishing data
  • whether and how your data should be open
  • how input data affects the results of machine learning
  • what aspects of multimedia data might be relevant for analysis

Course Overview: Relevance

What does all this have to do with studying religion?

Evaluation discussion

Further recommendations

Final Presentation & Conclusions