Datasets
Text Corpora
- A guide to text copora for linguistics work: https://guides.lib.umich.edu/c.php?g=282869&p=1884909. Some of the paid content can be accessed via https://dbis-ur-de.proxy.ub.uni-frankfurt.de.
- Arabic: https://github.com/OpenITI
- You can create your own text corpus from scans of books, manuscripts, etc.
Survey Data
- Pew Research: https://www.pewresearch.org/datasets/
- Or collect your own survey data in a spreadsheet
Social Media
- Check if the platform has an API you can access
- You can write your own scraper or use one at https://apify.com/
Art
- Google Arts & Culture: https://artsandculture.google.com/. See here for downloading metadata and images.
- National Gallery: https://www.nga.gov/open-access-images/open-data.html
- Met Museum: https://www.metmuseum.org/about-the-met/policies-and-documents/open-access#get-started-header
- Smithsonian: https://www.si.edu/OpenAccess
- See the list of data sources at National Endowment for the Arts: https://www.arts.gov/grants/research-awards/publicly-available-data-sources
Geographical
- World Historical Gazetteer: https://whgazetteer.org/
Images
- from social media (see above)
- from art databases (see above)
- digitized library archives