Datasets

Text Corpora

A guide to text copora for linguistics work: https://guides.lib.umich.edu/c.php?g=282869&p=1884909. Some of the paid content can be accessed via https://dbis-ur-de.proxy.ub.uni-frankfurt.de.
Arabic: https://github.com/OpenITI
You can create your own text corpus from scans of books, manuscripts, etc.

Survey Data

Pew Research: https://www.pewresearch.org/datasets/
Or collect your own survey data in a spreadsheet

Check if the platform has an API you can access
You can write your own scraper or use one at https://apify.com/

Art

Google Arts & Culture: https://artsandculture.google.com/. See here for downloading metadata and images.
National Gallery: https://www.nga.gov/open-access-images/open-data.html
Met Museum: https://www.metmuseum.org/about-the-met/policies-and-documents/open-access#get-started-header
Smithsonian: https://www.si.edu/OpenAccess
See the list of data sources at National Endowment for the Arts: https://www.arts.gov/grants/research-awards/publicly-available-data-sources

Geographical

World Historical Gazetteer: https://whgazetteer.org/

Images

from social media (see above)
from art databases (see above)
digitized library archives
- e.g., https://cudl.lib.cam.ac.uk/view/PH-WESTMINSTER-WGL-00004/1