data sources
- AWS Public Dataset Program.
- Quandl. Open and free time-series financial datasets.
- DrivenData. Data science competitions for social good.
- Kaggle Datasets. They’ve put together a clean portal for open datasets, with community vetting and good search functionality.
- General Social Survey from NORC. American Community Survey from the US census.
- Pothole images. With paper here.
- Million Song Dataset. Wasn’t someone interested in music analysis?
- Nature Scientific Data. An open-access, peer-reviewed journal publishing descriptions of datasets.
- Detroit Open Data: You can find such portals for a bunch of medium-to-large cities in the US. For example, see this Forbes article.
- StatLine. Netherlands Central Bureau of Statistics.
- UCI Machine Learning Repository. Datasets to test machine learning algorithms.
- The Harvard Dataverse. Scientific data for reproducible research.
- R datasets package. A bunch of built-in datasets for building examples in R.
- The Collection of Really Great, Interesting, Situated (CORGIS) Datasets. Just what it sounds like.
- data.gov: “The home of the U.S. Government’s open data.” Includes many, many datasets from both federal and non-federal sources.