The paper presents the application of principal component analysis and cluster analysis to historical individual level census data in order to explore social and economic variations and patterns in household structure across mid-Victorian England and Wales. Principal component analysis is used in order to identify and eliminate unimportant attributes within the data and the aggregation of the remaining attributes. By combining Kaiser’s rule and the Broken-stick model, four principal components are selected for subsequent data modelling. Cluster analysis is used in order to identify associations and structure within the data. A hierarchy of cluster structures is constructed with two, three, four and five clusters in 21-dimensional data space. The main differences between clusters are described in this paper.
Schürer, K. & Penkova, T. (2015). Creating a typology of parishes in England and Wales: Mining 1881 census data. Historical Life Course Studies, 2, 38-57. http://hdl.handle.net/10622/23526343-2015-0004?locatt=view:master