Partition the input data into clusters with a given minimum number of members per cluster. This turns out to be kind of hard to do because the structure produced by the clustering algorithm isn't conducive to finding the cluster that a too-small cluster was split from. Instead, we find a dendrogram cut that produces all clusters larger than the minimum. Then we recursively partition all of the clusters that are large enough that they could be split in two.

recursive.partition(input.data, cluster.vars, min.members = 5)

Arguments

input.data

Data frame of data to be clustered.

cluster.vars

Vector of names of variables to use in the clustering.

min.members

Desired minimum number of members per cluster.

Value

List of data frames, one data frame for each cluster.

Details

The purpose of this function is to allow us to estimate the observational error in historical food demand observations. By clustering observations that have similar input values (prices, GDP) we can get something approximating repeated measurements of similar situations. Setting a minimum number of members per cluster allows us to have enough measurements per grouping to get a resasonable estimate of the variance.

This function returns a list of data frames, with each list item being a single cluster. This irretrievably scrambles the order of the rows, so if recovering the original order is important, include an ID column.