recursive.partition.Rd
Partition the input data into clusters with a given minimum number of members per cluster. This turns out to be kind of hard to do because the structure produced by the clustering algorithm isn't conducive to finding the cluster that a too-small cluster was split from. Instead, we find a dendrogram cut that produces all clusters larger than the minimum. Then we recursively partition all of the clusters that are large enough that they could be split in two.
recursive.partition(input.data, cluster.vars, min.members = 5)
input.data | Data frame of data to be clustered. |
---|---|
cluster.vars | Vector of names of variables to use in the clustering. |
min.members | Desired minimum number of members per cluster. |
List of data frames, one data frame for each cluster.
The purpose of this function is to allow us to estimate the observational error in historical food demand observations. By clustering observations that have similar input values (prices, GDP) we can get something approximating repeated measurements of similar situations. Setting a minimum number of members per cluster allows us to have enough measurements per grouping to get a resasonable estimate of the variance.
This function returns a list of data frames, with each list item being a single cluster. This irretrievably scrambles the order of the rows, so if recovering the original order is important, include an ID column.