Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
But in many cases we cannot observe groups directly. Examples include individuals with
good and poor health, risk-averse and risk-seeking investors, and apples from different
orchards.
Here I show one way of addressing unobserved heterogeneity with finite mixture models.
We have data on individuals' standardized cholesterol levels (chol) and their mean-centered
monthly red wine consumption (wine).
A linear fit line superimposed on the scatterplot of the data suggests there is a positive
relationship between the two.
Let's see what happens when we add to our model a variable recording whether either parent
had high cholesterol (pchol).
Now we find that wine consumption has a slight negative effect on cholesterol and that
parents' cholesterol has a positive effect.
We hypothesize that an individual may inherit a "high cholesterol" gene and that we have two
types of individuals in the populationthose with the cholesterol gene and those without.
However, we do not know who belongs to each group.
We use fmm to probabilistically classify the sample into groups. By adding the fmm prefix
followed by 2, we indicate that we want to fit a model for two underlying subpopulations. We
use the lcprob(pchol) option to include parents' cholesterol history as the predictor of the
unobserved group, also called the latent class.
The output below shows a regression model for each class. Wine consumption has a negative
effect on cholesterol level in each class.
Class : 1
Response : chol
Model : regress
chol
wine -.6850974 .0783981 -8.74 0.000 -.8387549 -.5314399
_cons -.7401758 .0443478 -16.69 0.000 -.8270959 -.6532557
chol
wine -.4798618 .1319125 -3.64 0.000 -.7384056 -.221318
_cons .8343004 .0323813 25.76 0.000 .7708342 .8977667
The table below gives the coefficients for the latent class membership on the logit scale.
2.Class
pchol 7.473592 .8977705 8.32 0.000 5.713994 9.23319
_cons -3.228661 .3939579 -8.20 0.000 -4.000804 -2.456518
We use estat lcprob to display marginal class probabilities on the probability scale.
. estat lcprob
Delta-method
Margin Std. Err. [95% Conf. Interval]
Class
1 .6743291 .0055936 .6632719 .6851956
2 .3256709 .0055936 .3148044 .3367281
About 67% of individuals are expected to be in group 1 and 33% in group 2. But which group
is which? To find out, we use estat lcmean to calculate the marginal means for each class.
The reported mean for class 1 is lower than the reported mean for class 2. Therefore, class 1
corresponds to individuals with no cholesterol gene, and class 2 to those with the gene.
. estat lcmean
Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]
1
chol -.5123399 .024033 -21.32 0.000 -.5594438 -.465236
2
chol .9938833 .0601744 16.52 0.000 .8759435 1.111823
We can even classify the individuals in our dataset into the two groups by predicting
posterior latent-class probabilities and using a cutpoint (we will use 0.5) to assign each
observation to a group.
. summarize c2
. tabulate lgroup
lgroup Freq. Percent Cum.
From the tabulation of predicted group membership, we see that our classification is close to
the output produced by estat lcprob above.
Finally, we draw a scatterplot with the linear fit line for each group.
Stata 15's new menl command can handle everything described above and much more. In my
recent blog post, I walk you through a few examples and demonstrate how to use Stata to fit
nonlinear models to multilevel data, including repeated-measures data or panel data.
Houssein Assaad
Senior Statistician and Software Developer