Seeing the forest for the trees: The potential of random forest modelling in exploring predictors.
A talk by Michael Barthelmäs
Oplysninger om arrangementet
Tidspunkt
Sted
Building 1350, 6th floor
Abstract:
Random forest modeling is a popular approach in supervised machine learning because it allows large datasets to be conveniently explored. In contrast to classical regression approaches, the assumptions on the variables involved are less strict, correlated variables are less problematic, and higher-order interactions can be identified efficiently. In this talk I will provide a short introduction on random forest modeling and illustrate how we used the algorithm to predict crying frequency in a large representative Dutch dataset (N > 5000; more than 130 predictors). Although this form of emotional expression is in principle available to all people, the frequency with which people cry varies greatly from person to person. We tested a wide range of variables that might be associated with crying frequency. Using random forest modeling, we identified the most relevant predictors of crying frequency among personal (e.g., personality traits), situational (e.g., life stressors), and demographic (e.g., age) variables. Approximately 30% of the variance in crying frequency could be explained, with gender, trait empathy, and mental health being the most important predictors. In particular, we found robust higher-order interaction effects, underscoring the general proposition that complex interaction effects are a much better representation of reality (for emotional crying) than simple main effects. In summary, this project gives new impetus to the effort to unravel the mystery of human emotional crying and illustrates the benefits of data mining with random forest modeling.
Keywords: random forest, emotional crying, data mining
Michael Barthelmäs is from Ulm University in Germany.