?Selectiveinference for trees?
of Statistics and Biostatistics
Dorothy Gilford Endowed Chair
in Mathematical Statistics
University of Washington
Abstract: As datasets grow in size, the focusof data collection has increasinglyshifted awayfrom testing pre-specified hypotheses, and towards hypothesisgeneration.Researchers are often interested in performing an exploratorydata analysisto generate hypotheses, and then testing those hypotheses onthe samedata. Unfortunately, this type of ‘double dipping’ can lead tohighly-inflated Type1 errors. In this talk, I will consider double-dipping ontrees.
First, Iwill focus on trees generated via hierarchical clustering, andwillconsider testing the null hypothesis of equality of cluster means. Iwill proposea test for a difference in means between estimated clusters thataccounts forthe cluster estimation process, using a selective inferenceframework.Second, I’ll consider trees generated using the CART procedure,
and will again use selective inference to conduct inference on the means ofthe terminalnodes. Applications include single-cell RNA-sequencing data andthe BoxLunch Study.
This work is the result of collaborations with Lucy Gao, Anna Neufeld, and Jacob Bien.
Bio: Daniela Witten is a professor ofStatistics and Biostatistics at University of Washington, and the DorothyGilford Endowed Chair in Mathematical Statistics. She develops statisticalmachine learning methodsfor high-dimensional data, with a focus on unsupervisedlearning.
Daniela isthe recipient of an NIH Director’s Early Independence Award, a Sloan ResearchFellowship, an NSF CAREER Award, a Simons Investigator Award in MathematicalModeling of Living Systems, a David ByarAward, a Gertrude Cox Scholarship, and an NDSEG Research Fellowship. She isalso the recipient of the Spiegelman Award from the American Public HealthAssociation for a statistician under age 40 who has made outstandingcontributions to statistics for public health, as well as, theLeo BreimanAward for contributions to the field of statistical machine learning. She is aFellow of the American Statistical Association, andan Elected Member of the International Statistical Institute.
Daniela is aco-author (with Gareth James, Trevor Hastie, and Rob Tibshirani) ofthe very popular textbook “Introduction to Statistical Learning”. Shewas a member of the National Academy of Medicine (formerly the Institute ofMedicine) committee that released the report “Evolution of TranslationalOmics”.
Daniela completeda BS in Math and Biology with Honors and Distinction at Stanford University in2005, and a PhD in Statistics at Stanford University in 2010.