September 15, 2020

12:00 pm / 1:15 pm

Venue

ZOOM

Seminar Recording:

https://wse.zoom.us/rec/share/6GQvhM-bTXJcLk4thyLNdPo0LT9abRmSkXuEUyubcXs5LdGMxDt4ai6XQatLRjA.W3WipnhIIxIyUujp?startTime=1600184734000    

Topic:MINDS & CIS Seminar – Giles Hooker
Time: Sep 15, 2020 12:00 PM Eastern Time (US and Canada)

Join Zoom Meeting
https://wse.zoom.us/j/99121845815?pwd=bWViaXZSNzUvK2lBaGJ5TUljcllsZz09

Meeting ID: 991 2184 5815
Passcode: clark_hall
One tap mobile
+13017158592,,99121845815# US (Germantown)
+16465588656,,99121845815# US (New York)

Dial by your location
        +1 301 715 8592 US (Germantown)
        +1 646 558 8656 US (New York)
        +1 312 626 6799 US (Chicago)
        +1 346 2487799 US (Houston)
        +1 669 900 6833 US (San Jose)
        +1 253 215 8782 US (Tacoma)
Meeting ID: 991 2184 5815
Find your local number: https://wse.zoom.us/u/aeHH82dh6i

Giles Hooker, PhD
Associate Professor
Cornell University
Department of Statistics and Data Science
Department of Computational Biology

?Ensembles of Trees and CLT’s: Inference andMachine Learning?

Abstract: This talk develops methods of statistical inference based around ensembles of decision trees: bagging, random forests, and boosting. Recent results have shown that when the bootstrap procedure in bagging methods is replaced by sub-sampling, predictions from these methods can be analyzed using the theory of U-statistics which have a limiting normal distribution. Moreover, the limiting variance that can be estimated within the sub-sampling structure.
Using this result, we can compare the predictions made by a model learned with a feature of interest, to those made by a model learned without it and ask whether the differencesbetween these could have arisen by chance. By evaluating the model at a structured set of points we can also ask whether it differs significantly from an additive model. We demonstrate these results in an application to citizen-science data collected by Cornell’s Laboratory of Ornithology.
We will examine recent developments that extend distributional results to boosting-type estimators. Boosting allows trees to be incorporated into more structured regression such as additive or varying coefficient models and often outperforms bagging by reducing bias.

Bio: Giles Hooker is AssociateProfessor of Statistics and Data Science at Cornell University. His work has focused on statistical methods using dynamical systems models, inference with machine learning models, functional data analysis and robust statistics. He is the author of “Dynamic Data Analysis: Modeling Data with Differential Equations” and “Functional Data Analysis in R and Matlab”. Much of his work has been inspired by collaborations particularly in ecology and citizen science data.