NY R Conference
Get ready to celebrate the 10th anniversary of New York R Conference!
We're taking a trip down memory lane and looking back over the past nine years. Come listen to some of the all-time greats who will be gracing our stage once again, and we're also adding some fresh and exciting new voices to the mix!
Workshops: May 15, 2024 | Location: TBA
Conference: May 16-17, 2024 | Location: FIAF Manhattan
Speakers
Andrew Gelman
Professor
Department of Statistics and Department of Political Science, Columbia University
Talk: It’s About Time
Abigail Haddad
Lead Data Scientist
Capital Technology Group
Talk: Automating Tests for your RAG Chatbot or Other Generative Tool
Wes McKinney
Principal Architect
Posit
Talk: The Future Roadmap for the Composable Data Stack
Jared P. Lander
Chief Data Scientist
Lander Analytics
Talk: 15 Years of Data Science in NYC
Emily Zabor
Associate Staff Biostatistician
Cleveland Clinic, Department of Quantitative Health Sciences
Talk: Reporting Survival Analysis Results with the gtsummary and ggsurvfit Packages
Chang She
CEO & Cofounder
LanceDB
Talk: Lance: Towards a New Columnar Standard for Multimodal AI
Walker Harrison
Analyst
New York Yankees
Talk: Kick or Receive? Determining Optimal NFL Playoff OT Strategy via Simulation
David Robinson
Director of Data Science
Contentsquare
Talk: The Science of Product Development: Bringing Causal Inference to Conversion and Retention Metrics
Jon Harmon
R4DS Online Learning Community
Executive Director
Talk: I Built a Robot to Write This Talk
Drew Conway
Two Sigma
Head of Data Science
Talk: Retrospective - Special Guest
Soumya Kalra
Early Warning Services
Senior Director Data Product Management
Talk: Retrospective - Special Guest
More speakers coming soon…
Workshops
Machine Learning in R
Hosted by Max Kuhn
Wednesday, May 15 | 9:00am - 5:00pm
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling. You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and featur...
...re engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data. Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository. (In-Person & Virtual)Causal Inference in R
Hosted by Malcolm Barrett & Lucy D'Agostino McGowan
Wednesday, May 15 | 9:00am - 5:00pm
In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elem...
...ments of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know--the tidyverse, regression models, and more--to answer the questions that are important to your work. This course is for you if you: -Know how to fit a linear regression model in R -Have a basic understanding of data manipulation and visualization using tidyverse tools -Are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships (In-Person & Virtual)Exploratory Data Analysis with the Tidyverse
Hosted by David Robinson
Wednesday, May 15 | 9:00am - 5:00pm
The tidyverse is a powerful collection of packages following a standard set of principles for usability. During this workshop David will demonstrate an exploratory data analysis in R using tidy tools. He will demonstrate the use of tools such as dplyr and ggplot2 for data transformation and visualization, as well as other packages from the tidyverse as they're needed. He'll narrate his thought pro...
...ocess as attendees follow along and offer their own solutions. The workshop expects some familiarity with dplyr and ggplot2—enough to work with data using functions like mutate, group_by, and summarize and to create graphs like scatterplots or bar plots in ggplot2. These concepts will be re-introduced to ensure a smooth workshop, but it isn't designed for brand new R programmers. The workshop is designed to be interactive and participants are expected to type along on their own keyboards. (In-Person & Virtual)More workshops coming soon…
Agenda
Wednesday, May 15
-
08:00 AM - 09:00 AM
Registration & Breakfast
-
09:00 AM - 05:00 PM
Workshop: Max Kuhn Scientist @ Posit
Machine Learning in R ...
Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling. You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data. Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository. (In-Person & Virtual) -
09:00 AM - 05:00 PM
Workshop: Malcolm Barrett & Lucy D'Agostino McGowan
Causal Inference in R ...
In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. In both data science and academic research, prediction modeling is often not enough; to answer many questions, we need to approach them causally. In this workshop, we’ll teach the essential elements of answering causal questions in R through causal diagrams, and causal modeling techniques such as propensity scores and inverse probability weighting. We’ll also show that by distinguishing predictive models from causal models, we can better take advantage of both tools. You’ll be able to use the tools you already know--the tidyverse, regression models, and more--to answer the questions that are important to your work. This course is for you if you: -Know how to fit a linear regression model in R -Have a basic understanding of data manipulation and visualization using tidyverse tools -Are interested in understanding the fundamentals behind how to move from estimating correlations to causal relationships (In-Person & Virtual) -
09:00 AM - 05:00 PM
Workshop: David Robinson Director of Data Science @ Heap
Exploratory Data Analysis with the Tidyverse ...
The tidyverse is a powerful collection of packages following a standard set of principles for usability. During this workshop David will demonstrate an exploratory data analysis in R using tidy tools. He will demonstrate the use of tools such as dplyr and ggplot2 for data transformation and visualization, as well as other packages from the tidyverse as they're needed. He'll narrate his thought process as attendees follow along and offer their own solutions. The workshop expects some familiarity with dplyr and ggplot2—enough to work with data using functions like mutate, group_by, and summarize and to create graphs like scatterplots or bar plots in ggplot2. These concepts will be re-introduced to ensure a smooth workshop, but it isn't designed for brand new R programmers. The workshop is designed to be interactive and participants are expected to type along on their own keyboards. (In-Person & Virtual)
Thursday, May 16
-
08:00 AM - 08:50 AM
Registration & Breakfast
-
08:50 AM - 09:00 AM
Opening Remarks
-
09:00 AM - 09:20 AM
TBD
-
09:25 AM - 09:45 AM
TBD
-
09:50 AM - 10:10 AM
Chang She CEO & Cofounder @ LanceDB
Lance: Towards a New Columnar Standard for Multimodal AI
-
10:10 AM - 10:40 AM
Break
-
10:40 AM - 11:00 AM
Emily Zabor Associate Staff Biostatistician @ Cleveland Clinic, Department of Quantitative Health Sciences
Reporting Survival Analysis Results with the gtsummary and ggsurvfit Packages ...
Survival analysis is an essential tool to handle censored time-dependent endpoints such as overall survival, which are common across a variety of biomedical and other applications. The survival package in R provides the most essential tools to conduct a survival analysis, including estimating survival probabilities, fitting Cox proportional hazards models, and plotting Kaplan-Meier curves. While the functions are powerful, user-friendly, and well documented, getting publication-ready tables and figures can still be a challenge. In this talk, I will review the basics of survival analysis, and will demonstrate how to take results from the console to the manuscript using the gtsummary and ggsurvfit packages. -
11:05 AM - 11:25 AM
Jared P. Lander Chief Data Scientist @ Lander Analytics
15 Years of Data Science in NYC
-
11:30 AM - 11:50 AM
TBD
-
11:50 AM - 01:00 PM
Lunch
-
01:00 PM - 01:20 PM
TBD
-
01:25 PM - 02:05 PM
Andrew Gelman Professor @ Department of Statistics and Department of Political Science, Columbia University
It’s About Time ...
Statistical processes occur in time, but this is often not accounted for in the methods we use and the models we fit. Examples include imbalance in causal inference, generalization from A/B tests even when there is balance, sequential analysis, adjustment for pre-treatment measurements, poll aggregation, spatial and network models, chess ratings, sports analytics, and the replication crisis in science. The point of this talk is to motivate you to include time as a factor in your statistical analyses. This may change how you think about many applied problems! -
02:05 PM - 02:35 PM
Break
-
02:35 PM - 02:55 PM
Jon Harmon R4DS Online Learning Community @ Executive Director
I Built a Robot to Write This Talk
-
03:00 PM - 03:20 PM
David Robinson Director of Data Science @ Contentsquare
The Science of Product Development: Bringing Causal Inference to Conversion and Retention Metrics
-
03:25 PM - 03:45 PM
TBD
-
03:45 PM - 04:15 PM
Break
-
04:15 PM - 04:35 PM
Abigail Haddad Lead Data Scientist @ Capital Technology Group
Automating Tests for your RAG Chatbot or Other Generative Tool ...
Building a Retrieval Augmented Generation (RAG) chatbot that answers questions about a specific set of documents is straightforward. But how do you tell if it's working? Automated evaluation of generative tools for specific use cases is tricky, but it's also important if you want to easily compare performance using different underlying LLMs, system prompts, temperatures, or other parameters -- or just make sure you're not breaking something when you push your code. In this talk, I'll discuss why this kind of evaluation is challenging and review a few options for the kinds of assessments you can create, including using an LLM to evaluate your LLM-based tool. We'll then look at several ways to write automated LLM-led evaluations, including with a library that allows you to easily and with very little coding create complex grading rubrics for your tests. -
04:40 PM - 05:00 PM
TBD
-
05:00 PM - 05:10 PM
Closing Remarks
-
05:10 PM - 06:30 PM
Happy Hour
Friday, May 17
-
09:00 AM - 09:50 AM
Registration & Breakfast
-
09:50 AM - 10:00 AM
Opening Remarks
-
10:00 AM - 10:20 AM
TBD
-
10:25 AM - 10:45 AM
Zhangjun Zhou Lead Data Scientist @ Macy's
Personalized Customer Journey at Macy's
-
10:45 AM - 11:15 AM
Break
-
11:15 AM - 11:35 AM
TBD
-
11:40 AM - 12:20 PM
Hadley Wickham Chief Scientist @ Posit
-
12:20 PM - 01:30 PM
Lunch
-
01:30 PM - 01:50 PM
Wes McKinney Principal Architect @ Posit
The Future Roadmap for the Composable Data Stack
-
01:55 PM - 02:15 PM
Max Kuhn Scientist @ Posit
SHINYLIVE IS SO EASY
-
02:20 PM - 02:40 PM
TBD
-
02:40 PM - 03:10 PM
Break
-
03:10 PM - 04:10 PM
Retrospective Panel
Join us for a captivating retrospective panel as we celebrate a decade of the New York R Conference, 15 years of the New York Open Statistical Programming Meetup, and the vibrant journey of the Data Science community. Dive into the highlights, memories, and collective achievements that have shaped our community's remarkable evolution. Don't miss this nostalgic journey reflecting on the past and embracing the exciting future of data science! ...
Hosted by Jon Krohn, this retrospective panel includes special guests Drew Conway, Soumya Karla and Jared Lander. -
04:10 PM - 04:20 PM
Closing Remarks
Sponsors