Presented April 22, 2019.
Scientists who use data to gain insight for their operational or research interests often need to extract data from web pages or APIs from time to time. While this process can be completed manually, it can take orders of magnitude longer to complete without automation/scripting techniques, especially if a task becomes routine and must be executed on a recurring basis. This webinar will demonstrate working with an API from R to extract information from healthdata.gov. We will also demonstrate scraping static web content using the rvest package, and also how to scrape static content by driving a web browser using RSelenium. Real time demos navigating the websites we scrape will be given, and resources for learning how to navigate a website’s structure (document object model, DOM) using CSS and Xpath will be provided.
Presenter
Spencer George Lourens, Indiana University
April 25, 2019