This task view contains information about to use R and the world wide web together. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. This task view focuses on packages for obtaining web- based data and information, frameworks for building web-based R applications, and online services that can be accessed from R. A list of available packages and functions is presented below, grouped by the type of activity. The rOpenSci task view: Open Data provides further discussion of online data sources that can be accessed from R.

Thanks to all contributors to this task view, especially to Scott Chamberlain, Thomas Leeper, Patrick Mair, Karthik Ram, and Christopher Gandrud who maintained this task view up to 2021.

There are three main packages that should cover most use cases of interacting with the web from R. crul is an R6-based HTTP client that provides asynchronous HTTP requests, a pagination helper, HTTP mocking via webmockr, and request caching for unit tests via vcr. crul targets R developers more so than end users. httr provides more of a user facing client for HTTP requests and differentiates from the former package in that it provides support for OAuth. Note that you can pass in additional curl options when you instantiate R6 classes in crul, and the config parameter in httr. curl is a lower-level package that provides a closer interface between R and the libcurl C library, but is less user-friendly. curl underlies both crul and httr. curl may be useful for operations on web-based XML or to perform FTP operations (as crul and httr are focused primarily on HTTP). curl::curl() is an SSL-compatible replacement for base R’s url() and has support for http 2.0, SSL (https, ftps), gzip, deflate and more. For websites serving insecure HTTP (i.e. using the “http” not “https” prefix), most R functions can extract data directly, including read.table and read.csv; this also applies to functions in add-on packages such as jsonlite::fromJSON() and XML::parseXML. For more specific situations, the following resources may be useful:

The vast majority of web-based data is structured as plain text, HTML, XML, or JSON (javascript object notation). Web service APIs increasingly rely on JSON, but XML is still prevalent in many applications. There are several packages for specifically working with these format. These functions can be used to interact directly with insecure web pages or can be used to parse locally stored or in- memory web files.

