Connecting to PAD using R/RStudio

Created by Rebecca Sokol-Snyder, Modified on Wed, 6 Mar at 1:21 AM by Alsabana Sahubarali

Many analysts and data practitioners like to explore and manipulate data in R, and you can easily do this with PAD! The package bigrquery lets you easily connect to your PAD project from R.

Installation

You can do this using CRAN or GitHub, depending on your preference. Just run of the following two command snippets using the installation tool of your choice.

Using CRAN:

 install.packages("bigrquery")

Or from GitHub:

 devtools::install_github("r-dbi/bigrquery")

Then, load the installed package to each R session by running

 library("bigrquery")

Authorization

To authorize R to access the data in your PAD Project, use the function:

bq_auth()

We recommend against caching OAuth credentials between R sessions.

NOTE: Make sure your cta-tech.app Google Chrome opened before running this function. If you try to authorize access from another browser or Chrome profile, you will get an error message saying you have made a ‘malformed’ request. If this happens, switch to your cta-tech.app Chrome and try authorization again.

image1.png

A new tab will open asking you to authorize Tidyverse API Packages access to your BigQuery; the answer is yes. 

image2.png

If you have the package httr installed, click ‘Continue’ to authorize access! If you do not have the package httr installed, you’ll be given an authorization code – you will need to copy, paste and submit this into your R Console prompt. 

You are now connected!

Querying your PAD Data

bigrquery offers 3 different ways to interact with your PAD data:

1. Low-level API: these provide thin wrappers over the underlying REST API. All the low-level functions start with bq_, and mostly have the form bq_noun_verb(). This level of abstraction is most appropriate if you’re familiar with the REST API and you want to do something not supported in the higher-level APIs.

billing <- bq_test_project() # replace this with your project ID as a string
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"

tb <- bq_project_query(billing, sql)
bq_table_download(tb, n_max = 10)
#> First chunk includes all requested rows.
#> # A tibble: 10 x 4
#>     year month   day weight_pounds
#>    <int> <int> <int>         <dbl>
#>  1  1969     1    20          7.00
#>  2  1969     1    27          7.69
#>  3  1969     6    19          6.75
#>  4  1969     5    30          6.19
#>  5  1969    11     9          7.87
#>  6  1969     5    25          7.06
#>  7  1969     7    25          7.94
#>  8  1969     9    11          7.06
#>  9  1969     7    13          6.00
#> 10  1969     9    27          8.13

2. Using DBI: The DBI package wraps the low-level API and makes working with BigQuery like working with any other database system. This is the most convenient layer if you want to execute SQL queries in BigQuery or upload smaller amounts (i.e. <100 MB) of data.

library(DBI)
con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = bq_test_project() # replace this with your project ID as a string
)
con
dbListTables(con)
#> [1] "github_nested"   "github_timeline" "gsod"            "natality"      
#> [5] "shakespeare"     "trigrams"        "wikipedia"

dbGetQuery(con, "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`", n = 10)
#> First chunk includes all requested rows.
#> # A tibble: 10 x 4
#>     year month   day weight_pounds
#>    <int> <int> <int>         <dbl>
#>  1  1969     1    20          7.00
#>  2  1969     1    27          7.69
#>  3  1969     6    19          6.75
#>  4  1969     5    30          6.19
#>  5  1969    11     9          7.87
#>  6  1969     5    25          7.06
#>  7  1969     7    25          7.94
#>  8  1969     9    11          7.06
#>  9  1969     7    13          6.00
#> 10  1969     9    27          8.13

3. Using dplyr: The dplyr package lets you treat BigQuery tables as if they are in-memory data frames. This is the most convenient layer if you don’t want to write SQL, but instead want dbplyr to write it for you.

library(dplyr)
con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = bq_test_project() # replace this with your project ID as a string
)
natality <- tbl(con, "natality")

natality %>%
  select(year, month, day, weight_pounds) %>%
  head(10) %>%
  collect()
#> # A tibble: 10 x 4
#>     year month   day weight_pounds
#>    <int> <int> <int>         <dbl>
#>  1  1969    10     6          3.25
#>  2  1969     5    11          5.75
#>  3  1969     6    29          7.94
#>  4  1969     3     7          8.38
#>  5  1970     4    26          6.38
#>  6  1971    10     6          6.69
#>  7  1971     2    23          6.69
#>  8  1971     8    12          7.37
#>  9  1969     9     3          5.25
#> 10  1969     4    25          6.62

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article