How to: Load Google Sheets or CSVs into BigQuery

Modified on Mon, 24 Feb at 12:28 PM

TABLE OF CONTENTS

Loading Your Data as an External Data Source
Loading Your Data Directly to PAD as a Native Table
Loading Large CSVs as BigQuery Tables
Handling Schema Errors When Uploading

This support article outlines the steps to two different methods for loading CSVs and Google Sheets into BigQuery. Once imported, you can view and query your uploaded data directly from the BigQuery console. Note that only users with Admin, Editor or Contributor permissions can create datasets and tables in PAD. If you aren't sure of your permission level, you can ask your organization's PAD administrator or go to groups.google.com to see which groups you are a member of.

If you are looking for steps on how to move a number of flat files on a regular/automated basis, check out our Medium article showing how to utilize Google Workflows to move files from Drive to BigQuery here!

Loading Your Data as an External Data Source

By loading your data to PAD as an external data source, BigQuery will ensure the resulting table always reflects the data in the spreadsheet. So, for example, when a row is updated in the sheet, it also updates in the BigQuery table. This can be useful for getting data from organizers’ spreadsheets and forms into tables for analysts to use.

Since your data is not being stored in BQ, there is no query cost associated with using this data. However, it can be slower than querying a native table since PAD first accesses the entire table before running the SQL query. In limited occasions, a query pulling from a Google Sheets source may result in a Resource error due to service overload. In these cases, the service overload is caused by resource utilization in the Google Sheet source; simplifying the Sheet by minimizing use of formulas is the best solution to resolve this issue.

Follow these steps to load your data correctly:

Share the Google Sheet with your "@cta-tech.app" drive account.
From within your BigQuery Project, right-click on the three dots and select the “Create dataset”:

Screenshot_2023-02-09_at_6.32.14_PM.png

3. Enter the Dataset ID (dataset name) and set the data location as pictured. Once complete, click “Create Dataset”:

Screenshot_2023-02-09_at_6.33.01_PM.png

4. Next to the new dataset, right-click the actions menu and select “Create table”:

Screenshot_2023-02-09_at_6.33.30_PM.png

5. To create the table:

Select “Drive” in the “Create Table From” drop-down
Paste in the source URL for the google sheet or CSV
Select the file format (CSV/Google Sheet/etc.)
Enter the table name
Select “Auto Detect” for the schema

Screenshot_2023-02-09_at_6.50.58_PM.png

6. In the Advanced options (Optional):

Identify the delimiter type if applicable
If the sheet includes a header row, clip the drop-down next to Advanced Options, and update rows to be skipped to 1. BQ will use the header row to label columns in the table

7. Select “Create Table” to complete

Screenshot_2023-02-09_at_6.42.36_PM.png

Loading Your Data Directly to PAD as a Native Table

If you have a static dataset that you want to store and query in PAD, you’ll want to load your data directly into BigQuery. Follow these steps to upload the data correctly:

Download your CSV or Google Sheet directly to your computer.
From within your BigQuery Project, right-click on the three dots and select the “Create dataset”:

Screenshot_2023-02-09_at_6.32.14_PM.png

3. Enter the Dataset ID (dataset name) and set the data location as pictured. Once complete, click “Create Dataset”:

Screenshot_2023-02-09_at_6.33.01_PM.png

4. Next to the new dataset, right-click the actions menu and select “Create table”:

Screenshot_2023-02-09_at_6.33.30_PM.png

5. To create the table:

Select “Upload” in the “Create Table From” drop-down
Browse your files and select the file you want to upload
Select the file format (CSV)
Enter the table name
Either select “Auto Detect” for the schema or manually add the schema fields for your table

6. In the Advance options (optional):

Identify the delimiter type if applicable
If the sheet includes a header row, clip the drop-down next to Advanced Options, and update rows to be skipped to 1. BQ will use the header row to label columns in the table

7. Select “Create Table” to complete

Screenshot_2023-02-09_at_6.42.36_PM.png

Loading Large CSVs as BigQuery Tables

For large CSVs that exceeds the maximum size in BigQuery, you can leverage Google Cloud Storage (GCS) buckets for your work!

Create a Google Cloud Storage bucket. Google Cloud Storage buckets are containers for holding and organizing data. Google has step-by-step instructions on how to create a bucket here.
Upload your file to your newly-created GCS bucket
From within your BigQuery Project, right-click on the three dots and select the “Create dataset”:

4. Enter the Dataset ID (dataset name) and set the data location as pictured. Once complete, click “Create Dataset”:

5. Next to the new dataset, right-click the actions menu and select “Create table”:

6. To create the table:

Select “Google Cloud Storage” in the “Create Table From” drop-down
Select “Browse” and select your file from the bucket directory
Select the file format (in this example, CSV)
Enter the table name
Select “Auto Detect” for the schema. Or, you can also specify field names and datatypes manually, either by entering each field in the provided boxes, or editing as text.

f. In the Advanced options (Optional):

Identify the delimiter type if applicable
If the sheet includes a header row, clip the drop-down next to Advanced Options, and update rows to be skipped to 1. BQ will use the header row to label columns in the table

g. Select “Create Table” to complete

7. Confirm that your new table appears in the appropriate project when the load job completes

Handling Schema Errors When Uploading

BigQuery’s autodetect schema feature works well for smaller, uncomplicated files. However, it can struggle outside of those use cases, resulting in error messages and failed uploads. Here are some troubleshooting tips for common schema upload errors:

If you are getting an error saying your data has more columns than it has:

This is usually due to how your file is formatted. To troubleshoot:

Open up your file in a text editor (not Excel or Google Sheets, you want to be able to see the raw text) and check for any stray delimiters at the end of columns. For example, in a CSV, this might look like an extra comma at the end of a row. This can create unexpected extra columns. Remove these stray delimiters if they are present.
If this doesn’t help, check to see if the delimiter your file is using is present in your data: e.g., if you have a CSV, do you have values with commas in them? This can cause unexpected extra columns as well. To fix, either adjust those values or convert the file to use a different delimiter not present in your data. You can also experiment with quoted newlines around values with the delimiter (read more here). See below if you utilize quoted newlines.

If you are getting an error saying your data is missing a close double quote (") character:

This is due to having quoted newlines present in your data (data with quotes around it). To fix this error:

Try checking ‘allow quoted newlines’ in the Advanced options when uploading.
If this doesn’t fix it, inspect your data to see if you have any values with only one quotation mark or a second quotation mark that comes before the end of a value (e.g.: “May” is correct, “Ma”y or “May would not be)

If you are getting an error saying a specific row is the wrong type:

This typically happens with larger files because the BigQuery schema autodetect fucntion only scans a section of the file to determine column type, not the whole file. This means that if you have columns where the first set of rows are all numbers, but later on have strings in those columns as well, BigQuery may erroneously assign the column an INT type and the upload will fail once it gets to the string values.

To fix this, you will need to determine your file’s schema and input it into the BigQuery file upload UI instead of using autodetect. There are some tools you can use locally to streamline this process, such as the csvkit command line suite. The BigQuery error will tell you which row failed, so you can also manually inspect your data for each failure and determine what the correct type is for the column.

If you want to use more advanced schema features, like specifying a value to be inserted for all null rows:

There are some more advanced features not currently available in the BQ file upload UI that are available if you use the bq command line tool instead. You can read more about available features here.

Have questions? Contact us at help@techallies.org!