How to Perform Common Redshift tasks in PAD

Modified on Tue, 6 Aug, 2024 at 3:00 PM

Many of our users - including some of our staff! - make the switch from Redshift to BigQuery when they start using PAD (see here for a broader overview of why we use BigQuery at CTA).

For the most part, users will find Redshift and BigQuery to be very similar and Redshift users should expect to achieve fluency in PAD quickly as the SQL syntax itself doesn’t change much between the two. However, there are a few key areas where structure or syntax is slightly different, and there are also several tasks you can do in BigQuery natively that you cannot do as easily in Redshift. These differences are outlined below.

Structure:

I want to see the columns of a table. Should I just run a "select *" statement as I would do in Redshift?

We strongly recommend against running “select *” statements in BigQuery. Unlike Redshift, which charges by rows, BigQuery charges by the number of columns returned. This means that if you select * a table with a lot of columns, it can end up pretty expensive – even when using a limit statement. Instead, you can click on a table in the sidebar drop-down on the left-hand side in BigQuery and then select ‘Schema’ in the window on the right-hand side to see table columns.

Screenshot_2023-03-15_at_11.55.47_AM.png

What is the difference between projects, datasets, and tables? Where are schemas?

Projects in BigQuery are the equivalent of a Redshift database. Datasets are the equivalent of Redshift schemas, and tables in BigQuery are the equivalent of Redshift tables. You can easily switch between projects if you need to by using the selector in the top left corner of PAD.

Do I need to download a SQL GUI to utilize BigQuery, as I would for Redshift?

No, PAD has a built-in SQL editor - just navigate to PAD and select ‘query’ in the menu to access it.

Why can’t I run a grant query? How do I give people permission to see tables?

In BigQuery, all permissions are handled via Google Groups at the dataset level. PAD has four access levels available: administrator, editor, viewer, and contributor. Folks with the administration access level can create, edit, and delete datasets in your partner project; folks with the editor access level can create and edit datasets but cannot delete; folks with the viewer access level can only query datasets; and folks with the contributor access level can query and create datasets, but won’t necessarily be able to see everything within a dataset unless explicitly granted. In practice, this means if you create a table in a dataset, you don’t need to do anything about the permissions; anyone with access to that dataset will automatically have access to the table. If you need to change your account level or get access to another dataset, reach out to CTA at help@techallies.org or to your organization's admin..

How can I know when a table was refreshed?

The table details in BigQuery contain this information; click on a table and then select ‘Details’ to see the last-modified date. Note that this doesn’t necessarily reflect new data, just that a table refresh job ran.

Screenshot_2023-03-15_at_11.57.22_AM.png

Syntax:

How do I reference a table in BigQuery?

When writing a query in BigQuery, you'll referenc full project/dataset/table name combo, and put it between 'backtick' quotation marks - ` `. You can copy this ID from the Details tab when you navigate to a table or have it show up in the Query editor automatically when you click select query next to the table name.

Extras!

Starring datasets/projects

If you frequently use multiple datasets or projects, you can pin them so they always appear in the sidebar when you load the BigQuery UI. To pin, navigate to the project or dataset in the side console and click the star icon next to it. You can read more about finding and starring datasets and projects here.

How do I get data out of BigQuery?

BigQuery has a built-in feature to export data - after you run a query, just select ‘Save Results’ and choose how you want to export the data.

Screenshot_2023-03-15_at_12.01.18_PM.png

You can also export data using more advanced features available with Google Sheets, such as Connected Sheets - see here for more information.

What if I want to query across multiple projects?

You can! Just make sure the project name is included in your FROM statement, and that you have access to both tables, you’re attempting to query.

How can I save and/or schedule queries?

BigQuery allows you to both save and schedule queries. Saving queries means you can keep track of queries you use a lot and save them for both yourself and other folks in the dataset, and Scheduling queries allows you to set a query to run at a certain time repeatedly, which is helpful if you need to refresh tables. For a detailed walkthrough of scheduling and saving queries, check out our help article here.

What if I want to build visualizations with my data?

PAD also includes Looker Studio, which is Google’s data visualization platform (formerly known as Google Data Studio). It is free to use and share visuals from, with the exception of the BigQuery costs associated with pulling your data into Looker Studio. See here for some additional tips and tricks on using Looker Studio.

You can also connect to other data visualization programs with BigQuery data; processes will vary.