Views in BigQuery

Modified on Tue, 5 Aug at 12:37 PM

In BigQuery, several options exist for storing and retrieving your data through data object types. Data can be stored in a table, a view, or a materialized view. See below for some frequently asked questions about these data object types:

What is the difference between a table, a view, and a materialized view?

Table: Data is static; it is loaded into BigQuery once and stays the same unless you manually make changes. Tables are usually faster to query than views because the underlying data does not change. Table data can still be refreshed through options like scheduled queries or manual updates, but those have to be actively set up. You can create a table in BigQuery with the below SQL:

For an empty table:

CREATE OR REPLACE TABLE <yourprojectid.datasetname.tablenamehere> (<columnname1here> columndatatype, <columnname2here> columndatatype)

For a table based on the results of a SQL query:

CREATE OR REPLACE TABLE <yourprojectid.datasetname.tablenamehere> AS <SQLqueryhere>

Views: Data is not static. A view is defined by a SQL query, called a ‘view definition’, and the view always represents the freshest version of that query. E.g. - if you create a view that pulls from several tables in your project, and one of those tables changes, your view will automatically update to reflect that change. Views are usually the slowest option between tables and materialized views, but they are also often the right choice if data freshness is important for your use case! You can create a view in BigQuery with the below SQL:

CREATE OR REPLACE VIEW <yourprojectid.datasetname.viewnamehere> AS <SQLqueryhere>

Materialized Views: Data is not static. A materialized view is defined by a SQL query, called a ‘view definition’ just like a view. However, unlike a view, a materialized view refreshes on a set cadence. E.g - if you create a materialized view that pulls from several tables in your project and one of those tables changes, your materialized view will update based on your settings for the view. Querying the materialized view will always return the freshest data regardless of update cadence, but if a user queries a materialized view that hasn’t been refreshed and there have been changes to the underlying data, it will slow down the query. Materialized views are usually faster than views, but they are also a newer type of data object and have some limitations compared to tables and views - you can read more about that here. You can create a materialized view in BigQuery with the below SQL:

CREATE OR REPLACE MATERIALIZED VIEW <yourprojectid.datasetname.viewnamehere> AS <SQLqueryhere>

How can I tell if something is a table, a view, or a materialized view?

You have a few options here. First, you can look at the icon next to the name when you click into a dataset.

A Table will have this icon next to it:
A View will have this icon next to it:
A Materialized View will have this icon next to it:

You can also click on the object and then click on the ‘Details’ tab. At the top of the tab, it will identify what type of

object it is.

FAQs

What should I use if I want my data to always be fresh?

Views are your best option here, because they’ll automatically update when the underlying data changes. Materialized views could also be a good choice if you are okay with data only updating when somebody actively queries it.

If I have a table that I want to update, what can I do?

You have a few options here. If you need to make a one-off change that impacts table structure, such as adding a new column, you can recreate the table by manually running the CREATE OR REPLACE query above. If other objects rely on your table or you’re using it for connected tools like Looker Studio, make sure you name your recreated table the same thing as your original table.

If you want your table to regularly update, you can set up a scheduled query that will run the CREATE OR REPLACE query above on a schedule you set. We recommend scheduling your query for a time that maximizes data freshness (e.g., if the sync you’re pulling from updates at 4 AM every morning, then you’ll want to schedule your query for after 4 AM) and minimizes user downtime (e.g., if you know your users need to use this table every afternoon, you don’t want to schedule your query for then).

Querying my views takes a long time. What can I do to fix that?

You have a few options to explore here. First, you’ll want to make sure your query follows Google’s best practices for query optimization. If it does and your query is still too slow, consider whether a materialized view or an incremental table build is an option for your use case. Instead of rebuilding a whole table every time a refresh is needed, incremental table builds allow you to set a certain identifier - e.g., a date -, pull in only any new data that doesn’t exist in the table based on that identifier and merge that into your existing table.

What if I drop and recreate a table that my view uses? What happens to the view?

Your view will be completely fine as long as your recreated table, and the columns you’re using in the view have the same names! Some other databases require you to include certain language when you create a view to avoid issues with table drops, but BigQuery handles this natively with no extra effort required from you. If you drop an underlying table or view that your view uses, your view will show an error message until that underlying table or view is recreated with the same name. Once it is recreated, your view will function as usual. Keep in mind that if you need to drop an underlying table and view and recreate it with a different name, change or add column names, or exclude an underlying table or view from the view definition altogether, that’s fine - but you will need to edit your view’s definition to either remove the excluded table or include the updated name(s).

The schema (list of columns) for my view is incorrect when I preview my view in BigQuery. How can I fix it?

When you change the underlying views or tables that a view relies on, your view schema may or may not update to reflect that, depending on how you have defined it. If your view definition relies on a SELECT *, the schema will not update and you will need to drop and recreate the view to have the correct schema (but even when the schema is not updated, the data in the view will still be updated and correct, as noted above). If your view definition names specific columns, you will need to edit the view definition to include the changed columns in your underlying tables or views. Once you edit the view definition, the schema should immediately update. We recommend training your users to check view data by querying it vs. relying solely on the preview details since we know the queried data will always be correct!

How do I update my views in PAD?

Sometimes, you may need to update a view you’ve built in BigQuery by changing the underlying objects that feed into the view. Unlike other databases, you do not need to wrap in any special statements in the view creation in order to allow underlying objects to be changed. However, your options for changing underlying objects vary based on the view type.

Regular Views: For regular (non-materialized) views, you don’t need to recreate them if the underlying table(s) change. BigQuery can handle that automatically without you adding any attributes or anything to the views. See additional documentation here from Google.

With a view, the schema of the underlying table(s) is stored with the view when the view is created. If columns are added, deleted, or modified after the view is created, then the reported schema in BQ is inaccurate until the view is updated. However, even though the reported schema may be inaccurate, all submitted queries still produce accurate results. There is a brief period where the view won’t work while the underlying table doesn’t exist, but otherwise, views will function as normal even when the underlying tables are changed. (E.g., if you drop an underlying table for a view, there will be an error message for the view until you recreate the underlying table. Typically, this means a few minutes of downtime).

Materialized Views: If you are working with a materialized view, you will need to recreate it if underlying tables are dropped and recreated due to how materialized views are built. If you have a use case that requires both materialized views and frequent dropping/recreating of underlying tables, reach out to help@techallies.org.

Have questions? Contact us at help@techallies.org!