Using BigQuery's CREATE MODEL Statement

Modified on Fri, 17 Jan at 3:11 PM

If you're a PAD user who works with machine learning, BigQuery provides a range of options for creating, training, and predicting models. You can find a comprehensive list of supported types and functionalities on their website here.

To use any model, you'll need to rely on the CREATE MODEL statement in BigQuery. However, the syntax for this statement can vary significantly depending on the type of model you want to create or import. It's important to note that data processing and billing may also differ depending on the type of model you're working with.

It's worth mentioning that only PAD admins and editors have the necessary permissions to use the CREATE MODEL statement. Other users won't be able to access this functionality. For more information on PAD user levels and permissions, take a look at our help article here.

Creating Models in BigQuery:

BigQuery has built-in support for creating and training various model types. It's important to note that some of these models are trained internally on BigQuery ML, while others are trained externally with Vertex AI. You can find more information about each available model type here.

To create and train a model in BigQuery, you must provide all the required parameters in the model options list. This list is included in the CREATE MODEL statement and is specific to the type of model you're creating. You can find different syntaxes for each model type here.

For instance, the following is an example of a full CREATE MODEL query with the model options list for creating a k-means model:

CREATE MODEL
projectid.datasetid.model_name
OPTIONS(model_option_list)
AS query_statement

model_option_list:
MODEL_TYPE = { 'KMEANS' },
    [, NUM_CLUSTERS = { int64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) } ]
    [, KMEANS_INIT_METHOD = { 'RANDOM' | 'KMEANS++' | 'CUSTOM' } ]
    [, KMEANS_INIT_COL = string_value ]
    [, DISTANCE_TYPE = { 'EUCLIDEAN' | 'COSINE' } ]
    [, STANDARDIZE_FEATURES = { TRUE | FALSE } ]
    [, MAX_ITERATIONS = int64_value ]
    [, EARLY_STOP = { TRUE | FALSE } ]
    [, MIN_REL_PROGRESS = float64_value ]
    [, WARM_START = { TRUE | FALSE } ]
    [, NUM_TRIALS = int64_value ]
    [, MAX_PARALLEL_TRIALS = int64_value ]
    [, HPARAM_TUNING_ALGORITHM = { 'VIZIER_DEFAULT' | 'RANDOM_SEARCH' | 'GRID_SEARCH' } ]
    [, HPARAM_TUNING_OBJECTIVES = 'DAVIES_BOULDIN_INDEX' ]

Importing Models in BigQuery:

BigQuery allows you to import pre-existing models for generating predictions. File types like ONNX, TensorFlow, TensorFlow Lite, and XGBoost are supported. You can import the models using the CREATE MODEL statement. To import a model, you first need to upload the file to a Google Cloud bucket in your project and find its Cloud Storage URI before running the statement. To find the Google Cloud URI, locate your file in the Google Cloud console and select 'Configuration', then copy the 'gsutil URI' value.

The syntax for importing files varies depending on the file type you want to load, which can be found here. For instance, here's the syntax for importing an XGBoost model file:

CREATE MODEL
projectid.datasetid.model_name
[INPUT(field_name field_type, …)
 OUTPUT(field_name field_type, …)]
OPTIONS(MODEL_TYPE = 'XGBOOST', MODEL_PATH = string_value);

Keep in mind that options for imported models are a bit more limited than using the native models noted above - see Google’s documentation here for more information.

Have questions? Contact us at help@techallies.org!