Class: Google::Cloud::Bigquery::Project

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/project.rb,
lib/google/cloud/bigquery/project/list.rb

Overview

Project

Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.

Google::Cloud::Bigquery::Project is the main object for interacting with Google BigQuery. Dataset objects are created, accessed, and deleted by Google::Cloud::Bigquery::Project.

See Google::Cloud#bigquery.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

Defined Under Namespace

Classes: List

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(service) ⇒ Project

Creates a new Service instance.

See Google::Cloud.bigquery



66
67
68
# File 'lib/google/cloud/bigquery/project.rb', line 66

def initialize service
  @service = service
end

Instance Attribute Details

#nameString? (readonly)

The descriptive name of the project. Can only be present if the project was retrieved with #projects.

Returns:

  • (String, nil)

    the current value of name



55
56
57
# File 'lib/google/cloud/bigquery/project.rb', line 55

def name
  @name
end

#numeric_idInteger? (readonly)

The numeric ID of the project. Can only be present if the project was retrieved with #projects.

Returns:

  • (Integer, nil)

    the current value of numeric_id



55
56
57
# File 'lib/google/cloud/bigquery/project.rb', line 55

def numeric_id
  @numeric_id
end

Instance Method Details

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil) {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset

Creates a new dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Or, configure access with a block: (See Dataset::Access)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset" do |dataset|
  dataset.access.add_writer_user "writers@example.com"
end

Parameters:

  • dataset_id (String)

    A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the dataset.

  • description (String)

    A user-friendly description of the dataset.

  • expiration (Integer)

    The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour).

  • location (String)

    The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:



511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
# File 'lib/google/cloud/bigquery/project.rb', line 511

def create_dataset dataset_id, name: nil, description: nil,
                   expiration: nil, location: nil
  ensure_service!

  new_ds = Google::Apis::BigqueryV2::Dataset.new(
    dataset_reference: Google::Apis::BigqueryV2::DatasetReference.new(
      project_id: project, dataset_id: dataset_id))

  # Can set location only on creation, no Dataset#location method
  new_ds.update! location: location unless location.nil?

  updater = Dataset::Updater.new(new_ds).tap do |b|
    b.name = name unless name.nil?
    b.description = description unless description.nil?
    b.default_expiration = expiration unless expiration.nil?
  end

  if block_given?
    yield updater
    updater.check_for_mutated_access!
  end

  gapi = service.insert_dataset new_ds
  Dataset.from_gapi gapi, service
end

#dataset(dataset_id) ⇒ Google::Cloud::Bigquery::Dataset?

Retrieves an existing dataset by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
puts dataset.name

Parameters:

  • dataset_id (String)

    The ID of a dataset.

Returns:



457
458
459
460
461
462
463
# File 'lib/google/cloud/bigquery/project.rb', line 457

def dataset dataset_id
  ensure_service!
  gapi = service.get_dataset dataset_id
  Dataset.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#datasets(all: nil, token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Dataset>

Retrieves the list of datasets belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.each do |dataset|
  puts dataset.name
end

Retrieve hidden datasets with the all optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

all_datasets = bigquery.datasets all: true

Retrieve all datasets: (See Dataset::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.all do |dataset|
  puts dataset.name
end

Parameters:

  • all (Boolean)

    Whether to list all datasets, including hidden ones. The default is false.

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of datasets to return.

Returns:



576
577
578
579
580
581
# File 'lib/google/cloud/bigquery/project.rb', line 576

def datasets all: nil, token: nil, max: nil
  ensure_service!
  options = { all: all, token: token, max: max }
  gapi = service.list_datasets options
  Dataset::List.from_gapi gapi, service, all, max
end

#job(job_id) ⇒ Google::Cloud::Bigquery::Job?

Retrieves an existing job by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.job "my_job"

Parameters:

  • job_id (String)

    The ID of a job.

Returns:



598
599
600
601
602
603
604
# File 'lib/google/cloud/bigquery/project.rb', line 598

def job job_id
  ensure_service!
  gapi = service.get_job job_id
  Job.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#jobs(all: nil, token: nil, max: nil, filter: nil) ⇒ Array<Google::Cloud::Bigquery::Job>

Retrieves the list of jobs belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.each do |job|
  # process job
end

Retrieve only running jobs using the filter optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

running_jobs = bigquery.jobs filter: "running"
running_jobs.each do |job|
  # process job
end

Retrieve all jobs: (See Job::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.all do |job|
  # process job
end

Parameters:

  • all (Boolean)

    Whether to display jobs owned by all users in the project. The default is false.

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of jobs to return.

  • filter (String)

    A filter for job state.

    Acceptable values are:

    • done - Finished jobs
    • pending - Pending jobs
    • running - Running jobs

Returns:



655
656
657
658
659
660
# File 'lib/google/cloud/bigquery/project.rb', line 655

def jobs all: nil, token: nil, max: nil, filter: nil
  ensure_service!
  options = { all: all, token: token, max: max, filter: filter }
  gapi = service.list_jobs options
  Job::List.from_gapi gapi, service, all, max, filter
end

#projectObject

The BigQuery project connected to.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new(
  project: "my-project-id",
  keyfile: "/path/to/keyfile.json"
)

bigquery.project #=> "my-project-id"


83
84
85
# File 'lib/google/cloud/bigquery/project.rb', line 83

def project
  service.project
end

#projects(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Project>

Retrieves the list of all projects for which the currently authorized account has been granted any project role. The returned project instances share the same credentials as the project used to retrieve them, but lazily create a new API connection for interactions with the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects
projects.each do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Retrieve all projects: (See Google::Cloud::Bigquery::Project::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects

projects.all do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of projects to return.

Returns:



703
704
705
706
707
708
# File 'lib/google/cloud/bigquery/project.rb', line 703

def projects token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_projects options
  Project::List.from_gapi gapi, service, max
end

#query(query, params: nil, max: nil, timeout: 10000, dryrun: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the synchronous method.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM `my_project.my_dataset.my_table`"
data = bigquery.query sql

data.each do |row|
  puts row[:name]
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM [my_project:my_dataset.my_table]"
data = bigquery.query sql, legacy_sql: true

data.each do |row|
  puts row[:name]
end

Retrieve all rows: (See QueryData#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table`"

data.all do |row|
  puts row[:name]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE id = ?",
                      params: [1]

data.each do |row|
  puts row[:name]
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE id = @id",
                      params: { id: 1 }

data.each do |row|
  puts row[:name]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • timeout (Integer)

    How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds).

  • dryrun (Boolean)

    If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • dataset (String)

    Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format 'datasetId.tableId'.

  • project (String)

    Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.

Returns:



429
430
431
432
433
434
435
436
437
438
439
# File 'lib/google/cloud/bigquery/project.rb', line 429

def query query, params: nil, max: nil, timeout: 10000, dryrun: nil,
          cache: true, dataset: nil, project: nil, standard_sql: nil,
          legacy_sql: nil
  ensure_service!
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache,
              dataset: dataset, project: project,
              legacy_sql: legacy_sql, standard_sql: standard_sql,
              params: params }
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, params: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dataset: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the asynchronous method.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_project.my_dataset.my_table`"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "[my_project:my_dataset.my_table]",
                         legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_dataset.my_table`" \
                         " WHERE id = ?",
                         params: [1]

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM " \
                         "`my_dataset.my_table`" \
                         " WHERE id = @id",
                         params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row[:name]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • dataset (Dataset, String)

    The default dataset to use for unqualified table names in the query. Optional.

  • large_results (Boolean)

    If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.

  • large_results (Boolean)

    This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

  • maximum_billing_tier (Integer)

    Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default. For more information, see High-Compute queries.

  • maximum_bytes_billed (Integer)

    Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.

Returns:



263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/google/cloud/bigquery/project.rb', line 263

def query_job query, params: nil, priority: "INTERACTIVE", cache: true,
              table: nil, create: nil, write: nil, dataset: nil,
              standard_sql: nil, legacy_sql: nil, large_results: nil,
              flatten: nil, maximum_billing_tier: nil,
              maximum_bytes_billed: nil
  ensure_service!
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten,
              dataset: dataset, legacy_sql: legacy_sql,
              standard_sql: standard_sql,
              maximum_billing_tier: maximum_billing_tier,
              maximum_bytes_billed: maximum_bytes_billed,
              params: params }
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#schema {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema

Creates a new schema instance. An optional block may be given to configure the schema, otherwise the schema is returned empty and may be configured directly.

The returned schema can be passed to Dataset#load using the schema option. However, for most use cases, the block yielded by Dataset#load is a more convenient way to configure the schema for the destination table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

schema = bigquery.schema do |s|
  s.string "first_name", mode: :required
  s.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load "my_new_table", gs_url, schema: schema

Yields:

  • (schema)

    a block for setting the schema

Yield Parameters:

  • schema (Schema)

    the object accepting the schema

Returns:



788
789
790
791
792
# File 'lib/google/cloud/bigquery/project.rb', line 788

def schema
  s = Schema.from_gapi
  yield s if block_given?
  s
end

#time(hour, minute, second) ⇒ Bigquery::Time

Creates a Bigquery::Time object to represent a time, independent of a specific date.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

fourpm = bigquery.time 16, 0, 0
data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE time_of_date = @time",
                      params: { time: fourpm }

data.each do |row|
  puts row[:name]
end

Create Time with fractional seconds:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

precise_time = bigquery.time 16, 35, 15.376541
data = bigquery.query "SELECT name " \
                      "FROM `my_dataset.my_table`" \
                      "WHERE time_of_date >= @time",
                      params: { time: precise_time }

data.each do |row|
  puts row[:name]
end

Parameters:

  • hour (Integer)

    Hour, valid values from 0 to 23.

  • minute (Integer)

    Minute, valid values from 0 to 59.

  • second (Integer, Float)

    Second, valid values from 0 to 59. Can contain microsecond precision.

Returns:



751
752
753
# File 'lib/google/cloud/bigquery/project.rb', line 751

def time hour, minute, second
  Bigquery::Time.new "#{hour}:#{minute}:#{second}"
end