Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access # [{"role"=>"OWNER",
               #   "specialGroup"=>"projectOwners"},
               #  {"role"=>"WRITER",
               #   "specialGroup"=>"projectWriters"},
               #  {"role"=>"READER",
               #   "specialGroup"=>"projectReaders"},
               #  {"role"=>"OWNER",
               #   "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "owners@example.com"
  access.add_writer_user "writer@example.com"
  access.remove_writer_user "readers@example.com"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:

See Also:



255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/google/cloud/bigquery/dataset.rb', line 255

def access
  ensure_full_data!
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlObject

A URL that can be used to access the dataset using the REST API.



124
125
126
127
# File 'lib/google/cloud/bigquery/dataset.rb', line 124

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table",
                             name: "My Table",
                             description: "A description of table."

Or the table's schema can be configured with the block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Parameters:

  • table_id (String)

    The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Table)

    the table object to be updated

Returns:



362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
# File 'lib/google/cloud/bigquery/dataset.rb', line 362

def create_table table_id, name: nil, description: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::View

Creates a new view table from the given query.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
          "SELECT name, age FROM proj.dataset.users"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
          "SELECT name, age FROM proj.dataset.users",
          name: "My View", description: "This is my view"

Parameters:

  • table_id (String)

    The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • query (String)

    The query that BigQuery executes when the view is referenced.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.

Returns:



421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# File 'lib/google/cloud/bigquery/dataset.rb', line 421

def create_view table_id, query, name: nil, description: nil,
                standard_sql: nil, legacy_sql: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query,
      use_legacy_sql: Convert.resolve_legacy_sql(standard_sql,
                                                 legacy_sql)
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atObject

The time when this dataset was created.



179
180
181
182
183
184
185
186
# File 'lib/google/cloud/bigquery/dataset.rb', line 179

def created_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_idObject

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



67
68
69
# File 'lib/google/cloud/bigquery/dataset.rb', line 67

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#default_expirationObject

The default lifetime of all tables in the dataset, in milliseconds.



154
155
156
157
158
159
160
161
# File 'lib/google/cloud/bigquery/dataset.rb', line 154

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.



169
170
171
172
# File 'lib/google/cloud/bigquery/dataset.rb', line 169

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.delete

Parameters:

  • force (Boolean)

    If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

  • (Boolean)

    Returns true if the dataset was deleted.



288
289
290
291
292
# File 'lib/google/cloud/bigquery/dataset.rb', line 288

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#descriptionObject

A user-friendly description of the dataset.



134
135
136
137
# File 'lib/google/cloud/bigquery/dataset.rb', line 134

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.



144
145
146
147
# File 'lib/google/cloud/bigquery/dataset.rb', line 144

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagObject

A string hash of the dataset.



114
115
116
117
# File 'lib/google/cloud/bigquery/dataset.rb', line 114

def etag
  ensure_full_data!
  @gapi.etag
end

#locationObject

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.



208
209
210
211
# File 'lib/google/cloud/bigquery/dataset.rb', line 208

def location
  ensure_full_data!
  @gapi.location
end

#modified_atObject

The date when this dataset or any of its tables was last modified.



193
194
195
196
197
198
199
200
# File 'lib/google/cloud/bigquery/dataset.rb', line 193

def modified_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameObject

A descriptive name for the dataset.



95
96
97
# File 'lib/google/cloud/bigquery/dataset.rb', line 95

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.



104
105
106
107
# File 'lib/google/cloud/bigquery/dataset.rb', line 104

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idObject

The ID of the project containing this dataset.



76
77
78
# File 'lib/google/cloud/bigquery/dataset.rb', line 76

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, params: nil, max: nil, timeout: 10000, dryrun: nil, cache: true, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table"

data.each do |row|
  puts row["name"]
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table",
                     legacy_sql: true

data.each do |row|
  puts row["name"]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = ?",
                     params: [1]

data.each do |row|
  puts row["name"]
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = @id",
                     params: { id: 1 }

data.each do |row|
  puts row["name"]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • timeout (Integer)

    How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds).

  • dryrun (Boolean)

    If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.

Returns:



815
816
817
818
819
820
821
822
823
824
825
# File 'lib/google/cloud/bigquery/dataset.rb', line 815

def query query, params: nil, max: nil, timeout: 10000, dryrun: nil,
          cache: true, standard_sql: nil, legacy_sql: nil
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache,
              legacy_sql: legacy_sql, standard_sql: standard_sql,
              params: params }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, params: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table",
                        legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = ?",
                        params: [1]

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = @id",
                        params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.

  • large_results (Boolean)

    This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

Returns:



667
668
669
670
671
672
673
674
675
676
677
678
679
# File 'lib/google/cloud/bigquery/dataset.rb', line 667

def query_job query, params: nil, priority: "INTERACTIVE", cache: true,
              table: nil, create: nil, write: nil, standard_sql: nil,
              legacy_sql: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten,
              legacy_sql: legacy_sql, standard_sql: standard_sql,
              params: params }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table"
puts table.name

Parameters:

  • table_id (String)

    The ID of a table.

Returns:



461
462
463
464
465
466
467
# File 'lib/google/cloud/bigquery/dataset.rb', line 461

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of tables to return.

Returns:



504
505
506
507
508
509
# File 'lib/google/cloud/bigquery/dataset.rb', line 504

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end