Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "owners@example.com"
  access.add_writer_user "writer@example.com"
  access.remove_writer_user "readers@example.com"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:

See Also:



258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/google/cloud/bigquery/dataset.rb', line 258

def access
  ensure_full_data!
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlObject

A URL that can be used to access the dataset using the REST API.



125
126
127
128
# File 'lib/google/cloud/bigquery/dataset.rb', line 125

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of table."

The table's schema fields can be passed as an argument.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema_fields = [
  Google::Cloud::Bigquery::Schema::Field.new(
    "first_name", :string, mode: :required),
  Google::Cloud::Bigquery::Schema::Field.new(
    "cities_lived", :record, mode: :repeated
    fields: [
      Google::Cloud::Bigquery::Schema::Field.new(
        "place", :string, mode: :required),
      Google::Cloud::Bigquery::Schema::Field.new(
        "number_of_years", :integer, mode: :required),
      ])
]
table = dataset.create_table "my_table", fields: schema_fields

Or the table's schema can be configured with the block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Parameters:

  • table_id (String)

    The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

  • fields (Array<Schema::Field>)

    An array of Schema::Field objects specifying the schema's data types for the table. The schema may also be configured when passing a block.

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Table)

    the table object to be updated

Returns:



391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
# File 'lib/google/cloud/bigquery/dataset.rb', line 391

def create_table table_id, name: nil, description: nil, fields: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
    tb.schema.fields = fields unless fields.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

Creates a new view table from the given query.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

Parameters:

  • table_id (String)

    The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • query (String)

    The query that BigQuery executes when the view is referenced.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

Returns:



443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# File 'lib/google/cloud/bigquery/dataset.rb', line 443

def create_view table_id, query, name: nil, description: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atObject

The time when this dataset was created.



180
181
182
183
184
185
186
187
# File 'lib/google/cloud/bigquery/dataset.rb', line 180

def created_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_idObject

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



68
69
70
# File 'lib/google/cloud/bigquery/dataset.rb', line 68

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#default_expirationObject

The default lifetime of all tables in the dataset, in milliseconds.



155
156
157
158
159
160
161
162
# File 'lib/google/cloud/bigquery/dataset.rb', line 155

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.



170
171
172
173
# File 'lib/google/cloud/bigquery/dataset.rb', line 170

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete

Parameters:

  • force (Boolean)

    If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

  • (Boolean)

    Returns true if the dataset was deleted.



292
293
294
295
296
# File 'lib/google/cloud/bigquery/dataset.rb', line 292

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#descriptionObject

A user-friendly description of the dataset.



135
136
137
138
# File 'lib/google/cloud/bigquery/dataset.rb', line 135

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.



145
146
147
148
# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagObject

A string hash of the dataset.



115
116
117
118
# File 'lib/google/cloud/bigquery/dataset.rb', line 115

def etag
  ensure_full_data!
  @gapi.etag
end

#locationObject

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.



209
210
211
212
# File 'lib/google/cloud/bigquery/dataset.rb', line 209

def location
  ensure_full_data!
  @gapi.location
end

#modified_atObject

The date when this dataset or any of its tables was last modified.



194
195
196
197
198
199
200
201
# File 'lib/google/cloud/bigquery/dataset.rb', line 194

def modified_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameObject

A descriptive name for the dataset.



96
97
98
# File 'lib/google/cloud/bigquery/dataset.rb', line 96

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.



105
106
107
108
# File 'lib/google/cloud/bigquery/dataset.rb', line 105

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idObject

The ID of the project containing this dataset.



77
78
79
# File 'lib/google/cloud/bigquery/dataset.rb', line 77

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • timeout (Integer)

    How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds).

  • dryrun (Boolean)

    If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

Returns:



654
655
656
657
658
659
660
661
# File 'lib/google/cloud/bigquery/dataset.rb', line 654

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • large_results (Boolean)

    If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

Returns:



595
596
597
598
599
600
601
602
603
604
# File 'lib/google/cloud/bigquery/dataset.rb', line 595

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

Parameters:

  • table_id (String)

    The ID of a table.

Returns:



480
481
482
483
484
485
486
# File 'lib/google/cloud/bigquery/dataset.rb', line 480

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of tables to return.

Returns:



523
524
525
526
527
528
# File 'lib/google/cloud/bigquery/dataset.rb', line 523

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end