Class: Google::Cloud::Bigquery::Dataset

Inherits:

Object

Object
Google::Cloud::Bigquery::Dataset

show all

Defined in:: lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Attributes collapse

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset.
#api_url ⇒ Object
A URL that can be used to access the dataset using the REST API.
#created_at ⇒ Object
The time when this dataset was created.
#dataset_id ⇒ Object
A unique ID for this dataset, without the project name.
#default_expiration ⇒ Object
The default lifetime of all tables in the dataset, in milliseconds.
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
#description ⇒ Object
A user-friendly description of the dataset.
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
#etag ⇒ Object
A string hash of the dataset.
#location ⇒ Object
The geographic location where the dataset should reside.
#modified_at ⇒ Object
The date when this dataset or any of its tables was last modified.
#name ⇒ Object
A descriptive name for the dataset.
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
#project_id ⇒ Object
The ID of the project containing this dataset.

Lifecycle collapse

#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset.

Table collapse

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table.
#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View
Creates a new view table from the given query.
#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...
Retrieves an existing table by ID.
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>
Retrieves the list of tables belonging to the dataset.

Data collapse

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData
Queries data using the synchronous method.
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob
Queries data using the asynchronous method.

Instance Method Details

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "owners@example.com"
  access.add_writer_user "writer@example.com"
  access.remove_writer_user "readers@example.com"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

(access) —
a block for setting rules

Yield Parameters:

access (Dataset::Access) —
the object accepting rules

Returns:

(Google::Cloud::Bigquery::Dataset::Access)

#api_url ⇒ `Object`

A URL that can be used to access the dataset using the REST API.

# File 'lib/google/cloud/bigquery/dataset.rb', line 125

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of table."

The table's schema fields can be passed as an argument.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema_fields = [
  Google::Cloud::Bigquery::Schema::Field.new(
    "first_name", :string, mode: :required),
  Google::Cloud::Bigquery::Schema::Field.new(
    "cities_lived", :record, mode: :repeated
    fields: [
      Google::Cloud::Bigquery::Schema::Field.new(
        "place", :string, mode: :required),
      Google::Cloud::Bigquery::Schema::Field.new(
        "number_of_years", :integer, mode: :required),
      ])
]
table = dataset.create_table "my_table", fields: schema_fields

Or the table's schema can be configured with the block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Parameters:

table_id (String) —
The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
name (String) —
A descriptive name for the table.
description (String) —
A user-friendly description of the table.
fields (Array<Schema::Field>) —
An array of Schema::Field objects specifying the schema's data types for the table. The schema may also be configured when passing a block.

Yields:

(table) —
a block for setting the table

Yield Parameters:

table (Table) —
the table object to be updated

Returns:

(Google::Cloud::Bigquery::Table)

# File 'lib/google/cloud/bigquery/dataset.rb', line 391

def create_table table_id, name: nil, description: nil, fields: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
    tb.schema.fields = fields unless fields.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil) ⇒ `Google::Cloud::Bigquery::View`

Creates a new view table from the given query.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

Parameters:

table_id (String) —
The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
query (String) —
The query that BigQuery executes when the view is referenced.
name (String) —
A descriptive name for the table.
description (String) —
A user-friendly description of the table.

Returns:

(Google::Cloud::Bigquery::View)

# File 'lib/google/cloud/bigquery/dataset.rb', line 443

def create_view table_id, query, name: nil, description: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_at ⇒ `Object`

The time when this dataset was created.

# File 'lib/google/cloud/bigquery/dataset.rb', line 180

def created_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_id ⇒ `Object`

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



68
69
70

# File 'lib/google/cloud/bigquery/dataset.rb', line 68

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#default_expiration ⇒ `Object`

The default lifetime of all tables in the dataset, in milliseconds.

# File 'lib/google/cloud/bigquery/dataset.rb', line 155

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ `Object`

Updates the default lifetime of all tables in the dataset, in milliseconds.

# File 'lib/google/cloud/bigquery/dataset.rb', line 170

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ `Boolean`

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete

Parameters:

force (Boolean) —
If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

(Boolean) —
Returns true if the dataset was deleted.

# File 'lib/google/cloud/bigquery/dataset.rb', line 292

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#description ⇒ `Object`

A user-friendly description of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 135

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ `Object`

Updates the user-friendly description of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etag ⇒ `Object`

A string hash of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 115

def etag
  ensure_full_data!
  @gapi.etag
end

#location ⇒ `Object`

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

# File 'lib/google/cloud/bigquery/dataset.rb', line 209

def location
  ensure_full_data!
  @gapi.location
end

#modified_at ⇒ `Object`

The date when this dataset or any of its tables was last modified.

# File 'lib/google/cloud/bigquery/dataset.rb', line 194

def modified_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#name ⇒ `Object`

A descriptive name for the dataset.



96
97
98

# File 'lib/google/cloud/bigquery/dataset.rb', line 96

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ `Object`

Updates the descriptive name for the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 105

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_id ⇒ `Object`

The ID of the project containing this dataset.



77
78
79

# File 'lib/google/cloud/bigquery/dataset.rb', line 77

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Google::Cloud::Bigquery::QueryData`

Queries data using the synchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
max (Integer) —
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
timeout (Integer) —
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds).
dryrun (Boolean) —
If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.
cache (Boolean) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

Returns:

(Google::Cloud::Bigquery::QueryData)

# File 'lib/google/cloud/bigquery/dataset.rb', line 654

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Google::Cloud::Bigquery::QueryJob`

Queries data using the asynchronous method.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
priority (String) —
Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.
cache (Boolean) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
table (Table) —
The destination table where the query results should be stored. If not present, a new table will be created to store the results.
create (String) —
Specifies whether the job is allowed to create new tables.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) —
Specifies the action that occurs if the destination table already exists.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
large_results (Boolean) —
If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.
flatten (Boolean) —
Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

Returns:

(Google::Cloud::Bigquery::QueryJob)

# File 'lib/google/cloud/bigquery/dataset.rb', line 595

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ `Google::Cloud::Bigquery::Table`, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

Parameters:

table_id (String) —
The ID of a table.

Returns:

(Google::Cloud::Bigquery::Table, Google::Cloud::Bigquery::View, nil) —
Returns nil if the table does not exist

# File 'lib/google/cloud/bigquery/dataset.rb', line 480

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`, `Array<Google::Cloud::Bigquery::View>`

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

token (String) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) —
Maximum number of tables to return.

Returns:

(Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>) —
(See Table::List)

# File 'lib/google/cloud/bigquery/dataset.rb', line 523

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end

Class: Google::Cloud::Bigquery::Dataset

Overview

Dataset

Direct Known Subclasses

Defined Under Namespace

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

#api_url ⇒ Object

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

#created_at ⇒ Object

#dataset_id ⇒ Object

#default_expiration ⇒ Object

#default_expiration=(new_default_expiration) ⇒ Object

#delete(force: nil) ⇒ Boolean

#description ⇒ Object

#description=(new_description) ⇒ Object

#etag ⇒ Object

#location ⇒ Object

#modified_at ⇒ Object

#name ⇒ Object

#name=(new_name) ⇒ Object

#project_id ⇒ Object

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

#api_url ⇒ `Object`

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

#create_view(table_id, query, name: nil, description: nil) ⇒ `Google::Cloud::Bigquery::View`

#created_at ⇒ `Object`

#dataset_id ⇒ `Object`

#default_expiration ⇒ `Object`

#default_expiration=(new_default_expiration) ⇒ `Object`

#delete(force: nil) ⇒ `Boolean`

#description ⇒ `Object`

#description=(new_description) ⇒ `Object`

#etag ⇒ `Object`

#location ⇒ `Object`

#modified_at ⇒ `Object`

#name ⇒ `Object`

#name=(new_name) ⇒ `Object`

#project_id ⇒ `Object`

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Google::Cloud::Bigquery::QueryData`

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Google::Cloud::Bigquery::QueryJob`

#table(table_id) ⇒ `Google::Cloud::Bigquery::Table`, ...

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`, `Array<Google::Cloud::Bigquery::View>`