Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

access = dataset.access
access.writer_user? "reader@example.com" #=> false

Manage the access rules by passing a block:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "owners@example.com"
  access.add_writer_user "writer@example.com"
  access.remove_writer_user "readers@example.com"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

Returns:

See Also:



384
385
386
387
388
389
390
391
392
393
394
395
396
# File 'lib/google/cloud/bigquery/dataset.rb', line 384

def access
  ensure_full_data!
  reload! unless resource_full?
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlString?

A URL that can be used to access the dataset using the REST API.

Returns:

  • (String, nil)

    A REST URL for the resource, or nil if the object is a reference (see #reference?).



154
155
156
157
158
# File 'lib/google/cloud/bigquery/dataset.rb', line 154

def api_url
  return nil if reference?
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table",
                             name: "My Table",
                             description: "A description of table."

Or the table's schema can be configured with the block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Parameters:

  • table_id (String)

    The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Table)

    the table object to be updated

Returns:



492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
# File 'lib/google/cloud/bigquery/dataset.rb', line 492

def create_table table_id, name: nil, description: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table

Creates a new view table, which is a virtual table defined by the given SQL query.

BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See Table#view? and Table#query.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
          "SELECT name, age FROM proj.dataset.users"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

view = dataset.create_view "my_view",
          "SELECT name, age FROM proj.dataset.users",
          name: "My View", description: "This is my view"

Parameters:

  • table_id (String)

    The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • query (String)

    The query that BigQuery executes when the view is referenced.

  • name (String)

    A descriptive name for the table.

  • description (String)

    A user-friendly description of the table.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.

  • udfs (Array<String>, String)

    User-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

Returns:



565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
# File 'lib/google/cloud/bigquery/dataset.rb', line 565

def create_view table_id, query, name: nil, description: nil,
                standard_sql: nil, legacy_sql: nil, udfs: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query,
      use_legacy_sql: Convert.resolve_legacy_sql(standard_sql,
                                                 legacy_sql),
      user_defined_function_resources: udfs_gapi(udfs)
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atTime?

The time when this dataset was created.

Returns:

  • (Time, nil)

    The creation time, or nil if not present or the object is a reference (see #reference?).



237
238
239
240
241
242
243
244
245
# File 'lib/google/cloud/bigquery/dataset.rb', line 237

def created_at
  return nil if reference?
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_idString

A unique ID for this dataset, without the project name.

Returns:

  • (String)

    The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



74
75
76
77
# File 'lib/google/cloud/bigquery/dataset.rb', line 74

def dataset_id
  return reference.dataset_id if reference?
  @gapi.dataset_reference.dataset_id
end

#default_expirationInteger?

The default lifetime of all tables in the dataset, in milliseconds.

Returns:

  • (Integer, nil)

    The default table expiration in milliseconds, or nil if not present or the object is a reference (see #reference?).



200
201
202
203
204
205
206
207
208
# File 'lib/google/cloud/bigquery/dataset.rb', line 200

def default_expiration
  return nil if reference?
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_default_expiration (Integer)

    The new default table expiration in milliseconds.



223
224
225
226
227
# File 'lib/google/cloud/bigquery/dataset.rb', line 223

def default_expiration= new_default_expiration
  reload! unless resource_full?
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.delete

Parameters:

  • force (Boolean)

    If true, delete all the tables in the dataset. If false and the dataset contains tables, the request will fail. Default is false.

Returns:

  • (Boolean)

    Returns true if the dataset was deleted.



418
419
420
421
422
# File 'lib/google/cloud/bigquery/dataset.rb', line 418

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#descriptionString?

A user-friendly description of the dataset.

Returns:

  • (String, nil)

    The description, or nil if the object is a reference (see #reference?).



168
169
170
171
172
# File 'lib/google/cloud/bigquery/dataset.rb', line 168

def description
  return nil if reference?
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_description (String)

    The new description for the dataset.



185
186
187
188
189
# File 'lib/google/cloud/bigquery/dataset.rb', line 185

def description= new_description
  reload! unless resource_full?
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagString?

The ETag hash of the dataset.

Returns:

  • (String, nil)

    The ETag hash, or nil if the object is a reference (see #reference?).



140
141
142
143
144
# File 'lib/google/cloud/bigquery/dataset.rb', line 140

def etag
  return nil if reference?
  ensure_full_data!
  @gapi.etag
end

#exists?Boolean

Determines whether the dataset exists in the BigQuery service. The result is cached locally.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.exists? # true

Returns:

  • (Boolean)

    true when the dataset exists in the BigQuery service, false otherwise.



1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
# File 'lib/google/cloud/bigquery/dataset.rb', line 1621

def exists?
  # Always true if we have a gapi object
  return true unless reference?
  # If we have a value, return it
  return @exists unless @exists.nil?
  ensure_gapi!
  @exists = true
rescue Google::Cloud::NotFoundError
  @exists = false
end

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

data.each do |row|
  puts row[:name]
end

Parameters:

  • url (String, Array<String>)

    The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.

  • format (String|Symbol)

    The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.

    The following values are supported:

    • csv - CSV
    • json - Newline-delimited JSON
    • avro - Avro
    • sheets - Google Sheets
    • datastore_backup - Cloud Datastore backup
    • bigtable - Bigtable

Yields:

  • (ext)

Returns:

See Also:



1130
1131
1132
1133
1134
# File 'lib/google/cloud/bigquery/dataset.rb', line 1130

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#insert(table_id, rows, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) ⇒ Google::Cloud::Bigquery::InsertResponse

Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Avoid retrieving the dataset with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Using autocreate to create a new table if none exists.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows, autocreate: true do |t|
  t.schema.string "first_name", mode: :required
  t.schema.integer "age", mode: :required
end

Parameters:

  • table_id (String)

    The ID of the destination table.

  • rows (Hash, Array<Hash>)

    A hash object or array of hash objects containing the data. Required.

  • skip_invalid (Boolean)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

  • autocreate (Boolean)

    Specifies whether the method should create a new table with the given table_id, if no table is found for table_id. The default value is false.

Returns:

See Also:



1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
# File 'lib/google/cloud/bigquery/dataset.rb', line 1812

def insert table_id, rows, skip_invalid: nil, ignore_unknown: nil,
           autocreate: nil
  if autocreate
    begin
      insert_data table_id, rows, skip_invalid: skip_invalid,
                                  ignore_unknown: ignore_unknown
    rescue Google::Cloud::NotFoundError
      sleep rand(1..60)
      begin
        create_table table_id do |tbl_updater|
          yield tbl_updater if block_given?
        end
      # rubocop:disable Lint/HandleExceptions
      rescue Google::Cloud::AlreadyExistsError
      end
      # rubocop:enable Lint/HandleExceptions

      sleep 60
      insert table_id, rows, skip_invalid: skip_invalid,
                             ignore_unknown: ignore_unknown,
                             autocreate: true
    end
  else
    insert_data table_id, rows, skip_invalid: skip_invalid,
                                ignore_unknown: ignore_unknown
  end
end

#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter

Create an asynchronous inserter object used to insert rows in batches.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
inserter = table.insert_async do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Parameters:

  • table_id (String)

    The ID of the table to insert rows into.

  • skip_invalid (Boolean)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

  • max_rows (Integer)

    The maximum number of rows to be collected before the batch is published. Default is 500.

Yields:

  • (response)

    the callback for when a batch of rows is inserted

Yield Parameters:

Returns:



1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
# File 'lib/google/cloud/bigquery/dataset.rb', line 1888

def insert_async table_id, skip_invalid: nil, ignore_unknown: nil,
                 max_bytes: 10000000, max_rows: 500, interval: 10,
                 threads: 4, &block
  ensure_service!

  # Get table, don't use Dataset#table which handles NotFoundError
  gapi = service.get_table dataset_id, table_id
  table = Table.from_gapi gapi, service
  # Get the AsyncInserter from the table
  table.insert_async skip_invalid: skip_invalid,
                     ignore_unknown: ignore_unknown,
                     max_bytes: max_bytes, max_rows: max_rows,
                     interval: interval, threads: threads, &block
end

#labelsHash<String, String>?

A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

labels = dataset.labels
labels["department"] #=> "shipping"

Returns:

  • (Hash<String, String>, nil)

    A hash containing key/value pairs, or nil if the object is a reference (see #reference?).



302
303
304
305
306
307
# File 'lib/google/cloud/bigquery/dataset.rb', line 302

def labels
  return nil if reference?
  m = @gapi.labels
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(labels) ⇒ Object

Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.labels = { "department" => "shipping" }

Parameters:

  • labels (Hash<String, String>)

    A hash containing key/value pairs.

    • Label keys and values can be no longer than 63 characters.
    • Label keys and values can contain only lowercase letters, numbers, underscores, hyphens, and international characters.
    • Label keys and values cannot exceed 128 bytes in size.
    • Label keys must begin with a letter.
    • Label keys must be unique within a dataset.


338
339
340
341
342
# File 'lib/google/cloud/bigquery/dataset.rb', line 338

def labels= labels
  reload! unless resource_full?
  @gapi.labels = labels
  patch_gapi! :labels
end

#load(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Boolean

Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
dataset.load "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
dataset.load "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

dataset.load "my_new_table",
             "gs://my-bucket/xxxx.kind_name.backup_info",
             format: "datastore_backup"

Parameters:

  • table_id (String)

    The destination table to load the data into.

  • file (File, Google::Cloud::Storage::File, String, URI)

    A file or the URI of a Google Cloud Storage file containing data to load into the table.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the table. The default value is append.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the table already contains data.
  • projection_fields (Array<String>)

    If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.

  • jagged_rows (Boolean)

    Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.

  • quoted_newlines (Boolean)

    Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.

  • autodetect (Boolean)

    Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.

  • encoding (String)

    The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.

  • delimiter (String)

    Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.

  • ignore_unknown (Boolean)

    Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

    The format property determines what BigQuery treats as an extra value:

    • CSV: Trailing columns
    • JSON: Named values that don't match any column names
  • max_bad_records (Integer)

    The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.

  • null_marker (String)

    Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

  • quote (String)

    The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.

  • skip_leading (Integer)

    The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

  • schema (Google::Cloud::Bigquery::Schema)

    The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

    See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.

Yields:

  • (schema)

    A block for setting the schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

Returns:

  • (Boolean)

    Returns true if the load job was successful.



1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
# File 'lib/google/cloud/bigquery/dataset.rb', line 1550

def load table_id, file, format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil,
         encoding: nil, delimiter: nil, ignore_unknown: nil,
         max_bad_records: nil, quote: nil, skip_leading: nil,
         schema: nil, autodetect: nil, null_marker: nil

  yield (schema ||= Schema.from_gapi) if block_given?

  options = { format: format, create: create, write: write,
              projection_fields: projection_fields,
              jagged_rows: jagged_rows,
              quoted_newlines: quoted_newlines, encoding: encoding,
              delimiter: delimiter, ignore_unknown: ignore_unknown,
              max_bad_records: max_bad_records, quote: quote,
              skip_leading: skip_leading, schema: schema,
              autodetect: autodetect, null_marker: null_marker }
  job = load_job table_id, file, options

  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#load_job(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

file = File.open "my_data.csv"
load_job = dataset.load_job "my_new_table", file do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Schema is not required with a Cloud Datastore backup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

load_job = dataset.load_job "my_new_table",
                        "gs://my-bucket/xxxx.kind_name.backup_info",
                        format: "datastore_backup"

Parameters:

  • table_id (String)

    The destination table to load the data into.

  • file (File, Google::Cloud::Storage::File, String, URI)

    A file or the URI of a Google Cloud Storage file containing data to load into the table.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the table. The default value is append.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the table already contains data.
  • projection_fields (Array<String>)

    If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.

  • jagged_rows (Boolean)

    Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.

  • quoted_newlines (Boolean)

    Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.

  • autodetect (Boolean)

    Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.

  • encoding (String)

    The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.

  • delimiter (String)

    Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.

  • ignore_unknown (Boolean)

    Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

    The format property determines what BigQuery treats as an extra value:

    • CSV: Trailing columns
    • JSON: Named values that don't match any column names
  • max_bad_records (Integer)

    The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.

  • null_marker (String)

    Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

  • quote (String)

    The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.

  • skip_leading (Integer)

    The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

  • schema (Google::Cloud::Bigquery::Schema)

    The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

    See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.

  • job_id (String)

    A user-defined ID for the load job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Yields:

  • (schema)

    A block for setting the schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

Returns:



1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
# File 'lib/google/cloud/bigquery/dataset.rb', line 1340

def load_job table_id, file, format: nil, create: nil, write: nil,
             projection_fields: nil, jagged_rows: nil,
             quoted_newlines: nil, encoding: nil, delimiter: nil,
             ignore_unknown: nil, max_bad_records: nil, quote: nil,
             skip_leading: nil, dryrun: nil, schema: nil, job_id: nil,
             prefix: nil, labels: nil, autodetect: nil, null_marker: nil
  ensure_service!

  if block_given?
    schema ||= Schema.from_gapi
    yield schema
  end
  schema_gapi = schema.to_gapi if schema

  options = { format: format, create: create, write: write,
              projection_fields: projection_fields,
              jagged_rows: jagged_rows,
              quoted_newlines: quoted_newlines, encoding: encoding,
              delimiter: delimiter, ignore_unknown: ignore_unknown,
              max_bad_records: max_bad_records, quote: quote,
              skip_leading: skip_leading, dryrun: dryrun,
              schema: schema_gapi, job_id: job_id, prefix: prefix,
              labels: labels, autodetect: autodetect,
              null_marker: null_marker }
  return load_storage(table_id, file, options) if storage_url? file
  return load_local(table_id, file, options) if local_file? file
  fail Google::Cloud::Error, "Don't know how to load #{file}"
end

#locationString?

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

Returns:

  • (String, nil)

    The location code, or nil if the object is a reference (see #reference?).



274
275
276
277
278
# File 'lib/google/cloud/bigquery/dataset.rb', line 274

def location
  return nil if reference?
  ensure_full_data!
  @gapi.location
end

#modified_atTime?

The date when this dataset or any of its tables was last modified.

Returns:

  • (Time, nil)

    The last modified time, or nil if not present or the object is a reference (see #reference?).



255
256
257
258
259
260
261
262
263
# File 'lib/google/cloud/bigquery/dataset.rb', line 255

def modified_at
  return nil if reference?
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameString?

A descriptive name for the dataset.

Returns:

  • (String, nil)

    The friendly name, or nil if the object is a reference (see #reference?).



109
110
111
112
# File 'lib/google/cloud/bigquery/dataset.rb', line 109

def name
  return nil if reference?
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.

If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

  • new_name (String)

    The new friendly name, or nil if the object is a reference (see #reference?).



126
127
128
129
130
# File 'lib/google/cloud/bigquery/dataset.rb', line 126

def name= new_name
  reload! unless resource_full?
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idString

The ID of the project containing this dataset.

Returns:

  • (String)

    The project ID.



86
87
88
89
# File 'lib/google/cloud/bigquery/dataset.rb', line 86

def project_id
  return reference.project_id if reference?
  @gapi.dataset_reference.project_id
end

#query(query, params: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::Data

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table"

data.each do |row|
  puts row[:name]
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table",
                     legacy_sql: true

data.each do |row|
  puts row[:name]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = ?",
                     params: [1]

data.each do |row|
  puts row[:name]
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

data = dataset.query "SELECT name FROM my_table WHERE id = @id",
                     params: { id: 1 }

data.each do |row|
  puts row[:name]
end

Query using external data source:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = dataset.query "SELECT * FROM my_ext_table",
                     external: { my_ext_table: csv_table }

data.each do |row|
  puts row[:name]
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • external (Hash<String|Symbol, External::DataSource>)

    A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.

  • max (Integer)

    The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.

Returns:

See Also:



1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
# File 'lib/google/cloud/bigquery/dataset.rb', line 1061

def query query, params: nil, external: nil, max: nil, cache: true,
          standard_sql: nil, legacy_sql: nil
  ensure_service!
  options = { params: params, external: external, cache: cache,
              legacy_sql: legacy_sql, standard_sql: standard_sql }

  job = query_job query, options
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  job.data max: max
end

#query_job(query, params: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data by creating a query job.

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

When using standard SQL and passing arguments using params, Ruby types are mapped to BigQuery types as follows:

BigQuery Ruby Notes
BOOL true/false
INT64 Integer
FLOAT64 Float
STRING STRING
DATETIME DateTime DATETIME does not support time zone.
DATE Date
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table",
                        legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = ?",
                        params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

job = dataset.query_job "SELECT name FROM my_table WHERE id = @id",
                        params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using external data source:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

csv_url = "gs://bucket/path/to/data.csv"
csv_table = dataset.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = dataset.query_job "SELECT * FROM my_ext_table",
                        external: { my_ext_table: csv_table }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Parameters:

  • query (String)

    A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

  • params (Array, Hash)

    Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

  • external (Hash<String|Symbol, External::DataSource>)

    A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.

  • priority (String)

    Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.

  • cache (Boolean)

    Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.

  • table (Table)

    The destination table where the query results should be stored. If not present, a new table will be created to store the results.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies the action that occurs if the destination table already exists. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.

  • large_results (Boolean)

    This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.

  • flatten (Boolean)

    This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.

  • maximum_billing_tier (Integer)

    Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default. For more information, see High-Compute queries.

  • maximum_bytes_billed (Integer)

    Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.

  • job_id (String)

    A user-defined ID for the query job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

  • udfs (Array<String>, String)

    User-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

Returns:



890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
# File 'lib/google/cloud/bigquery/dataset.rb', line 890

def query_job query, params: nil, external: nil,
              priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, standard_sql: nil,
              legacy_sql: nil, large_results: nil, flatten: nil,
              maximum_billing_tier: nil, maximum_bytes_billed: nil,
              job_id: nil, prefix: nil, labels: nil, udfs: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten,
              legacy_sql: legacy_sql, standard_sql: standard_sql,
              maximum_billing_tier: maximum_billing_tier,
              maximum_bytes_billed: maximum_bytes_billed,
              params: params, external: external, labels: labels,
              job_id: job_id, prefix: prefix, udfs: udfs }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#reference?Boolean

Whether the dataset was created without retrieving the resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.reference? # true
dataset.reload!
dataset.reference? # false

Returns:

  • (Boolean)

    true when the dataset is just a local reference object, false otherwise.



1650
1651
1652
# File 'lib/google/cloud/bigquery/dataset.rb', line 1650

def reference?
  @gapi.nil?
end

#reload!Google::Cloud::Bigquery::Dataset Also known as: refresh!

Reloads the dataset with current data from the BigQuery service.

Examples:

Skip retrieving the dataset from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true
dataset.reload!

Returns:



1597
1598
1599
1600
1601
1602
1603
# File 'lib/google/cloud/bigquery/dataset.rb', line 1597

def reload!
  ensure_service!
  reloaded_gapi = service.get_dataset dataset_id
  @reference = nil
  @gapi = reloaded_gapi
  self
end

#resource?Boolean

Whether the dataset was created with a resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

dataset.resource? # false
dataset.reload!
dataset.resource? # true

Returns:

  • (Boolean)

    true when the dataset was created with a resource representation, false otherwise.



1672
1673
1674
# File 'lib/google/cloud/bigquery/dataset.rb', line 1672

def resource?
  !@gapi.nil?
end

#resource_full?Boolean

Whether the dataset was created with a full resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

dataset.resource_full? # true

Returns:

  • (Boolean)

    true when the dataset was created with a full resource representation, false otherwise.



1719
1720
1721
# File 'lib/google/cloud/bigquery/dataset.rb', line 1719

def resource_full?
  @gapi.is_a? Google::Apis::BigqueryV2::Dataset
end

#resource_partial?Boolean

Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets. See Datasets: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.datasets.first

dataset.resource_partial? # true
dataset.description # Loads the full resource.
dataset.resource_partial? # false

Returns:

  • (Boolean)

    true when the dataset was created with a partial resource representation, false otherwise.



1699
1700
1701
# File 'lib/google/cloud/bigquery/dataset.rb', line 1699

def resource_partial?
  @gapi.is_a? Google::Apis::BigqueryV2::DatasetList::Dataset
end

#table(table_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Table?

Retrieves an existing table by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table"
puts table.name

Avoid retrieving the table resource with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"

table = dataset.table "my_table", skip_lookup: true

Parameters:

  • table_id (String)

    The ID of a table.

  • skip_lookup (Boolean)

    Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.

Returns:



618
619
620
621
622
623
624
625
626
627
# File 'lib/google/cloud/bigquery/dataset.rb', line 618

def table table_id, skip_lookup: nil
  ensure_service!
  if skip_lookup
    return Table.new_reference project_id, dataset_id, table_id, service
  end
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

tables = dataset.tables
tables.all do |table|
  puts table.name
end

Parameters:

  • token (String)

    A previously-returned page token representing part of the larger set of results to view.

  • max (Integer)

    Maximum number of tables to return.

Returns:



663
664
665
666
667
668
# File 'lib/google/cloud/bigquery/dataset.rb', line 663

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end