Class: Google::Cloud::Bigquery::Table

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/table.rb,
lib/google/cloud/bigquery/table/list.rb

Overview

Table

A named resource representing a BigQuery table that holds zero or more records. Every table is defined by a schema that may contain nested and repeated fields.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

row = {
  "first_name" => "Alice",
  "cities_lived" => [
    {
      "place" => "Seattle",
      "number_of_years" => 5
    },
    {
      "place" => "Stockholm",
      "number_of_years" => 6
    }
  ]
}
table.insert row

See Also:

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: List, Updater

Attributes collapse

Data collapse

Lifecycle collapse

Instance Method Details

#api_urlObject

A URL that can be used to access the dataset using the REST API.



292
293
294
295
# File 'lib/google/cloud/bigquery/table.rb', line 292

def api_url
  ensure_full_data!
  @gapi.self_link
end

#bytes_countObject

The number of bytes in the table.



322
323
324
325
326
327
328
329
# File 'lib/google/cloud/bigquery/table.rb', line 322

def bytes_count
  ensure_full_data!
  begin
    Integer @gapi.num_bytes
  rescue
    nil
  end
end

#copy(destination_table, create: nil, write: nil, dryrun: nil) ⇒ Google::Cloud::Bigquery::CopyJob

Copies the data from the table to another table. The destination table argument can also be a string identifier as specified by the Query Reference: project_name:datasetId.tableId. This is useful for referencing tables in other projects and datasets.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
destination_table = dataset.table "my_destination_table"

copy_job = table.copy destination_table

Passing a string identifier for the destination table:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

copy_job = table.copy "other-project:other_dataset.other_table"

Parameters:

  • destination_table (Table, String)

    The destination for the copied data.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the destination table. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the destination table already contains data.

Returns:



579
580
581
582
583
584
585
586
# File 'lib/google/cloud/bigquery/table.rb', line 579

def copy destination_table, create: nil, write: nil, dryrun: nil
  ensure_service!
  options = { create: create, write: write, dryrun: dryrun }
  gapi = service.copy_table table_ref,
                            get_table_ref(destination_table),
                            options
  Job.from_gapi gapi, service
end

#created_atObject

The time when this table was created.



350
351
352
353
354
355
356
357
# File 'lib/google/cloud/bigquery/table.rb', line 350

def created_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data

Retrieves data from the table.

Examples:

Paginate rows of data: (See Data#next)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.each do |row|
  puts row[:first_name]
end
if data.next?
  more_data = data.next if data.next?
end

Retrieve all rows of data: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.all do |row|
  puts row[:first_name]
end

Parameters:

  • token (String)

    Page token, returned by a previous call, identifying the result set.

  • max (Integer)

    Maximum number of results to return.

  • start (Integer)

    Zero-based index of the starting row to read.

Returns:



522
523
524
525
526
527
# File 'lib/google/cloud/bigquery/table.rb', line 522

def data token: nil, max: nil, start: nil
  ensure_service!
  options = { token: token, max: max, start: start }
  gapi = service.list_tabledata dataset_id, table_id, options
  Data.from_gapi gapi, self
end

#dataset_idObject

The ID of the Dataset containing this table.



99
100
101
# File 'lib/google/cloud/bigquery/table.rb', line 99

def dataset_id
  @gapi.table_reference.dataset_id
end

#deleteBoolean

Permanently deletes the table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.delete

Returns:

  • (Boolean)

    Returns true if the table was deleted.



838
839
840
841
842
# File 'lib/google/cloud/bigquery/table.rb', line 838

def delete
  ensure_service!
  service.delete_table dataset_id, table_id
  true
end

#descriptionObject

The description of the table.



302
303
304
305
# File 'lib/google/cloud/bigquery/table.rb', line 302

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the description of the table.



312
313
314
315
# File 'lib/google/cloud/bigquery/table.rb', line 312

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagObject

A string hash of the dataset.



282
283
284
285
# File 'lib/google/cloud/bigquery/table.rb', line 282

def etag
  ensure_full_data!
  @gapi.etag
end

#expires_atObject

The time when this table expires. If not present, the table will persist indefinitely. Expired tables will be deleted and their storage reclaimed.



366
367
368
369
370
371
372
373
# File 'lib/google/cloud/bigquery/table.rb', line 366

def expires_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.expiration_time) / 1000.0)
  rescue
    nil
  end
end

#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil) ⇒ Google::Cloud::Bigquery::ExtractJob

Extract the data from the table to a Google Cloud Storage file.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

extract_job = table.extract "gs://my-bucket/file-name.json",
                            format: "json"

Parameters:

  • extract_url (Google::Cloud::Storage::File, String, Array<String>)

    The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • compression (String)

    The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.

  • delimiter (String)

    Delimiter to use between fields in the exported data. Default is ,.

  • header (Boolean)

    Whether to print out a header row in the results. Default is true.

Returns:

See Also:



628
629
630
631
632
633
634
635
# File 'lib/google/cloud/bigquery/table.rb', line 628

def extract extract_url, format: nil, compression: nil, delimiter: nil,
            header: nil, dryrun: nil
  ensure_service!
  options = { format: format, compression: compression,
              delimiter: delimiter, header: header, dryrun: dryrun }
  gapi = service.extract_table table_ref, extract_url, options
  Job.from_gapi gapi, service
end

#fieldsObject

The fields of the table.



469
470
471
# File 'lib/google/cloud/bigquery/table.rb', line 469

def fields
  schema.fields
end

#headersObject

The names of the columns in the table.



478
479
480
# File 'lib/google/cloud/bigquery/table.rb', line 478

def headers
  schema.headers
end

#idObject

The combined Project ID, Dataset ID, and Table ID for this table, in the format specified by the Legacy SQL Query Reference: project_name:datasetId.tableId. To use this value in queries see #query_id.



220
221
222
# File 'lib/google/cloud/bigquery/table.rb', line 220

def id
  @gapi.id
end

#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ Google::Cloud::Bigquery::InsertResponse

Inserts data into the table for near-immediate querying, without the need to complete a #load operation before the data can appear in query results.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
table.insert rows

Parameters:

  • rows (Hash, Array<Hash>)

    A hash object or array of hash objects containing the data.

  • skip_invalid (Boolean)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

Returns:

See Also:



812
813
814
815
816
817
818
819
820
# File 'lib/google/cloud/bigquery/table.rb', line 812

def insert rows, skip_invalid: nil, ignore_unknown: nil
  rows = [rows] if rows.is_a? Hash
  rows = Convert.to_json_rows rows
  ensure_service!
  options = { skip_invalid: skip_invalid,
              ignore_unknown: ignore_unknown }
  gapi = service.insert_tabledata dataset_id, table_id, rows, options
  InsertResponse.from_gapi rows, gapi
end

#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil) ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the table. You can pass a google-cloud storage file path or a google-cloud storage file instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

load_job = table.load "gs://my-bucket/file-name.csv"

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = table.load file

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

file = File.open "my_data.csv"
load_job = table.load file

Parameters:

  • file (File, Google::Cloud::Storage::File, String)

    A file or the URI of a Google Cloud Storage file containing data to load into the table.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the table. The default value is append.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the table already contains data.
  • projection_fields (Array<String>)

    If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.

  • jagged_rows (Boolean)

    Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.

  • quoted_newlines (Boolean)

    Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.

  • encoding (String)

    The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.

  • delimiter (String)

    Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.

  • ignore_unknown (Boolean)

    Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

    The format property determines what BigQuery treats as an extra value:

    • CSV: Trailing columns
    • JSON: Named values that don't match any column names
  • max_bad_records (Integer)

    The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.

  • quote (String)

    The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.

  • skip_leading (Integer)

    The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

Returns:



760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
# File 'lib/google/cloud/bigquery/table.rb', line 760

def load file, format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil,
         encoding: nil, delimiter: nil, ignore_unknown: nil,
         max_bad_records: nil, quote: nil, skip_leading: nil,
         dryrun: nil
  ensure_service!
  options = { format: format, create: create, write: write,
              projection_fields: projection_fields,
              jagged_rows: jagged_rows,
              quoted_newlines: quoted_newlines, encoding: encoding,
              delimiter: delimiter, ignore_unknown: ignore_unknown,
              max_bad_records: max_bad_records, quote: quote,
              skip_leading: skip_leading, dryrun: dryrun }
  return load_storage(file, options) if storage_url? file
  return load_local(file, options) if local_file? file
  fail Google::Cloud::Error, "Don't know how to load #{file}"
end

#locationObject

The geographic location where the table should reside. Possible values include EU and US. The default value is US.



413
414
415
416
# File 'lib/google/cloud/bigquery/table.rb', line 413

def location
  ensure_full_data!
  @gapi.location
end

#modified_atObject

The date when this table was last modified.



380
381
382
383
384
385
386
387
# File 'lib/google/cloud/bigquery/table.rb', line 380

def modified_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameObject

The name of the table.



263
264
265
# File 'lib/google/cloud/bigquery/table.rb', line 263

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the name of the table.



272
273
274
275
# File 'lib/google/cloud/bigquery/table.rb', line 272

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idObject

The ID of the Project containing this table.



108
109
110
# File 'lib/google/cloud/bigquery/table.rb', line 108

def project_id
  @gapi.table_reference.project_id
end

#query_id(standard_sql: nil, legacy_sql: nil) ⇒ Object

The value returned by #id, wrapped in square brackets if the Project ID contains dashes, as specified by the Query Reference. Useful in queries.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = bigquery.query "SELECT first_name FROM #{table.query_id}"

Parameters:

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.



250
251
252
253
254
255
256
# File 'lib/google/cloud/bigquery/table.rb', line 250

def query_id standard_sql: nil, legacy_sql: nil
  if Convert.resolve_legacy_sql standard_sql, legacy_sql
    "[#{id}]"
  else
    "`#{project_id}.#{dataset_id}.#{table_id}`"
  end
end

#reload!Object Also known as: refresh!

Reloads the table with current data from the BigQuery service.



849
850
851
852
853
# File 'lib/google/cloud/bigquery/table.rb', line 849

def reload!
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  @gapi = gapi
end

#rows_countObject

The number of rows in the table.



336
337
338
339
340
341
342
343
# File 'lib/google/cloud/bigquery/table.rb', line 336

def rows_count
  ensure_full_data!
  begin
    Integer @gapi.num_rows
  rescue
    nil
  end
end

#schema(replace: false) {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema

Returns the table's schema. This method can also be used to set, replace, or add to the schema by passing a block. See Schema for available methods.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

table.schema do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Parameters:

  • replace (Boolean)

    Whether to replace the existing schema with the new schema. If true, the fields will replace the existing schema. If false, the fields will be added to the existing schema. When a table already contains data, schema changes must be additive. Thus, the default value is false.

Yields:

  • (schema)

    a block for setting the schema

Yield Parameters:

  • schema (Schema)

    the object accepting the schema

Returns:



450
451
452
453
454
455
456
457
458
459
460
461
462
# File 'lib/google/cloud/bigquery/table.rb', line 450

def schema replace: false
  ensure_full_data!
  schema_builder = Schema.from_gapi @gapi.schema
  if block_given?
    schema_builder = Schema.from_gapi if replace
    yield schema_builder
    if schema_builder.changed?
      @gapi.schema = schema_builder.to_gapi
      patch_gapi! :schema
    end
  end
  schema_builder.freeze
end

#table?Boolean

Checks if the table's type is "TABLE".

Returns:

  • (Boolean)


394
395
396
# File 'lib/google/cloud/bigquery/table.rb', line 394

def table?
  @gapi.type == "TABLE"
end

#table_idObject

A unique ID for this table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



90
91
92
# File 'lib/google/cloud/bigquery/table.rb', line 90

def table_id
  @gapi.table_reference.table_id
end

#time_partitioning?Boolean

Is the table partitioned?

Returns:

  • (Boolean)


126
127
128
# File 'lib/google/cloud/bigquery/table.rb', line 126

def time_partitioning?
  !@gapi.time_partitioning.nil?
end

#time_partitioning_expirationObject

The expiration for the table partitions, if any, in seconds.



175
176
177
178
179
180
# File 'lib/google/cloud/bigquery/table.rb', line 175

def time_partitioning_expiration
  ensure_full_data!
  @gapi.time_partitioning.expiration_ms / 1_000 if
      time_partitioning? &&
      !@gapi.time_partitioning.expiration_ms.nil?
end

#time_partitioning_expiration=(expiration) ⇒ Object

Sets the partition expiration for the table. See Partitioned Tables . The table must also be partitioned.

See #time_partitioning_type=.

for data in partitions.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
  table.time_partitioning_expiration = 86_400
end

Parameters:

  • expiration (Integer)

    An expiration time, in seconds,



204
205
206
207
208
209
# File 'lib/google/cloud/bigquery/table.rb', line 204

def time_partitioning_expiration= expiration
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.expiration_ms = expiration * 1000
  patch_gapi! :time_partitioning
end

#time_partitioning_typeObject

The period for which the table is partitioned, if any.



135
136
137
138
# File 'lib/google/cloud/bigquery/table.rb', line 135

def time_partitioning_type
  ensure_full_data!
  @gapi.time_partitioning.type if time_partitioning?
end

#time_partitioning_type=(type) ⇒ Object

Sets the partitioning for the table. See Partitioned Tables .

You can only set partitioning when creating a table as in the example below. BigQuery does not allow you to change partitioning on an existing table.

supported value is "DAY".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
end

Parameters:

  • type (String)

    The partition type. Currently the only



162
163
164
165
166
167
# File 'lib/google/cloud/bigquery/table.rb', line 162

def time_partitioning_type= type
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.type = type
  patch_gapi! :time_partitioning
end

#view?Boolean

Checks if the table's type is "VIEW".

Returns:

  • (Boolean)


403
404
405
# File 'lib/google/cloud/bigquery/table.rb', line 403

def view?
  @gapi.type == "VIEW"
end