Class: Google::Cloud::Bigquery::Table

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/table.rb,
lib/google/cloud/bigquery/table/list.rb,
lib/google/cloud/bigquery/table/async_inserter.rb

Overview

Table

A named resource representing a BigQuery table that holds zero or more records. Every table is defined by a schema that may contain nested and repeated fields.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

row = {
  "first_name" => "Alice",
  "cities_lived" => [
    {
      "place" => "Seattle",
      "number_of_years" => 5
    },
    {
      "place" => "Stockholm",
      "number_of_years" => 6
    }
  ]
}
table.insert row

See Also:

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: AsyncInserter, List, Updater

Attributes collapse

Data collapse

Lifecycle collapse

Instance Method Details

#api_urlString

A URL that can be used to access the table using the REST API.

Returns:

  • (String)

    A REST URL for the resource.



325
326
327
328
# File 'lib/google/cloud/bigquery/table.rb', line 325

def api_url
  ensure_full_data!
  @gapi.self_link
end

#buffer_bytesInteger

A lower-bound estimate of the number of bytes currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

  • (Integer)

    The estimated number of bytes in the buffer.



686
687
688
689
# File 'lib/google/cloud/bigquery/table.rb', line 686

def buffer_bytes
  ensure_full_data!
  @gapi.streaming_buffer.estimated_bytes if @gapi.streaming_buffer
end

#buffer_oldest_atTime?

The time of the oldest entry currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

  • (Time, nil)

    The oldest entry time.



715
716
717
718
719
720
721
722
723
724
# File 'lib/google/cloud/bigquery/table.rb', line 715

def buffer_oldest_at
  ensure_full_data!
  return nil unless @gapi.streaming_buffer
  oldest_entry_time = @gapi.streaming_buffer.oldest_entry_time
  begin
    ::Time.at(Integer(oldest_entry_time) / 1000.0)
  rescue
    nil
  end
end

#buffer_rowsInteger

A lower-bound estimate of the number of rows currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

  • (Integer)

    The estimated number of rows in the buffer.



701
702
703
704
# File 'lib/google/cloud/bigquery/table.rb', line 701

def buffer_rows
  ensure_full_data!
  @gapi.streaming_buffer.estimated_rows if @gapi.streaming_buffer
end

#bytes_countInteger

The number of bytes in the table.

Returns:

  • (Integer)

    The count of bytes in the table.



361
362
363
364
365
366
367
368
# File 'lib/google/cloud/bigquery/table.rb', line 361

def bytes_count
  ensure_full_data!
  begin
    Integer @gapi.num_bytes
  rescue
    nil
  end
end

#copy(destination_table, create: nil, write: nil) ⇒ Boolean

Copies the data from the table to another table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #copy_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
destination_table = dataset.table "my_destination_table"

table.copy destination_table

Passing a string identifier for the destination table:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.copy "other-project:other_dataset.other_table"

Parameters:

  • destination_table (Table, String)

    The destination for the copied data. This can also be a string identifier as specified by the Query Reference: project_name:datasetId.tableId. This is useful for referencing tables in other projects and datasets.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the destination table. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the destination table already contains data.

Returns:

  • (Boolean)

    Returns true if the copy operation succeeded.



913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
# File 'lib/google/cloud/bigquery/table.rb', line 913

def copy destination_table, create: nil, write: nil
  job = copy_job destination_table, create: create, write: write
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#copy_job(destination_table, create: nil, write: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::CopyJob

Copies the data from the table to another table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #copy.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
destination_table = dataset.table "my_destination_table"

copy_job = table.copy_job destination_table

Passing a string identifier for the destination table:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

copy_job = table.copy_job "other-project:other_dataset.other_table"

Parameters:

  • destination_table (Table, String)

    The destination for the copied data. This can also be a string identifier as specified by the Query Reference: project_name:datasetId.tableId. This is useful for referencing tables in other projects and datasets.

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the destination table. The default value is empty.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the destination table already contains data.
  • job_id (String)

    A user-defined ID for the copy job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:



849
850
851
852
853
854
855
856
857
858
# File 'lib/google/cloud/bigquery/table.rb', line 849

def copy_job destination_table, create: nil, write: nil, dryrun: nil,
             job_id: nil, prefix: nil, labels: nil
  ensure_service!
  options = { create: create, write: write, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels }
  gapi = service.copy_table table_ref,
                            get_table_ref(destination_table),
                            options
  Job.from_gapi gapi, service
end

#created_atTime?

The time when this table was created.

Returns:

  • (Time, nil)

    The creation time.



393
394
395
396
397
398
399
400
# File 'lib/google/cloud/bigquery/table.rb', line 393

def created_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data

Retrieves data from the table.

Examples:

Paginate rows of data: (See Data#next)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.each do |row|
  puts row[:first_name]
end
if data.next?
  more_data = data.next if data.next?
end

Retrieve all rows of data: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.all do |row|
  puts row[:first_name]
end

Parameters:

  • token (String)

    Page token, returned by a previous call, identifying the result set.

  • max (Integer)

    Maximum number of results to return.

  • start (Integer)

    Zero-based index of the starting row to read.

Returns:



766
767
768
769
770
771
# File 'lib/google/cloud/bigquery/table.rb', line 766

def data token: nil, max: nil, start: nil
  ensure_service!
  options = { token: token, max: max, start: start }
  data_gapi = service.list_tabledata dataset_id, table_id, options
  Data.from_gapi data_gapi, gapi, service
end

#dataset_idString

The ID of the Dataset containing this table.

Returns:

  • (String)

    The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



105
106
107
# File 'lib/google/cloud/bigquery/table.rb', line 105

def dataset_id
  @gapi.table_reference.dataset_id
end

#deleteBoolean

Permanently deletes the table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.delete

Returns:

  • (Boolean)

    Returns true if the table was deleted.



1523
1524
1525
1526
1527
# File 'lib/google/cloud/bigquery/table.rb', line 1523

def delete
  ensure_service!
  service.delete_table dataset_id, table_id
  true
end

#descriptionString

A user-friendly description of the table.

Returns:

  • (String)

    The description.



337
338
339
340
# File 'lib/google/cloud/bigquery/table.rb', line 337

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the table.

Parameters:

  • new_description (String)

    The new user-friendly description.



349
350
351
352
# File 'lib/google/cloud/bigquery/table.rb', line 349

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagString

The ETag hash of the table.

Returns:

  • (String)

    The ETag hash.



313
314
315
316
# File 'lib/google/cloud/bigquery/table.rb', line 313

def etag
  ensure_full_data!
  @gapi.etag
end

#expires_atTime?

The time when this table expires. If not present, the table will persist indefinitely. Expired tables will be deleted and their storage reclaimed.

Returns:

  • (Time, nil)

    The expiration time.



411
412
413
414
415
416
417
418
# File 'lib/google/cloud/bigquery/table.rb', line 411

def expires_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.expiration_time) / 1000.0)
  rescue
    nil
  end
end

#externalExternal::DataSource

The External::DataSource (or subclass) object that represents the external data source that the table represents. Data can be queried the table, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Present only if the table represents an External Data Source. See #external? and External::DataSource.

Returns:

See Also:



649
650
651
652
# File 'lib/google/cloud/bigquery/table.rb', line 649

def external
  return nil if @gapi.external_data_configuration.nil?
  External.from_gapi(@gapi.external_data_configuration).freeze
end

#external=(external) ⇒ Object

Set the External::DataSource (or subclass) object that represents the external data source that the table represents. Data can be queried the table, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Use only if the table represents an External Data Source. See #external? and External::DataSource.

Parameters:

See Also:



671
672
673
674
# File 'lib/google/cloud/bigquery/table.rb', line 671

def external= external
  @gapi.external_data_configuration = external.to_gapi
  patch_gapi! :external_data_configuration
end

#external?Boolean

Checks if the table's type is "EXTERNAL".

Returns:

  • (Boolean)

    true when the type is EXTERNAL, false otherwise.



466
467
468
# File 'lib/google/cloud/bigquery/table.rb', line 466

def external?
  @gapi.type == "EXTERNAL"
end

#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil) ⇒ Boolean

Extracts the data from the table to a Google Cloud Storage file using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.extract "gs://my-bucket/file-name.json", format: "json"

Parameters:

  • extract_url (Google::Cloud::Storage::File, String, Array<String>)

    The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • compression (String)

    The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.

  • delimiter (String)

    Delimiter to use between fields in the exported data. Default is ,.

  • header (Boolean)

    Whether to print out a header row in the results. Default is true.

Returns:

  • (Boolean)

    Returns true if the extract operation succeeded.

See Also:



1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
# File 'lib/google/cloud/bigquery/table.rb', line 1050

def extract extract_url, format: nil, compression: nil, delimiter: nil,
            header: nil
  job = extract_job extract_url, format: format,
                                 compression: compression,
                                 delimiter: delimiter, header: header
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#extract_job(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::ExtractJob

Extracts the data from the table to a Google Cloud Storage file using an asynchronous method. In this method, an ExtractJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

extract_job = table.extract_job "gs://my-bucket/file-name.json",
                            format: "json"

Parameters:

  • extract_url (Google::Cloud::Storage::File, String, Array<String>)

    The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • compression (String)

    The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.

  • delimiter (String)

    Delimiter to use between fields in the exported data. Default is ,.

  • header (Boolean)

    Whether to print out a header row in the results. Default is true.

  • job_id (String)

    A user-defined ID for the extract job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:

See Also:



997
998
999
1000
1001
1002
1003
1004
1005
1006
# File 'lib/google/cloud/bigquery/table.rb', line 997

def extract_job extract_url, format: nil, compression: nil,
                delimiter: nil, header: nil, dryrun: nil, job_id: nil,
                prefix: nil, labels: nil
  ensure_service!
  options = { format: format, compression: compression,
              delimiter: delimiter, header: header, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels }
  gapi = service.extract_table table_ref, extract_url, options
  Job.from_gapi gapi, service
end

#fieldsArray<Schema::Field>

The fields of the table, obtained from its schema.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.fields.each do |field|
  puts field.name
end

Returns:



606
607
608
# File 'lib/google/cloud/bigquery/table.rb', line 606

def fields
  schema.fields
end

#headersArray<Symbol>

The names of the columns in the table, obtained from its schema.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.headers.each do |header|
  puts header
end

Returns:

  • (Array<Symbol>)

    An array of column names.



628
629
630
# File 'lib/google/cloud/bigquery/table.rb', line 628

def headers
  schema.headers
end

#idString

The combined Project ID, Dataset ID, and Table ID for this table, in the format specified by the Legacy SQL Query Reference: project_name:datasetId.tableId. To use this value in queries see #query_id.

Returns:

  • (String)

    The combined ID.



242
243
244
# File 'lib/google/cloud/bigquery/table.rb', line 242

def id
  @gapi.id
end

#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ Google::Cloud::Bigquery::InsertResponse

Inserts data into the table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
table.insert rows

Parameters:

  • rows (Hash, Array<Hash>)

    A hash object or array of hash objects containing the data. Required.

  • skip_invalid (Boolean)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

Returns:

See Also:



1443
1444
1445
1446
1447
1448
1449
1450
1451
# File 'lib/google/cloud/bigquery/table.rb', line 1443

def insert rows, skip_invalid: nil, ignore_unknown: nil
  rows = [rows] if rows.is_a? Hash
  fail ArgumentError, "No rows provided" if rows.empty?
  ensure_service!
  options = { skip_invalid: skip_invalid,
              ignore_unknown: ignore_unknown }
  gapi = service.insert_tabledata dataset_id, table_id, rows, options
  InsertResponse.from_gapi rows, gapi
end

#insert_async(skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter

Create an asynchonous inserter object used to insert rows in batches.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
inserter = table.insert_async do |response|
  log_insert "inserted #{response.insert_count} rows " \
    "with #{response.error_count} errors"
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Parameters:

  • skip_invalid (Boolean)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

  • max_rows (Integer)

    The maximum number of rows to be collected before the batch is published. Default is 500.

Yields:

  • (response)

    the callback for when a batch of rows is inserted

Yield Parameters:

Returns:



1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
# File 'lib/google/cloud/bigquery/table.rb', line 1496

def insert_async skip_invalid: nil, ignore_unknown: nil,
                 max_bytes: 10000000, max_rows: 500, interval: 10,
                 threads: 4, &block
  ensure_service!

  AsyncInserter.new self, skip_invalid: skip_invalid,
                          ignore_unknown: ignore_unknown,
                          max_bytes: max_bytes, max_rows: max_rows,
                          interval: interval, threads: threads, &block
end

#labelsHash<String, String>

A hash of user-provided labels associated with this table. Labels are used to organize and group tables. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

labels = table.labels
labels["department"] #=> "shipping"

Returns:

  • (Hash<String, String>)

    A hash containing key/value pairs.



505
506
507
508
509
# File 'lib/google/cloud/bigquery/table.rb', line 505

def labels
  m = @gapi.labels
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(labels) ⇒ Object

Updates the hash of user-provided labels associated with this table. Labels are used to organize and group tables. See Using Labels.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.labels = { "department" => "shipping" }

Parameters:

  • labels (Hash<String, String>)

    A hash containing key/value pairs.

    • Label keys and values can be no longer than 63 characters.
    • Label keys and values can contain only lowercase letters, numbers, underscores, hyphens, and international characters.
    • Label keys and values cannot exceed 128 bytes in size.
    • Label keys must begin with a letter.
    • Label keys must be unique within a table.


537
538
539
540
# File 'lib/google/cloud/bigquery/table.rb', line 537

def labels= labels
  @gapi.labels = labels
  patch_gapi! :labels
end

#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, autodetect: nil, null_marker: nil) ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the table. You can pass a google-cloud storage file path or a google-cloud storage file instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

load_job = table.load_job "gs://my-bucket/file-name.csv"

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = table.load_job file

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

file = File.open "my_data.csv"
load_job = table.load_job file

Parameters:

  • file (File, Google::Cloud::Storage::File, String)

    A file or the URI of a Google Cloud Storage file containing data to load into the table.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the table. The default value is append.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the table already contains data.
  • projection_fields (Array<String>)

    If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.

  • jagged_rows (Boolean)

    Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.

  • quoted_newlines (Boolean)

    Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.

  • autodetect (Boolean)

    Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.

  • encoding (String)

    The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.

  • delimiter (String)

    Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.

  • ignore_unknown (Boolean)

    Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

    The format property determines what BigQuery treats as an extra value:

    • CSV: Trailing columns
    • JSON: Named values that don't match any column names
  • max_bad_records (Integer)

    The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.

  • null_marker (String)

    Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

  • quote (String)

    The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.

  • skip_leading (Integer)

    The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

Returns:



1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
# File 'lib/google/cloud/bigquery/table.rb', line 1379

def load file, format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil,
         encoding: nil, delimiter: nil, ignore_unknown: nil,
         max_bad_records: nil, quote: nil, skip_leading: nil,
         autodetect: nil, null_marker: nil
  job = load_job file, format: format, create: create, write: write,
                       projection_fields: projection_fields,
                       jagged_rows: jagged_rows,
                       quoted_newlines: quoted_newlines,
                       encoding: encoding, delimiter: delimiter,
                       ignore_unknown: ignore_unknown,
                       max_bad_records: max_bad_records, quote: quote,
                       skip_leading: skip_leading,
                       autodetect: autodetect, null_marker: null_marker

  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#load_job(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) ⇒ Google::Cloud::Bigquery::LoadJob

Loads data into the table. You can pass a google-cloud storage file path or a google-cloud storage file instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

load_job = table.load_job "gs://my-bucket/file-name.csv"

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = table.load_job file

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

file = File.open "my_data.csv"
load_job = table.load_job file

Parameters:

  • file (File, Google::Cloud::Storage::File, String)

    A file or the URI of a Google Cloud Storage file containing data to load into the table.

  • format (String)

    The exported file format. The default value is csv.

    The following values are supported:

  • create (String)

    Specifies whether the job is allowed to create new tables. The default value is needed.

    The following values are supported:

    • needed - Create the table if it does not exist.
    • never - The table must already exist. A 'notFound' error is raised if the table does not exist.
  • write (String)

    Specifies how to handle data already present in the table. The default value is append.

    The following values are supported:

    • truncate - BigQuery overwrites the table data.
    • append - BigQuery appends the data to the table.
    • empty - An error will be returned if the table already contains data.
  • projection_fields (Array<String>)

    If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.

  • jagged_rows (Boolean)

    Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.

  • quoted_newlines (Boolean)

    Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.

  • autodetect (Boolean)

    Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.

  • encoding (String)

    The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.

  • delimiter (String)

    Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.

  • ignore_unknown (Boolean)

    Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

    The format property determines what BigQuery treats as an extra value:

    • CSV: Trailing columns
    • JSON: Named values that don't match any column names
  • max_bad_records (Integer)

    The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.

  • null_marker (String)

    Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

  • quote (String)

    The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.

  • skip_leading (Integer)

    The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

  • job_id (String)

    A user-defined ID for the load job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

    See Generating a job ID.

  • prefix (String)

    A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

  • labels (Hash)

    A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:



1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
# File 'lib/google/cloud/bigquery/table.rb', line 1225

def load_job file, format: nil, create: nil, write: nil,
             projection_fields: nil, jagged_rows: nil,
             quoted_newlines: nil, encoding: nil, delimiter: nil,
             ignore_unknown: nil, max_bad_records: nil, quote: nil,
             skip_leading: nil, dryrun: nil, job_id: nil, prefix: nil,
             labels: nil, autodetect: nil, null_marker: nil
  ensure_service!
  options = { format: format, create: create, write: write,
              projection_fields: projection_fields,
              jagged_rows: jagged_rows,
              quoted_newlines: quoted_newlines, encoding: encoding,
              delimiter: delimiter, ignore_unknown: ignore_unknown,
              max_bad_records: max_bad_records, quote: quote,
              skip_leading: skip_leading, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels,
              autodetect: autodetect, null_marker: null_marker }
  return load_storage(file, options) if storage_url? file
  return load_local(file, options) if local_file? file
  fail Google::Cloud::Error, "Don't know how to load #{file}"
end

#locationString

The geographic location where the table should reside. Possible values include EU and US. The default value is US.

Returns:

  • (String)

    The location code.



478
479
480
481
# File 'lib/google/cloud/bigquery/table.rb', line 478

def location
  ensure_full_data!
  @gapi.location
end

#modified_atTime?

The date when this table was last modified.

Returns:

  • (Time, nil)

    The last modified time.



427
428
429
430
431
432
433
434
# File 'lib/google/cloud/bigquery/table.rb', line 427

def modified_at
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameString

The name of the table.

Returns:

  • (String)

    The friendly name.



290
291
292
# File 'lib/google/cloud/bigquery/table.rb', line 290

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the name of the table.

Parameters:

  • new_name (String)

    The new friendly name.



301
302
303
304
# File 'lib/google/cloud/bigquery/table.rb', line 301

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idString

The ID of the Project containing this table.

Returns:

  • (String)

    The project ID.



116
117
118
# File 'lib/google/cloud/bigquery/table.rb', line 116

def project_id
  @gapi.table_reference.project_id
end

#query_id(standard_sql: nil, legacy_sql: nil) ⇒ String

The value returned by #id, wrapped in square brackets if the Project ID contains dashes, as specified by the Query Reference. Useful in queries.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = bigquery.query "SELECT first_name FROM #{table.query_id}"

Parameters:

  • standard_sql (Boolean)

    Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.

  • legacy_sql (Boolean)

    Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.

Returns:

  • (String)

    The appropriate table ID for use in queries, depending on SQL type.



275
276
277
278
279
280
281
# File 'lib/google/cloud/bigquery/table.rb', line 275

def query_id standard_sql: nil, legacy_sql: nil
  if Convert.resolve_legacy_sql standard_sql, legacy_sql
    "[#{id}]"
  else
    "`#{project_id}.#{dataset_id}.#{table_id}`"
  end
end

#reload!Object Also known as: refresh!

Reloads the table with current data from the BigQuery service.



1534
1535
1536
1537
1538
# File 'lib/google/cloud/bigquery/table.rb', line 1534

def reload!
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  @gapi = gapi
end

#rows_countInteger

The number of rows in the table.

Returns:

  • (Integer)

    The count of rows in the table.



377
378
379
380
381
382
383
384
# File 'lib/google/cloud/bigquery/table.rb', line 377

def rows_count
  ensure_full_data!
  begin
    Integer @gapi.num_rows
  rescue
    nil
  end
end

#schema(replace: false) {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema

Returns the table's schema. This method can also be used to set, replace, or add to the schema by passing a block. See Schema for available methods.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

table.schema do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Parameters:

  • replace (Boolean)

    Whether to replace the existing schema with the new schema. If true, the fields will replace the existing schema. If false, the fields will be added to the existing schema. When a table already contains data, schema changes must be additive. Thus, the default value is false.

Yields:

  • (schema)

    a block for setting the schema

Yield Parameters:

  • schema (Schema)

    the object accepting the schema

Returns:



574
575
576
577
578
579
580
581
582
583
584
585
586
# File 'lib/google/cloud/bigquery/table.rb', line 574

def schema replace: false
  ensure_full_data!
  schema_builder = Schema.from_gapi @gapi.schema
  if block_given?
    schema_builder = Schema.from_gapi if replace
    yield schema_builder
    if schema_builder.changed?
      @gapi.schema = schema_builder.to_gapi
      patch_gapi! :schema
    end
  end
  schema_builder.freeze
end

#table?Boolean

Checks if the table's type is "TABLE".

Returns:

  • (Boolean)

    true when the type is TABLE, false otherwise.



443
444
445
# File 'lib/google/cloud/bigquery/table.rb', line 443

def table?
  @gapi.type == "TABLE"
end

#table_idString

A unique ID for this table.

Returns:

  • (String)

    The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



93
94
95
# File 'lib/google/cloud/bigquery/table.rb', line 93

def table_id
  @gapi.table_reference.table_id
end

#time_partitioning?Boolean

Checks if the table is time-partitioned. See Partitioned Tables.

Returns:

  • (Boolean)

    true when the table is time-partitioned, false otherwise.



138
139
140
# File 'lib/google/cloud/bigquery/table.rb', line 138

def time_partitioning?
  !@gapi.time_partitioning.nil?
end

#time_partitioning_expirationInteger?

The expiration for the table partitions, if any, in seconds. See Partitioned Tables.

Returns:

  • (Integer, nil)

    The expiration time, in seconds, for data in partitions.



195
196
197
198
199
200
# File 'lib/google/cloud/bigquery/table.rb', line 195

def time_partitioning_expiration
  ensure_full_data!
  @gapi.time_partitioning.expiration_ms / 1_000 if
      time_partitioning? &&
      !@gapi.time_partitioning.expiration_ms.nil?
end

#time_partitioning_expiration=(expiration) ⇒ Object

Sets the partition expiration for the table. See Partitioned Tables. The table must also be partitioned.

See #time_partitioning_type=.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
  table.time_partitioning_expiration = 86_400
end

Parameters:

  • expiration (Integer)

    An expiration time, in seconds, for data in partitions.



224
225
226
227
228
229
# File 'lib/google/cloud/bigquery/table.rb', line 224

def time_partitioning_expiration= expiration
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.expiration_ms = expiration * 1000
  patch_gapi! :time_partitioning
end

#time_partitioning_typeString?

The period for which the table is partitioned, if any. See Partitioned Tables.

Returns:

  • (String, nil)

    The partition type. Currently the only supported value is "DAY".



151
152
153
154
# File 'lib/google/cloud/bigquery/table.rb', line 151

def time_partitioning_type
  ensure_full_data!
  @gapi.time_partitioning.type if time_partitioning?
end

#time_partitioning_type=(type) ⇒ Object

Sets the partitioning for the table. See Partitioned Tables.

You can only set partitioning when creating a table as in the example below. BigQuery does not allow you to change partitioning on an existing table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
end

Parameters:

  • type (String)

    The partition type. Currently the only supported value is "DAY".



178
179
180
181
182
183
# File 'lib/google/cloud/bigquery/table.rb', line 178

def time_partitioning_type= type
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.type = type
  patch_gapi! :time_partitioning
end

#view?Boolean

Checks if the table's type is "VIEW".

Returns:

  • (Boolean)

    true when the type is VIEW, false otherwise.



454
455
456
# File 'lib/google/cloud/bigquery/table.rb', line 454

def view?
  @gapi.type == "VIEW"
end