Class: Google::Cloud::Bigquery::Table

Inherits:

Object

Object
Google::Cloud::Bigquery::Table

show all

Defined in:: lib/google/cloud/bigquery/table.rb,
lib/google/cloud/bigquery/table/list.rb,
lib/google/cloud/bigquery/table/async_inserter.rb

Overview

Table

A named resource representing a BigQuery table that holds zero or more records. Every table is defined by a schema that may contain nested and repeated fields.

The Table class can also represent a view, which is a virtual table defined by a SQL query. BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See #view?, #query, #query=, and Dataset#create_view.)

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

row = {
  "first_name" => "Alice",
  "cities_lived" => [
    {
      "place" => "Seattle",
      "number_of_years" => 5
    },
    {
      "place" => "Stockholm",
      "number_of_years" => 6
    }
  ]
}
table.insert row

Creating a BigQuery view:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
         "SELECT name, age FROM `my_project.my_dataset.my_table`"
view.view? # true

See Also:

Preparing Data for BigQuery

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: AsyncInserter, List, Updater

Attributes collapse

#api_url ⇒ String^?
A URL that can be used to access the table using the REST API.
#buffer_bytes ⇒ Integer^?
A lower-bound estimate of the number of bytes currently in this table's streaming buffer, if one is present.
#buffer_oldest_at ⇒ Time^?
The time of the oldest entry currently in this table's streaming buffer, if one is present.
#buffer_rows ⇒ Integer^?
A lower-bound estimate of the number of rows currently in this table's streaming buffer, if one is present.
#created_at ⇒ Time^?
The time when this table was created.
#dataset_id ⇒ String
The ID of the Dataset containing this table.
#description ⇒ String^?
A user-friendly description of the table.
#description=(new_description) ⇒ Object
Updates the user-friendly description of the table.
#etag ⇒ String^?
The ETag hash of the table.
#expires_at ⇒ Time^?
The time when this table expires.
#external ⇒ External::DataSource^?
The External::DataSource (or subclass) object that represents the external data source that the table represents.
#external=(external) ⇒ Object
Set the External::DataSource (or subclass) object that represents the external data source that the table represents.
#external? ⇒ Boolean^?
Checks if the table's type is "EXTERNAL", indicating that the table represents an External Data Source.
#fields ⇒ Array<Schema::Field>^?
The fields of the table, obtained from its schema.
#headers ⇒ Array<Symbol>^?
The names of the columns in the table, obtained from its schema.
#id ⇒ String^?
The combined Project ID, Dataset ID, and Table ID for this table, in the format specified by the Legacy SQL Query Reference: project_name:datasetId.tableId.
#labels ⇒ Hash<String, String>^?
A hash of user-provided labels associated with this table.
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this table.
#location ⇒ String^?
The geographic location where the table should reside.
#modified_at ⇒ Time^?
The date when this table was last modified.
#name ⇒ String^?
The name of the table.
#name=(new_name) ⇒ Object
Updates the name of the table.
#project_id ⇒ String
The ID of the Project containing this table.
#query ⇒ String
The query that executes each time the view is loaded.
#query_id(standard_sql: nil, legacy_sql: nil) ⇒ String
The value returned by #id, wrapped in square brackets if the Project ID contains dashes, as specified by the Query Reference.
#query_legacy_sql? ⇒ Boolean
Checks if the view's query is using legacy sql.
#query_standard_sql? ⇒ Boolean
Checks if the view's query is using standard sql.
#query_udfs ⇒ Array<String>
The user-defined function resources used in the view's query.
#schema(replace: false) {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema^?
Returns the table's schema.
#table? ⇒ Boolean^?
Checks if the table's type is "TABLE".
#table_id ⇒ String
A unique ID for this table.
#time_partitioning? ⇒ Boolean^?
Checks if the table is time-partitioned.
#time_partitioning_expiration ⇒ Integer^?
The expiration for the table partitions, if any, in seconds.
#time_partitioning_expiration=(expiration) ⇒ Object
Sets the partition expiration for the table.
#time_partitioning_type ⇒ String^?
The period for which the table is partitioned, if any.
#time_partitioning_type=(type) ⇒ Object
Sets the partitioning for the table.
#view? ⇒ Boolean^?
Checks if the table's type is "VIEW", indicating that the table represents a BigQuery view.

Data collapse

#bytes_count ⇒ Integer^?
The number of bytes in the table.
#copy(destination_table, create: nil, write: nil) ⇒ Boolean
Copies the data from the table to another table using a synchronous method that blocks for a response.
#copy_job(destination_table, create: nil, write: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::CopyJob
Copies the data from the table to another table using an asynchronous method.
#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data
Retrieves data from the table.
#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil) ⇒ Boolean
Extracts the data from the table to a Google Cloud Storage file using a synchronous method that blocks for a response.
#extract_job(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::ExtractJob
Extracts the data from the table to a Google Cloud Storage file using an asynchronous method.
#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
#insert_async(skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchronous inserter object used to insert rows in batches.
#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, autodetect: nil, null_marker: nil) ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the table.
#load_job(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the table.
#rows_count ⇒ Integer^?
The number of rows in the table.

Lifecycle collapse

#delete ⇒ Boolean
Permanently deletes the table.
#exists? ⇒ Boolean
Determines whether the table exists in the BigQuery service.
#query=(new_query) ⇒ Object
Updates the query that executes each time the view is loaded.
#reference? ⇒ Boolean
Whether the table was created without retrieving the resource representation from the BigQuery service.
#reload! ⇒ Google::Cloud::Bigquery::Table (also: #refresh!)
Reloads the table with current data from the BigQuery service.
#resource? ⇒ Boolean
Whether the table was created with a resource representation from the BigQuery service.
#resource_full? ⇒ Boolean
Whether the table was created with a full resource representation from the BigQuery service.
#resource_partial? ⇒ Boolean
Whether the table was created with a partial resource representation from the BigQuery service by retrieval through Dataset#tables.
#set_query(query, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Object
Updates the query that executes each time the view is loaded.

Instance Method Details

#api_url ⇒ `String`^?

A URL that can be used to access the table using the REST API.

Returns:

(String, nil) —
A REST URL for the resource, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 379

def api_url
  return nil if reference?
  ensure_full_data!
  @gapi.self_link
end

#buffer_bytes ⇒ `Integer`^?

A lower-bound estimate of the number of bytes currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

(Integer, nil) —
The estimated number of bytes in the buffer, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 791

def buffer_bytes
  return nil if reference?
  ensure_full_data!
  @gapi.streaming_buffer.estimated_bytes if @gapi.streaming_buffer
end

#buffer_oldest_at ⇒ `Time`^?

The time of the oldest entry currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

(Time, nil) —
The oldest entry time, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 825

def buffer_oldest_at
  return nil if reference?
  ensure_full_data!
  return nil unless @gapi.streaming_buffer
  oldest_entry_time = @gapi.streaming_buffer.oldest_entry_time
  begin
    ::Time.at(Integer(oldest_entry_time) / 1000.0)
  rescue
    nil
  end
end

#buffer_rows ⇒ `Integer`^?

A lower-bound estimate of the number of rows currently in this table's streaming buffer, if one is present. This field will be absent if the table is not being streamed to or if there is no data in the streaming buffer.

Returns:

(Integer, nil) —
The estimated number of rows in the buffer, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 809

def buffer_rows
  return nil if reference?
  ensure_full_data!
  @gapi.streaming_buffer.estimated_rows if @gapi.streaming_buffer
end

#bytes_count ⇒ `Integer`^?

The number of bytes in the table.

Returns:

(Integer, nil) —
The count of bytes in the table, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 424

def bytes_count
  return nil if reference?
  ensure_full_data!
  begin
    Integer @gapi.num_bytes
  rescue
    nil
  end
end

#copy(destination_table, create: nil, write: nil) ⇒ `Boolean`

Copies the data from the table to another table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #copy_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
destination_table = dataset.table "my_destination_table"

table.copy destination_table

Passing a string identifier for the destination table:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.copy "other-project:other_dataset.other_table"

Parameters:

destination_table (Table, String) —
The destination for the copied data. This can also be a string identifier as specified by the Query Reference: project_name:datasetId.tableId. This is useful for referencing tables in other projects and datasets.
create (String) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) —
Specifies how to handle data already present in the destination table. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the destination table already contains data.

Returns:

(Boolean) —
Returns true if the copy operation succeeded.

# File 'lib/google/cloud/bigquery/table.rb', line 1159

def copy destination_table, create: nil, write: nil
  job = copy_job destination_table, create: create, write: write
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#copy_job(destination_table, create: nil, write: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ `Google::Cloud::Bigquery::CopyJob`

Copies the data from the table to another table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #copy.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
destination_table = dataset.table "my_destination_table"

copy_job = table.copy_job destination_table

Passing a string identifier for the destination table:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

copy_job = table.copy_job "other-project:other_dataset.other_table"

Parameters:

destination_table (Table, String) —
The destination for the copied data. This can also be a string identifier as specified by the Query Reference: project_name:datasetId.tableId. This is useful for referencing tables in other projects and datasets.
create (String) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) —
Specifies how to handle data already present in the destination table. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the destination table already contains data.
job_id (String) —
A user-defined ID for the copy job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:

(Google::Cloud::Bigquery::CopyJob)

# File 'lib/google/cloud/bigquery/table.rb', line 1095

def copy_job destination_table, create: nil, write: nil, dryrun: nil,
             job_id: nil, prefix: nil, labels: nil
  ensure_service!
  options = { create: create, write: write, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels }
  gapi = service.copy_table table_ref,
                            get_table_ref(destination_table),
                            options
  Job.from_gapi gapi, service
end

#created_at ⇒ `Time`^?

The time when this table was created.

Returns:

(Time, nil) —
The creation time, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 460

def created_at
  return nil if reference?
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#data(token: nil, max: nil, start: nil) ⇒ `Google::Cloud::Bigquery::Data`

Retrieves data from the table.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the data retrieval.

Examples:

Paginate rows of data: (See Data#next)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.each do |row|
  puts row[:first_name]
end
if data.next?
  more_data = data.next if data.next?
end

Retrieve all rows of data: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = table.data
data.all do |row|
  puts row[:first_name]
end

Parameters:

token (String) —
Page token, returned by a previous call, identifying the result set.
max (Integer) —
Maximum number of results to return.
start (Integer) —
Zero-based index of the starting row to read.

Returns:

(Google::Cloud::Bigquery::Data)

# File 'lib/google/cloud/bigquery/table.rb', line 1010

def data token: nil, max: nil, start: nil
  ensure_service!
  reload! unless resource_full?
  options = { token: token, max: max, start: start }
  data_json = service.list_tabledata \
    dataset_id, table_id, options
  Data.from_gapi_json data_json, gapi, service
end

#dataset_id ⇒ `String`

The ID of the Dataset containing this table.

Returns:

(String) —
The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

# File 'lib/google/cloud/bigquery/table.rb', line 128

def dataset_id
  return reference.dataset_id if reference?
  @gapi.table_reference.dataset_id
end

#delete ⇒ `Boolean`

Permanently deletes the table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.delete

Returns:

(Boolean) —
Returns true if the table was deleted.

# File 'lib/google/cloud/bigquery/table.rb', line 1786

def delete
  ensure_service!
  service.delete_table dataset_id, table_id
  true
end

#description ⇒ `String`^?

A user-friendly description of the table.

Returns:

(String, nil) —
The description, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 393

def description
  return nil if reference?
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ `Object`

Updates the user-friendly description of the table.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_description (String) —
The new user-friendly description.

# File 'lib/google/cloud/bigquery/table.rb', line 410

def description= new_description
  reload! unless resource_full?
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etag ⇒ `String`^?

The ETag hash of the table.

Returns:

(String, nil) —
The ETag hash, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 365

def etag
  return nil if reference?
  ensure_full_data!
  @gapi.etag
end

#exists? ⇒ `Boolean`

Determines whether the table exists in the BigQuery service. The result is cached locally.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table", skip_lookup: true
table.exists? # true

Returns:

(Boolean) —
true when the table exists in the BigQuery service, false otherwise.

# File 'lib/google/cloud/bigquery/table.rb', line 1833

def exists?
  # Always true if we have a gapi object
  return true unless reference?
  # If we have a value, return it
  return @exists unless @exists.nil?
  ensure_gapi!
  @exists = true
rescue Google::Cloud::NotFoundError
  @exists = false
end

#expires_at ⇒ `Time`^?

The time when this table expires. If not present, the table will persist indefinitely. Expired tables will be deleted and their storage reclaimed.

Returns:

(Time, nil) —
The expiration time, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 480

def expires_at
  return nil if reference?
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.expiration_time) / 1000.0)
  rescue
    nil
  end
end

#external ⇒ `External::DataSource`^?

The External::DataSource (or subclass) object that represents the external data source that the table represents. Data can be queried the table, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Present only if the table represents an External Data Source. See #external? and External::DataSource.

Returns:

(External::DataSource, nil) —
The external data source.

@!group Attributes

See Also:

Querying External Data Sources

# File 'lib/google/cloud/bigquery/table.rb', line 745

def external
  return nil if reference?
  ensure_full_data!
  return nil if @gapi.external_data_configuration.nil?
  External.from_gapi(@gapi.external_data_configuration).freeze
end

#external=(external) ⇒ `Object`

Set the External::DataSource (or subclass) object that represents the external data source that the table represents. Data can be queried the table, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Use only if the table represents an External Data Source. See #external? and External::DataSource.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

external (External::DataSource) —
An external data source.

See Also:

Querying External Data Sources

# File 'lib/google/cloud/bigquery/table.rb', line 773

def external= external
  reload! unless resource_full?
  @gapi.external_data_configuration = external.to_gapi
  patch_gapi! :external_data_configuration
end

#external? ⇒ `Boolean`^?

Checks if the table's type is "EXTERNAL", indicating that the table represents an External Data Source. See #external? and External::DataSource.

Returns:

(Boolean, nil) —
true when the type is EXTERNAL, false otherwise, if the object is a resource (see #resource?); nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 548

def external?
  return nil if reference?
  @gapi.type == "EXTERNAL"
end

#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil) ⇒ `Boolean`

Extracts the data from the table to a Google Cloud Storage file using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.extract "gs://my-bucket/file-name.json", format: "json"

Parameters:

extract_url (Google::Cloud::Storage::File, String, Array<String>) —
The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.
format (String) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
compression (String) —
The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.
delimiter (String) —
Delimiter to use between fields in the exported data. Default is ,.
header (Boolean) —
Whether to print out a header row in the results. Default is true.

Returns:

(Boolean) —
Returns true if the extract operation succeeded.

See Also:

Exporting Data From BigQuery

# File 'lib/google/cloud/bigquery/table.rb', line 1296

def extract extract_url, format: nil, compression: nil, delimiter: nil,
            header: nil
  job = extract_job extract_url, format: format,
                                 compression: compression,
                                 delimiter: delimiter, header: header
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#extract_job(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ `Google::Cloud::Bigquery::ExtractJob`

Extracts the data from the table to a Google Cloud Storage file using an asynchronous method. In this method, an ExtractJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

extract_job = table.extract_job "gs://my-bucket/file-name.json",
                            format: "json"

Parameters:

extract_url (Google::Cloud::Storage::File, String, Array<String>) —
The Google Storage file or file URI pattern(s) to which BigQuery should extract the table data.
format (String) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
compression (String) —
The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE.
delimiter (String) —
Delimiter to use between fields in the exported data. Default is ,.
header (Boolean) —
Whether to print out a header row in the results. Default is true.
job_id (String) —
A user-defined ID for the extract job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:

(Google::Cloud::Bigquery::ExtractJob)

See Also:

Exporting Data From BigQuery

# File 'lib/google/cloud/bigquery/table.rb', line 1243

def extract_job extract_url, format: nil, compression: nil,
                delimiter: nil, header: nil, dryrun: nil, job_id: nil,
                prefix: nil, labels: nil
  ensure_service!
  options = { format: format, compression: compression,
              delimiter: delimiter, header: header, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels }
  gapi = service.extract_table table_ref, extract_url, options
  Job.from_gapi gapi, service
end

#fields ⇒ `Array<Schema::Field>`^?

The fields of the table, obtained from its schema.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.fields.each do |field|
  puts field.name
end

Returns:

(Array<Schema::Field>, nil) —
An array of field objects.

# File 'lib/google/cloud/bigquery/table.rb', line 700

def fields
  return nil if reference?
  schema.fields
end

#headers ⇒ `Array<Symbol>`^?

The names of the columns in the table, obtained from its schema.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.headers.each do |header|
  puts header
end

Returns:

(Array<Symbol>, nil) —
An array of column names.

# File 'lib/google/cloud/bigquery/table.rb', line 723

def headers
  return nil if reference?
  schema.headers
end

#id ⇒ `String`^?

The combined Project ID, Dataset ID, and Table ID for this table, in the format specified by the Legacy SQL Query Reference: project_name:datasetId.tableId. To use this value in queries see #query_id.

Returns:

(String, nil) —
The combined ID, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 285

def id
  return nil if reference?
  @gapi.id
end

#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ `Google::Cloud::Bigquery::InsertResponse`

Inserts data into the table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
table.insert rows

Avoid retrieving the dataset and table with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset", skip_lookup: true
table = dataset.table "my_table", skip_lookup: true

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
table.insert rows

Parameters:

rows (Hash, Array<Hash>) —
A hash object or array of hash objects containing the data. Required.
skip_invalid (Boolean) —
Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.
ignore_unknown (Boolean) —
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

Returns:

(Google::Cloud::Bigquery::InsertResponse)

See Also:

Streaming Data Into BigQuery

# File 'lib/google/cloud/bigquery/table.rb', line 1702

def insert rows, skip_invalid: nil, ignore_unknown: nil
  rows = [rows] if rows.is_a? Hash
  fail ArgumentError, "No rows provided" if rows.empty?
  ensure_service!
  options = { skip_invalid: skip_invalid,
              ignore_unknown: ignore_unknown }
  gapi = service.insert_tabledata dataset_id, table_id, rows, options
  InsertResponse.from_gapi rows, gapi
end

#insert_async(skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ `Table::AsyncInserter`

Create an asynchronous inserter object used to insert rows in batches.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
inserter = table.insert_async do |result|
  if result.error?
    log_error result.error
  else
    log_insert "inserted #{result.insert_count} rows " \
      "with #{result.error_count} errors"
  end
end

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
inserter.insert rows

inserter.stop.wait!

Parameters:

skip_invalid (Boolean) —
Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.
ignore_unknown (Boolean) —
Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
max_rows (Integer) —
The maximum number of rows to be collected before the batch is published. Default is 500.

Yields:

(response) —
the callback for when a batch of rows is inserted

Yield Parameters:

result (Table::AsyncInserter::Result) —
the result of the asynchronous insert

Returns:

(Table::AsyncInserter) —
Returns inserter object.

# File 'lib/google/cloud/bigquery/table.rb', line 1759

def insert_async skip_invalid: nil, ignore_unknown: nil,
                 max_bytes: 10000000, max_rows: 500, interval: 10,
                 threads: 4, &block
  ensure_service!

  AsyncInserter.new self, skip_invalid: skip_invalid,
                          ignore_unknown: ignore_unknown,
                          max_bytes: max_bytes, max_rows: max_rows,
                          interval: interval, threads: threads, &block
end

#labels ⇒ `Hash<String, String>`^?

A hash of user-provided labels associated with this table. Labels are used to organize and group tables. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

labels = table.labels
labels["department"] #=> "shipping"

Returns:

(Hash<String, String>, nil) —
A hash containing key/value pairs.

# File 'lib/google/cloud/bigquery/table.rb', line 589

def labels
  return nil if reference?
  m = @gapi.labels
  m = m.to_h if m.respond_to? :to_h
  m.dup.freeze
end

#labels=(labels) ⇒ `Object`

Updates the hash of user-provided labels associated with this table. Labels are used to organize and group tables. See Using Labels.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.labels = { "department" => "shipping" }

Parameters:

labels (Hash<String, String>) —
A hash containing key/value pairs.
- Label keys and values can be no longer than 63 characters.
- Label keys and values can contain only lowercase letters, numbers, underscores, hyphens, and international characters.
- Label keys and values cannot exceed 128 bytes in size.
- Label keys must begin with a letter.
- Label keys must be unique within a table.

# File 'lib/google/cloud/bigquery/table.rb', line 626

def labels= labels
  reload! unless resource_full?
  @gapi.labels = labels
  patch_gapi! :labels
end

#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, autodetect: nil, null_marker: nil) ⇒ `Google::Cloud::Bigquery::LoadJob`

Loads data into the table. You can pass a google-cloud storage file path or a google-cloud storage file instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

load_job = table.load_job "gs://my-bucket/file-name.csv"

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = table.load_job file

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

file = File.open "my_data.csv"
load_job = table.load_job file

Parameters:

file (File, Google::Cloud::Storage::File, String) —
A file or the URI of a Google Cloud Storage file containing data to load into the table.
format (String) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- datastore_backup - Cloud Datastore backup
create (String) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

Returns:

(Google::Cloud::Bigquery::LoadJob)

# File 'lib/google/cloud/bigquery/table.rb', line 1625

def load file, format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil,
         encoding: nil, delimiter: nil, ignore_unknown: nil,
         max_bad_records: nil, quote: nil, skip_leading: nil,
         autodetect: nil, null_marker: nil
  job = load_job file, format: format, create: create, write: write,
                       projection_fields: projection_fields,
                       jagged_rows: jagged_rows,
                       quoted_newlines: quoted_newlines,
                       encoding: encoding, delimiter: delimiter,
                       ignore_unknown: ignore_unknown,
                       max_bad_records: max_bad_records, quote: quote,
                       skip_leading: skip_leading,
                       autodetect: autodetect, null_marker: null_marker

  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      fail job.gapi_error
    rescue => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  true
end

#load_job(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) ⇒ `Google::Cloud::Bigquery::LoadJob`

Loads data into the table. You can pass a google-cloud storage file path or a google-cloud storage file instance. Or, you can upload a file directly. See Loading Data with a POST Request.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

load_job = table.load_job "gs://my-bucket/file-name.csv"

Pass a google-cloud-storage File instance:

require "google/cloud/bigquery"
require "google/cloud/storage"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

storage = Google::Cloud::Storage.new
bucket = storage.bucket "my-bucket"
file = bucket.file "file-name.csv"
load_job = table.load_job file

Upload a file directly:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

file = File.open "my_data.csv"
load_job = table.load_job file

Parameters:

file (File, Google::Cloud::Storage::File, String) —
A file or the URI of a Google Cloud Storage file containing data to load into the table.
format (String) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- datastore_backup - Cloud Datastore backup
create (String) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
job_id (String) —
A user-defined ID for the load job. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

Returns:

(Google::Cloud::Bigquery::LoadJob)

# File 'lib/google/cloud/bigquery/table.rb', line 1471

def load_job file, format: nil, create: nil, write: nil,
             projection_fields: nil, jagged_rows: nil,
             quoted_newlines: nil, encoding: nil, delimiter: nil,
             ignore_unknown: nil, max_bad_records: nil, quote: nil,
             skip_leading: nil, dryrun: nil, job_id: nil, prefix: nil,
             labels: nil, autodetect: nil, null_marker: nil
  ensure_service!
  options = { format: format, create: create, write: write,
              projection_fields: projection_fields,
              jagged_rows: jagged_rows,
              quoted_newlines: quoted_newlines, encoding: encoding,
              delimiter: delimiter, ignore_unknown: ignore_unknown,
              max_bad_records: max_bad_records, quote: quote,
              skip_leading: skip_leading, dryrun: dryrun,
              job_id: job_id, prefix: prefix, labels: labels,
              autodetect: autodetect, null_marker: null_marker }
  return load_storage(file, options) if storage_url? file
  return load_local(file, options) if local_file? file
  fail Google::Cloud::Error, "Don't know how to load #{file}"
end

#location ⇒ `String`^?

The geographic location where the table should reside. Possible values include EU and US. The default value is US.

Returns:

(String, nil) —
The location code.

# File 'lib/google/cloud/bigquery/table.rb', line 561

def location
  return nil if reference?
  ensure_full_data!
  @gapi.location
end

#modified_at ⇒ `Time`^?

The date when this table was last modified.

Returns:

(Time, nil) —
The last modified time, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 498

def modified_at
  return nil if reference?
  ensure_full_data!
  begin
    ::Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#name ⇒ `String`^?

The name of the table.

Returns:

(String, nil) —
The friendly name, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 335

def name
  return nil if reference?
  @gapi.friendly_name
end

#name=(new_name) ⇒ `Object`

Updates the name of the table.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameters:

new_name (String) —
The new friendly name.

# File 'lib/google/cloud/bigquery/table.rb', line 351

def name= new_name
  reload! unless resource_full?
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_id ⇒ `String`

The ID of the Project containing this table.

Returns:

(String) —
The project ID.

# File 'lib/google/cloud/bigquery/table.rb', line 140

def project_id
  return reference.project_id if reference?
  @gapi.table_reference.project_id
end

#query ⇒ `String`

The query that executes each time the view is loaded.

Returns:

(String) —
The query that defines the view.



844
845
846

# File 'lib/google/cloud/bigquery/table.rb', line 844

def query
  @gapi.view.query if @gapi.view
end

#query=(new_query) ⇒ `Object`

Updates the query that executes each time the view is loaded.

This sets the query using standard SQL. To specify legacy SQL or to use user-defined function resources use (#set_query) instead.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
view = dataset.table "my_view"

view.query = "SELECT first_name FROM " \
               "`my_project.my_dataset.my_table`"

Parameters:

new_query (String) —
The query that defines the view.

See Also:

BigQuery Query Reference



871
872
873

# File 'lib/google/cloud/bigquery/table.rb', line 871

def query= new_query
  set_query new_query
end

#query_id(standard_sql: nil, legacy_sql: nil) ⇒ `String`

The value returned by #id, wrapped in square brackets if the Project ID contains dashes, as specified by the Query Reference. Useful in queries.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

data = bigquery.query "SELECT first_name FROM #{table.query_id}"

Parameters:

standard_sql (Boolean) —
Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.
legacy_sql (Boolean) —
Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.

Returns:

(String) —
The appropriate table ID for use in queries, depending on SQL type.

# File 'lib/google/cloud/bigquery/table.rb', line 319

def query_id standard_sql: nil, legacy_sql: nil
  if Convert.resolve_legacy_sql standard_sql, legacy_sql
    "[#{project_id}:#{dataset_id}.#{table_id}]"
  else
    "`#{project_id}.#{dataset_id}.#{table_id}`"
  end
end

#query_legacy_sql? ⇒ `Boolean`

Checks if the view's query is using legacy sql.

Returns:

(Boolean) —
true when legacy sql is used, false otherwise.

# File 'lib/google/cloud/bigquery/table.rb', line 929

def query_legacy_sql?
  val = @gapi.view.use_legacy_sql
  return true if val.nil?
  val
end

#query_standard_sql? ⇒ `Boolean`

Checks if the view's query is using standard sql.

Returns:

(Boolean) —
true when standard sql is used, false otherwise.



942
943
944

# File 'lib/google/cloud/bigquery/table.rb', line 942

def query_standard_sql?
  !query_legacy_sql?
end

#query_udfs ⇒ `Array<String>`

The user-defined function resources used in the view's query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

Returns:

(Array<String>) —
An array containing Google Cloud Storage URIs and/or inline source code.

# File 'lib/google/cloud/bigquery/table.rb', line 960

def query_udfs
  udfs_gapi = @gapi.view.user_defined_function_resources
  return [] if udfs_gapi.nil?
  Array(udfs_gapi).map { |udf| udf.inline_code || udf.resource_uri }
end

#reference? ⇒ `Boolean`

Whether the table was created without retrieving the resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table", skip_lookup: true

table.reference? # true
table.reload!
table.reference? # false

Returns:

(Boolean) —
true when the table is just a local reference object, false otherwise.



1863
1864
1865

# File 'lib/google/cloud/bigquery/table.rb', line 1863

def reference?
  @gapi.nil?
end

#reload! ⇒ `Google::Cloud::Bigquery::Table` Also known as: refresh!

Reloads the table with current data from the BigQuery service.

Examples:

Skip retrieving the table from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table", skip_lookup: true

table.reload!

Returns:

(Google::Cloud::Bigquery::Table) —
Returns the reloaded table.

# File 'lib/google/cloud/bigquery/table.rb', line 1810

def reload!
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  @gapi = gapi
end

#resource? ⇒ `Boolean`

Whether the table was created with a resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table", skip_lookup: true

table.resource? # false
table.reload!
table.resource? # true

Returns:

(Boolean) —
true when the table was created with a resource representation, false otherwise.



1886
1887
1888

# File 'lib/google/cloud/bigquery/table.rb', line 1886

def resource?
  !@gapi.nil?
end

#resource_full? ⇒ `Boolean`

Whether the table was created with a full resource representation from the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

table.resource_full? # true

Returns:

(Boolean) —
true when the table was created with a full resource representation, false otherwise.



1935
1936
1937

# File 'lib/google/cloud/bigquery/table.rb', line 1935

def resource_full?
  @gapi.is_a? Google::Apis::BigqueryV2::Table
end

#resource_partial? ⇒ `Boolean`

Whether the table was created with a partial resource representation from the BigQuery service by retrieval through Dataset#tables. See Tables: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
table = dataset.tables.first

table.resource_partial? # true
table.description # Loads the full resource.
table.resource_partial? # false

Returns:

(Boolean) —
true when the table was created with a partial resource representation, false otherwise.



1914
1915
1916

# File 'lib/google/cloud/bigquery/table.rb', line 1914

def resource_partial?
  @gapi.is_a? Google::Apis::BigqueryV2::TableList::Table
end

#rows_count ⇒ `Integer`^?

The number of rows in the table.

Returns:

(Integer, nil) —
The count of rows in the table, or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 442

def rows_count
  return nil if reference?
  ensure_full_data!
  begin
    Integer @gapi.num_rows
  rescue
    nil
  end
end

#schema(replace: false) {|schema| ... } ⇒ `Google::Cloud::Bigquery::Schema`^?

Returns the table's schema. If the table is not a view (See #view?), this method can also be used to set, replace, or add to the schema by passing a block. See Schema for available methods.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

table.schema do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Parameters:

replace (Boolean) —
Whether to replace the existing schema with the new schema. If true, the fields will replace the existing schema. If false, the fields will be added to the existing schema. When a table already contains data, schema changes must be additive. Thus, the default value is false.

Yields:

(schema) —
a block for setting the schema

Yield Parameters:

schema (Schema) —
the object accepting the schema

Returns:

(Google::Cloud::Bigquery::Schema, nil) —
A frozen schema object.

# File 'lib/google/cloud/bigquery/table.rb', line 667

def schema replace: false
  return nil if reference? && !block_given?
  reload! unless resource_full?
  schema_builder = Schema.from_gapi @gapi.schema
  if block_given?
    schema_builder = Schema.from_gapi if replace
    yield schema_builder
    if schema_builder.changed?
      @gapi.schema = schema_builder.to_gapi
      patch_gapi! :schema
    end
  end
  schema_builder.freeze
end

#set_query(query, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ `Object`

Updates the query that executes each time the view is loaded. Allows setting of standard vs. legacy SQL and user-defined function resources.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
view = dataset.table "my_view"

view.set_query "SELECT first_name FROM " \
                 "`my_project.my_dataset.my_table`",
               standard_sql: true

Parameters:

query (String) —
The query that defines the view.
standard_sql (Boolean) —
Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.
legacy_sql (Boolean) —
Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.
udfs (Array<String>, String) —
User-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

See Also:

BigQuery Query Reference

# File 'lib/google/cloud/bigquery/table.rb', line 913

def set_query query, standard_sql: nil, legacy_sql: nil, udfs: nil
  @gapi.view = Google::Apis::BigqueryV2::ViewDefinition.new \
    query: query,
    use_legacy_sql: Convert.resolve_legacy_sql(standard_sql,
                                               legacy_sql),
    user_defined_function_resources: udfs_gapi(udfs)
  patch_gapi! :view
end

#table? ⇒ `Boolean`^?

Checks if the table's type is "TABLE".

Returns:

(Boolean, nil) —
true when the type is TABLE, false otherwise, if the object is a resource (see #resource?); nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 517

def table?
  return nil if reference?
  @gapi.type == "TABLE"
end

#table_id ⇒ `String`

A unique ID for this table.

Returns:

(String) —
The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

# File 'lib/google/cloud/bigquery/table.rb', line 115

def table_id
  return reference.table_id if reference?
  @gapi.table_reference.table_id
end

#time_partitioning? ⇒ `Boolean`^?

Checks if the table is time-partitioned. See Partitioned Tables.

Returns:

(Boolean, nil) —
true when the table is time-partitioned, or false otherwise, if the object is a resource (see #resource?); nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 165

def time_partitioning?
  return nil if reference?
  !@gapi.time_partitioning.nil?
end

#time_partitioning_expiration ⇒ `Integer`^?

The expiration for the table partitions, if any, in seconds. See Partitioned Tables.

Returns:

(Integer, nil) —
The expiration time, in seconds, for data in partitions, or nil if not present or the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 231

def time_partitioning_expiration
  return nil if reference?
  ensure_full_data!
  @gapi.time_partitioning.expiration_ms / 1_000 if
      time_partitioning? &&
      !@gapi.time_partitioning.expiration_ms.nil?
end

#time_partitioning_expiration=(expiration) ⇒ `Object`

Sets the partition expiration for the table. See Partitioned Tables. The table must also be partitioned.

See #time_partitioning_type=.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
  table.time_partitioning_expiration = 86_400
end

Parameters:

expiration (Integer) —
An expiration time, in seconds, for data in partitions.

# File 'lib/google/cloud/bigquery/table.rb', line 265

def time_partitioning_expiration= expiration
  reload! unless resource_full?
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.expiration_ms = expiration * 1000
  patch_gapi! :time_partitioning
end

#time_partitioning_type ⇒ `String`^?

The period for which the table is partitioned, if any. See Partitioned Tables.

Returns:

(String, nil) —
The partition type. Currently the only supported value is "DAY", or nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 180

def time_partitioning_type
  return nil if reference?
  ensure_full_data!
  @gapi.time_partitioning.type if time_partitioning?
end

#time_partitioning_type=(type) ⇒ `Object`

Sets the partitioning for the table. See Partitioned Tables.

You can only set partitioning when creating a table as in the example below. BigQuery does not allow you to change partitioning on an existing table.

If the table is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |table|
  table.time_partitioning_type = "DAY"
end

Parameters:

type (String) —
The partition type. Currently the only supported value is "DAY".

# File 'lib/google/cloud/bigquery/table.rb', line 212

def time_partitioning_type= type
  reload! unless resource_full?
  @gapi.time_partitioning ||=
      Google::Apis::BigqueryV2::TimePartitioning.new
  @gapi.time_partitioning.type = type
  patch_gapi! :time_partitioning
end

#view? ⇒ `Boolean`^?

Checks if the table's type is "VIEW", indicating that the table represents a BigQuery view. See Dataset#create_view.

Returns:

(Boolean, nil) —
true when the type is VIEW, false otherwise, if the object is a resource (see #resource?); nil if the object is a reference (see #reference?).

# File 'lib/google/cloud/bigquery/table.rb', line 532

def view?
  return nil if reference?
  @gapi.type == "VIEW"
end

Class: Google::Cloud::Bigquery::Table

Overview

Table

Direct Known Subclasses

Defined Under Namespace

Attributes collapse

Data collapse

Lifecycle collapse

Instance Method Details

#api_url ⇒ String?

#buffer_bytes ⇒ Integer?

#buffer_oldest_at ⇒ Time?

#buffer_rows ⇒ Integer?

#bytes_count ⇒ Integer?

#copy(destination_table, create: nil, write: nil) ⇒ Boolean

#copy_job(destination_table, create: nil, write: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::CopyJob

#created_at ⇒ Time?

#data(token: nil, max: nil, start: nil) ⇒ Google::Cloud::Bigquery::Data

#dataset_id ⇒ String

#delete ⇒ Boolean

#description ⇒ String?

#description=(new_description) ⇒ Object

#etag ⇒ String?

#exists? ⇒ Boolean

#expires_at ⇒ Time?

#external ⇒ External::DataSource?

#external=(external) ⇒ Object

#external? ⇒ Boolean?

#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil) ⇒ Boolean

#extract_job(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ Google::Cloud::Bigquery::ExtractJob

#fields ⇒ Array<Schema::Field>?

#headers ⇒ Array<Symbol>?

#id ⇒ String?

#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ Google::Cloud::Bigquery::InsertResponse

#insert_async(skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter

#labels ⇒ Hash<String, String>?

#labels=(labels) ⇒ Object

#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, autodetect: nil, null_marker: nil) ⇒ Google::Cloud::Bigquery::LoadJob

#location ⇒ String?

#modified_at ⇒ Time?

#name ⇒ String?

#name=(new_name) ⇒ Object

#project_id ⇒ String

#query ⇒ String

#query=(new_query) ⇒ Object

#query_id(standard_sql: nil, legacy_sql: nil) ⇒ String

#query_legacy_sql? ⇒ Boolean

#query_standard_sql? ⇒ Boolean

#query_udfs ⇒ Array<String>

#reference? ⇒ Boolean

#reload! ⇒ Google::Cloud::Bigquery::Table Also known as: refresh!

#resource? ⇒ Boolean

#resource_full? ⇒ Boolean

#resource_partial? ⇒ Boolean

#rows_count ⇒ Integer?

#schema(replace: false) {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema?

#set_query(query, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Object

#table? ⇒ Boolean?

#table_id ⇒ String

#time_partitioning? ⇒ Boolean?

#time_partitioning_expiration ⇒ Integer?

#time_partitioning_expiration=(expiration) ⇒ Object

#time_partitioning_type ⇒ String?

#time_partitioning_type=(type) ⇒ Object

#view? ⇒ Boolean?

#api_url ⇒ `String`^?

#buffer_bytes ⇒ `Integer`^?

#buffer_oldest_at ⇒ `Time`^?

#buffer_rows ⇒ `Integer`^?

#bytes_count ⇒ `Integer`^?

#copy(destination_table, create: nil, write: nil) ⇒ `Boolean`

#copy_job(destination_table, create: nil, write: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ `Google::Cloud::Bigquery::CopyJob`

#created_at ⇒ `Time`^?

#data(token: nil, max: nil, start: nil) ⇒ `Google::Cloud::Bigquery::Data`

#dataset_id ⇒ `String`

#delete ⇒ `Boolean`

#description ⇒ `String`^?

#description=(new_description) ⇒ `Object`

#etag ⇒ `String`^?

#exists? ⇒ `Boolean`

#expires_at ⇒ `Time`^?

#external ⇒ `External::DataSource`^?

#external=(external) ⇒ `Object`

#external? ⇒ `Boolean`^?

#extract(extract_url, format: nil, compression: nil, delimiter: nil, header: nil) ⇒ `Boolean`

#extract_job(extract_url, format: nil, compression: nil, delimiter: nil, header: nil, dryrun: nil, job_id: nil, prefix: nil, labels: nil) ⇒ `Google::Cloud::Bigquery::ExtractJob`

#fields ⇒ `Array<Schema::Field>`^?

#headers ⇒ `Array<Symbol>`^?

#id ⇒ `String`^?

#insert(rows, skip_invalid: nil, ignore_unknown: nil) ⇒ `Google::Cloud::Bigquery::InsertResponse`

#insert_async(skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ `Table::AsyncInserter`

#labels ⇒ `Hash<String, String>`^?

#labels=(labels) ⇒ `Object`

#load(file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, autodetect: nil, null_marker: nil) ⇒ `Google::Cloud::Bigquery::LoadJob`

#location ⇒ `String`^?

#modified_at ⇒ `Time`^?

#name ⇒ `String`^?

#name=(new_name) ⇒ `Object`

#project_id ⇒ `String`

#query ⇒ `String`

#query=(new_query) ⇒ `Object`

#query_id(standard_sql: nil, legacy_sql: nil) ⇒ `String`

#query_legacy_sql? ⇒ `Boolean`

#query_standard_sql? ⇒ `Boolean`

#query_udfs ⇒ `Array<String>`

#reference? ⇒ `Boolean`

#reload! ⇒ `Google::Cloud::Bigquery::Table` Also known as: refresh!

#resource? ⇒ `Boolean`

#resource_full? ⇒ `Boolean`

#resource_partial? ⇒ `Boolean`

#rows_count ⇒ `Integer`^?

#schema(replace: false) {|schema| ... } ⇒ `Google::Cloud::Bigquery::Schema`^?

#set_query(query, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ `Object`

#table? ⇒ `Boolean`^?

#table_id ⇒ `String`

#time_partitioning? ⇒ `Boolean`^?

#time_partitioning_expiration ⇒ `Integer`^?

#time_partitioning_expiration=(expiration) ⇒ `Object`

#time_partitioning_type ⇒ `String`^?

#time_partitioning_type=(type) ⇒ `Object`

#view? ⇒ `Boolean`^?