Class: Google::Cloud::Bigquery::LoadJob

Inherits:

Job

Object
Job
Google::Cloud::Bigquery::LoadJob

show all

Defined in:: lib/google/cloud/bigquery/load_job.rb

Overview

LoadJob

A Job subclass representing a load operation that may be performed on a Table. A LoadJob instance is created when you call Table#load_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

load_job.wait_until_done!
load_job.done? #=> true

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Updater

Attributes collapse

#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration
The encryption configuration of the destination table.
#output_bytes ⇒ Integer
The number of bytes that have been loaded into the table.
#time_partitioning? ⇒ Boolean^?
Checks if the destination table will be time-partitioned.
#time_partitioning_expiration ⇒ Integer^?
The expiration for the destination table partitions, if any, in seconds.
#time_partitioning_field ⇒ String^?
The field on which the destination table will be partitioned, if any.
#time_partitioning_require_filter? ⇒ Boolean
If set to true, queries over the destination table will require a partition filter that can be used for partition elimination to be specified.
#time_partitioning_type ⇒ String^?
The period for which the destination table will be partitioned, if any.

Instance Method Summary collapse

#allow_jagged_rows? ⇒ Boolean
Checks if the load operation accepts rows that are missing trailing optional columns.
#autodetect? ⇒ Boolean
Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources.
#backup? ⇒ Boolean
Checks if the source data is a Google Cloud Datastore backup.
#csv? ⇒ Boolean
Checks if the format of the source data is CSV.
#delimiter ⇒ String
The delimiter used between fields in the source data.
#destination ⇒ Table
The table into which the operation loads data.
#ignore_unknown_values? ⇒ Boolean
Checks if the load operation allows extra values that are not represented in the table schema.
#input_file_bytes ⇒ Integer
The number of bytes of source data in the load job.
#input_files ⇒ Integer
The number of source data files in the load job.
#iso8859_1? ⇒ Boolean
Checks if the character encoding of the data is ISO-8859-1.
#json? ⇒ Boolean
Checks if the format of the source data is newline-delimited JSON.
#max_bad_records ⇒ Integer
The maximum number of bad records that the load operation can ignore.
#null_marker ⇒ String
Specifies a string that represents a null value in a CSV file.
#output_rows ⇒ Integer
The number of rows that have been loaded into the table.
#quote ⇒ String
The value that is used to quote data sections in a CSV file.
#quoted_newlines? ⇒ Boolean
Checks if quoted data sections may contain newline characters in a CSV file.
#schema ⇒ Schema^?
The schema for the destination table.
#schema_update_options ⇒ Array<String>
Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration.
#skip_leading_rows ⇒ Integer
The number of rows at the top of a CSV file that BigQuery will skip when loading the data.
#sources ⇒ Object
The URI or URIs representing the Google Cloud Storage files from which the operation loads data.
#utf8? ⇒ Boolean
Checks if the character encoding of the data is UTF-8.

Methods inherited from Job

#cancel, #configuration, #created_at, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #pending?, #project_id, #reload!, #rerun!, #running?, #started_at, #state, #statistics, #status, #user_email, #wait_until_done!

Instance Method Details

#allow_jagged_rows? ⇒ `Boolean`

Checks if the load operation accepts rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an error is returned. The default value is false. Only applicable to CSV, ignored for other formats.

Returns:

(Boolean) —
true when jagged rows are allowed, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 237

def allow_jagged_rows?
  val = @gapi.configuration.load.allow_jagged_rows
  val = false if val.nil?
  val
end

#autodetect? ⇒ `Boolean`

Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default is false.

Returns:

(Boolean) —
true when autodetect is enabled, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 186

def autodetect?
  val = @gapi.configuration.load.autodetect
  val = false if val.nil?
  val
end

#backup? ⇒ `Boolean`

Checks if the source data is a Google Cloud Datastore backup.

Returns:

(Boolean) —
true when the source format is DATASTORE_BACKUP, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 222

def backup?
  val = @gapi.configuration.load.source_format
  val == "DATASTORE_BACKUP"
end

#csv? ⇒ `Boolean`

Checks if the format of the source data is CSV. The default is true.

Returns:

(Boolean) —
true when the source format is CSV, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 210

def csv?
  val = @gapi.configuration.load.source_format
  return true if val.nil?
  val == "CSV"
end

#delimiter ⇒ `String`

The delimiter used between fields in the source data. The default is a comma (,).

Returns:

(String) —
A string containing the character, such as ",".



81
82
83

# File 'lib/google/cloud/bigquery/load_job.rb', line 81

def delimiter
  @gapi.configuration.load.field_delimiter || ","
end

#destination ⇒ `Table`

The table into which the operation loads data. This is the table on which Table#load_job was invoked.

Returns:

(Table) —
A table instance.

# File 'lib/google/cloud/bigquery/load_job.rb', line 67

def destination
  table = @gapi.configuration.load.destination_table
  return nil unless table
  retrieve_table table.project_id,
                 table.dataset_id,
                 table.table_id
end

#encryption ⇒ `Google::Cloud::BigQuery::EncryptionConfiguration`

The encryption configuration of the destination table.

Returns:

(Google::Cloud::BigQuery::EncryptionConfiguration) —
Custom encryption configuration (e.g., Cloud KMS keys).

# File 'lib/google/cloud/bigquery/load_job.rb', line 334

def encryption
  EncryptionConfiguration.from_gapi(
    @gapi.configuration.load.destination_encryption_configuration
  )
end

#ignore_unknown_values? ⇒ `Boolean`

Checks if the load operation allows extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned. The default is false.

Returns:

(Boolean) —
true when unknown values are ignored, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 253

def ignore_unknown_values?
  val = @gapi.configuration.load.ignore_unknown_values
  val = false if val.nil?
  val
end

#input_file_bytes ⇒ `Integer`

The number of bytes of source data in the load job.

Returns:

(Integer) —
The number of bytes.

# File 'lib/google/cloud/bigquery/load_job.rb', line 309

def input_file_bytes
  Integer @gapi.statistics.load.input_file_bytes
rescue StandardError
  nil
end

#input_files ⇒ `Integer`

The number of source data files in the load job.

Returns:

(Integer) —
The number of source files.

# File 'lib/google/cloud/bigquery/load_job.rb', line 298

def input_files
  Integer @gapi.statistics.load.input_files
rescue StandardError
  nil
end

#iso8859_1? ⇒ `Boolean`

Checks if the character encoding of the data is ISO-8859-1.

Returns:

(Boolean) —
true when the character encoding is ISO-8859-1, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 116

def iso8859_1?
  val = @gapi.configuration.load.encoding
  val == "ISO-8859-1"
end

#json? ⇒ `Boolean`

Checks if the format of the source data is newline-delimited JSON. The default is false.

Returns:

(Boolean) —
true when the source format is NEWLINE_DELIMITED_JSON, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 199

def json?
  val = @gapi.configuration.load.source_format
  val == "NEWLINE_DELIMITED_JSON"
end

#max_bad_records ⇒ `Integer`

The maximum number of bad records that the load operation can ignore. If the number of bad records exceeds this value, an error is returned. The default value is 0, which requires that all records be valid.

Returns:

(Integer) —
The maximum number of bad records.

# File 'lib/google/cloud/bigquery/load_job.rb', line 143

def max_bad_records
  val = @gapi.configuration.load.max_bad_records
  val = 0 if val.nil?
  val
end

#null_marker ⇒ `String`

Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

Returns:

(String) —
A string representing null value in a CSV file.

# File 'lib/google/cloud/bigquery/load_job.rb', line 160

def null_marker
  val = @gapi.configuration.load.null_marker
  val = "" if val.nil?
  val
end

#output_bytes ⇒ `Integer`

The number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.

Returns:

(Integer) —
The number of bytes that have been loaded.

# File 'lib/google/cloud/bigquery/load_job.rb', line 346

def output_bytes
  Integer @gapi.statistics.load.output_bytes
rescue StandardError
  nil
end

#output_rows ⇒ `Integer`

The number of rows that have been loaded into the table. While an import job is in the running state, this value may change.

Returns:

(Integer) —
The number of rows that have been loaded.

# File 'lib/google/cloud/bigquery/load_job.rb', line 321

def output_rows
  Integer @gapi.statistics.load.output_rows
rescue StandardError
  nil
end

#quote ⇒ `String`

The value that is used to quote data sections in a CSV file. The default value is a double-quote ("). If your data does not contain quoted sections, the value should be an empty string. If your data contains quoted newline characters, #quoted_newlines? should return true.

Returns:

(String) —
A string containing the character, such as "\"".

# File 'lib/google/cloud/bigquery/load_job.rb', line 130

def quote
  val = @gapi.configuration.load.quote
  val = "\"" if val.nil?
  val
end

#quoted_newlines? ⇒ `Boolean`

Checks if quoted data sections may contain newline characters in a CSV file. The default is false.

Returns:

(Boolean) —
true when quoted newlines are allowed, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 173

def quoted_newlines?
  val = @gapi.configuration.load.allow_quoted_newlines
  val = false if val.nil?
  val
end

#schema ⇒ `Schema`^?

The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

The returned object is frozen and changes are not allowed. Use Table#schema to update the schema.

Returns:

(Schema, nil) —
A schema object, or nil.



269
270
271

# File 'lib/google/cloud/bigquery/load_job.rb', line 269

def schema
  Schema.from_gapi(@gapi.configuration.load.schema).freeze
end

#schema_update_options ⇒ `Array<String>`

Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration. Schema update options are supported in two cases: when write disposition is WRITE_APPEND; when write disposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema. One or more of the following values are specified:

ALLOW_FIELD_ADDITION: allow adding a nullable field to the schema.
ALLOW_FIELD_RELAXATION: allow relaxing a required field in the original schema to nullable.

Returns:

(Array<String>) —
An array of strings.



289
290
291

# File 'lib/google/cloud/bigquery/load_job.rb', line 289

def schema_update_options
  Array @gapi.configuration.load.schema_update_options
end

#skip_leading_rows ⇒ `Integer`

The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

Returns:

(Integer) —
The number of header rows at the top of a CSV file to skip.



93
94
95

# File 'lib/google/cloud/bigquery/load_job.rb', line 93

def skip_leading_rows
  @gapi.configuration.load.skip_leading_rows || 0
end

#sources ⇒ `Object`

The URI or URIs representing the Google Cloud Storage files from which the operation loads data.



57
58
59

# File 'lib/google/cloud/bigquery/load_job.rb', line 57

def sources
  Array @gapi.configuration.load.source_uris
end

#time_partitioning? ⇒ `Boolean`^?

Checks if the destination table will be time-partitioned. See Partitioned Tables.

Returns:

(Boolean, nil) —
true when the table will be time-partitioned, or false otherwise.



361
362
363

# File 'lib/google/cloud/bigquery/load_job.rb', line 361

def time_partitioning?
  !@gapi.configuration.load.time_partitioning.nil?
end

#time_partitioning_expiration ⇒ `Integer`^?

The expiration for the destination table partitions, if any, in seconds. See Partitioned Tables.

Returns:

(Integer, nil) —
The expiration time, in seconds, for data in partitions, or nil if not present.

# File 'lib/google/cloud/bigquery/load_job.rb', line 404

def time_partitioning_expiration
  @gapi.configuration.load.time_partitioning.expiration_ms / 1_000 if
      time_partitioning? &&
      !@gapi.configuration.load.time_partitioning.expiration_ms.nil?
end

#time_partitioning_field ⇒ `String`^?

The field on which the destination table will be partitioned, if any. If not set, the destination table will be partitioned by pseudo column _PARTITIONTIME; if set, the table will be partitioned by this field. See Partitioned Tables.

Returns:

(String, nil) —
The partition field, if a field was configured. nil if not partitioned or not set (partitioned by pseudo column '_PARTITIONTIME').



390
391
392

# File 'lib/google/cloud/bigquery/load_job.rb', line 390

def time_partitioning_field
  @gapi.configuration.load.time_partitioning.field if time_partitioning?
end

#time_partitioning_require_filter? ⇒ `Boolean`

If set to true, queries over the destination table will require a partition filter that can be used for partition elimination to be specified. See Partitioned Tables.

Returns:

(Boolean) —
true when a partition filter will be required, or false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 421

def time_partitioning_require_filter?
  tp = @gapi.configuration.load.time_partitioning
  return false if tp.nil? || tp.require_partition_filter.nil?
  tp.require_partition_filter
end

#time_partitioning_type ⇒ `String`^?

The period for which the destination table will be partitioned, if any. See Partitioned Tables.

Returns:

(String, nil) —
The partition type. Currently the only supported value is "DAY", or nil if not present.



374
375
376

# File 'lib/google/cloud/bigquery/load_job.rb', line 374

def time_partitioning_type
  @gapi.configuration.load.time_partitioning.type if time_partitioning?
end

#utf8? ⇒ `Boolean`

Checks if the character encoding of the data is UTF-8. This is the default.

Returns:

(Boolean) —
true when the character encoding is UTF-8, false otherwise.

# File 'lib/google/cloud/bigquery/load_job.rb', line 104

def utf8?
  val = @gapi.configuration.load.encoding
  return true if val.nil?
  val == "UTF-8"
end

Class: Google::Cloud::Bigquery::LoadJob

Overview

LoadJob

Direct Known Subclasses

Defined Under Namespace

Attributes collapse

Instance Method Summary collapse

Methods inherited from Job

Instance Method Details

#allow_jagged_rows? ⇒ Boolean

#autodetect? ⇒ Boolean

#backup? ⇒ Boolean

#csv? ⇒ Boolean

#delimiter ⇒ String

#destination ⇒ Table

#encryption ⇒ Google::Cloud::BigQuery::EncryptionConfiguration

#ignore_unknown_values? ⇒ Boolean

#input_file_bytes ⇒ Integer

#input_files ⇒ Integer

#iso8859_1? ⇒ Boolean

#json? ⇒ Boolean

#max_bad_records ⇒ Integer

#null_marker ⇒ String

#output_bytes ⇒ Integer

#output_rows ⇒ Integer

#quote ⇒ String

#quoted_newlines? ⇒ Boolean

#schema ⇒ Schema?

#schema_update_options ⇒ Array<String>

#skip_leading_rows ⇒ Integer

#sources ⇒ Object

#time_partitioning? ⇒ Boolean?

#time_partitioning_expiration ⇒ Integer?

#time_partitioning_field ⇒ String?

#time_partitioning_require_filter? ⇒ Boolean

#time_partitioning_type ⇒ String?

#utf8? ⇒ Boolean

#allow_jagged_rows? ⇒ `Boolean`

#autodetect? ⇒ `Boolean`

#backup? ⇒ `Boolean`

#csv? ⇒ `Boolean`

#delimiter ⇒ `String`

#destination ⇒ `Table`

#encryption ⇒ `Google::Cloud::BigQuery::EncryptionConfiguration`

#ignore_unknown_values? ⇒ `Boolean`

#input_file_bytes ⇒ `Integer`

#input_files ⇒ `Integer`

#iso8859_1? ⇒ `Boolean`

#json? ⇒ `Boolean`

#max_bad_records ⇒ `Integer`

#null_marker ⇒ `String`

#output_bytes ⇒ `Integer`

#output_rows ⇒ `Integer`

#quote ⇒ `String`

#quoted_newlines? ⇒ `Boolean`

#schema ⇒ `Schema`^?

#schema_update_options ⇒ `Array<String>`

#skip_leading_rows ⇒ `Integer`

#sources ⇒ `Object`

#time_partitioning? ⇒ `Boolean`^?

#time_partitioning_expiration ⇒ `Integer`^?

#time_partitioning_field ⇒ `String`^?

#time_partitioning_require_filter? ⇒ `Boolean`

#time_partitioning_type ⇒ `String`^?

#utf8? ⇒ `Boolean`