Class: Google::Cloud::Bigquery::Dataset
- Inherits:
-
Object
- Object
- Google::Cloud::Bigquery::Dataset
- Defined in:
- lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb
Overview
Dataset
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
Direct Known Subclasses
Defined Under Namespace
Classes: Access, List, Updater
Attributes collapse
-
#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset.
-
#api_url ⇒ String
A URL that can be used to access the dataset using the REST API.
-
#created_at ⇒ Time?
The time when this dataset was created.
-
#dataset_id ⇒ String
A unique ID for this dataset, without the project name.
-
#default_expiration ⇒ Integer
The default lifetime of all tables in the dataset, in milliseconds.
-
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
-
#description ⇒ String
A user-friendly description of the dataset.
-
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
-
#etag ⇒ String
The ETag hash of the dataset.
-
#labels ⇒ Hash<String, String>
A hash of user-provided labels associated with this dataset.
-
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this dataset.
-
#location ⇒ String
The geographic location where the dataset should reside.
-
#modified_at ⇒ Time?
The date when this dataset or any of its tables was last modified.
-
#name ⇒ String
A descriptive name for the dataset.
-
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
-
#project_id ⇒ String
The ID of the project containing this dataset.
Lifecycle collapse
-
#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset.
Table collapse
-
#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table.
-
#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::View
Creates a new view table from the given query.
-
#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...
Retrieves an existing table by ID.
-
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>
Retrieves the list of tables belonging to the dataset.
Data collapse
-
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery.
-
#insert(table_id, rows, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
-
#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchonous inserter object used to insert rows in batches.
-
#load(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response.
-
#load_job(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method.
-
#query(query, params: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::Data
Queries data using a synchronous method that blocks for a response.
-
#query_job(query, params: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::QueryJob
Queries data using the asynchronous method.
Instance Method Details
#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.
333 334 335 336 337 338 339 340 341 342 343 344 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 333 def access ensure_full_data! access_builder = Access.from_gapi @gapi if block_given? yield access_builder if access_builder.changed? @gapi.update! access: access_builder.to_gapi patch_gapi! :access end end access_builder.freeze end |
#api_url ⇒ String
A URL that can be used to access the dataset using the REST API.
136 137 138 139 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 136 def api_url ensure_full_data! @gapi.self_link end |
#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 440 def create_table table_id, name: nil, description: nil ensure_service! new_tb = Google::Apis::BigqueryV2::Table.new( table_reference: Google::Apis::BigqueryV2::TableReference.new( project_id: project_id, dataset_id: dataset_id, table_id: table_id)) updater = Table::Updater.new(new_tb).tap do |tb| tb.name = name unless name.nil? tb.description = description unless description.nil? end yield updater if block_given? gapi = service.insert_table dataset_id, updater.to_gapi Table.from_gapi gapi, service end |
#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::View
Creates a new view table from the given query.
506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 506 def create_view table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil new_view_opts = { table_reference: Google::Apis::BigqueryV2::TableReference.new( project_id: project_id, dataset_id: dataset_id, table_id: table_id ), friendly_name: name, description: description, view: Google::Apis::BigqueryV2::ViewDefinition.new( query: query, use_legacy_sql: Convert.resolve_legacy_sql(standard_sql, legacy_sql), user_defined_function_resources: udfs_gapi(udfs) ) }.delete_if { |_, v| v.nil? } new_view = Google::Apis::BigqueryV2::Table.new new_view_opts gapi = service.insert_table dataset_id, new_view Table.from_gapi gapi, service end |
#created_at ⇒ Time?
The time when this dataset was created.
202 203 204 205 206 207 208 209 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 202 def created_at ensure_full_data! begin ::Time.at(Integer(@gapi.creation_time) / 1000.0) rescue nil end end |
#dataset_id ⇒ String
A unique ID for this dataset, without the project name.
69 70 71 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 69 def dataset_id @gapi.dataset_reference.dataset_id end |
#default_expiration ⇒ Integer
The default lifetime of all tables in the dataset, in milliseconds.
172 173 174 175 176 177 178 179 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 172 def default_expiration ensure_full_data! begin Integer @gapi.default_table_expiration_ms rescue nil end end |
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
190 191 192 193 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 190 def default_expiration= new_default_expiration @gapi.update! default_table_expiration_ms: new_default_expiration patch_gapi! :default_table_expiration_ms end |
#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset. The dataset must be empty before it
can be deleted unless the force
option is set to true
.
366 367 368 369 370 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 366 def delete force: nil ensure_service! service.delete_dataset dataset_id, force true end |
#description ⇒ String
A user-friendly description of the dataset.
148 149 150 151 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 148 def description ensure_full_data! @gapi.description end |
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
160 161 162 163 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 160 def description= new_description @gapi.update! description: new_description patch_gapi! :description end |
#etag ⇒ String
The ETag hash of the dataset.
124 125 126 127 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 124 def etag ensure_full_data! @gapi.etag end |
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.
1057 1058 1059 1060 1061 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1057 def external url, format: nil ext = External.from_urls url, format yield ext if block_given? ext end |
#insert(table_id, rows, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1572 def insert table_id, rows, skip_invalid: nil, ignore_unknown: nil, autocreate: nil if autocreate begin insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown rescue Google::Cloud::NotFoundError sleep rand(1..60) begin create_table table_id do |tbl_updater| yield tbl_updater if block_given? end # rubocop:disable Lint/HandleExceptions rescue Google::Cloud::AlreadyExistsError end # rubocop:enable Lint/HandleExceptions sleep 60 insert table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown, autocreate: true end else insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown end end |
#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchonous inserter object used to insert rows in batches.
1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1644 def insert_async table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10000000, max_rows: 500, interval: 10, threads: 4, &block ensure_service! # Get table, don't use Dataset#table which handles NotFoundError gapi = service.get_table dataset_id, table_id table = Table.from_gapi gapi, service # Get the AsyncInserter from the table table.insert_async skip_invalid: skip_invalid, ignore_unknown: ignore_unknown, max_bytes: max_bytes, max_rows: max_rows, interval: interval, threads: threads, &block end |
#labels ⇒ Hash<String, String>
A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.
261 262 263 264 265 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 261 def labels m = @gapi.labels m = m.to_h if m.respond_to? :to_h m.dup.freeze end |
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
292 293 294 295 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 292 def labels= labels @gapi.labels = labels patch_gapi! :labels end |
#load(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File
instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1477 def load table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil yield (schema ||= Schema.from_gapi) if block_given? = { format: format, create: create, write: write, projection_fields: projection_fields, jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding, delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect, null_marker: null_marker } job = load_job table_id, file, job.wait_until_done! if job.failed? begin # raise to activate ruby exception cause handling fail job.gapi_error rescue => e # wrap Google::Apis::Error with Google::Cloud::Error raise Google::Cloud::Error.from_error(e) end end true end |
#load_job(table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil) {|schema| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File
instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1267 def load_job table_id, file, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, dryrun: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil ensure_service! if block_given? schema ||= Schema.from_gapi yield schema end schema_gapi = schema.to_gapi if schema = { format: format, create: create, write: write, projection_fields: projection_fields, jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding, delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading, dryrun: dryrun, schema: schema_gapi, job_id: job_id, prefix: prefix, labels: labels, autodetect: autodetect, null_marker: null_marker } return load_storage(table_id, file, ) if storage_url? file return load_local(table_id, file, ) if local_file? file fail Google::Cloud::Error, "Don't know how to load #{file}" end |
#location ⇒ String
The geographic location where the dataset should reside. Possible
values include EU
and US
. The default value is US
.
235 236 237 238 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 235 def location ensure_full_data! @gapi.location end |
#modified_at ⇒ Time?
The date when this dataset or any of its tables was last modified.
218 219 220 221 222 223 224 225 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 218 def modified_at ensure_full_data! begin ::Time.at(Integer(@gapi.last_modified_time) / 1000.0) rescue nil end end |
#name ⇒ String
A descriptive name for the dataset.
101 102 103 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 101 def name @gapi.friendly_name end |
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
112 113 114 115 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 112 def name= new_name @gapi.update! friendly_name: new_name patch_gapi! :friendly_name end |
#project_id ⇒ String
The ID of the project containing this dataset.
80 81 82 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 80 def project_id @gapi.dataset_reference.project_id end |
#query(query, params: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil) ⇒ Google::Cloud::Bigquery::Data
Queries data using a synchronous method that blocks for a response. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
When using standard SQL and passing arguments using params
, Ruby
types are mapped to BigQuery types as follows:
BigQuery | Ruby | Notes |
---|---|---|
BOOL |
true /false |
|
INT64 |
Integer |
|
FLOAT64 |
Float |
|
STRING |
STRING |
|
DATETIME |
DateTime |
DATETIME does not support time zone. |
DATE |
Date |
|
TIMESTAMP |
Time |
|
TIME |
Google::Cloud::BigQuery::Time |
|
BYTES |
File , IO , StringIO , or similar |
|
ARRAY |
Array |
Nested arrays, nil values are not supported. |
STRUCT |
Hash |
Hash keys may be strings or symbols. |
See Data Types for an overview of each BigQuery data type, including allowed values.
988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 988 def query query, params: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil ensure_service! = { params: params, external: external, cache: cache, legacy_sql: legacy_sql, standard_sql: standard_sql } job = query_job query, job.wait_until_done! if job.failed? begin # raise to activate ruby exception cause handling fail job.gapi_error rescue => e # wrap Google::Apis::Error with Google::Cloud::Error raise Google::Cloud::Error.from_error(e) end end job.data max: max end |
#query_job(query, params: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::QueryJob
Queries data using the asynchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
When using standard SQL and passing arguments using params
, Ruby
types are mapped to BigQuery types as follows:
BigQuery | Ruby | Notes |
---|---|---|
BOOL |
true /false |
|
INT64 |
Integer |
|
FLOAT64 |
Float |
|
STRING |
STRING |
|
DATETIME |
DateTime |
DATETIME does not support time zone. |
DATE |
Date |
|
TIMESTAMP |
Time |
|
TIME |
Google::Cloud::BigQuery::Time |
|
BYTES |
File , IO , StringIO , or similar |
|
ARRAY |
Array |
Nested arrays, nil values are not supported. |
STRUCT |
Hash |
Hash keys may be strings or symbols. |
See Data Types for an overview of each BigQuery data type, including allowed values.
817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 817 def query_job query, params: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil = { priority: priority, cache: cache, table: table, create: create, write: write, large_results: large_results, flatten: flatten, legacy_sql: legacy_sql, standard_sql: standard_sql, maximum_billing_tier: maximum_billing_tier, maximum_bytes_billed: maximum_bytes_billed, params: params, external: external, labels: labels, job_id: job_id, prefix: prefix, udfs: udfs } [:dataset] ||= self ensure_service! gapi = service.query_job query, Job.from_gapi gapi, service end |
#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...
Retrieves an existing table by ID.
547 548 549 550 551 552 553 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 547 def table table_id ensure_service! gapi = service.get_table dataset_id, table_id Table.from_gapi gapi, service rescue Google::Cloud::NotFoundError nil end |
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>
Retrieves the list of tables belonging to the dataset.
590 591 592 593 594 595 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 590 def tables token: nil, max: nil ensure_service! = { token: token, max: max } gapi = service.list_tables dataset_id, Table::List.from_gapi gapi, service, dataset_id, max end |