Class: Google::Cloud::Language::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/language/document.rb

Overview

Document

Represents a document for the Language service.

Cloud Natural Language API supports UTF-8, UTF-16, and UTF-32 encodings. (Ruby uses UTF-8 natively, which is the default sent to the API, so unless you're working with text processed in different platform, you should not need to set the encoding type.)

Be aware that only English, Spanish, and Japanese language content are supported, and sentiment analysis only supports English text.

See Project#document.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.entities.count #=> 2
annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Instance Method Summary collapse

Instance Method Details

#annotate(sentiment: false, entities: false, syntax: false, encoding: nil) ⇒ Annotation Also known as: mark, detect

Analyzes the document and returns sentiment, entity, and syntactic feature results, depending on the option flags. Calling annotate with no arguments will perform all analysis features. Each feature is priced separately. See Pricing for details.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.sentiment.polarity #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

With feature flags:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate entities: true, text: true

annotation.sentiment #=> nil
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Parameters:

  • sentiment (Boolean)

    Whether to perform sentiment analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • entities (Boolean)

    Whether to perform the entity analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • syntax (Boolean)

    Whether to perform syntactic analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results of the content analysis.



219
220
221
222
223
224
225
226
227
# File 'lib/google/cloud/language/document.rb', line 219

def annotate sentiment: false, entities: false, syntax: false,
             encoding: nil
  ensure_service!
  grpc = service.annotate to_grpc, sentiment: sentiment,
                                   entities: entities,
                                   syntax: syntax,
                                   encoding: encoding
  Annotation.from_grpc grpc
end

#entities(encoding: nil) ⇒ Annotation::Entities

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities.

content = "Darth Vader is the best villain in Star Wars." document = language.document content entities = document.entities # API call

entities.count #=> 2 entities.first.name #=> "Darth Vader" entities.first.type #=> :PERSON entities.first.name #=> "Star Wars" entities.first.type #=> :WORK_OF_ART

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

Parameters:

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



282
283
284
285
286
# File 'lib/google/cloud/language/document.rb', line 282

def entities encoding: nil
  ensure_service!
  grpc = service.entities to_grpc, encoding: encoding
  Annotation::Entities.from_grpc grpc
end

#formatSymbol

The document's format.

Returns:

  • (Symbol)

    :text or :html



91
92
93
94
# File 'lib/google/cloud/language/document.rb', line 91

def format
  return :text if text?
  return :html if html?
end

#format=(new_format) ⇒ Object

Sets the document's format.

Examples:

document = language.document "<p>The Old Man and the Sea</p>"
document.format = :html

Parameters:

  • new_format (Symbol, String)

    Accepted values are :text or :html.



106
107
108
109
110
# File 'lib/google/cloud/language/document.rb', line 106

def format= new_format
  @grpc.type = :PLAIN_TEXT if new_format.to_s == "text"
  @grpc.type = :HTML       if new_format.to_s == "html"
  @grpc.type
end

#html!Object

Sets the document to the HTML format.



140
141
142
# File 'lib/google/cloud/language/document.rb', line 140

def html!
  @grpc.type = :HTML
end

#html?Boolean

Whether the document is the HTML format.

Returns:

  • (Boolean)


133
134
135
# File 'lib/google/cloud/language/document.rb', line 133

def html?
  @grpc.type == :HTML
end

#languageString

The document's language. ISO and BCP-47 language codes are supported.

Returns:

  • (String)


149
150
151
# File 'lib/google/cloud/language/document.rb', line 149

def language
  @grpc.language
end

#language=(new_language) ⇒ Object

Sets the document's language.

Examples:

document = language.document "<p>El viejo y el mar</p>"
document.language = "es"

Parameters:

  • new_language (String, Symbol)

    ISO and BCP-47 language codes are accepted.



163
164
165
# File 'lib/google/cloud/language/document.rb', line 163

def language= new_language
  @grpc.language = new_language.to_s
end

#sentimentAnnotation::Sentiment

Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Currently, only English is supported for sentiment analysis.

content = "Darth Vader is the best villain in Star Wars." document = language.document content sentiment = document.sentiment # API call

sentiment.polarity #=> 1.0 sentiment.magnitude #=> 0.8999999761581421

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

Returns:



310
311
312
313
314
# File 'lib/google/cloud/language/document.rb', line 310

def sentiment
  ensure_service!
  grpc = service.sentiment to_grpc
  Annotation::Sentiment.from_grpc grpc
end

#syntax(encoding: nil) ⇒ Annotation

Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
language = gcloud.language

document = language.document "Hello world!"

annotation = document.syntax
annotation.thing #=> Some Result

Parameters:

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    ] The results for the content analysis.



252
253
254
# File 'lib/google/cloud/language/document.rb', line 252

def syntax encoding: nil
  annotate syntax: true, encoding: encoding
end

#text!Object

Sets the document to the TEXT format.



124
125
126
# File 'lib/google/cloud/language/document.rb', line 124

def text!
  @grpc.type = :PLAIN_TEXT
end

#text?Boolean

Whether the document is the TEXT format.

Returns:

  • (Boolean)


117
118
119
# File 'lib/google/cloud/language/document.rb', line 117

def text?
  @grpc.type == :PLAIN_TEXT
end