Class: Google::Cloud::Language::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/language/document.rb

Overview

Document

Represents a document for the Language service.

Cloud Natural Language API supports UTF-8, UTF-16, and UTF-32 encodings. (Ruby uses UTF-8 natively, which is the default sent to the API, so unless you're working with text processed in different platform, you should not need to set the encoding type.)

Be aware that only English, Spanish, and Japanese language content are supported.

See Project#document.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.entities.count #=> 2
annotation.sentiment.score #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Instance Method Summary collapse

Instance Method Details

#annotate(sentiment: false, entities: false, syntax: false, encoding: nil) ⇒ Annotation Also known as: mark, detect

Analyzes the document and returns sentiment, entity, and syntactic feature results, depending on the option flags. Calling annotate with no arguments will perform all analysis features. Each feature is priced separately. See Pricing for details.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate

annotation.sentiment.score #=> 1.0
annotation.sentiment.magnitude #=> 0.8999999761581421
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

With feature flags:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content
annotation = document.annotate entities: true, text: true

annotation.sentiment #=> nil
annotation.entities.count #=> 2
annotation.sentences.count #=> 1
annotation.tokens.count #=> 10

Parameters:

  • sentiment (Boolean)

    Whether to perform sentiment analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • entities (Boolean)

    Whether to perform the entity analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • syntax (Boolean)

    Whether to perform syntactic analysis. Optional. The default is false. If every feature option is false, all features will be performed.

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:

  • (Annotation)

    The results of the content analysis.



216
217
218
219
220
221
222
223
224
# File 'lib/google/cloud/language/document.rb', line 216

def annotate sentiment: false, entities: false, syntax: false,
             encoding: nil
  ensure_service!
  grpc = service.annotate to_grpc, sentiment: sentiment,
                                   entities: entities,
                                   syntax: syntax,
                                   encoding: encoding
  Annotation.from_grpc grpc
end

#entities(encoding: nil) ⇒ Annotation::Entities

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities.

content = "Darth Vader is the best villain in Star Wars." document = language.document content entities = document.entities # API call

entities.count #=> 2 entities.first.name #=> "Darth Vader" entities.first.type #=> :PERSON entities.first.name #=> "Star Wars" entities.first.type #=> :WORK_OF_ART

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

Parameters:

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



293
294
295
296
297
# File 'lib/google/cloud/language/document.rb', line 293

def entities encoding: nil
  ensure_service!
  grpc = service.entities to_grpc, encoding: encoding
  Annotation::Entities.from_grpc grpc
end

#formatSymbol

The document's format.

Returns:

  • (Symbol)

    :text or :html



90
91
92
93
# File 'lib/google/cloud/language/document.rb', line 90

def format
  return :text if text?
  return :html if html?
end

#format=(new_format) ⇒ Object

Sets the document's format.

Examples:

document = language.document "<p>The Old Man and the Sea</p>"
document.format = :html

Parameters:

  • new_format (Symbol, String)

    Accepted values are :text or :html.



105
106
107
108
109
# File 'lib/google/cloud/language/document.rb', line 105

def format= new_format
  @grpc.type = :PLAIN_TEXT if new_format.to_s == "text"
  @grpc.type = :HTML       if new_format.to_s == "html"
  @grpc.type
end

#html!Object

Sets the document to the HTML format.



139
140
141
# File 'lib/google/cloud/language/document.rb', line 139

def html!
  @grpc.type = :HTML
end

#html?Boolean

Whether the document is the HTML format.

Returns:

  • (Boolean)


132
133
134
# File 'lib/google/cloud/language/document.rb', line 132

def html?
  @grpc.type == :HTML
end

#languageString

The document's language. ISO and BCP-47 language codes are supported.

Returns:

  • (String)


148
149
150
# File 'lib/google/cloud/language/document.rb', line 148

def language
  @grpc.language
end

#language=(new_language) ⇒ Object

Sets the document's language.

Examples:

document = language.document "<p>El viejo y el mar</p>"
document.language = "es"

Parameters:

  • new_language (String, Symbol)

    ISO and BCP-47 language codes are accepted.



162
163
164
# File 'lib/google/cloud/language/document.rb', line 162

def language= new_language
  @grpc.language = new_language.to_s
end

#sentiment(encoding: nil) ⇒ Annotation::Sentiment

Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Currently, only English is supported for sentiment analysis.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content

sentiment = document.sentiment

sentiment.score #=> 1.0
sentiment.magnitude #=> 0.8999999761581421
sentiment.language #=> "en"

sentence = sentiment.sentences.first
sentence.sentiment.score #=> 1.0
sentence.sentiment.magnitude #=> 0.8999999761581421

Parameters:

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



329
330
331
332
333
# File 'lib/google/cloud/language/document.rb', line 329

def sentiment encoding: nil
  ensure_service!
  grpc = service.sentiment to_grpc, encoding: encoding
  Annotation::Sentiment.from_grpc grpc
end

#syntax(encoding: nil) ⇒ Annotation::Syntax

Syntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Darth Vader is the best villain in Star Wars."
document = language.document content

syntax = document.syntax

sentence = syntax.sentences.last
sentence.text #=> "Darth Vader is the best villain in Star Wars."
sentence.offset #=> 0

syntax.tokens.count #=> 10
token = syntax.tokens.first

token.text #=> "Darth"
token.offset #=> 0
token.part_of_speech.tag #=> :NOUN
token.head_token_index #=> 1
token.label #=> :NN
token.lemma #=> "Darth"

Parameters:

  • encoding (String)

    The encoding type used by the API to calculate offsets. Optional.

Returns:



262
263
264
265
266
# File 'lib/google/cloud/language/document.rb', line 262

def syntax encoding: nil
  ensure_service!
  grpc = service.syntax to_grpc, encoding: encoding
  Annotation::Syntax.from_grpc grpc
end

#text!Object

Sets the document to the TEXT format.



123
124
125
# File 'lib/google/cloud/language/document.rb', line 123

def text!
  @grpc.type = :PLAIN_TEXT
end

#text?Boolean

Whether the document is the TEXT format.

Returns:

  • (Boolean)


116
117
118
# File 'lib/google/cloud/language/document.rb', line 116

def text?
  @grpc.type == :PLAIN_TEXT
end