Class: Google::Cloud::Language::Annotation::Token

Inherits:

Object

Object
Google::Cloud::Language::Annotation::Token

show all

Defined in:: lib/google/cloud/language/annotation.rb

Overview

Represents the smallest syntactic building block of the text. Returned by syntactic analysis.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Star Wars is a great movie. The Death Star is fearsome."
document = language.document content
annotation = document.annotate

annotation.tokens.count #=> 13
token = annotation.tokens.first

token.text_span.text #=> "Star"
token.text_span.offset #=> 0
token.part_of_speech.tag #=> :NOUN
token.part_of_speech.number #=> :SINGULAR
token.head_token_index #=> 1
token.label #=> :TITLE
token.lemma #=> "Star"

Instance Attribute Summary collapse

#head_token_index ⇒ Integer readonly
Represents the head of this token in the dependency tree.
#label ⇒ Symbol readonly
The parse label for the token.
#lemma ⇒ String readonly
Lemma of the token.
#part_of_speech ⇒ PartOfSpeech readonly
Represents part of speech information for a token.
#text_span ⇒ TextSpan readonly
The token text.

Instance Method Summary collapse

#offset ⇒ Integer (also: #begin_offset)
The beginning offset of the content in the original document.
#text ⇒ String (also: #content)
The content of the output text.

Instance Attribute Details

#head_token_index ⇒ `Integer` (readonly)

Represents the head of this token in the dependency tree. This is the index of the token which has an arc going to this token. The index is the position of the token in the array of tokens returned by the API method. If this token is a root token, then the headTokenIndex is its own index.

Returns:

(Integer) —
the current value of head_token_index



539
540
541

# File 'lib/google/cloud/language/annotation.rb', line 539

def head_token_index
  @head_token_index
end

#label ⇒ `Symbol` (readonly)

The parse label for the token.

Returns:

(Symbol) —
the current value of label



539
540
541

# File 'lib/google/cloud/language/annotation.rb', line 539

def label
  @label
end

#lemma ⇒ `String` (readonly)

Lemma of the token.

Returns:

(String) —
the current value of lemma



539
540
541

# File 'lib/google/cloud/language/annotation.rb', line 539

def lemma
  @lemma
end

#part_of_speech ⇒ `PartOfSpeech` (readonly)

Represents part of speech information for a token.

Returns:

(PartOfSpeech) —
the current value of part_of_speech



539
540
541

# File 'lib/google/cloud/language/annotation.rb', line 539

def part_of_speech
  @part_of_speech
end

#text_span ⇒ `TextSpan` (readonly)

The token text.

Returns:

(TextSpan) —
the current value of text_span



539
540
541

# File 'lib/google/cloud/language/annotation.rb', line 539

def text_span
  @text_span
end

Instance Method Details

#offset ⇒ `Integer` Also known as: begin_offset

The beginning offset of the content in the original document. See Google::Cloud::Language::Annotation::TextSpan#offset.

The API calculates the beginning offset according to the client system's default encoding. In Ruby this defaults to UTF-8. To change the offset calculation you will need to change Ruby's default encoding. This is commonly done by setting Encoding.default_internal to Encoding::UTF_16 or Encoding::UTF_32. If the system is configured to use an encoding other than UTF-16 or UTF-32 the offset will be calculated using UTF-8.

Returns:

(Integer)



579
580
581

# File 'lib/google/cloud/language/annotation.rb', line 579

def offset
  @text_span.offset
end

#text ⇒ `String` Also known as: content

The content of the output text. See Google::Cloud::Language::Annotation::TextSpan#text.

Returns:

(String)



559
560
561

# File 'lib/google/cloud/language/annotation.rb', line 559

def text
  @text_span.text
end

Class: Google::Cloud::Language::Annotation::Token

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#head_token_index ⇒ Integer (readonly)

#label ⇒ Symbol (readonly)

#lemma ⇒ String (readonly)

#part_of_speech ⇒ PartOfSpeech (readonly)

#text_span ⇒ TextSpan (readonly)

Instance Method Details

#offset ⇒ Integer Also known as: begin_offset

#text ⇒ String Also known as: content

#head_token_index ⇒ `Integer` (readonly)

#label ⇒ `Symbol` (readonly)

#lemma ⇒ `String` (readonly)

#part_of_speech ⇒ `PartOfSpeech` (readonly)

#text_span ⇒ `TextSpan` (readonly)

#offset ⇒ `Integer` Also known as: begin_offset

#text ⇒ `String` Also known as: content