Class: Google::Cloud::Language::Annotation::Token

Inherits:

Object

Object
Google::Cloud::Language::Annotation::Token

show all

Defined in:: lib/google/cloud/language/annotation.rb

Overview

Represents the smallest syntactic building block of the text. Returned by syntactic analysis.

Examples:

require "google/cloud/language"

language = Google::Cloud::Language.new

content = "Star Wars is a great movie. The Death Star is fearsome."
document = language.document content
annotation = document.annotate

annotation.tokens.count #=> 13
token = annotation.tokens.first

token.text_span.text #=> "Star"
token.text_span.offset #=> 0
token.part_of_speech.tag #=> :NOUN
token.part_of_speech.number #=> :SINGULAR
token.head_token_index #=> 1
token.label #=> :TITLE
token.lemma #=> "Star"

Instance Attribute Summary collapse

#head_token_index ⇒ Integer readonly
Represents the head of this token in the dependency tree.
#label ⇒ Symbol readonly
The parse label for the token.
#lemma ⇒ String readonly
Lemma of the token.
#part_of_speech ⇒ PartOfSpeech readonly
Represents part of speech information for a token.
#text_span ⇒ TextSpan readonly
The token text.

Instance Method Summary collapse

#offset ⇒ Integer (also: #begin_offset)
The beginning offset of the content in the original document.
#text ⇒ String (also: #content)
The content of the output text.

Instance Attribute Details

#head_token_index ⇒ `Integer` (readonly)

Represents the head of this token in the dependency tree. This is the index of the token which has an arc going to this token. The index is the position of the token in the array of tokens returned by the API method. If this token is a root token, then the headTokenIndex is its own index.

Returns:

(Integer) —
the current value of head_token_index



540
541
542

# File 'lib/google/cloud/language/annotation.rb', line 540

def head_token_index
  @head_token_index
end

#label ⇒ `Symbol` (readonly)

The parse label for the token.

Returns:

(Symbol) —
the current value of label



540
541
542

# File 'lib/google/cloud/language/annotation.rb', line 540

def label
  @label
end

#lemma ⇒ `String` (readonly)

Lemma of the token.

Returns:

(String) —
the current value of lemma



540
541
542

# File 'lib/google/cloud/language/annotation.rb', line 540

def lemma
  @lemma
end

#part_of_speech ⇒ `PartOfSpeech` (readonly)

Represents part of speech information for a token.

Returns:

(PartOfSpeech) —
the current value of part_of_speech



540
541
542

# File 'lib/google/cloud/language/annotation.rb', line 540

def part_of_speech
  @part_of_speech
end

#text_span ⇒ `TextSpan` (readonly)

The token text.

Returns:

(TextSpan) —
the current value of text_span



540
541
542

# File 'lib/google/cloud/language/annotation.rb', line 540

def text_span
  @text_span
end

Instance Method Details

#offset ⇒ `Integer` Also known as: begin_offset

The beginning offset of the content in the original document. See Google::Cloud::Language::Annotation::TextSpan#offset.

The API calculates the beginning offset according to the client system's default encoding. In Ruby this defaults to UTF-8. To change the offset calculation you will need to change Ruby's default encoding. This is commonly done by setting Encoding.default_internal to Encoding::UTF_16 or Encoding::UTF_32. If the system is configured to use an encoding other than UTF-16 or UTF-32 the offset will be calculated using UTF-8.

Returns:

(Integer)



580
581
582

# File 'lib/google/cloud/language/annotation.rb', line 580

def offset
  @text_span.offset
end

#text ⇒ `String` Also known as: content

The content of the output text. See Google::Cloud::Language::Annotation::TextSpan#text.

Returns:

(String)



560
561
562

# File 'lib/google/cloud/language/annotation.rb', line 560

def text
  @text_span.text
end

Class: Google::Cloud::Language::Annotation::Token

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#head_token_index ⇒ Integer (readonly)

#label ⇒ Symbol (readonly)

#lemma ⇒ String (readonly)

#part_of_speech ⇒ PartOfSpeech (readonly)

#text_span ⇒ TextSpan (readonly)

Instance Method Details

#offset ⇒ Integer Also known as: begin_offset

#text ⇒ String Also known as: content

#head_token_index ⇒ `Integer` (readonly)

#label ⇒ `Symbol` (readonly)

#lemma ⇒ `String` (readonly)

#part_of_speech ⇒ `PartOfSpeech` (readonly)

#text_span ⇒ `TextSpan` (readonly)

#offset ⇒ `Integer` Also known as: begin_offset

#text ⇒ `String` Also known as: content