Module: Google::Cloud::Speech

Defined in:: lib/google/cloud/speech.rb,
lib/google/cloud/speech/job.rb,
lib/google/cloud/speech/audio.rb,
lib/google/cloud/speech/result.rb,
lib/google/cloud/speech/stream.rb,
lib/google/cloud/speech/project.rb,
lib/google/cloud/speech/service.rb,
lib/google/cloud/speech/version.rb,
lib/google/cloud/speech/credentials.rb

Overview

Google Cloud Speech

Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application's microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

For more information about Google Cloud Speech API, read the Google Cloud Speech API Documentation.

The goal of google-cloud is to provide an API that is comfortable to Rubyists. Authentication is handled by #speech. You can provide the project and credential information to connect to the Cloud Speech service, or if you are running on Google Compute Engine this configuration is taken care of for you. You can read more about the options for connecting in the Authentication Guide.

Creating audio sources

You can create an audio object that holds a reference to any one of several types of audio data source, along with metadata such as the audio encoding type.

Use Project#audio to create audio sources for the Cloud Speech API. You can provide a file path:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

Or, you can initialize the audio instance with a Google Cloud Storage URI:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "gs://bucket-name/path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

Or, with a Google Cloud Storage File object:

require "google/cloud/storage"

storage = Google::Cloud::Storage.new

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio file, encoding: :raw, sample_rate: 16000

Recognizing speech

The instance methods on Audio can be used to invoke both synchronous and asynchronous versions of the Cloud Speech API speech recognition operation.

Use Audio#recognize for synchronous speech recognition that returns Result objects only after all audio has been processed. This method is limited to audio data of 1 minute or less in duration, and will take roughly the same amount of time to process as the duration of the supplied audio data.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
results = audio.recognize

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Use Audio#recognize_job for asynchronous speech recognition, in which a Job is returned immediately after the audio data has been sent. The job can be refreshed to retrieve Result objects once the audio data has been processed.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
job = audio.recognize_job

job.done? #=> false
job.reload!
job.done? #=> true
results = job.results

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Use Project#stream for streaming audio data for speech recognition, in which a Stream is returned. The stream object can receive results while sending audio by performing bidirectional streaming speech-recognition.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw"

stream = audio.stream encoding: :raw, sample_rate: 16000

# register callback for when a result is returned
stream.on_result do |results|
  result = results.first
  result.transcript #=> "how old is the Brooklyn Bridge"
  result.confidence #=> 0.9826789498329163
end

# Stream 5 seconds of audio from the microphone
# Actual implementation of microphone input varies by platform
5.times do
  stream.send MicrophoneInput.read(32000)
end

stream.stop

Obtaining audio data from input sources such as a Microphone is outside the scope of this document.

Defined Under Namespace

Classes: Audio, InterimResult, Job, Project, Result, Stream

Constant Summary collapse

VERSION =

"0.23.0"

Class Method Summary collapse

.new(project: nil, keyfile: nil, scope: nil, timeout: nil, client_config: nil) ⇒ Google::Cloud::Speech::Project
Creates a new object for connecting to the Speech service.

Class Method Details

.new(project: nil, keyfile: nil, scope: nil, timeout: nil, client_config: nil) ⇒ `Google::Cloud::Speech::Project`

Creates a new object for connecting to the Speech service. Each call creates a new connection.

For more information on connecting to Google Cloud see the Authentication Guide.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

Parameters:

project (String) —
Project identifier for the Speech service you are connecting to.
keyfile (String, Hash) —
Keyfile downloaded from Google Cloud. If file path the file must be readable.
scope (String, Array<String>) —
The OAuth 2.0 scopes controlling the set of resources and operations that the connection can access. See Using OAuth 2.0 to Access Google APIs.

The default scope is:
- https://www.googleapis.com/auth/speech
timeout (Integer) —
Default timeout to use in requests. Optional.
client_config (Hash) —
A hash of values to override the default behavior of the API client. Optional.

Returns:

(Google::Cloud::Speech::Project)

# File 'lib/google/cloud/speech.rb', line 208

def self.new project: nil, keyfile: nil, scope: nil, timeout: nil,
             client_config: nil
  project ||= Google::Cloud::Speech::Project.default_project
  if keyfile.nil?
    credentials = Google::Cloud::Speech::Credentials.default scope: scope
  else
    credentials = Google::Cloud::Speech::Credentials.new(
      keyfile, scope: scope)
  end
  Google::Cloud::Speech::Project.new(
    Google::Cloud::Speech::Service.new(
      project, credentials, timeout: timeout,
                            client_config: client_config))
end