Class: Google::Cloud::Speech::Project

Inherits:

Object

Object
Google::Cloud::Speech::Project

show all

Defined in:: lib/google/cloud/speech/project.rb

Overview

Project

The Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application's microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

See Google::Cloud#speech

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000
results = audio.recognize

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Instance Method Summary collapse

#audio(source, encoding: nil, language: nil, sample_rate: nil) ⇒ Audio
Returns a new Audio instance from the given source.
#operation(id) ⇒ Operation
Performs asynchronous speech recognition.
#process(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ Operation (also: #long_running_recognize, #recognize_job)
Performs asynchronous speech recognition.
#project_id ⇒ Object (also: #project)
The Speech project connected to.
#recognize(source, encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ Array<Result>
Performs synchronous speech recognition.
#stream(encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil, utterance: nil, interim: nil) ⇒ Stream (also: #stream_recognize)
Creates a Stream object to perform bidirectional streaming speech-recognition: receive results while sending audio.

Instance Method Details

#audio(source, encoding: nil, language: nil, sample_rate: nil) ⇒ `Audio`

Returns a new Audio instance from the given source. No API call is made.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

With a Google Cloud Storage URI:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "gs://bucket-name/path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

With a Google Cloud Storage File object:

require "google/cloud/storage"

storage = Google::Cloud::Storage.new

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio file,
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

Parameters:

source (String, IO, Google::Cloud::Storage::File) —
A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.
encoding (String, Symbol) —
Encoding of audio data to be recognized. Optional.

Acceptable values are:
- linear16 - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
- flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
- mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
- amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
- amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
- ogg_opus - Ogg Mapping for Opus. (OGG_OPUS)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription.
- speex - Speex with header byte. (SPEEX_WITH_HEADER_BYTE)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.
language (String, Symbol) —
The language of the supplied audio as a BCP-47 language code. e.g. "en-US" for English (United States), "en-GB" for English (United Kingdom), "fr-FR" for French (France). See Language Support for a list of the currently supported language codes. Optional.
sample_rate (Integer) —
Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.

Returns:

(Audio) —
The audio file to be recognized.

See Also:

# File 'lib/google/cloud/speech/project.rb', line 177

def audio source, encoding: nil, language: nil, sample_rate: nil
  audio = if source.is_a? Audio
            source.dup
          else
            Audio.from_source source, self
          end
  audio.encoding = encoding unless encoding.nil?
  audio.language = language unless language.nil?
  audio.sample_rate = sample_rate unless sample_rate.nil?
  audio
end

#operation(id) ⇒ `Operation`

Performs asynchronous speech recognition. Requests are processed asynchronously, meaning a Operation is returned once the audio data has been sent, and can be refreshed to retrieve recognition results once the audio data has been processed.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

op = speech.operation "1234567890"

op.done? #=> false
op.reload!

Parameters:

id (String) —
The unique identifier for the long running operation. Required.

Returns:

(Operation) —
A resource represents the long-running, asynchronous processing of a speech-recognition operation.

See Also:

Long-running Operation

# File 'lib/google/cloud/speech/project.rb', line 609

def operation id
  ensure_service!

  grpc = service.get_op id
  Operation.from_grpc grpc
end

#process(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ `Operation` Also known as: long_running_recognize, recognize_job

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

op = speech.process "path/to/audio.raw",
                    encoding: :linear16,
                    language: "en-US",
                    sample_rate: 16000

op.done? #=> false
op.reload!

With a Google Cloud Storage URI:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

op = speech.process "gs://bucket-name/path/to/audio.raw",
                    encoding: :linear16,
                    language: "en-US",
                    sample_rate: 16000

op.done? #=> false
op.reload!

With a Google Cloud Storage File object:

require "google/cloud/storage"

storage = Google::Cloud::Storage.new

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

op = speech.process file,
                    encoding: :linear16,
                    language: "en-US",
                    sample_rate: 16000,
                    max_alternatives: 10

op.done? #=> false
op.reload!

Parameters:

source (String, IO, Google::Cloud::Storage::File) —
A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.
encoding (String, Symbol) —
Encoding of audio data to be recognized. Optional.

Acceptable values are:
- linear16 - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
- flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
- mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
- amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
- amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
- ogg_opus - Ogg Mapping for Opus. (OGG_OPUS)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription.
- speex - Speex with header byte. (SPEEX_WITH_HEADER_BYTE)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.
language (String, Symbol) —
The language of the supplied audio as a BCP-47 language code. e.g. "en-US" for English (United States), "en-GB" for English (United Kingdom), "fr-FR" for French (France). See Language Support for a list of the currently supported language codes. Optional.
sample_rate (Integer) —
Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.
max_alternatives (String) —
The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.
profanity_filter (Boolean) —
When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.
phrases (Array<String>) —
A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.
words (Boolean) —
When true, return a list of words with additional information about each word. Currently, the only additional information provided is the the start and end time offsets. See Result#words. Default is false.

Returns:

(Operation) —
A resource represents the long-running, asynchronous processing of a speech-recognition operation.

See Also:

Asynchronous Speech API Responses

# File 'lib/google/cloud/speech/project.rb', line 444

def process source, encoding: nil, sample_rate: nil, language: nil,
            max_alternatives: nil, profanity_filter: nil, phrases: nil,
            words: nil
  ensure_service!

  audio_obj = audio source, encoding: encoding, language: language,
                            sample_rate: sample_rate

  config = audio_config(
    encoding: audio_obj.encoding, sample_rate: audio_obj.sample_rate,
    language: audio_obj.language, max_alternatives: max_alternatives,
    profanity_filter: profanity_filter, phrases: phrases,
    words: words
  )

  grpc = service.recognize_async audio_obj.to_grpc, config
  Operation.from_grpc grpc
end

#project_id ⇒ `Object` Also known as: project

The Speech project connected to.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new(
  project_id: "my-project",
  credentials: "/path/to/keyfile.json"
)

speech.project_id #=> "my-project"



78
79
80

# File 'lib/google/cloud/speech/project.rb', line 78

def project_id
  service.project
end

#recognize(source, encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ `Array<Result>`

Performs synchronous speech recognition. Sends audio data to the Speech API, which performs recognition on that data, and returns results only after all audio has been processed. Limited to audio data of 1 minute or less in duration.

The Speech API will take roughly the same amount of time to process audio data sent synchronously as the duration of the supplied audio data. That is, if you send audio data of 30 seconds in length, expect the synchronous request to take approximately 30 seconds to return results.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

results = speech.recognize "path/to/audio.raw",
                           encoding: :linear16,
                           language: "en-US",
                           sample_rate: 16000

With a Google Cloud Storage URI:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

results = speech.recognize "gs://bucket-name/path/to/audio.raw",
                           encoding: :linear16,
                           language: "en-US",
                           sample_rate: 16000

With a Google Cloud Storage File object:

require "google/cloud/storage"

storage = Google::Cloud::Storage.new

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

results = speech.recognize file,
                           encoding: :linear16,
                           language: "en-US",
                           sample_rate: 16000,
                           max_alternatives: 10

Parameters:

source (String, IO, Google::Cloud::Storage::File) —
A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.
encoding (String, Symbol) —
Encoding of audio data to be recognized. Optional.

Acceptable values are:
- linear16 - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
- flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
- mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
- amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
- amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
- ogg_opus - Ogg Mapping for Opus. (OGG_OPUS)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription.
- speex - Speex with header byte. (SPEEX_WITH_HEADER_BYTE)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.
language (String, Symbol) —
The language of the supplied audio as a BCP-47 language code. e.g. "en-US" for English (United States), "en-GB" for English (United Kingdom), "fr-FR" for French (France). See Language Support for a list of the currently supported language codes. Optional.
sample_rate (Integer) —
Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.
max_alternatives (String) —
The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.
profanity_filter (Boolean) —
When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.
phrases (Array<String>) —
A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.
words (Boolean) —
When true, return a list of words with additional information about each word. Currently, the only additional information provided is the the start and end time offsets. See Result#words. Default is false.

Returns:

(Array<Result>) —
The transcribed text of audio recognized.

See Also:

# File 'lib/google/cloud/speech/project.rb', line 305

def recognize source, encoding: nil, language: nil, sample_rate: nil,
              max_alternatives: nil, profanity_filter: nil,
              phrases: nil, words: nil
  ensure_service!

  audio_obj = audio source, encoding: encoding, language: language,
                            sample_rate: sample_rate

  config = audio_config(
    encoding: audio_obj.encoding, sample_rate: audio_obj.sample_rate,
    language: audio_obj.language, max_alternatives: max_alternatives,
    profanity_filter: profanity_filter, phrases: phrases,
    words: words
  )

  grpc = service.recognize_sync audio_obj.to_grpc, config
  grpc.results.map do |result_grpc|
    Result.from_grpc result_grpc
  end
end

#stream(encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil, utterance: nil, interim: nil) ⇒ `Stream` Also known as: stream_recognize

Creates a Stream object to perform bidirectional streaming speech-recognition: receive results while sending audio.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

stream = speech.stream encoding: :linear16,
                       language: "en-US",
                       sample_rate: 16000

# Stream 5 seconds of audio from the microphone
# Actual implementation of microphone input varies by platform
5.times do
  stream.send MicrophoneInput.read(32000)
end

stream.stop
stream.wait_until_complete!

results = stream.results
result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Parameters:

encoding (String, Symbol) —
Encoding of audio data to be recognized. Optional.

Acceptable values are:
- linear16 - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
- flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
- mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
- amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
- amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
- ogg_opus - Ogg Mapping for Opus. (OGG_OPUS)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription.
- speex - Speex with header byte. (SPEEX_WITH_HEADER_BYTE)
Lossy codecs do not recommend, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.
language (String, Symbol) —
The language of the supplied audio as a BCP-47 language code. e.g. "en-US" for English (United States), "en-GB" for English (United Kingdom), "fr-FR" for French (France). See Language Support for a list of the currently supported language codes. Optional.
sample_rate (Integer) —
Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.
max_alternatives (String) —
The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.
profanity_filter (Boolean) —
When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.
phrases (Array<String>) —
A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.
words (Boolean) —
When true, return a list of words with additional information about each word. Currently, the only additional information provided is the the start and end time offsets. See Result#words. Default is false.
utterance (Boolean) —
When true, the service will perform continuous recognition (continuing to process audio even if the user pauses speaking) until the client closes the output stream (gRPC API) or when the maximum time limit has been reached. Default is false.
interim (Boolean) —
When true, interim results (tentative hypotheses) may be returned as they become available. Default is false.

Returns:

(Stream) —
A resource that represents the streaming requests and responses.

See Also:

Streaming Speech API Recognition Requests

# File 'lib/google/cloud/speech/project.rb', line 560

def stream encoding: nil, language: nil, sample_rate: nil,
           max_alternatives: nil, profanity_filter: nil, phrases: nil,
           words: nil, utterance: nil, interim: nil
  ensure_service!

  grpc_req = V1::StreamingRecognizeRequest.new(
    streaming_config: V1::StreamingRecognitionConfig.new(
      {
        config: audio_config(encoding: convert_encoding(encoding),
                             language: language,
                             sample_rate: sample_rate,
                             max_alternatives: max_alternatives,
                             profanity_filter: profanity_filter,
                             phrases: phrases, words: words),
        single_utterance: utterance,
        interim_results: interim
      }.delete_if { |_, v| v.nil? }
    )
  )

  Stream.new service, grpc_req
end

Class: Google::Cloud::Speech::Project

Overview

Project

Instance Method Summary collapse

Instance Method Details

#audio(source, encoding: nil, language: nil, sample_rate: nil) ⇒ Audio

#operation(id) ⇒ Operation

#process(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ Operation Also known as: long_running_recognize, recognize_job

#project_id ⇒ Object Also known as: project

#recognize(source, encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ Array<Result>

#stream(encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil, utterance: nil, interim: nil) ⇒ Stream Also known as: stream_recognize

#audio(source, encoding: nil, language: nil, sample_rate: nil) ⇒ `Audio`

#operation(id) ⇒ `Operation`

#process(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ `Operation` Also known as: long_running_recognize, recognize_job

#project_id ⇒ `Object` Also known as: project

#recognize(source, encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil) ⇒ `Array<Result>`

#stream(encoding: nil, language: nil, sample_rate: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil, words: nil, utterance: nil, interim: nil) ⇒ `Stream` Also known as: stream_recognize