Class: Google::Cloud::Speech::Audio

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/speech/audio.rb

Overview

Audio

Represents a source of audio data, with related metadata such as the audio encoding, sample rate, and language.

See Project#audio.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

results = audio.recognize
result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

See Also:

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#encodingString, Symbol

Encoding of audio data to be recognized.

Acceptable values are:

  • linear16 - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
  • flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
  • mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
  • amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
  • amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
  • ogg_opus - Ogg Mapping for Opus. (OGG_OPUS)

    Lossy codecs do not recommend, as they result in a lower-quality speech transcription.

  • speex - Speex with header byte. (SPEEX_WITH_HEADER_BYTE)

    Lossy codecs do not recommend, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     language: "en-US",
                     sample_rate: 16000

audio.encoding = :linear16
audio.encoding #=> :linear16

Returns:

  • (String, Symbol)


98
99
100
# File 'lib/google/cloud/speech/audio.rb', line 98

def encoding
  @encoding
end

#languageString, Symbol

The language of the supplied audio as a BCP-47 language code. e.g. "en-US" for English (United States), "en-GB" for English (United Kingdom), "fr-FR" for French (France). See Language Support for a list of the currently supported language codes.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     sample_rate: 16000

audio.language = "en-US"
audio.language #=> "en-US"

Returns:

  • (String, Symbol)


122
123
124
# File 'lib/google/cloud/speech/audio.rb', line 122

def language
  @language
end

#sample_rateInteger

Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US"

audio.sample_rate = 16000
audio.sample_rate #=> 16000

Returns:

  • (Integer)


144
145
146
# File 'lib/google/cloud/speech/audio.rb', line 144

def sample_rate
  @sample_rate
end

Instance Method Details

#process(max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Operation Also known as: long_running_recognize, recognize_job

Performs asynchronous speech recognition. Requests are processed asynchronously, meaning a Operation is returned once the audio data has been sent, and can be refreshed to retrieve recognition results once the audio data has been processed.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

op = audio.process
op.done? #=> false
op.reload!
op.done? #=> true
results = op.results

Parameters:

  • max_alternatives (String)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean)

    When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.

  • phrases (Array<String>)

    A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.

Returns:

  • (Operation)

    A resource represents the long-running, asynchronous processing of a speech-recognition operation.

See Also:



262
263
264
265
266
267
268
269
270
271
272
# File 'lib/google/cloud/speech/audio.rb', line 262

def process max_alternatives: nil, profanity_filter: nil,
            phrases: nil
  ensure_speech!

  speech.process self, encoding: encoding,
                       sample_rate: sample_rate,
                       language: language,
                       max_alternatives: max_alternatives,
                       profanity_filter: profanity_filter,
                       phrases: phrases
end

#recognize(max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Array<Result>

Performs synchronous speech recognition. Sends audio data to the Speech API, which performs recognition on that data, and returns results only after all audio has been processed. Limited to audio data of 1 minute or less in duration.

The Speech API will take roughly the same amount of time to process audio data sent synchronously as the duration of the supplied audio data. That is, if you send audio data of 30 seconds in length, expect the synchronous request to take approximately 30 seconds to return results.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

results = audio.recognize
result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Parameters:

  • max_alternatives (String)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean)

    When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.

  • phrases (Array<String>)

    A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.

Returns:

  • (Array<Result>)

    The transcribed text of audio recognized.

See Also:



212
213
214
215
216
217
218
219
220
# File 'lib/google/cloud/speech/audio.rb', line 212

def recognize max_alternatives: nil, profanity_filter: nil, phrases: nil
  ensure_speech!

  speech.recognize self, encoding: encoding, sample_rate: sample_rate,
                         language: language,
                         max_alternatives: max_alternatives,
                         profanity_filter: profanity_filter,
                         phrases: phrases
end