Class: Google::Cloud::Speech::Project

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/speech/project.rb

Overview

Project

The Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application's microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

See Google::Cloud#speech

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
results = audio.recognize

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 88.15

Instance Method Summary collapse

Instance Method Details

#audio(source, encoding: nil, sample_rate: nil, language: nil) ⇒ Audio

Returns a new Audio instance from the given source. No API call is made.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

With a Google Cloud Storage URI:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

audio = speech.audio "gs://bucket-name/path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

With a Google Cloud Storage File object:

require "google/cloud"

gcloud = Google::Cloud.new
storage = gcloud.storage

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

speech = gcloud.speech

audio = speech.audio file, encoding: :raw, sample_rate: 16000

Parameters:

  • source (String, IO, Google::Cloud::Storage::File)

    A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.

  • encoding (String, Symbol)

    Encoding of audio data to be recognized. Optional.

    Acceptable values are:

    • raw - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
    • flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
    • mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
    • amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
    • amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
  • sample_rate (Integer)

    Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.

  • language (String)

    The language of the supplied audio as a https://www.rfc-editor.org/rfc/bcp/bcp47.txt language code. If not specified, the language defaults to "en-US". See Language Support for a list of the currently supported language codes. Optional.

Returns:

  • (Audio)

    The audio file to be recognized.

See Also:



168
169
170
171
172
173
174
175
176
177
178
# File 'lib/google/cloud/speech/project.rb', line 168

def audio source, encoding: nil, sample_rate: nil, language: nil
  if source.is_a? Audio
    audio = source.dup
  else
    audio = Audio.from_source source, self
  end
  audio.encoding = encoding unless encoding.nil?
  audio.sample_rate = sample_rate unless sample_rate.nil?
  audio.language = language unless language.nil?
  audio
end

#projectObject

The Speech project connected to.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new "my-project-id",
                           "/path/to/keyfile.json"
speech = gcloud.speech

speech.project #=> "my-project-id"


76
77
78
# File 'lib/google/cloud/speech/project.rb', line 76

def project
  service.project
end

#recognize(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Array<Result>

Performs synchronous speech recognition. Sends audio data to the Speech API, which performs recognition on that data, and returns results only after all audio has been processed. Limited to audio data of 1 minute or less in duration.

The Speech API will take roughly the same amount of time to process audio data sent synchronously as the duration of the supplied audio data. That is, if you send audio data of 30 seconds in length, expect the synchronous request to take approximately 30 seconds to return results.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

results = speech.recognize "path/to/audio.raw",
                           encoding: :raw, sample_rate: 16000

With a Google Cloud Storage URI:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

results = speech.recognize "gs://bucket-name/path/to/audio.raw",
                           encoding: :raw, sample_rate: 16000

With a Google Cloud Storage File object:

require "google/cloud"

gcloud = Google::Cloud.new
storage = gcloud.storage

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

speech = gcloud.speech

results = speech.recognize file, encoding: :raw,
                           sample_rate: 16000,
                           max_alternatives: 10

Parameters:

  • source (String, IO, Google::Cloud::Storage::File)

    A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.

  • encoding (String, Symbol)

    Encoding of audio data to be recognized. Optional.

    Acceptable values are:

    • raw - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
    • flac - The Free Lossless Audio Codec encoding. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported. (FLAC)
    • mulaw - 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. (MULAW)
    • amr - Adaptive Multi-Rate Narrowband codec. (sample_rate must be 8000 Hz.) (AMR)
    • amr_wb - Adaptive Multi-Rate Wideband codec. (sample_rate must be 16000 Hz.) (AMR_WB)
  • sample_rate (Integer)

    Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.

  • language (String)

    The language of the supplied audio as a https://www.rfc-editor.org/rfc/bcp/bcp47.txt language code. If not specified, the language defaults to "en-US". See Language Support for a list of the currently supported language codes. Optional.

  • max_alternatives (String)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean)

    When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.

  • phrases (Array<String>)

    A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.

Returns:

  • (Array<Result>)

    The transcribed text of audio recognized.

See Also:



278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# File 'lib/google/cloud/speech/project.rb', line 278

def recognize source, encoding: nil, sample_rate: nil, language: nil,
              max_alternatives: nil, profanity_filter: nil, phrases: nil
  ensure_service!

  audio_obj = audio source, encoding: encoding,
                            sample_rate: sample_rate, language: language

  config = audio_config(
    encoding: audio_obj.encoding, sample_rate: audio_obj.sample_rate,
    language: audio_obj.language, max_alternatives: max_alternatives,
    profanity_filter: profanity_filter, phrases: phrases)

  grpc = service.recognize_sync audio_obj.to_grpc, config
  grpc.results.map do |result_grpc|
    Result.from_grpc result_grpc
  end
end

#recognize_job(source, encoding: nil, sample_rate: nil, language: nil, max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Job

Performs asynchronous speech recognition. Requests are processed asynchronously, meaning a Job is returned once the audio data has been sent, and can be refreshed to retrieve recognition results once the audio data has been processed.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

job = speech.recognize_job "path/to/audio.raw",
                           encoding: :raw, sample_rate: 16000

job.done? #=> false
job.reload!

With a Google Cloud Storage URI:

require "google/cloud"

gcloud = Google::Cloud.new
speech = gcloud.speech

job = speech.recognize_job "gs://bucket-name/path/to/audio.raw",
                           encoding: :raw, sample_rate: 16000

job.done? #=> false
job.reload!

With a Google Cloud Storage File object:

require "google/cloud"

gcloud = Google::Cloud.new
storage = gcloud.storage

bucket = storage.bucket "bucket-name"
file = bucket.file "path/to/audio.raw"

speech = gcloud.speech

job = speech.recognize_job file, encoding: :raw,
                           sample_rate: 16000,
                           max_alternatives: 10

job.done? #=> false
job.reload!

Parameters:

  • source (String, IO, Google::Cloud::Storage::File)

    A string of the path to the audio file to be recognized, or a File or other IO object of the audio contents, or a Cloud Storage URI of the form "gs://bucketname/path/to/document.ext"; or an instance of Google::Cloud::Storage::File of the text to be annotated.

  • encoding (String, Symbol)

    Encoding of audio data to be recognized. Optional.

    Currently, the only acceptable value is:

    • raw - Uncompressed 16-bit signed little-endian samples. (LINEAR16)
  • sample_rate (Integer)

    Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Optional.

  • language (String)

    The language of the supplied audio as a https://www.rfc-editor.org/rfc/bcp/bcp47.txt language code. If not specified, the language defaults to "en-US". See Language Support for a list of the currently supported language codes. Optional.

  • max_alternatives (String)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean)

    When true, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". Default is false.

  • phrases (Array<String>)

    A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. See usage limits. Optional.

Returns:

  • (Job)

    A resource represents the long-running, asynchronous processing of a speech-recognition operation.

See Also:



386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
# File 'lib/google/cloud/speech/project.rb', line 386

def recognize_job source, encoding: nil, sample_rate: nil,
                  language: nil, max_alternatives: nil,
                  profanity_filter: nil, phrases: nil
  ensure_service!

  audio_obj = audio source, encoding: encoding,
                            sample_rate: sample_rate, language: language

  config = audio_config(
    encoding: audio_obj.encoding, sample_rate: audio_obj.sample_rate,
    language: audio_obj.language, max_alternatives: max_alternatives,
    profanity_filter: profanity_filter, phrases: phrases)

  grpc = service.recognize_async audio_obj.to_grpc, config
  Job.from_grpc grpc, service
end