Rev.ai Provider

The Rev.ai provider contains language model support for the Rev.ai transcription API.

Setup

The Rev.ai provider is available in the @ai-sdk/revai module. You can install it with

pnpm add @ai-sdk/revai

Provider Instance

You can import the default provider instance revai from @ai-sdk/revai:

import { revai } from '@ai-sdk/revai';

If you need a customized setup, you can import createRevai from @ai-sdk/revai and create a provider instance with your settings:

import { createRevai } from '@ai-sdk/revai';
const revai = createRevai({
// custom settings, e.g.
fetch: customFetch,
});

You can use the following optional settings to customize the Rev.ai provider instance:

  • apiKey string

    API key that is being sent using the Authorization header. It defaults to the REVAI_API_KEY environment variable.

  • headers Record<string,string>

    Custom headers to include in the requests.

  • fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>

    Custom fetch implementation. Defaults to the global fetch function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Transcription Models

You can create models that call the Rev.ai transcription API using the .transcription() factory method.

The first argument is the model id e.g. machine.

const model = revai.transcription('machine');

You can also pass additional provider-specific options using the providerOptions argument. For example, supplying the input language in ISO-639-1 (e.g. en) format can sometimes improve transcription performance if known beforehand.

import { experimental_transcribe as transcribe } from 'ai';
import { revai } from '@ai-sdk/revai';
import { readFile } from 'fs/promises';
const result = await transcribe({
model: revai.transcription('machine'),
audio: await readFile('audio.mp3'),
providerOptions: { revai: { language: 'en' } },
});

The following provider options are available:

  • metadata string

    Optional metadata string to associate with the transcription job.

  • notification_config object

    Configuration for webhook notifications when job is complete.

    • url string - URL to send the notification to.
    • auth_headers object - Optional authorization headers for the notification request.
      • Authorization string - Authorization header value.
  • delete_after_seconds integer

    Number of seconds after which the job will be automatically deleted.

  • verbatim boolean

    Whether to include filler words and false starts in the transcription.

  • rush boolean

    [HIPAA Unsupported] Whether to prioritize the job for faster processing. Only available for human transcriber option.

  • test_mode boolean

    Whether to run the job in test mode. Default is false.

  • segments_to_transcribe Array

    Specific segments of the audio to transcribe.

    • start number - Start time of the segment in seconds.
    • end number - End time of the segment in seconds.
  • speaker_names Array

    Names to assign to speakers in the transcription.

    • display_name string - Display name for the speaker.
  • skip_diarization boolean

    Whether to skip speaker diarization. Default is false.

  • skip_postprocessing boolean

    Whether to skip post-processing steps. Only available for English and Spanish languages. Default is false.

  • skip_punctuation boolean

    Whether to skip adding punctuation to the transcription. Default is false.

  • remove_disfluencies boolean

    Whether to remove disfluencies (um, uh, etc.) from the transcription. Default is false.

  • remove_atmospherics boolean

    Whether to remove atmospheric sounds (like <laugh>, <affirmative>) from the transcription. Default is false.

  • filter_profanity boolean

    Whether to filter profanity from the transcription by replacing characters with asterisks except for the first and last. Default is false.

  • speaker_channels_count integer

    Number of speaker channels in the audio. Only available for English, Spanish and French languages.

  • speakers_count integer

    Expected number of speakers in the audio. Only available for English, Spanish and French languages.

  • diarization_type string

    Type of diarization to use. Possible values: "standard" (default), "premium".

  • custom_vocabulary_id string

    ID of a custom vocabulary to use for the transcription, submitted through the Custom Vocabularies API.

  • custom_vocabularies Array

    Custom vocabularies to use for the transcription.

  • strict_custom_vocabulary boolean

    Whether to strictly enforce custom vocabulary.

  • summarization_config object

    Configuration for generating a summary of the transcription.

    • model string - Model to use for summarization. Possible values: "standard" (default), "premium".
    • type string - Format of the summary. Possible values: "paragraph" (default), "bullets".
    • prompt string - Custom prompt for the summarization (mutually exclusive with type).
  • translation_config object

    Configuration for translating the transcription.

    • target_languages Array - Target languages for translation. Each item is an object with:
      • language string - Language code. Possible values: "en", "en-us", "en-gb", "ar", "pt", "pt-br", "pt-pt", "fr", "fr-ca", "es", "es-es", "es-la", "it", "ja", "ko", "de", "ru".
    • model string - Model to use for translation. Possible values: "standard" (default), "premium".
  • language string

    Language of the audio content, provided as an ISO 639-1 language code. Default is "en".

  • forced_alignment boolean

    Whether to perform forced alignment, which provides improved accuracy for per-word timestamps. Default is false.

    Currently supported languages:

    • English (en, en-us, en-gb)
    • French (fr)
    • Italian (it)
    • German (de)
    • Spanish (es)

    Note: This option is not available in low-cost environment.

Model Capabilities

ModelTranscriptionDurationSegmentsLanguage
machine
low_cost
fusion