Deepgram Provider

The Deepgram provider contains language model support for the Deepgram transcription and speech generation APIs.

Setup

The Deepgram provider is available in the @ai-sdk/deepgram module. You can install it with

pnpm add @ai-sdk/deepgram

Provider Instance

You can import the default provider instance deepgram from @ai-sdk/deepgram:

import { deepgram } from '@ai-sdk/deepgram';

If you need a customized setup, you can import createDeepgram from @ai-sdk/deepgram and create a provider instance with your settings:

import { createDeepgram } from '@ai-sdk/deepgram';

const deepgram = createDeepgram({
  // custom settings, e.g.
  fetch: customFetch,
});

You can use the following optional settings to customize the Deepgram provider instance:

apiKey string

API key that is being sent using the Authorization header. It defaults to the DEEPGRAM_API_KEY environment variable.
headers Record<string,string>

Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>

Custom fetch implementation. Defaults to the global fetch function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Speech Models

You can create models that call the Deepgram text-to-speech API using the .speech() factory method.

The first argument is the model id, which includes the voice. Deepgram embeds the voice directly in the model ID (e.g., aura-2-helena-en).

const model = deepgram.speech('aura-2-helena-en');

You can use the model with the generateSpeech function:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { deepgram } from '@ai-sdk/deepgram';

const result = await generateSpeech({
  model: deepgram.speech('aura-2-helena-en'),
  text: 'Hello, world!',
});

You can also pass additional provider-specific options using the providerOptions argument:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { deepgram } from '@ai-sdk/deepgram';

const result = await generateSpeech({
  model: deepgram.speech('aura-2-helena-en'),
  text: 'Hello, world!',
  providerOptions: {
    deepgram: {
      encoding: 'linear16',
      sampleRate: 24000,
    },
  },
});

The following provider options are available:

encoding string

Encoding type for the audio output. Supported values: 'linear16', 'mulaw', 'alaw', 'mp3', 'opus', 'flac', 'aac'. Optional.
container string

Container format for the output audio. Supported values: 'wav', 'ogg', 'none'. Optional.
sampleRate number

Sample rate for the output audio in Hz. Supported values depend on the encoding: 8000, 16000, 24000, 32000, 48000. Optional.
bitRate number | string

Bitrate of the audio in bits per second. For mp3: 32000 or 48000. For opus: 4000 to 650000. For aac: 4000 to 192000. Optional.
callback string

URL to which Deepgram will make a callback request with the audio. Optional.
callbackMethod enum

HTTP method for the callback request. Allowed values: 'POST', 'PUT'. Optional.
mipOptOut boolean

Opts out requests from the Deepgram Model Improvement Program. Optional.
tag string | array of strings

Label your requests for identification during usage reporting. Optional.

Model Capabilities

Model
`aura-2-asteria-en`
`aura-2-thalia-en`
`aura-2-helena-en`
`aura-2-orpheus-en`
`aura-2-zeus-en`
`aura-asteria-en`
`aura-luna-en`
`aura-stella-en`
+ more voices

Transcription Models

You can create models that call the Deepgram transcription API using the .transcription() factory method.

The first argument is the model id e.g. nova-3.

const model = deepgram.transcription('nova-3');

You can also pass additional provider-specific options using the providerOptions argument. For example, supplying the summarize option will enable summaries for sections of content.

import { experimental_transcribe as transcribe } from 'ai';
import { deepgram } from '@ai-sdk/deepgram';
import { readFile } from 'fs/promises';

const result = await transcribe({
  model: deepgram.transcription('nova-3'),
  audio: await readFile('audio.mp3'),
  providerOptions: { deepgram: { summarize: true } },
});

The following provider options are available:

language string

Language code for the audio. Supports numerous ISO-639-1 and ISO-639-3 language codes. Optional.
detectLanguage boolean

Whether to enable automatic language detection. When true, Deepgram will detect the language of the audio. Optional.
smartFormat boolean

Whether to apply smart formatting to the transcription. Optional.
punctuate boolean

Whether to add punctuation to the transcription. Optional.
summarize enum | boolean

Whether to generate a summary of the transcription. Allowed values: 'v2', false. Optional.
topics boolean

Whether to detect topics in the transcription. Optional.
detectEntities boolean

Whether to detect entities in the transcription. Optional.
redact string | array of strings

Specifies what content to redact from the transcription. Optional.
search string

Search term to find in the transcription. Optional.
diarize boolean

Whether to identify different speakers in the transcription. Defaults to true. Optional.
utterances boolean

Whether to segment the transcription into utterances. Optional.
uttSplit number

Threshold for splitting utterances. Optional.
fillerWords boolean

Whether to include filler words (um, uh, etc.) in the transcription. Optional.

Model Capabilities

Model	Transcription	Duration	Segments	Language
`nova-3` (+ variants)
`nova-2` (+ variants)
`nova` (+ variants)
`enhanced` (+ variants)
`base` (+ variants)