Captions and Transcripts

People with hearing loss depend on media with highly accurate closed captions and transcripts (approximately 99%). Many video platforms provide automatic speech recognition (ASR) captions for uploaded videos, including Zoom, Kaltura, and YouTube.

ASR captions and transcripts are only roughly 80% accurate, which falls below the threshold of accessibility. Platforms with ASR captions also have integrated editors for improving their accuracy.

Accurate captions and transcripts have many affordances that benefit everyone, regardless of ability. Some of those benefits include:

  • Improved note-taking.
  • Improved attention.
  • Improved comprehension for non-native speakers.

Learn more about transcription accuracy.

Captions vs. Subtitles


Captions display a synchronized text version of the speech (and contextual sound descriptions) in a video using the same language.

Closed vs. Open

  • Closed captions can be hidden by the video watcher.
  • Open captions are always displayed and cannot be turned off.


Subtitles display a synchronized text version of the speech translated into a different language.


Success Criteria for Transcripts


Basic transcripts are a text version of speech and contextually relevant audio sounds and are used by individuals who are deaf, hard of hearing, or have difficulty processing auditory information. Descriptive transcripts also include visual information for individuals that are blind.


Interactive transcripts can be created by some media players like Zoom and Kaltura using caption files in order to concurrently display both written speech and the video. Phrases are highlighted in the displayed transcript as it is spoken in the video. Interactive transcripts allow individuals to move within the video by selecting text of interest.

Techniques for Transcripts

Recorded Video Captions

Success Criteria for Recorded Video Captions

When preparing a video, accessibility should be considered in the initial planning.

  • Take steps to ensure the audio will be high-quality with few background noises and that the presenter speaks clearly and not too fast. This will help with comprehension as well as captioning later.

  • The speaker’s face should be easily visible since some people watch mouth movement to assist in their comprehension.

  • Adding verbal descriptions of simple visual information can help blind and low-vision audience members, among others. For more complex information, you may need to add descriptions after the recording.

Techniques for Recorded Video Captions

Live Video Captions

Success Criteria for Live Video Captions

CART Captioning

CART stands for Communication Access Realtime Translation.

Whenever hosting an event with a broad audience like a webinar, it is best to hire a professional CART vendor to provide accurate, real-time captions.

Likewise, if a student has a document disability, they are entitled to receive reasonable accommodations, including human captioning. For more information about hiring a captioning vendor, contact the Disability Resource Center.

Live ASR Captions

For some smaller-scale presentations, hiring a human captioner may not always be possible, partly due to budget constraints.

ASR captions alone are not accurate enough to meet accessibility standards in recorded media. However, ASR captions can still be valuable during live presentations, but Zoom currently does not include live ASR captions.

We recommend using one of two presentation applications available to members of the Rice community. Both applications include live ASR:

Techniques for Live Video Captions