Captions and Transcripts

Many video platforms in use at Rice, including Zoom, Kaltura, and YouTube, provide automatic speech recognition (ASR) captions for uploaded videos. ASR captions and transcripts are only roughly 80% accurate, which falls below the accessibility threshold. Platforms with ASR captions have integrated editors where content creators can edit spelling, punctuation and other errors to ensure accuracy.

Accurate captions and transcripts have many affordances that benefit everyone, but remember that they are essential for some disabled people. Some of the broader benefits include:

Improved note-taking.
Improved attention.
Improved comprehension for non-native speakers.

Learn more about transcription accuracy.

Captions vs. Subtitles

Captions

Captions display a synchronized text version of the speech (and contextual sound descriptions) in a video using the same language.

Closed vs. Open

The video watcher can hide closed captions.
Open captions are always displayed and cannot be turned off.

Subtitles

Subtitles display a synchronized text version of the speech translated into a different language.

Transcripts

Success Criteria for Transcripts

1.2.1. Alternatives for Prerecorded Audio-Only and Video-Only Content (A)

Basic

Basic transcripts are a text version of speech and contextually relevant audio sounds and are used by individuals who are deaf, hard of hearing, or have difficulty processing auditory information. Descriptive transcripts also include visual information for individuals that are blind.

Interactive

Interactive transcripts can be created by some media players like Zoom and Kaltura using caption files in order to concurrently display both written speech and the video. Phrases are highlighted in the displayed transcript as it is spoken in the video. Interactive transcripts allow individuals to move within the video by selecting text of interest.

Techniques for Transcripts

Recorded Video Captions

Success Criteria for Recorded Video Captions

When preparing a video, accessibility should be considered in the initial planning.

Take steps to ensure the audio will be high-quality with few background noises and that the presenter speaks clearly and not too fast. This will help with comprehension as well as captioning later.
The speaker’s face should be easily visible since some people watch mouth movement to assist in their comprehension.
Adding verbal descriptions of simple visual information can help blind and low-vision audience members, among others. For more complex information, you may need to add descriptions after the recording.

Techniques for Recorded Video Captions

Live Video Captions

Success Criteria for Live Video Captions

1.2.4. Captions (Live) (AA)

CART Captioning

CART stands for Communication Access Realtime Translation.

Whenever hosting an event with a broad audience like a webinar, hiring a professional CART vendor to provide accurate, real-time captions is best.

Likewise, if a student has a document disability, they are entitled to reasonable accommodations, including human captioning. For more information about hiring a captioning vendor, contact the Disability Resource Center.

Live ASR Captions

For some smaller-scale presentations, hiring a human captioner may not always be possible, partly due to budget constraints.

ASR captions alone are not accurate enough to meet accessibility standards in recorded media. However, ASR captions can still be valuable during live presentations.

Zoom currently offers live ASR captions, and participants can turn on this feature in any Zoom meeting by using the Show Captions option in the actions bar at the bottom of the meeting room screen. Encourage colleagues and students to use this feature, as they can follow along with a live transcript or change the language in which captions are displayed.

We recommend using one of two presentation applications available to members of the Rice community. Both applications include live ASR:

PowerPoint 365
Google Slides

Techniques for Live Video Captions

Creating captions for live synchronized media

Description of Visual Information

Description of visual information is called audio description and provides content to people who are blind and others that cannot see or distinguish what is occurring in a video (W3C). Audio description explains visual information that is needed to understand the content of the video.

Many times, description can be integrated into the script of the video as the speaker explains content. This approach is called integrated description so that the visually impaired can fully understand the content being shown. View an example of a training video with the description integrated in what the trainer is saying (YouTube).

A different approach to ensure equal access to the visual information of a video, is to create a separate video where description is included, usually done after the initial creation of the video and with different voice. View an example of an alternative story video with audio description in a different voice (YouTube).

Does My Media Need Description?

This section from the W3C tells you:

What is required in the WCAG standard at Level A, AA, and AAA. (WCAG is introduced in the Planning page of this resource.)
What is needed to meet user needs, beyond WCAG. If there are no “A”s, then it is not required in WCAG.