An e-learning chemistry lesson: how to mix text and audio

I came away from last week’s eLearning Network event on rich media with a long list of takeaway lessons, things to try and topics to explore further. (I wasn’t the only one, as the Twitter backchannel shows.) One of these is the enduring question of how to use text and audio within e-learning, which prompted some debate and some interesting experiment results.

During Clive Shepherd’s session on media chemistry, we were separated into groups, each considering one element of online communication (text, images, audio, animation or video) and establishing its advantages and disadvantages.

We also had to identify the other element that it is most compatible with (simultaneously, as opposed to sequentially throughout an e-learning course). This opened up the discussion about the relationship between text and audio.

From what I could gather, there were three views:

  • Text and audio (speech, as opposed to music or sound effects) should not be used together.

This is the key message from Clive’s handbook on media chemistry, in which he says: ‘As a verbal element, text clashes badly with a second verbal element such as speech. Text plus speech causes all sorts of confusion and overload for the user. The brain cannot process two verbal inputs simultaneously, so the user has to block out one element (usually speech because this is conveyed much more slowly than text) in order to concentrate on the other.’

  • Text and audio can be used together, but only if they present different information – so the text shouldn’t be a verbatim transcript of the audio.

I posted the question on Twitter during the discussion and got a couple of responses from people who are against using audio as a verbatim accompaniment to text – presumably because it doesn’t add anything – but are in favour of using audio alongside text, as long as they don’t say the same thing. For example, you might have three sentences of text, highlighting the three key points from that screen, with an audio narrative that elaborates on those points.

  • Text and audio can be used together, as long as they present the same information verbatim.

This is the approach I’ve used most often myself, but always with the option for the learner to switch the audio off. (This is probably part of the reason why I take this approach in e-learning, but not in live presentations.) People learn in different ways so I like to offer this degree of choice. I’m much happier reading text at my own pace, and find it frustrating if my pace is dictated by the pace of pre-recorded audio. I also don’t respond so well to audio alone. I’ve never been much of a radio-listener, for example, as I find myself tuning out very quickly, even if I switched it on for something particular like a weather forecast or travel news. So while I can see Clive’s point when he says that ‘if the words [from the audio] are replicated on the screen as text, the user stands to be confused and frustrated’, I don’t entirely agree. Yes, maybe they are likely to switch one or the other off, but I would find myself far more frustrated by being forced to rely on audio alone than by being offered the choice.

Some of these views were put to the test in the next session, during which Tony Frascina conducted a little experiment. Tony had prepared three passages, each demonstrating a different text/audio relationship. We were asked to read and listen to each passage in turn, answering a series of questions after each one:

  • A passage on Komodo dragons, with very little text but lots of supporting audio, accompanied by graphics with key words as labels.
  • A passage on the quickstep, where the text and audio made the same point, but with slightly different words and in a slightly different sequence.
  • A passage on violin bows, where the audio script had been written first and the text was a slightly abbreviated version of that script.

Which test do you think we (on average) scored best on? Out of a possible 11 points, the average scores were:

  • 4.5 for the Komodo dragon piece.
  • 5.9 for the quickstep piece.
  • 6.3 for the violin bow piece.

I would have liked to take it a little further, adding two additional passages so we could see average scores for a text-only passage and an audio-only passage. But even these three results provide interesting food for thoughts. For me, the experiment has taught me two lessons:

  • The approach I’ve used in the past is not as bad as some people would have you believe. Providing text with verbatim audio is not detrimental to learning; providing there is an option to switch the audio off (and perhaps also an option to hide the text), it only offers the benefit of giving learners a little more control over their learning experience.
  • I will continue to veer away from providing basic text with more detail or alternative wording in the audio narrative – unless I find evidence to the contrary! Tony’s experiment supports Clive’s assertion that combining two verbal channels (where one can’t simply be switched off without the experience losing something) can be detrimental to learning.

I’d be very interested to hear any alternative views, or further arguments in support of or against any of the ideas mentioned above; I have a feeling that this is a question which will continue to be debated in the e-learning world for some time!

Image:  Renjith Krishnan /

7 thoughts on “An e-learning chemistry lesson: how to mix text and audio

  1. Pingback: A word of warning, be careful… | Tayloring it…

  2. Karen Mardahl

    Was captioning considered in this mix? I’m thinking of students who are hard of hearing or deaf. Learning disabilities can also play a role in deciding the mix. Too much input (text and sound) could confuse. Maybe I am going beyond the constraints of the workshop, but I am guessing you do need to analyze audience needs first?

    1. Stephanie Dedhar Post author

      Hi Karen – thanks for your comment. I think accessibility was discussed briefly but have to confess I don’t have many notes on that part of the conversation. What I would say though is that I think again this comes back to the question of choice. If all the content is shown on screen, and they have the option to switch the audio off, learners with hearing impairments should have just as good an experience as anyone else. Visually impaired learners will usually have access to screenreader software, which does require some thought and care on the designer’s part, but also means that this is slightly separate from the audio/text question for learners who don’t have an impairment. Accessibility isn’t my area of expertise though, so maybe someone will come along and provide a more enlightening response to your question!

  3. Karen Mardahl

    Hi Stephanie – You did answer under “question of choice”. 🙂 The planning of the setup should consider this issue as part of the overall strategy, or content strategy, if you will. People just need to remember to include accessibility as one of the many requirements on Day 1 of planning. I am mostly concerned that presenters on this sort of topic remember to teach that – otherwise their students (the attendees at this event) go off without learning the basic principles. One never knows when the person using the e-learning tool will have some sort of disability (vision, motor skills, hearing, cognition). It’s best to be prepared!

  4. Lawal Muhammad

    Interesting post, I agree with both points with regards to providing the learner with choice. I would worry that deciding not to match the text and audio (verbatim) and trying to provide a summarised interpretation (of either text or audio), could potentially disadvantage some of my learners. Surely their preferred learning style (only text/only audio/text and audio) shouldn’t stop them from receiving the complete message?

    1. Stephanie Dedhar Post author

      I completely agree Lawal. I would want all learners to have an equally positive experience, so wouldn’t design e-learning which is dependent on audio. (This also means people who can’t play audio for whatever reason aren’t disadvantaged either.) That’s how I see it for now anyway, but happy to change my views if I come across evidence to the contrary!

  5. Nick Shackleton-Jones

    If this is an area that interests you, it might be worth having a look at the Working Memory Model (Baddeley & Hitch) which explains your findings as well as some other things you will encounter in learning design (there’s a diagram here: It turns out that people process text using a phonological store (effectively it is converted to sounds & processed by the corresponding system) – this is why is it near impossible to follow two conversations at once – we have one attentional channel. But there is a bit of wiggle-room: we have an acoustic loop which works like 30 seconds of looped audio tape allowing us to recall exactly what we just heard: this is why someone can ask you a question when you are reading, and you can look up a few seconds later and recall the question. So SKY sports works fine: you can listen to the presenter and swap to the aston text, keeping what they are saying in the loop until you can return to it. This pretty much explains what you can and can’t do with text and audio: you can have a vo and small chunks of text on screen – you can’t have a vo and a large chunk of text on the screen: people can handle swapping in the former case, but not in the latter.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s