Should the transcription of voice recordings to text

oblige regulated trading firms to retain those recordings longer?

Should the transcription of voice recordings to text oblige regulated trading firms to retain those recordings longer?

Over the years, while speaking to compliance officers and other market professionals, I have often been asked the question above. In writing this post I will share
my thoughts and experiences and try to answer this question which, if anything, is more relevant than ever. 

We can start by explaining that in some jurisdictions there is a difference in the required retention period between textual electronic communications, such as email and Bloomberg messages, and that of voice recordings.

MiFID II and Voice Recordings

In Europe this difference in the retention period is removed by the rules proposed by the European Securities and Markets Authority (ESMA) under the MiFID II regulation. Under MiFID, all records, text and voice, will be required to be retained for a period of five years. By contrast, in the United States for example, the Commodities Futures Trading Commission’s (CFTC) rules require voice calls to be retained for one year while text communications are retained for five years.1

MiFID II also requires firms to conduct surveillance of their employees’ communications to ensure the firm is compliant with market rules. Effective surveillance on voice calls will require the deployment of clever technologies that can extract the ‘substance’ of these highly specialized conversations and deliver risk- and scenario-based insights to a knowledgeable reviewer who can then determine if any wrongdoing has occurred.

Speech-to-text is a technology that transcribes what is said on a voice recording into a textual representation of the call. The question being asked is:

“Does this text, produced from my call recording, make my call an electronic communication?”

My experience and discussions with global regulators leads me to a very simple answer to this question:

“No, the derived transcribed text record is not a communication per se and therefore cannot be included in the communications retention rules”.

One should be careful to note that:

  • Converting speech to text does not allow a firm to discard the original recording – as that would be a clear breach of the requirements. The original record must always be preserved.
  • Once analyzed, and a text version is created, there is no requirement to retain the transcription.
  • The associated metadata of a call, such as the time, date and numbers dialed, along with the servers and systems that a call is tagged with, are not derivative data. All call metadata must be retained for as long as the call audio itself is retained.
  • If the transcribed data is re-sent by electronic means (email etc.) then that new communication shall be retained for the appropriate amount of time as a communication in its own right.
  • If a call transcript is available, and has been used as part of the surveillance process to flag a call for further review, then in the event of a regulator request for call logs, then it should be noted in the same way that you would if delivering an email that had already been reviewed in your submission to the regulator.

There are more arguments that suggest these transcripts should not change the retention period.

New Technologies for Compliant Call Capture, Transcription

Transcription technologies have improved substantially in accuracy and language coverage in recent years. However, no technology will provide 100% accuracy.
As a result, while incredibly useful in surveillance and productivity, the transcription cannot be said to be a facsimile copy of the original recording. It may contain errors and it is much harder to convey inflection, pauses and – critically – intent in transcribed speech than when listening to a call recording.

Most financial markets records retention regulations require electronic communications and voice recordings to be preserved, as far as possible, in their original form. This has always been problematic for voice recordings, as any voice engineer will attest. The quality of audio recordings is always a balance between the space required for storage and the audio’s fidelity and usability. Fortunately, newer communication devices are offering pristine audio for compliant call capture, retrieval and analytics to meet regulator requests.

As we move forward we are seeing many clients investigating transcription and speech analytics capabilities that will be used to increase productivity and detect bad behavior. But, in my view the overwhelming weight of evidence, along with some sound logical arguments on how the technology works and what it provides, suggest that transcription should not increase the retention period for voice recordings.


1. Regulators may require the retention period to be extended 7 years in certain circumstances