SoundAuth: Audio authentication for mobile
I thought a video would be much more illustrative than some long-winded explanation. So here it is:
Introduction
Traditional radio frequency identification (RFID) card and fingerprint authentication systems are exposed to security risks including card duplication and data capturing. For most RFID cards, once it is lost, anyone who picks up the card can use it. My project, an audio based authentication system using mobile phone producing sound aims to enhance the security as compared to RFID cards and fingerprints in authentication. We developed applications for various phones to produce a sound signal encoding certain personal information, different analogue filtering and amplifying circuits for sound pre-processing and a Fourier transform algorithm for the microprocessors to obtain different frequency components of sound as information. Through these, the phone is able to communicate with the microprocessor and act as a card for tapping. We ran a series of tests indicating that this filtering system allowed the authentication procedure to be extremely noise-resistant.
Translating information into sound
Information stored in an RFID card encoding a person’s identity can be abstracted into a number of arbitrary length. In this project we chose to assign a 64 bit number to each person, which is definitely more than enough. (264≈1020). The mobile phone converts this number into a series of binary numbers between 0000 and 1111, and produces several sinusoidal notes, each with a frequency corresponding to one of the binary numbers.
[table th=“0”]
Frequency/Hz, 600, 800, 1000, 1200, 1400, 1600, 1800, …
Binary, 0000, 0001, 0010, 0011, 0100, 0101, 0110, …
[/table]
For example, if a person has identity number 1701, in binary numbers, 0110 1010 0101, the corresponding frequency series will be 1800Hz, 2600Hz and 1600Hz. In my project each note lasts for 0.05s, and there are a total of 128 bits of information to transmit. Therefore there will be a total of 128/log_216 =32 notes, lasting altogether 1.60s.
Designing software to decode the sound
Many failed attempts were made at decoding the frequency series back into binary data before the author finally settling on Fourier transforms. Other methods failed due to the presence of acoustic noise. Below shows two waveforms, the one above is generated in ideal situation, the other is captured in reality.
[caption id=“” align=“aligncenter” width=“620”]

Upon first glance, the first waveform’s prominent frequency can be deduced by counting the number of times the waveform intersects zero. However such method fail to function for the second waveform. In order to determine whether the second waveform contains mainly 2000Hz note, 2200Hz note or 2400Hz note, Fourier transform has to be used, which uses several sine functions to approximate a given function. The transformation assumes that the given function comprises sinusoidal waves of infinitely many frequencies, as illustrated below:
Where the function is given by
Hence, if the waveform comprises mainly 5/4Hz wave, a2 will be fairly large. If the waveform comprises mainly 1500Hz wave, a((1500∗2-1)/4) will be large. The problem reduces to finding an. By rearranging the above equation, it can be deduced that:
The receiver can then perform numeric integration to evaluate the loudness of each frequency component of the waveform. Some analysis results are presented below:
[caption id=“” align=“aligncenter” width=“926”]

As we can see a recorded sound wave generally contains frequency components over a large range. However, as we are only interested in loudness of waves of frequency 600Hz, 800Hz, … (those produced by the mobile phone), we can set the coefficient of other frequency components to be zero and the end result is very similar to that in the ideal situation.
[caption id=“” align=“aligncenter” width=“921”]

Overview
The figure below shows an amplitude-against-frequency-time graph when Fourier transform is applied to the entire sound signal. The higher the amplitude is, the brighter the pixel. For example, the left-most bright region indicates that before 0.40s, 500Hz sound is very loud.
[caption id=“” align=“aligncenter” width=“836”]
The receiver will then take that sequence of frequencies and transform them back into data about the user who sent the signal. It will then search the identity information in its database, if the user is present, the authentication succeeds.
SoundAuth in action
User enters a password, the phone produces a sound signal:
[caption id=“” align=“aligncenter” width=“970”]
When the receiver found the information about the user in its database, it will flash a green light indicating success:
What if others copy the sound?
In order to counter this problem, we developed an advanced feature of a mutating key scheme: the correct sequence of notes that is accepted by the authentication system changes after each authentication following a complex and irreversible algorithm. This implies that any malicious user who records the sound produced from phones of legitimate users and attempts to pass through the authentication system using the recorded sound will find the sound useless since the accepted sound has already changed after the sound is recorded. In short, the key scheme can be summarized into a flowchart below:
[caption id=“” align=“aligncenter” width=“602”] [/caption]