Paralinguistic indicators of insincerity in speech (on the example of Russian language)

Лыкова О.В.

doi:10.25136/1339-3057.2018.4.26875

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Back to contents

SENTENTIA. European Journal of Humanities and Social Sciences

Reference:

Lykova, O. (2018). Paralinguistic indicators of insincerity in speech (on the example of Russian language). SENTENTIA. European Journal of Humanities and Social Sciences, 4, 38–48. https://doi.org/10.25136/1339-3057.2018.4.26875

Paralinguistic indicators of insincerity in speech (on the example of Russian language)

Lykova Ol'ga

senior lecturer of the Department of Foreign Languages at National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)

115409, Russia, g. Moscow, ul. Kashirskoe Shosse, 31

ovlykova@mephi.ru

Other publications by this author

DOI:

10.25136/1339-3057.2018.4.26875

Received:

15-07-2018

Published:

21-12-2018

Abstract: The article provides an overview of prosodic and acoustic indicators of deception. The research was conducted based on frequency spectrum analysis of 108 speech fragments, which had been obtained by segmenting the 12-hour long audio recording. Although modeled deception situations are proven to give accurate results in deception detection, the experiment is not a modeled deception situation, which allows greater accuracy by considering involuntary changes in the subject's voice caused by their genuine emotions, such as fear of being exposed. Frequency spectrum analysis of speech fragments showed an increase in pitch frequency of the subject’s speech during deception, compared to their speech in the absence of psychological stress. Data on such markers of deception as response latency changes, rising tone, laughter and filled pauses were also obtained. The cross-cultural study of prosodic and acoustic indicators of deception based on the Russian language is currently of interest, as it allows comparing the obtained results both, with those of foreign researchers, and those gained in modeled deception detection experiments.

Keywords:

lies, deception, deception detection, lie detection, insincerity, pitch frequency, frequency spectrum analysis, paralinguistics, speech analysis, voice analysis

Introduction

There have been made a lot of attempts to obtain reliable indicators of lies. Linguistic and paralinguistic indicators of deception are a field of scientific interest of many researchers from a number of disciplines, including psychology, criminology and applied linguistics. Paralinguistic indicators are features that accompany speech such as facial expressions, gestures, body movements, voice, articulation, etc. ^{[18, p. 91]}.

One of the challenges one may face working on deception detection are differences in deceptive behavior across different countries and cultures. Indicators of deception in English cannot be applied across other languages due to their difference. Culture-specific deception indicators were described by a number of researchers who compared Korean ^[8], Chinese ^[2],[11] and Italian ^[19] native speakers' speech with the English one. However, the whole corpus of cross-cultural deceptive speech differences is still lacking significant data on many languages. This study represents an overview of linguistic, prosodic and acoustic indicators of lies on the basis of the Russian language.

It seems important to note that there is not a single indicator of lies allowing us to conclude with certainty that the statement is insincere. Therefore all linguistic and paralinguistic features of speech should be considered in the aggregate in order to obtain an adequate assessment of a speech fragment under analysis.

When analyzing a person's speech, it is also necessary to consider their behavior and manner of speech when they are calm and unworried. Some questionable speech parameters (e.g. a large number of pauses, evasive answers, pitch change in voice) can be marked by an expert as indicators of lies while they can be typical for this person's usual speaking manner. It is necessary to know the pitch frequency parameters for a particular speaker in an emotionally neutral psychological state ^{[18, P. 24]}.

While analyzing indicators of lies in speech one should also consider the main verification errors. There is no way to avoid such errors completely, but all possible precautions should be taken to reduce their probability. An expert should be objective, calm and have a neutral attitude toward the subject. "The same emotional-modal state of a communicant or the emotional background of communication can be perceived differently depending on the situation and the current emotional-modal state of a recipient" ^[14]. There is evidence that expert's decision accuracy upon the honesty of a potential liar still depends on expert's own character. Experts’ accuracy in judging deception can be predicted from their sociological test results ^[9]. The confidence of an expert in the guilt of a subject may affect the test result. Usually the subject is not completely unfamiliar to the expert, the latter knows important details of the subject's biography (including information from their criminal case). During a polygraph testing, for instance, a polygraph examiner gets a certain subjective impression about the subject (negative or positive) during a preliminary interview, in which control questions are asked. If they believe that the suspect is innocent, the probability that testing results will prove the subject to be innocent increases. On the other hand, if a polygraph examiner in advance considers a suspect guilty, this can lead to the former being too strict and consequently the latter being too nervous. In this case, the test result may become "guilty".

Another frequent error in deception detection occurs when the verifier does not believe a subject as the latter is stressed and nervous. The expert may suppose these features to be indicators of lies, but they still can be the subject's emotions caused by the test situation itself or by any other problem not relating to the discussion topic ^{[6, p. 144, 148]}.

There have been made hundreds of attempts to detect deception through voice and speech analysis over centures. In ancient India, for instance, the suspect had to strike the gong simultaneously with their answer during an interrogation process. It was believed that if a person was stressed or nervous while answering the question, it affected their tempo of striking the gong, thus revealing their inner felling of anxiety which could have been caused by the fear of their lies being exposed. Therefore, one cannot deny that the physical state of a person is closely related to their psychological state.

The process of lying is characterized by confusion and internal difficulties experienced by a liar. There are two parallel events or two variants of the same event in the mind of a liar. A liar wants to hide the one that is real and bright, while intending to tell about the other one which is fictional and pale. They must suppress the colourful truthful images in their mind and replace them with fictional ones which can also be quite ill-conceived. They should constantly maneuver between a truth that can not be pronounced, and a lie that must replace the one concealed. Furthermore, a lying person should not become confused in their words forgetting things they had said earlier, and they should remember and repeat all the details of their stories. Moreover, a liar always runs the risk of being exposed by giving out some piece of truthful information that immediately contradicts to what was said before. Therefore, it seems quite clear that considering emotions, especially the fear of being exposed, are of great importance in deception detection.

Speech parameters of a speaker can be divided into controlled (external) and uncontrolled (internal) ones. The degree of control depends on the ability of a speaker to get and use auditory feedback from their voice while speaking and articulating. There are a few factors that cannot be controlled due to the autonomic nervous system of a speaker ^[17].

Speech seems both the easiest and the hardest activity to control while lying. On the one hand, almost any person can formulate their wording in advance, write it down or even learn it. Futhermore, most people learn to speak from an early age and they get used to controlling their words rather than their behavior while speaking, because they may believe that speech is more important in the exchange of information than behavior. This makes one pay more attention to what is being said, rather than how a person behaves while speaking. It is a difficult task to prepare and learn all facial expressions, gestures and intonation carefully in advance and it is usually rather well-done only by professional actors. Besides, there are reflex connections in people's bodies between emotions and non-verbal behavior which are extremely difficult or even impossible to control without specialized training, whereas there is no similar connections between emotions and words being uttered ^[6].

Deception is an extremely difficult research area due to the lack of real study material such as full trial or interrogation transcripts as such data is usually confidential and inaccessible. Modelled deception situations are proven to give accurate results in deception detection, but one cannot deny that they still do not reflect the emotional state of a speaker, the latter being an important study parameter in deception detection. Not only can the need to deceive make a lying subject nervous and uneasy in their words and thus less concentrated on their speech, gestures and other parameters under control, which can be considered in terms of deception indicators, but the liar can also experience guilt or arrogance for fooling others and that influences their manner of speech as well. The impact of these feelings on speech is of great importance in deception detection, but one can hardly analyse emotions if a role play game, a modelled deception situation or a film are under analysis. Although actors pretend to experience deep feelings while lying, which affect their speech and gestures, they show this deception leakage deliberately. Thus, a researcher may not notice their fear of being exposed, micro expressions on their faces and some indicators of their nervous system, such as sweating, for instance. The voice is connected to the areas of the human brain responsible for emotions. While lying the emerging emotions cause hard-to-hide changes in the voice, however these would be impossible to detect in a modelled deception experiment.

The reason of accurate results gained in modelled deception situations may be gained due to the actors following "The Stanislavsky Method" ^[6], the acting technique which influences actors' mind in the way that they truly believe in the modelled situation and thus experience real emotions. Consequently, while analysing modelled deception situations, one should make sure that the actors under analysis have got proper education and qualifications and that they use this method, although it is extremely difficult to monitor. That is why real deception situations are of great interest, but due to the confidentiality of such data, its number is still very small.

Human speech analysis allows to reveal many indicators of speakers' insincerity. Linguistic features accompanying deception can be a slip of the tongue, evasive answers, accusations, counter-questions, tirades, etc. It is quite difficult for an untrained person to lie fast as they need to gain time to reflect on their further course of action. That is why their speech is full of the linguistic features mentioned above.

Study of verbal indicators of lies in speech seems quite promising. Many works are being carried out allowing us to increase the database size of linguistic indicators of deception and justifying the possibility of deception detection by the instrumentality of linguistic techniques ^{[1, 13]}.

Deception is a complicated psychological activity which requires words, voice, face and body control. Moreover, a lying person always has two parallel events in his mind, one of which is real and bright but suppressed by the other one which may seem pale and ill-conceived. Such suppression causes subject's confusion which may affect their speech manner and rate.

Speech rate represents a significant parameter in deception detection. There are a number of studies describing speech rate variation during deception and truth-telling. In several studies, an increase in speech rate was observed during deception ^[12],[10] while other works describe speech rate decrease during lying versus truthful utterances ^[23],[24]. Such different outcomes may be gained due to methodological differences.

Pauses are the most common feature accompanying deceit. Frequent and long pauses and pauses before answering the question (response latency) always seem suspicious. Pauses generally occur for two reasons: if the lying communicant has not thought over his further behavior and is gaining time, or because of fear of exposure. A person can hear how insincere their speech sounds and become afraid of being caught, that is why the number of pauses in their speech increases. Some studies show that pauses are one of the most frequent indicators of deception ^[15].

Response latency is the time span between a stimulus (a question or statement by an expert) and a response or reaction by a subject. Some experiments show a decrease in response latency in deceptive speech compared with truthful one. A Kolmogorov–Smirnov test of the entire data-set ^[19] showed that response latency (in ms) was longer in the deceptive speech condition (Mdn = 1200.77) than in the truthful speech condition (Mdn = 775.26).

Speech tone and intonation can also serve as indicative parameters in deception detection. Increased tone, interrogative or ascending intonation and high-pitched voice are often markers of insincerity. About 70% of experiments suggest that upset or worried people have a higher-pitched voice. ^{[6, с. 79]}. A lying person may feel uncertainty or guilt, which makes them express it unintentionally and implicitly in speech. Psychological stress can affect the vocal cords tension, thus increasing the tone and fundamental frequency (pitch) fluctuations. Frequency corresponds to the number of vocal-fold vibrations per second, while fundamental frequency (F0) is the lowest frequency at which an individual’s vocal folds vibrate ^{[20, p. 66]}.

Frequency spectrum analysis of speech fragments can be used to measure such significant parameters for deception detection as pitch and its standard deviation, F0 dispersion, F0 variability (coefficient of variation), F0 maximum and minimum values, F0 range, F0 curve trend and frequencies F1, F2, F3, ..., Fn, etc. ^{[17, p. 337]}. According to the study by R. Potapova ^{[16, p. 146]} most information about the speaker's emotional state can be obtained from the spectral region of 150-1200 Hz. F1 analysis showed its tendency to grow with an increase in the subject's psychological stress. Frequencies F1, F2 ... Fn can be used to track the tone change (ascending, descending, descending-ascending tone, etc.) and intonation of the speaker, thus drawing a conclusion about their psychological state.

Pitch change during deception is described by numerous researchers. The first data on pitch fluctuations during deception were obtained in 1976 by P. Ekman, W.V. Friesen and K.R. Scherer ^[7]. Their work suggested an increase in pitch frequency of the subject's speech during lying compared to their speech in the absence of psychological stress. Their later studies demonstrated pitch difference: 227.99 Hz during deception and 220.85 Hz while truth-telling ^[5]. The difference was greater when subjects were highly motivated to lie. The pitch analysis offered by R. Potapova ^{[17, p. 337]} established that pitch frequency increases from 100-132 Hz to 156-185 Hz during psychological stress. Consequently, the F0 dispersion rose from 9-17 Hz to 23-34 Hz and F0 variability from 8-16 to 15-20 rel. units. Studies conducted by L. Streeter and B. DePaulo also indicate significant pitch and vocal folds tension increase during deception versus truth-telling ^{[22; 3]}.

Vocal pitch is one of the most studied characteristics of speech. Besides, it is a relatively reliable parameter because it is difficult to manipulate ^[26]. Results of some studies indicate that pitch and response latency are the most reliable indicators of deception, while speech rate, message duration, number of words, filled and unfilled pauses, repetitions and speech errors showed no significant change during lying compared to truth-telling ^[21]. Cross-cultural study of pitch frequency and response latency is of great importance as speech parameters can vary with different languages. Most researchers tend to describe pitch frequency increase and changes in responce latency in a subject's speech while lying, however the current corpus still lacks data on many languages, therefore detection of prosodic and acoustic indicators of deception on the basis of the Russian language and its comparisson with the findings of American researchers is of interest. Given that the experiment is not a modelled deception one, it would be also interesting to compare its results with those obtained in the modelled deception detection experiment by R.Potapova on the basis of the Russian language.

Materials and Methods

The current research was conducted on the basis of frequency spectrum analysis of 108 speech fragments which had been obtained by segmenting the 12-hour long audio recording. The latter represents a series of interviews of an expert with subjects in a third participant presence.

All the participants are female and were born and live in Russia, their age being within 25-30. The subjects have no special deception detection training. The requirements for the subjects had been the following:

1. They should be native Russian speakers because otherwise a person starts translating a question into their native language, then generates an answer in their mind, translates it into Russian and only then is the time of their answer. The long duration of this process and its complexity might affect the result.

2. The subject must be mentally healthy and sober as subjects with schizophrenia or those suffering from hallucinations being intoxicated sometimes cannot distinguish the real world from the fictitious one thus affecting the experiment.

3. The subject should not be a pathological liar or a military officer who has gained specialized anti-polygraph training.

The experiment was conducted in a well-controlled environment which enabled accurate measurements. The room walls had appropriate sound insulation properties contributing both to good acoustic recording of the subject's speech and to their high level of concentration as any distracting factor can either cause an irrelevant physiological reaction of the subject or be used by them as a trick to distract the expert's attention from their lies, thus negatively influencing the result. The experiment was conducted under conditions that were comfortable in terms of illumination, temperature, humidity, with a minimum of disturbing influences - noise, vibration (from the train, tram), insects, etc.

In each of 6 experiments there were three people taking part in it- an expert, a subject and a third person, the latter being the necessary trigger for the subject's inclination to lie. The expert knew some private information about the subject's life prior to the interview and the latter did not want this information to be revealed to anyone else. The range of acceptable questions was discussed on the first stage of the interview in private between the expert and the subject and it included some questions which will make the subject lie due to their disinclination to share this information with the third person. This preliminary discussion of questions with the subject also allowed to make sure that the subject understands the questions, so that during the testing or after it, there is no discussion of the content of the questions or its ethics.

At the second stage of the interview all three participants sat within a meter from each other, the third person being in front of the subject to make the subject feel uncomfortable and insecure while lying. The subject consented to the experiment, although they were warned about the probability of their secrets being revealed by the third person due to the subject's lack of experience in lying. The subject was also notified about their right to terminate the test at any time. As the study was not intended to analyse facial expressions, gestures or postures, the interviewer sat just behind the subject in order to deprive them of the opportunity to shift their gaze from one participant to another thus gaining time to prepare for lying and/or calm down. In addition, it derived the third person of the opportunity to confirm their thoughts about the truthfulness of the subject's statements by observing facial expressions of the subject and the expert who both knew when the false statement was uttered.

The microphone lay on the table in front of the subject so that they would not be distracted by touching or rolling it in their hands. The subject was instructed not to move their hands, to try to look in the face of the third participant and to give a yes no response. The third person was instructed to stay silent and to look in the face of the subject. Such strict experimental conditions made the subject concentrate utterly on their voice while lying.

During the second stage of the interview the subject was asked a few basic neutral questions about their name, age, sex, nationality and job and some more personal questions about their favourite film, book and song in order to obtain data on subject's speech in calm psychological state, thus reducing experimental errors. The subject was instructed to give a yes no response. Thus, a sample of the subject's binary responses in a calm psychological state was collected.

The final stage of the interview included a lot of questions covering different spheres of the subject's private life, some of them being "trigger" questions - the ones that would make the subject lie due to their inclination to share private information with the third person. The trigger questions were asked in a random fashion between other questions as all the questions were shuffled automatically and read out by the expert. All the questions were formulated in such a way that there was little difference in their length, thus the subject's response latency depended mainly on their emotions, rather than on the time needed for understanding the question asked.

The final sample represented 108 speech fragments of the subject's "no" response, 60 of which being truth and 48 lies. The subjects' "yes" responses were not considered due to their small number while lying. Speech segments containing irrelevant information provided by the subjects (answers followed by or following after joking, making remarks, justifying, explaining) as well as interviewer's explanation were not considered as well.

The study equipment included a speech analysis software "Masterskaya zvukov" ("Signal Workshop") ^[25] created by a Russian scientist and university professor V.Zhenilo, who has been studying phonology, acoustics and forensic linguistics and developing the product for many years. This program allows to obtain accurate measurements on pitch frequency as each of the measurements is done manually on the basis of a given spectrogram. The corresponding spectrograms for one of the interviews are presented in Fig. 1 ("no" is true) and Fig. 2 ("no" is false).

Figure 1:Spectrograms of the subject's "no" response that is truthful

Figure 2:Spectrograms of the subject's "no" response that is deceptive

Results and Discussion

Spectrogram analysis results showed that response "no" that is truthful can be characterized as having neutral or downward tone in all speech fragments under analysis. It can be explained by the fact that the speakers are calm and confident, while subjects pronouncing lies are inclined to speak with an ascending or descending-ascending intonation, being not sure of their words. In addition the subjects make use of various ways to delay the response in order to prepare for an answer to an unexpected question that has put them at a dead end. For example, the second spectrogram in Fig. 2 (S: mmm ... net = false 2) depicts a long pause filled with mumbling, which can be considered an attempt to gain time, while the fifth spectrogram (S: net = false 5) shows laughter accompanying the response, which is also a way of avoiding a direct answer to the question and trying to hide real emotions.

There is also some difference in pauses made by the subjects before their responses (i.e. in response latency). All speakers tend to make a short pause before their response as they need time to think over the question before answering. Although there is little difference in the questions length, there is one in the subjects' response latency before lying and truth-telling. The pauses made by the subjects before telling the truth are almost identical in time as they think about the question and give their answer immediately as no additional actions are needed. The response latency before lying is more diverse, although the speakers try to maintain their speech pace throughout the interview and control their pausing. Such behaviour can be explained by the subjects' need to prepare themselves and especially their voice for lying. The subjects' tendency to give their answer immediately after hearing the question may be caused by their anticipation of the question and readiness to answer it as their lies is well-prepared and studied.

The average pitch frequency for each of the fragments containing truthful and deceptive statements is presented in Tables 1 and 2 respectively.

	1	2	3	4	5	6	7	8	9	10
APF(Sp.1)	231,8	189,3	180,5	186,3	187,6	185,5	190,5	185,6	168	165,1
APF(Sp.2)	247,6	223	223,1	243,2	246,1	248,7	239,4	237,8	239	221,5
APF(Sp.3)	232,1	264,9	259,2	254,4	255,1	254,1	233,3	245,7	256,1	233,2
APF(Sp.4)	205,1	220,6	229	225,7	222,8	213	226,1	228	224,3	216
APF(Sp.5)	223,4	212,5	193,1	217	234,8	208,8	206,7	200,1	214	205,4
APF(Sp.6)	191,5	208,1	206,4	207,6	184,3	188,5	204,8	195,2	195,8	202,1

Table 1:The average pitch frequency (APF) for fragments containing truthful statements

	1	2	3	4	5	6	7	8
APF(Sp.1)	206,2	198,3	211,9	272,1	237	254,4	208,8	212,6
APF(Sp.2)	251,8	238,3	245,7	266,9	246,4	243,7	237,3	241,8
APF(Sp.3)	248,9	264,6	275,1	253,4	265,2	255,3	246,7	280,1
APF(Sp.4)	202,2	221,3	213,6	222,1	230,8	240,1	239	220,9
APF(Sp.5)	278,3	254,2	231,7	227,8	226,7	231,7	196,2	265,3
APF(Sp.6)	211,7	192,2	219,4	197	204,7	246,4	191,2	198,4

Table 2:The average pitch frequency (APF) for fragments containing deceptive statements

The net average pitch frequency for each speaker while lying versus truth-telling is presented in Figure 1.

Figure 1:The average pitch frequency (APF) for each speaker

The net average pitch frequency comparison is presented in Chart 1.

Chart 1:Net average pitch frequency (Hz) comparison

The net average pitch frequency for fragments containing deceptive statements is 233,86 Hz as the speakers are excited, stressed and nervous due to their fear of being exposed. The net average pitch frequency for fragments containing truthful statements has a smaller value - 217,31 Hz since the speakers are confident and calm. Psychological stress can affect the vocal cords tension, thus increasing the pitch frequency which is proven to be true for many languages.

The obtained results show that pitch frequency increase is typical not only of American English, but also of Russian. Although the experiment is not a modelled deception detection one, its results coincide with those obtained in the modelled deception experiment by P. Ekman ^[7].

The results of this research can contribute to future studies of cross-cultural deceptive speech differences and deception detection techniques development.

Conclusion

Deception detection with the use of linguistic methods only is impossible, since many people are able to control what they say. In this regard, the digital analysis of acoustic parameters in deception detection is very promising. When analyzing a speaker's voice and speech one should not neglect studying their speech manner in everyday life. When analyzing prosodic and acoustic indicators of lies in voice and speech, one should also consider the errors of verification. It is impossible to avoid such mistakes completely, but all possible precautions must be taken to reduce their possible number. The expert should be objective, calm and have a neutral attitude towards the subject. Furthermore, all possible linguistic and paralinguistic indicators of insincerity should be considered in the aggregate in order to reduce errors in the results interpretation.

The results of this study allowed to obtain data on prosodic and acoustic indicators of lies in voice and speech of Russian speakers by conducting a non-modelled deception detection experiment. The obtained data provide new challenges for future research due to deception detection being rather new and understudied research field for the Russian science due to the policy of Soviet Union leaders ^[4].

References

1. BACHENKO, J. et al. 2008. Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1. Association for Computational Linguistics, Manchester: pp. 41–48.

Journals

Books

Paralinguistic indicators of insincerity in speech (on the example of Russian language)