A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case

doi:10.1016/j.scijus.2015.06.005

Science & Justice

Volume 56, Issue 1, January 2016, Pages 42-57

https://doi.org/10.1016/j.scijus.2015.06.005 Get rights and content

Highlights

•
Evaluation of strength of evidence under conditions reflecting those of a real forensic-voice-comparison case
•
Use of relevant data, quantitative measurements, and statistical models to calculate likelihood ratios
•
Empirical testing of validity and reliability under conditions reflecting those of the case under investigation
•
Exploration of different techniques to compensate for mismatched recording conditions

Abstract

The new paradigm for the evaluation of the strength of forensic evidence includes: The use of the likelihood-ratio framework. The use of relevant data, quantitative measurements, and statistical models. Empirical testing of validity and reliability under conditions reflecting those of the case under investigation. Transparency as to decisions made and procedures employed. The present paper illustrates the use of the new paradigm to evaluate strength of evidence under conditions reflecting those of a real forensic-voice-comparison case. The offender recording was from a landline telephone system, had background office noise, and was saved in a compressed format. The suspect recording included substantial reverberation and ventilation system noise, and was saved in a different compressed format. The present paper includes descriptions of the selection of the relevant hypotheses, sampling of data from the relevant population, simulation of suspect and offender recording conditions, and acoustic measurement and statistical modelling procedures. The present paper also explores the use of different techniques to compensate for the mismatch in recording conditions. It also examines how system performance would have differed had the suspect recording been of better quality.

Introduction

In Daubert v Merrell Dow Pharmaceuticals [1993, 509 US 579] the United States Supreme Court instructed judges to consider several factors in determining the admissibility of forensic evidence, including whether the methodology applied is scientifically valid and whether it has been empirically tested and found to have an acceptable error rate. Saks and Koehler [1] described a paradigm shift in forensic science which they proposed was in part driven by the Daubert ruling and in part by the shift already having occurred for DNA evidence. They “envision[ed] a paradigm shift in the traditional forensic identification sciences in which untested assumptions and semi-informed guesswork are replaced by a sound scientific foundation and justifiable protocols.” (p. 895). They also proposed that “the time is ripe for the traditional forensic sciences to replace antiquated assumptions of uniqueness and perfection with a more defensible empirical and probabilistic foundation.” (p. 895). The 2009 National Research Council (NRC) report to the U.S. Congress [2] was highly critical of contemporary practice across a broad range of forensic science disciplines. Their recommendations included that procedures be adopted which include “quantifiable measures of the reliability and accuracy of forensic analyses” (p. 23), “the reporting of a measurement with an interval that has a high probability of containing the true value” (p. 121), and “the conducting of validation studies of the performance of a forensic procedure” (p. 121). In response to the R v T ruling by the Court of Appeal of England & Wales (R v T [2010] EWCA Crim 2439, [2011] 1 Cr App R 9), a large number of individuals and organisations have affirmed or reaffirmed that the likelihood-ratio framework is the logically correct framework for the evaluation of forensic evidence [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13] (see also [14], [15], [16], [17]). The need for transparency was also a major theme in the R v T ruling itself and in several of the responses.

Drawing on the ongoing changes and calls for change in forensic science, Morrison and colleagues have formulated a description of a new paradigm for the evaluation of forensic evidence which includes the following key elements:

•
use of the likelihood-ratio framework for the evaluation of the strength of forensic evidence
•
use of approaches based on relevant data, quantitative measurements, and statistical models (relevant data is representative of the relevant population)
•
empirical testing of the validity and reliability of the forensic analysis system under conditions reflecting those of the case under investigation.
For the first time here we propose the promotion of a fourth concern to be an explicit member of this list of key elements:
•
transparent reporting of choices made and procedures employed.

An early formulation of Morrison and colleagues' conception of the new paradigm, and a description of the history of the paradigm shift in forensic voice comparison appeared in Morrison [18]. Another early formulation appeared in Morrison [19], and later formulations in Morrison [9], Morrison [20], and Morrison and Stoel [21]. Morrison et al. [22] focussed particularly on the selection of the relevant population for the defence hypothesis, and Morrison [23] on procedures for empirically testing validity and reliability within the likelihood-ratio framework.

The following is a description of general procedures for performing a source-level forensic comparison within the new paradigm (it is based on the description in Morrison and Stoel [21]):

First, the forensic scientist must define and communicate the prosecution and defence hypotheses as they understand them. A forensic likelihood ratio is the answer to a specific question,¹ and to make sense of the likelihood ratio both the forensic scientist and the trier of fact need to understand that question. The question is specified by two hypotheses: the prosecution hypothesis, which pertains to the numerator of the likelihood ratio, and the defence hypothesis, which pertains to the denominator. A typical prosecution hypothesis is that the sample of questioned origin comes from the same source as the sample of known origin. A typical defence hypothesis is that the sample of questioned origin does not come from the same source as the sample of known origin, but from some other source in the relevant population. The relevant population is specific to the particular case under investigation (see, for example, Curran et al. [24], on glass and Kerkhoff et al. [25], on firearms). In most jurisdictions, it is not common for the court to provide the forensic scientist with explicit hypotheses to test prior to the forensic scientist beginning their analysis. In such circumstances, the forensic scientist must therefore use their own judgement and adopt hypotheses which they believe will be of interest to the trier of fact. Analysis cannot proceed unless both a prosecution and defence hypothesis are either provided to or adopted by the forensic scientist. A legitimate question to debate before the trier of fact would be whether the alternative hypothesis adopted by the forensic scientist is appropriate. That is, does it lead to a likelihood ratio which answers the question that the trier of fact wants to have answered.² By making their adopted hypotheses explicit, the forensic scientist facilitates consideration of this important question.

Next, the forensic scientist must obtain a sample from the relevant population. This sample is to be used to train the model which will calculate the denominator of the likelihood ratio. A legitimate issue to debate before the trier of fact would be whether the sample is sufficiently representative of the relevant population (see Hancock et al. [27], Morrison [9]).

The forensic scientist must make measurements which quantity the properties of the sample of known origin (suspect sample), the sample of questioned origin (offender sample), and each item in the sample representative of the relevant population. These measurements constitute relevant data.

Next, the forensic scientist must choose the statistical models that they will use to calculate the likelihood ratio. Part of the expertise of the forensic scientist is to select a model which they expect will give a reasonable approximation of the distribution of the population without overfitting the model to the particular training data. They can conduct tests using development data to help them select a model which gives what they themselves consider to be sufficiently acceptable performance under the conditions of the case under investigation.³ The models should be trained and optimised using data which reflect the conditions of the case under investigation. In a forensic-voice-comparison case this would include recording and transmission channel (e.g., landline or mobile telephone, compression algorithms), background noise, reverberation, speaking style (conversation, formal speech), etc. To avoid condition-dependent bias in the calculation of the denominator versus the numerator of the likelihood ratio, the data used to train the model for the denominator should be in the same condition as the known-origin data which is used to train the model for the numerator. Ideally the statistical models would also incorporate techniques which attempt to compensate for mismatches between the conditions of samples of known and questioned origin. The description of the conditions also forms part of the specific question which is to be answered by the likelihood ratio. For example: What is the probability of getting the properties of the distorted partial latent mark if it were produced by the same finger as made the high-quality suspect fingerprint versus if it were made by a finger of someone else from the relevant population? The forensic scientist should communicate to the trier of fact the conditions of the case as they understand them, and how they form part of the specific question to be answered by the likelihood ratio.

Once relevant training data have been selected and a model has been chosen, trained, and optimised to the conditions of the case under investigation, the system should be frozen, i.e., no other changes are allowed. Then the system should be tested using new pairs of sample items drawn from the relevant population and reflecting the conditions of the actual samples of known and questioned origin from the case under investigation. In this way the forensic scientist obtains an indication of how well the system is expected to perform on previously unseen data from the relevant population under these conditions. Testing using samples from some other population or under different conditions will not be informative as to how well the system is expected to perform on the actual samples of known and questioned origin from the case under investigation. Testing using some other population and/or under some other conditions, could potentially be highly misleading with respect to the performance of the system in the particular case under investigation. An issue for debate would be whether the conditions of the training and test data adequately reflect the conditions of the samples of known and questioned origin.

If the judge at an admissibility hearing or the trier of fact at trial is satisfied that the samples adequately reflect the relevant population and conditions specific to the case, and is satisfied that the model is answering a question which is relevant to the trier of fact, then they should consider whether the empirically demonstrated degree of validity and reliability of the system is sufficient for the output to be of use to the trier of fact. If they are not satisfied on any of these points, then the output of the system will be of little or no value to them. It is therefore essential that the forensic scientist be transparent as to what they have done, and that they present the results of validity and reliability testing.

After the performance of the system has been empirically tested, the system and the test results are frozen, i.e., no other changes are allowed to the system, and the test data cannot be changed and new tests cannot be run. The last thing the forensic scientist does as part of the analysis is to calculate a likelihood ratio for the actual samples of known and questioned origin from the case.

In a review of forensic-speech-science research literature published between 2010 and 2013 Morrison & Enzinger [28] found that, in contrast to earlier years, the majority of studies used data, quantitative measurements, and statistical models to calculate likelihood ratios, and empirically tested the performance of the system. It therefore appears that the majority of research studies in the field now attempt to operate within the new paradigm. Many studies in the review, however, suffered from problems including poorly defined hypotheses, small databases, the use of data not representative of casework conditions, and training and testing on the same data. Few, or none, of the published studies were conducted in a way which attempted to satisfy all elements of the new paradigm under conditions reflecting those of real forensic cases. Also, in a survey of practitioners by Gold and French [29], only 4 of 36 respondents said they reported strength of evidence as a numeric likelihood ratio. Thus, although we may have reached an inflection point in the paradigm shift in the context of research, there is clearly still a long way to go, and even further to go with respect to the implementation of the new paradigm in casework. The aim of the present paper is to demonstrate that forensic-voice-comparison casework can be, and has been, performed in a manner consistent with the new paradigm.

The present paper describes the implementation of all key elements of the new paradigm under conditions reflecting those of a real case, a case on which we actually worked. One previously published study described the implementation of the new paradigm under conditions reflecting those of a different real case [30]. The circumstances of the latter case were not very typical, whereas the circumstances of the case described in the present paper are much more typical: the recording of the speaker of questioned identity (offender recording) is a recording of a telephone conversation recorded by a device attached to a telephone system, and the recording of the speaker of known identity (suspect recording) is a recording of a police interview with a suspect made in a police station interview room. In the research study we have replicated the analyses we conducted for the actual case. Details of the recording conditions and other factors such as the durations of the recordings are taken from the recordings in the actual case, but the acoustic and statistical analyses in the research study are of recordings of speakers in a research database rather than of the recordings of the speakers in the actual case. Nothing we say with respect to the particular values of the strength of the evidence of the recordings analysed in this research report should be interpreted as relating to the specific values of the strength of the evidence of the recordings analysed in the original case. The research report has been streamlined, omitting some details of the actual case which are peripheral to the research issues. The research also expands on the casework analyses by addressing additional research questions which were not appropriate to address within the constraints of performing the actual casework. The primary expansion is the investigation of different techniques for dealing with mismatches in recording conditions between the suspect and offender recording.

Below we describe:

1.
how we chose the relevant hypotheses, and hence the relevant population;
2.
how we sampled from the relevant population;
3.
how we simulated the conditions of the suspect and offender recordings;
4.
how we measured acoustic properties of the recordings;
5.
how we built statistical models to calculate likelihood ratios which addressed the relevant hypotheses on the basis of these measurements;
6.
how we empirically tested the degree of validity and reliability of our system under conditions reflecting those of the case;
7.
and finally how we reported the strength of the evidence for the comparison of the suspect and offender recordings.

The conditions of the present case are that a telephone call was made from a landline telephone to a call centre. A recording was made at the call centre. This is the recording of the voice of questioned identity, the offender recording. It includes background office noise (multi-speaker babble and typing noises). It was saved in a compressed format. Some time later a suspect was interviewed at a police station. A recording was made of this interview. This is the recording of the voice of known identity, the suspect recording. There was substantial room reverberation, the recording included background noise from a ventilation system, and it was saved in a different compressed format. Mismatches in recording conditions can severely degrade the performance of forensic-voice-comparison systems (see, for example, Zhang et al. [31]). A major component of the present paper is an investigation of three different techniques to compensate for differences in the conditions between the suspect and offender recordings. To simplify exposition, and to illustrate the importance of applying compensation techniques under the conditions of the present case, we first present a forensic analysis system which does not include any compensation techniques. We then describe three techniques, add them to the forensic analysis system, choose the technique (or combination of techniques) which gives best performance under the conditions of this case, and retest the suspect and offender recordings using a system which includes this technique.⁴

In general we would not expect to be able to control the recording conditions for offender recordings, but in theory we should be able to obtain reasonably good quality recordings of the suspect. The quality of the suspect recording in this case was quite poor. To give an idea of what system performance could be like if better quality recordings were made of police interviews, the penultimate section of the paper retests the system using higher quality audio recordings for the suspect condition.

Section snippets

Definition of hypotheses

Based on the circumstances of the case, as described above, we adopted the following two competing hypotheses:

•
Prosecution hypothesis: The voice on the offender recording was produced by the suspect.
•
Defence hypothesis: The voice on the offender recording was not produced by the suspect, but by some other speaker from the relevant population.
In our analysis we instantiated the prosecution and defence hypotheses as the numerator and denominator of the likelihood ratio being answers to the

Testing of validity and reliability and evaluation of the likelihood ratio

After finalising development of the forensic-voice-comparison system and before it was applied to the offender and suspect samples, its validity and reliability were tested on data from a separate set of speakers (the test data set). Every speaker's Session 1 offender-condition recording was compared with their own Session 2 suspect-condition recording, and with their Session 3 suspect-condition recording if one was available. These were same-speaker comparisons. Every speaker's Session 1

Recording-condition mismatch compensation

Mismatches in recording conditions in the present case included differences in background noise, room reverberation, and transmission and recording systems. Filtering and additive noise corrupt MFCC features. The following exposition is based on Pelecanos and Sridharan [54]. Assuming that the speech signal x_s[i] and the background noise x_n[i] are uncorrelated and the linear filtering effect H_k is consistent over the frequency range of the filterbank, the log filterbank energies log(E_k) can be

Testing of validity and reliability and evaluation of the likelihood ratio: System incorporating mismatch compensation

Here we repeat the testing of validity and reliability, but now on a system incorporating mismatch compensation in the form of combined feature warping and probabilistic feature mapping. The resulting system had a C_llr-mean of 0.344, a 95% CI of ± 0.95 orders of magnitude, and a C_llr-pooled of 0.423 (a 98% credible interval estimate was ± 1.13 orders of magnitude. A Tippett plot of results is provided in Fig. 7b. The likelihood-ratio value calculated for the suspect and offender recordings was

Effect of the recording condition of the suspect on forensic-voice-comparison performance

In forensic casework, there is typically a mismatch in conditions between suspect and offender recordings. We would not normally expect to be able to control the recording conditions for the offender recording, but in theory we should be able to control the recording conditions for the suspect recording when it is a recording of a police interview. The conditions of the suspect recording in the present case were quite poor. In 2012, the U.S. National Institute of Justice released draft

Conclusion

We have demonstrated the evaluation of forensic evidence under conditions reflecting those of an actual forensic-voice-comparison case. This includes consideration of the relevant prosecution and defence hypotheses to address in this case, selection of data reflecting the adopted defence hypothesis, simulation of recording conditions reflecting those of the suspect and offender recordings in the case, quantitative measurement and statistical modelling to calculate a likelihood ratio given the

Acknowledgements

This research was supported by the Australian Research Council, Australian Federal Police, New South Wales Police, Queensland Police, National Institute of Forensic Science, Australasian Speech Science and Technology Association, and the Guardia Civil through Linkage Project LP100200142. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. Opinions expressed are those of the

References (67)

C.E.H. Berger et al.
Evidence evaluation: a response to the Court of Appeal judgment in R v T
Sci. Justice
(2011)
J.M. Curran
Is forensic science the last bastion of resistance against statistics?
Sci. Justice
(2013)
Association of Forensic Science Providers
Standards for the formulation of evaluative forensic science expert opinion
Sci. Justice
(2009)
G.S. Morrison
Forensic voice comparison and the paradigm shift
Sci. Justice
(2009)
G.S. Morrison
Distinguishing between forensic science and forensic pseudoscience: testing of validity and reliability, and approaches to forensic voice comparison
Sci. Justice
(2014)
G.S. Morrison
Measuring the validity and reliability of forensic likelihood-ratio systems
Sci. Justice
(2011)
S. Hancock et al.
The interpretation of shoeprint comparison class correspondences
Sci. Justice
(2012)
E. Enzinger et al.
Mismatched distances from speakers to telephone in a forensic-voice-comparison case
Speech Comm.
(2015)
C. Zhang et al.
Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison — female voices
Speech Comm.
(2013)
D.A. Reynolds et al.
Speaker verification using adapted Gaussian mixture models
Digital Signal Process.
(2000)

M.-W. Mak et al.

Probabilistic feature-based transformation for speaker verification over telephone networks

Neurocomputing

(2007)

M.J. Saks et al.

The coming paradigm shift in forensic identification science

Science

Expert working group on human factors in latent print analysis, latent print examination and human factors: improving the practice through a systems approach

Technical Report, NIST

(2012)

G.S. Morrison et al.

Response to Draft Australian Standard: DR AS 5388.3 Forensic Analysis — Part 3 — Interpretation

(2012)

S.M. Willis et al.

ENFSI guideline for evaluative reporting in forensic science

Technical Report

(2015)

G.S. Morrison

Forensic voice comparison

G.S. Morrison et al.

Forensic strength of evidence statements should preferably be likelihood ratios calculated using relevant data, quantitative measurements, and statistical models — a response to Lennard (2013) Fingerprint identification: how far have we come?

Aust. J. Forensic Sci.

(2014)

G.S. Morrison et al.

Database selection for forensic voice comparison

J.M. Curran et al.

Forensic Interpretation of Glass Evidence

(2000)

W. Kerkhoff et al.

The likelihood ratio approach in cartridge case and bullet comparison

J. Assoc. Firearm Toolmark Examiners

(2013)

P. Rose

Forensic Speaker Identification

(2002)

G.S. Morrison et al.

Forensic speech science — review: 2010–2013

E. Gold et al.

International practices in forensic speaker comparison

Int. J. Speech Lang. Law

(2011)

Cited by (31)

Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions
2024, Speech Communication
In this work, we tested different variants of a Forensic Automatic Speaker Recognition (FASR) system based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN). To this scope, conditions reflecting those of a real forensic voice comparison case have been taken into consideration according to the forensic_eval_01 evaluation campaign settings. Using this recent neural model as an embedding extraction block, various normalization strategies at the level of embeddings and scores allowed us to observe the variations in system performance in terms of discriminating power, accuracy and precision metrics. Our findings suggest that the ECAPA-TDNN can be successfully used as a base component of a FASR system, managing to surpass the previous state of the art, at least in the context of the considered operating conditions.
Speaker identification in courtroom contexts – Part II: Investigation of bias in individual listeners’ responses
2023, Forensic Science International
In “Speaker identification in courtroom contexts – Part I” individual listeners made speaker-identification judgements on pairs of recordings which reflected the conditions of the questioned-speaker and known-speaker recordings in a real case. The recording conditions were poor, and there was a mismatch between the questioned-speaker condition and the known-speaker condition. No contextual information that could potentially bias listeners’ responses was included in the experiment condition – it was decontextualized with respect to case circumstances and with respect to other evidence that could be presented in the context of a case. Listeners’ responses exhibited a bias in favour of the different-speaker hypothesis. It was hypothesized that the bias was due to the poor and mismatched recording conditions. The present research compares speaker-identification performance between: (1) listeners under the original Part I experiment condition, (2) listeners who were informed ahead of time that the recording conditions would make the recordings sound more different from one another than had they both been high-quality recordings, and (3) listeners who were presented with high-quality versions of the recordings. Under all experiment conditions, there was a substantial bias in favour of the different-speaker hypothesis. The bias in favour of the different-speaker hypothesis therefore appears not to be due to the poor and mismatched recording conditions.
Validations of an alpha version of the E3 Forensic Speech Science System (E3FS3) core software tools
2022, Forensic Science International: Synergy
Citation Excerpt :
To maximize use of available case-relevant data, and to avoid training and testing on the same data, the calibration model is trained using leave-one-speaker-out/leave-two-speakers-out cross-validation, see [4] §2.5.4. The forensic_eval_01 benchmark dataset and validation protocols are described in [37] and [38]. The speakers are male Australian-English speakers.
This paper reports on validations of an alpha version of the E³ Forensic Speech Science System (E³FS³) core software tools. This is an open-code human-supervised-automatic forensic-voice-comparison system based on x-vectors extracted using a type of Deep Neural Network (DNN) known as a Residual Network (ResNet). A benchmark validation was conducted using training and test data (forensic_eval_01) that have previously been used to assess the performance of multiple other forensic-voice-comparison systems. Performance equalled that of the best-performing system with previously published results for the forensic_eval_01 test set. The system was then validated using two different populations (male speakers of Australian English and female speakers of Australian English) under conditions reflecting those of a particular case to which it was to be applied. The conditions included three different sets of codecs applied to the questioned-speaker recordings (two mismatched with the set of codecs applied to the known-speaker recordings), and multiple different durations of questioned-speaker recordings. Validations were conducted and reported in accordance with the “Consensus on validation of forensic voice comparison”.
Consensus on validation of forensic voice comparison
2021, Science and Justice
Since the 1960s, there have been calls for forensic voice comparison to be empirically validated under casework conditions. Since around 2000, there have been an increasing number of researchers and practitioners who conduct forensic-voice-comparison research and casework within the likelihood-ratio framework. In recent years, this community of researchers and practitioners has made substantial progress toward validation under casework conditions becoming a standard part of practice: Procedures for conducting validation have been developed, along with graphics and metrics for representing the results, and an increasing number of papers are being published that include empirical validation of forensic-voice-comparison systems under conditions reflecting casework conditions. An outstanding question, however, is: In the context of a case, given the results of an empirical validation of a forensic-voice-comparison system, how can one decide whether the system is good enough for its output to be used in court? This paper provides a statement of consensus developed in response to this question. Contributors included individuals who had knowledge and experience of validating forensic-voice-comparison systems in research and/or casework contexts, and individuals who had actually presented validation results to courts. They also included individuals who could bring a legal perspective on these matters, and individuals with knowledge and experience of validation in forensic science more broadly. We provide recommendations on what practitioners should do when conducting evaluations and validations, and what they should present to the court. Although our focus is explicitly on forensic voice comparison, we hope that this contribution will be of interest to an audience concerned with validation in forensic science more broadly. Although not written specifically for a legal audience, we hope that this contribution will still be of interest to lawyers.
Bayesian multivariate models for case assessment in dynamic signature cases
2021, Forensic Science International
Dynamic signatures are recordings of signatures made on digitizing devices such as tablet PCs. These handwritten signatures contain both dynamic and spatial information on every data point collected during the signature movement and can therefore be described in the form of multivariate data. The management of dynamic signatures represents a challenge for the forensic science community through its novelty and the volume of data available. Much as for static signatures, the authenticity of dynamic signatures may be doubted, which leads to a forensic examination of the unknown source signature.
The Bayes’ factor, as measure of evidential support, can be assigned with statistical models to discriminate between competing propositions. In this respect, the limitations of existing probabilistic solutions to deal with dynamic signature evidence is pointed out and explained in detail. In particular, the necessity to remove the independence assumption between questioned and reference material is emphasized.
Dynamic signatures: A review of dynamic feature variation and forensic methodology
2018, Forensic Science International
Citation Excerpt :
This implies that there are still important steps to be taken to attain the demands set by documents such as the NAS [137] and PCAST reports [136]. Other forensic fields have already started adapting to the requirements set to forensic science by the legal system [215–217]. Forensic handwriting examination should also adapt, especially when dynamic signatures and quantitative data are involved.
This article focuses on dynamic signatures and their features. It provides a detailed and critical review of dynamic feature variations and circumstantial parameters affecting dynamic signatures. The state of the art summarizes available knowledge, meant to assist the forensic practitioner in cases presenting extraordinary writing conditions. The studied parameters include hardware-related issues, aging and the influence of time, as well as physical and mental states of the writer. Some parameters, such as drug and alcohol abuse or medication, have very strong effects on handwriting and signature dynamics. Other conditions such as the writer’s posture and fatigue have been found to affect feature variation less severely.
The need for further research about the influence of these parameters, as well as handwriting dynamics in general is highlighted. These factors are relevant to the examiner in the assessment of the probative value of the reported features. Additionally, methodology for forensic examination of dynamic signatures is discussed. Available methodology and procedures are reviewed, while pointing out major technical and methodological advances in the field of forensic handwriting examination. The need for sharing the best practice manuals, standard operating procedures and methodologies to favor further progress is accentuated.

View all citing articles on Scopus

View full text

A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case

Highlights

Abstract

Introduction

Section snippets

Definition of hypotheses

Testing of validity and reliability and evaluation of the likelihood ratio

Recording-condition mismatch compensation

Testing of validity and reliability and evaluation of the likelihood ratio: System incorporating mismatch compensation

Effect of the recording condition of the suspect on forensic-voice-comparison performance

Conclusion

Acknowledgements

Sci. Justice

Sci. Justice

Sci. Justice

Sci. Justice

Sci. Justice

Sci. Justice

Sci. Justice

Speech Comm.

Speech Comm.

Digital Signal Process.

Neurocomputing

The coming paradigm shift in forensic identification science

Science

Strengthening Forensic Science in the United States: A Path Forward

Expressing evaluative opinions: a position statement

Sci. Justice

Improve statistics in court

Nature

Forensic science evidence in question

Crim. Law Rev.

Extending the confusion about Bayes

Mod. Law Rev.

How to assign a likelihood ratio in a footwear mark case: an analysis and discussion in the light of R v T

Law Prob. Risk

The likelihood-ratio framework and forensic evidence in court: a response to R v T

Int. J. Evid. Proof

The likelihood ratio as value of evidence: more than a question of numbers

Law Prob. Risk

How clear is transparent? Reporting expert reasoning in legal cases

Law Prob. Risk

Bad cases make bad law: reactions to R v T

Law Prob. Risk

Expert working group on human factors in latent print analysis, latent print examination and human factors: improving the practice through a systems approach

Technical Report, NIST

Response to Draft Australian Standard: DR AS 5388.3 Forensic Analysis — Part 3 — Interpretation

ENFSI guideline for evaluative reporting in forensic science

Technical Report

Forensic voice comparison

Forensic strength of evidence statements should preferably be likelihood ratios calculated using relevant data, quantitative measurements, and statistical models — a response to Lennard (2013) Fingerprint identification: how far have we come?

Aust. J. Forensic Sci.

Database selection for forensic voice comparison

Forensic Interpretation of Glass Evidence

The likelihood ratio approach in cartridge case and bullet comparison

J. Assoc. Firearm Toolmark Examiners

Forensic Speaker Identification

Forensic speech science — review: 2010–2013

International practices in forensic speaker comparison

Int. J. Speech Lang. Law