Science & Justice
Volume 51, Issue 1 , Pages 10-15, March 2011

The evidentiary values of “cold hits” in a DNA database search on two-person mixture

Received 5 May 2010; received in revised form 14 July 2010; accepted 18 July 2010. published online 18 August 2010.

Article Outline

Abstract 

We provide a unified overview on the evaluation of the evidentiary values of cold-hit DNA matches between profiles in a DNA database and a mixed crime sample. Also discussed are methods of handling missing data in evaluating the DNA evidence. Through the analysis of a constructed murder case using Swedish data, we illustrate the applicability of the methods on various situations including the presence of multiple matches and consideration of allele drop-out. We also demonstrate the calculation of the probability of erroneous attribution as a measure of the effectiveness of the database search on DNA mixtures.

Keywords: Database search, DNA, Likelihood ratio, Mixture

 

Back to Article Outline

1. Introduction 

Deoxyribonucleic acid (DNA) profiling has become a powerful technique for human identification since its introduction by Jeffreys et al. [1]. In crime investigation, an arrested suspect is often linked to a biological trace found at the crime scene through DNA profiling. If a perfect match was found between the DNA profiles of the suspect and the crime sample, the evidentiary value of the match can be assessed by the likelihood ratio (LR) of the prosecution and defense hypotheses about who left the crime trace. For a probable cause case in which the suspect is identified based on non-DNA evidence, the LR is simply the reciprocal of the probability of a random match between the DNA profiles of the suspect and the crime sample. In cases where only the crime sample but no suspect is available, the police force may conduct a blind search on a database containing DNA profiles from previously convicted offenders or unsolved criminal cases, etc. Matches between the DNA profiles in the database and the crime sample is often referred to as “cold hits”. The police force may open an investigation on the suspects identified based on cold hits. Many countries have built up their offender DNA databases on the basis of 10 to 15 short tandem repeat (STR) loci. For example, there are about 5million profiles in the U.K. national DNA database (http://www.npia.police.uk/en/13338.htm) as of December 2009 and about 8million profiles in the U.S. Federal DNA database CODIS as of April 2010 (http://www.fbi.gov/hq/lab/html/codis1.htm). Along with the amassing of large offender DNA databases, DNA database search has played an important role in suspect identification for solving crimes in recent years. To evaluate the evidentiary value of cold hits from database search, two different approaches have been reported by Balding and Donnelly [2] and Stockmarr [3], using different formulation of hypotheses. See Meester and Sjerps [4] and Chung et al. [5], among many others, for comprehensive discussions on the DNA database search controversy arising from their seemingly contradictory results. It has been concluded that the two approaches are in fact equivalent in the sense that they give the same posterior odds despite different LRs, when only a single match is found. Chung et al. [5] also commented that if more than one suspect is identified after the database search, the two approaches will be no longer equivalent and the formulation of the hypotheses used by Balding and Donnelly [2] is recommended because they can provide more specific description about who left the crime trace.

In practical crime cases such as murder cases or group rape cases, the biological trace found at the crime scene often contains DNA from more than one contributor. For many years, the problem of how to evaluate the evidentiary value of a DNA mixture in probable cause case has attracted a great deal of attention from statisticians and forensic scientists and has been studied extensively. Weir et al. [6] and Fukshansky and Bär [7] have developed general formulae for calculating the LR in the mixture case under the Hardy–Weinberg equilibrium. Curran et al. [8] and Fung and Hu [9] have extended the formulae for evaluating the LR when population substructure is considered. Fukshansky and Bär [10] and Hu and Fung [11], [12] have developed various formulae to handle the situations when related persons are involved in the interpretation of the DNA mixtures. Cowell et al. [13], [14] used the object-oriented Bayesian network in developing probabilistic models to take into consideration the quantitative peak area information in the DNA mixture, as an alternative to the numerical methods such as the linear mixture analysis [15] and least square deconvolution [16], [17]. See, among many others, Balding [18] and Fung and Hu [19]. Recently, we have discussed in a previous paper [5] how the evidentiary value of DNA mixtures can be evaluated when the suspect is identified through database search and derived general formulae for calculating the LR. One of the key features of the formulae is their applicability to general situations including multiple match cases and non-uniform priors. On the basis of our previous work, in this paper we present further technical details of the approach on some special aspects such as missing data and effectiveness of the database search. The implementation of our approach is illustrated with a numerical example including perfectly and partially matched profiles with the consideration of allele drop-out. The calculation of the average probability of erroneous attribution, which can be regarded as a false positive rate of the search, with DNA mixture as part of the evidence is also discussed. This measure is believed to be useful in predicting the performance of the database search.

The paper is organized as follows. In Section 2, we outline the basic approach on the interpretation and evaluation of the DNA mixture for cold hit cases, followed by the details on how the formulae can be applied to situations when the evidence contains a missing genotype or when allele drop-out is considered. The calculations of the LRs in various scenarios are demonstrated through the analysis of a constructed murder case using Swedish data. Section 3 presents the evaluation of the effectiveness of the database search by means of average probability of erroneous attribution. Finally, Section 4 concludes our findings and their implications and discusses future direction of work.

Back to Article Outline

2. Evidentiary value of cold hits 

2.1. Likelihood ratio 

Consider a two-person mixture case where we denote the DNA profiles of the mixed crime stain and the victim by M={Ml,l=1,…,L} and V={Vl,l=1,…,L} respectively where Ml is the set of alleles at locus l present in the mixed stain and Vl is the genotype of the victim at locus l. Any individual with DNA profile consisting of alleles that are all present in M cannot be excluded as the contributor of the mixture. If there is only one unknown contributor in this two-person mixture case, the DNA profile of the unknown contributor should contain all the alleles in M that are not explained by V, namely MV={MlVl,l=1,…,L}. Suppose no suspect can be identified based on non-DNA evidence. In order to identify the possible perpetrator, we may search existing offender database D and compare each of the DNA profiles Xj={Xjl,l=1,…,L},jD, to MV. An individual will be identified as a possible perpetrator if his/her corresponding DNA profile Xj satisfies MlVlXjlMl for l=1,⋯,L. The evidentiary value of such cold-hit DNA match can be expressed as a likelihood ratio (LR) of the following hypotheses:

Denote P(Hj) as the prior probability that individual j is the unknown contributor. The LR can be evaluated by using the following general formula derived by Chung et al. [5]:

(1)
where HR is the hypothesis that the victim and a random person contribute to the mixture and
is the ordinary LR of Hj versus Hd for a probable cause case when only the DNA profile of individual j is available. If all the DNA profiles contain no missing data, the probability P(M|V,Xj,Hj) is just an indicator function that equals to 1 or 0 according to whether individual j is identified as a possible contributor with MlVlXjlM for l=1,⋯,L. Under linkage equilibrium assumption, the overall random match probability P(M|V,HR) at all L loci is the product of the random match probabilities at each particular locus, and the overall likelihood ratio LRj can be computed by LRj=l=1LLRjl where
is the likelihood ratio at locus l. Table 1 shows the computational formulae of LRjl for a particular autosomal locus l with K alleles A1,A2,…,AK and corresponding allele frequencies p1,p2,...,pK (∑i=1Kpi=1), under the assumption of the Hardy–Weinberg equilibrium.

Table 1. The calculating formulae of LR * at locus l for different combinations of (Ml,Vl,Xl), under the assumption of Hardy–Weinberg equilibrium.
VlMlXlLR
Ai/Ai{Ai}Ai/Ai
Other genotypes0
{Ai,Aj}Aj/Aj,Ai/Aj
Other genotypes0
{Ai,Aj,Ak}Aj/Ak
Other genotypes0
Ai/Aj{Ai,Aj}Ai/Ai,Aj/Aj,Ai/Aj
Other genotypes0
{Ai,Aj,Ak}Aj/Ak,Ai/Ak,Ak/Ak
Other genotypes0
{Ai,Aj,Ak,At}Ak/At
Other genotypes0

2.2. Missing data 

Missing data often arise in DNA profiling, especially when limited DNA samples are typed. Due to genotyping failure, the DNA profile of an individual may contain one or more missing genotypes at some particular loci. If there are missing genotypes in the profile Xj, the likelihood ratio can be obtained by excluding the corresponding loci from the calculation. If the profile V contains a missing genotype at locus l, the likelihood ratio at this locus is calculated as

where the summations are taken over all possible genotypes Vl of the victim at locus l and P(Vl) is the corresponding genotype probability. In this situation, the formulae for calculating LRjl are shown in Table 2.

Table 2. The calculating formulae of LR for different combinations of (Ml,Xl) when Vl is missing, under the assumption of Hardy–Weinberg equilibrium.
XlMlLR
Ai/Ai{Ai}
{Ai,Aj}
{Ai,Aj,Ak}
Ai/Aj{Ai,Aj}
{Ai,Aj,Ak}
{Ai,Aj,Ak,At}

In some cases when a sparse DNA sample is obtained from the crime scene, it may be possible that one or more alleles are not successfully amplified during genotyping. As a result, an allele present in the mixed stain may be undetected and not included in the observed profile M, leading to the well-known phenomenon known as allele drop-out [20], [21]. Considering the possibility of allele drop-out, an individual will not be excluded as the unknown contributor of the mixture by the unmatched profiles at just one or two loci. Nevertheless, Eq. (1) is still capable of assessing the weight of evidence when allele drop-out is taken into account. In such situation, the probability P(M|V,Xj,Hj) is no longer an indicator function. Suppose at a particular locus l the mixture contains a single allele Ai and the victim is homozygous with genotype Ai/Ai. An individual with heterozygous genotype Ai/Aj at this locus would possibly be the contributor if and only if drop-out had occurred and the probability P(Ml|Vl,Xjl,Hj) would be equal to the drop-out rate of allele Aj at this locus. Assuming a constant drop-out rate rd for all alleles, the random match probability is calculated as

and hence

Since the drop-out rate is usually very small, an individual with a genotype not containing allele Ai will still be excluded as it is relatively improbable to have both alleles dropped out. Table 3 shows the computational formulae of LRjl for other genotype combinations of Ml, Vl and Xjl. Note that Table 3 reduces to Table 1 if rd=0.

Table 3. The calculating formulae of LR at locus l for different combinations of (Ml,Vl,Xl), assuming Hardy–Weinberg equilibrium and constant drop-out rate rd for all alleles.
VlMlXlLR
Ai/Ai{Ai}Ai/Ai
Ai/Aj
Other genotypes0
{Ai,Aj}Aj/Aj,Ai/Aj
Aj/Ak
Other genotypes0
{Ai,Aj,Ak}Aj/Ak
Other genotypes0
Ai/Aj{Ai,Aj}Ai/Ai,Aj/Aj,Ai/Aj
Ai/Ak,Aj/Ak,Ak/Ak
Other genotypes0
{Ai,Aj,Ak}Aj/Ak,Ai/Ak,Ak/Ak
Ak/At
Other genotypes0
{Ai,Aj,Ak,At}Ak/At
Other genotypes0

2.3. Example 

To illustrate how Eq. (1) can be applied to evaluate the evidentiary value of cold-hit DNA matches, we consider a constructed murder case in Sweden where a blood stain had been found at the crime scene but no suspect had been identified. The Swedish national DNA database (NDNAD), consisting of about 50,000 DNA profiles from previously convicted offenders, was searched and two persons were identified as possible perpetrators in accordance to cold-hit DNA matches. Table 4 lists the DNA profiles of the victim, the mixed stain and the suspects, namely s1 and s2, at 10SGM Plus STR loci which are commonly used in European NDNADs. As can be seen, suspect s1 has DNA profiles perfectly matched at all ten loci with MV while suspect s2 has DNA profiles matched at only nine loci. If allele drop-out is not considered, suspect s2 will be excluded from being the contributor to the DNA mixture, leaving the DNA evidence of s1 to be evaluated. Using the formulae listed in Table 1 and the allele frequencies given in Montelius et al. [22], the ordinary LR of Hs1: suspect s1 is the contributor, versus Hd: someone else is the contributor is computed as LRs1=2.380×1010. Taking account of the fact that this suspect was identified by cold-hit DNA matches, the weight of evidence should be adjusted using Eq. (1), which gives the LR as

Table 4. DNA profiles detected in a murder case in Sweden. The likelihood ratios are calculated using the formulae listed in Table 1 under the assumption that drop-out has not occurred.
LocusVictimMixtureSuspect s1LRs1Suspect s2LRs2
D3S135815/16{15,16,18}15/185.85116/185.851
vWA18/19{18,19}18/1910.0819/1910.08
FGA21/23{21,23}23/2310.0121/220
D8S117913/14{13,14,15}14/157.75013/157.750
D21S1130/31{29,30,31}29/305.11529/315.115
D18S5113/14{13,14,15,18}15/1843.9715/1843.97
D16S5399/13{9,11,13}9/113.78211/113.782
TH017/9{7,8,9}8/816.338/916.33
D2S133817/17{17,19,20}19/2025.5319/2025.53
D19S43314/14{12,14}12/1414.6712/1214.67
Overall 2.380×1010 0

Suppose we take N=1,000,000 as the number of possible perpetrators which is about 1/9 of the Swedish population. Using uniform priors, i.e. P(Hi)=1/N=0.000001 for all i, we obtain LRs1=2.506×1010, yielding a posterior odds of 25,056. Since the DNA database consists of profiles from previously convicted offenders, it may be more reasonable to adopt non-uniform priors such that each individual in the database is more likely, say, k times as likely, to be the contributor as those who are not in the database. Denoting P(Hs1)=δ as the prior probability that suspect s1 contributes to the mixture and n as the database size, we have

and the posterior odds becomes
which equals to for N=1,000,000, n=50,000, k=10 and LRs1=2.380×1010. In particular even the jurors use small prior odds of 106 for suspect s1 to be the contributor, the resulting posterior odds 36,331, and hence posterior probability 0.99997, would still provide very strong evidence to convict the suspect.

To illustrate how the database size affects the evidentiary value, we may consider the situation in a large country such as the United Kingdom. The UK database contains about n=5,000,000 offenders as well as non-offenders. If we take N=50,000,000 which is the approximate size of the adult population in the UK, the same case with ordinary LR as LRs1=2.380×1010 would result in LRs1=5.025×1010, which is increased by about 38% compared to the Swedish case. This further demonstrates the fact that a cold hit match from a larger database would have a stronger evidentiary value when comparing to a cold-hit match from a small database.

Now suppose allele drop-out is considered. The DNA profile of suspect s2 unmatched with MV only at locus FGA, with allele 22 present in the profile of s2 but absent in the mixture. Under the consideration of allele-out, s2 should not be excluded from being the contributor because it may be possible that allele 22 is present in the mixed stain but failed to be detected. For simplicity, we assume the same drop-out probabilities for all alleles in each locus and make use of the estimated drop-out probabilities from Tvedebrink et al. [23]. Using the formulae listed in Table 3, the ordinary LRs are recalculated and tabulated in Table 5. As can be seen from the table, the value of LRs1 is reduced by more than half when the possibilities of allele drop-out are taken into account. The value of LRs2 is substantially smaller than LRs1, indicating a much lower weight of evidence of the partially matched profile of s2, compared to the perfectly matched profile of s1. Note that the ratio of LRs2 to LRs1 is 6% which is exactly the drop-out rate at locus FGA. By applying Eq. (1), the LR of Hs1: suspect s1 is the contributor, versus Hd: someone else is the contributor, is obtained as

and the LR of Hs2: suspect s2 is the contributor, versus Hd: someone else is the contributor, is

Table 5. DNA profiles detected in a murder case in Sweden. The likelihood ratios are calculated using the formulae listed in Table 3 with the consideration of allele drop-out. The drop-out probabilities rd from Tvedebrink et al. [23] are used.
LocusrdVictimMixtureSuspect s1LRs1Suspect s2LRs2
D3S13580.0315/16{15,16,18}15/185.59716/185.597
vWA0.0318/19{18,19}18/198.61919/198.619
FGA0.0621/23{21,23}23/237.62621/220.458
D8S11790.0313/14{13,14,15}14/157.40813/157.408
D21S110.0230/31{29,30,31}29/304.95129/314.951
D18S510.0313/14{13,14,15,18}15/1843.9715/1843.97
D16S5390.039/13{9,11,13}9/113.61311/113.613
TH010.077/9{7,8,9}8/813.738/913.73
D2S13380.0417/17{17,19,20}19/2025.5319/2025.53
D19S4330.0514/14{12,14}12/1413.8412/1213.84
Overall 1.040×1010 6.240×108

Since the number of possible perpetrators N=1,000,000 is moderate compared to the values of LRs1 and LRs2, the LRs are dominated by the values of LRs1, LRs2, P(Hs1) and P(Hs2). In particular, the LR of Hs1 versus Hd can be approximated as

with corresponding posterior odds equal to 16.67(P(Hs1)/P(Hs2)). Therefore the weight of evidence depends much on the juror's choices of the prior probabilities. The posterior odds is essentially equal to 16.67 when the same prior probability is assigned to both suspects, i.e. there is 94.3% chance for s1 to be the contributor and the other 5.7% chance goes to s2 given the observed DNA evidence. Table 6 reports the posterior probabilities of s1 being the contributor for different combinations of P(Hs1) and P(Hs2). It is clear from Table 6 that for most combinations of P(Hs1) and P(Hs2), the posterior probabilities that s1 contributed to the mixture are substantially smaller than the corresponding posterior probability 0.99997 evaluated without the consideration of allele drop-out. The posterior probabilities become larger only when P(Hs1)/P(Hs2) is at least 1000. Therefore under the consideration of allele drop-out, if the juror uses a combination of the prior probabilities with moderate P(Hs1)/P(Hs2), the evidentiary values for the hypothesis Hs1 become much weaker when more than one suspect is identified, even though the additional suspect has only a partially matched DNA profile. In such case, the evidentiary value of the cold hits would not be strong enough to uphold the conviction of the suspect.

Table 6. The posterior probabilities of Hs1: suspect s1 is the contributor in the Swedish murder case, with two suspects identified, n=50,000, N=1,000,000, LRs1=1.040×1010 and LRs2=6.240×108.
Prior probabilitiesPrior probabilities for s2 (γ)
for s1 (δ)0.0000010.000010.00010.0010.010.1
0.0000010.94340.62500.14290.01640.00170.0002
0.000010.99400.94340.62500.14290.01640.0017
0.00010.99940.99400.94340.62500.14290.0164
0.0010.99990.99940.99400.94340.62500.1429
0.011.00000.99990.99940.99400.94340.6250
0.11.00001.00000.99990.99940.99400.9434

Back to Article Outline

3. Average probability of erroneous attribution 

The effectiveness of the database search as a tool for criminal identification can be measured by the average probability of erroneous attribution (APEA) suggested by Song et al. [24]. It is defined as the conditional probability that the actual perpetrator is someone not in the database, given that exactly one individual in the database was found to have matched DNA profile with the crime stain. A smaller chance of erroneous attribution would indicate a more reliable basis of conviction by cold-hit evidences. It was shown by Song et al. [24] that under the assumption of the independence among the DNA profiles in the database, the APEA equals to (Nn)pA approximately where N is the number of possible perpetrators in the population, n is the number of profiles in the database and pA is the average random match probability for all DNA profiles in the population. It was concluded that the chance of an erroneous attribution is usually very small even under the conservative assumption of independent profiles.

Their work, however, is based on single-source crime samples rather than mixtures and the scenario when more than one individual in the database was found to have matched profiles is not considered. Here we extend the idea of the APEA to the mixture cases, taking account of the possibility of multiple matches. We define D and R respectively as the index sets of individuals in the offender database and not in the offender database, n as the size of D, r=Nn as the size of R, and EGk as the event that exactly k individuals in the set G have DNA profiles that match MV. For simplicity, we assume that there is only one perpetrator in the population and all the involved DNA profiles are independent. Using uniform priors, the APEA, given that m individuals in the database have DNA profiles matched with MV, can be approximated as

(2)
where pA,m is the mth moment of the random match probability for all DNA profiles in the population. The proof of Eq. (2) can be found in the Appendix. Note that for m=1, Eq. (2) reduces to P(ErroneousAttribution|ED1)=(Nn)pA,1 which is exactly the formula derived by Song et al. [24]. For a particular locus l, denote Cl as the profile of the perpetrator at this locus. Then Ml=VlCl and the random match probability at locus l given the profiles (Vl,Cl) can be calculated by using the formulae listed in Table 7. Taking the expectation of RMPlm with respect to the distribution of Vl and Cl gives the mth moment of the random match probability at locus l:

Table 7. The calculating formulae of the random match probability (RMPl) for different combinations of (Vl,Cl), under the assumption of Hardy–Weinberg equilibrium.
VlClMlRMPl
Ai/AiAi/Ai{Ai}pi2
Aj/Aj, Ai,Aj{Ai,Aj}pj2+2pipj
Aj/Ak{Ai,Aj,Ak}2pipj
Ai/AjAi/Ai, Aj/Aj, Ai/Aj{Ai,Aj}pi2+pj2+2pipj
Ak/Ak, Ai/Ak, Aj/Ak{Ai,Aj,Ak}pk2+2pipk+2pjpk
Ak/At{Ai,Aj,Ak,At}2pkpt

Assuming linkage equilibrium, pA,m can be obtained as the product of E(RMPlm) for all loci. Table 8 shows the values of pA,m and the corresponding APEA for m5, using Swedish allele frequencies at 10 loci, N=1,000,000 and n=50,000. As can be seen, the values of APEA increase with the increase of m. Therefore, more matched profiles found in the database would suggest a larger chance that the unknown contributor to the crime stain is someone not in the database. While it seemingly contradicts with our intuition that more matches should result in a higher evidentiary value, both the number of matches found in the database and the value of APEA actually depend much on the unexplained profiles M\V. If rare alleles are present in MV, it would be very unlikely to have many individuals in the population to have matched profiles and hence both the number of matches found in the database and the value of APEA would be substantially small. However, even when there is just a single match, i.e. m=1, the chance of an erroneous attribution is approximately 1.861×104, or about 1 in 5000 cases, which is not very small compared to the results (1 in 3.4million on 13-locus CODIS profiles of U.S. population) reported in Song et al. [24]. Therefore more loci in addition to the 10SGM Plus loci would be required in order to reduce the chance of erroneous attribution and improve the effectiveness of the database search.

Table 8. The average probability of erroneous attribution given m matches, using Swedish allele frequencies on 10SGM Plus loci, N=1,000,000 and n=50,000.
mE(RMPm)E(RMPm)/E(RMPm1)P(ErroneousAttribution|EDm)
11.959×10101.959×10101.861×104
21.957×10189.989×1094.725×103
31.089×10255.567×1081.737×102
41.566×10321.438×1073.325×102
54.064×10392.595×1074.758×102

Back to Article Outline

4. Discussion 

In this paper we have reported our progress on an approach to evaluate the forensic evidence of DNA mixtures when the suspect is identified through a database search. We offer to push ahead further into the situations of imperfect evidence such as missing genotype and allele drop-out. It is demonstrated through a numerical example that our approach is capable of assessing the evidentiary value of multiple cold hits from a database search on a two-person mixture. Under the consideration of allele drop-out, the evidentiary value would be shared among the suspects with DNA profiles matched with the victim and the mixture at a certain number of loci, according to the drop-out rates at the unmatched loci and the juror's choices on the prior probabilities. The results can be presented as a table of posterior probabilities, providing comprehensive information for the court if it is accepted.

We have also described a simple method to evaluate the effectiveness of database search with DNA mixtures, by calculating the average probability of erroneous attribution which can be regarded as a measure of false positive rate for the suspect identification. The derived approximating formula of APEA allows us to calculate various values of APEA for mixture cases when different numbers of matches are found from the database search. As illustrated by the example using Swedish allele frequencies at 10SGM Plus loci, the chance of an erroneous attribution is small but not negligible, suggesting room for further improvement on the effectiveness of the search.

The results presented in this paper are based on the most common scenario when a two-person mixture is included as part of the DNA evidences. We basically focus on the situation when the mixed stain is originated from the victim whose DNA profile is available, so that the set of possible profiles for the unknown contributor is more restrictive and the database search will not result in too many cold-hit matches. For mixture cases involving two unknown contributors, there will be relatively more individuals in the database that will not be excluded as the contributors. As a result, the database search may not be that effective compared to cases involving one victim and one unknown. In fact, for cases with two unknown contributors, we can treat the victim profile as missing data, so that the evidentiary value of cold hits can be evaluated by using the formulae in Table 2.

In practical crime cases, it is also possible that the crime stain composes of a mixture of more than two people's DNA. When multiple perpetrators are involved, the formulae presented here are no longer applicable. Thus, further work could include the generalization of Eq. (1) to handle multiple perpetrator cases.

Back to Article Outline

Acknowledgements 

Special thanks to Anna Beckman of the Swedish National Laboratory of Forensic Science for providing useful information on the Swedish national DNA database. The authors also thank the anonymous referees for their valuable comments and suggestions. This work was partially supported by the Croucher Foundation.

Back to Article Outline

Appendix A. 

Proof of Eq. (2). Define HD as the event that the perpetrator is in the database D and θ as the random match probability for a particular combination of the profiles (V,C). Under the assumption of uniform prior, the probability of HD, given that there are m matches found in the database, can be evaluated by using the Bayes rule:

where the expectation is taken over the distribution of (V,C). The average probability of erroneous attribution is therefore obtained as
where the approximations are resulted by truncating the higher order terms E(θk) with k>m.

Back to Article Outline

References 

  1. Jeffreys AJ, Wilson V, Thein SL. Individual-specific ‘fingerprints’ of human DNA. Nature. 1985;316:76–79
  2. Balding DJ, Donnelly P. Inference in forensic identification (with discussion). Journal of the Royal Statistical Society, Series A. 1995;158:21–53
  3. Stockmarr A. Likelihood ratios for evaluating DNA evidence when the suspect is found through a database search. Biometrics. 1999;55:671–677
  4. Meester R, Sjerps M. The evidential value in the DNA database search controversy and the two-Stain problem. Biometrics. 2003;59:727–732
  5. Chung YK, Hu YQ, Fung WK. Evaluation of DNA mixtures from database search. Biometrics. 2010;66:233–238
  6. Weir BS, Triggs CM, Starling L, Stowell LI, Walsh KAJ, Bukleton JS. Interpreting DNA mixtures. Journal of Forensic Sciences. 1997;42:113–122
  7. Fukshansky N, Bär W. Interpreting forensic DNA evidence on the basis of hypotheses testing. International Journal of Legal Medicine. 1998;111:62–66
  8. Curran JM, Triggs CM, Buckleton J, Weir BS. Interpreting DNA mixtures in structured populations. Journal of Forensic Sciences. 1999;44:987–995
  9. Fung WK, Hu YQ. Interpreting forensic DNA mixtures: allowing for uncertainty in population substructure and dependence. Journal of the Royal Statistical Society, Series A. 2000;163:241–254
  10. Fukshansky N, Bär W. Biostatistics for mixed stain: the case of tested relatives of a non-tested suspect. International Journal of Legal Medicine. 2000;114:78–82
  11. Hu YQ, Fung WK. Intepretating DNA mixtures with the presence of relatives. International Journal of Legal Medicine. 2003;117:39–45
  12. Hu YQ, Fung WK. Evaluation of DNA mixtures involving two pairs of relatives. International Journal of Legal Medicine. 2005;119:251–259
  13. Cowell RG, Lauritzen SL, Mortera J. Identification and separation of DNA mixtures using peak area information. Forensic Science International. 2007;116:28–34
  14. Cowell RG, Lauritzen SL, Mortera J. A gamma model for DNA mixture analyses. Bayesian analysis. 2007;2:333–348
  15. Perlin MW, Szabady B. Linear mixture analysis: a mathematical approach to resolving mixed DNA samples. Journal of Forensic Science. 2001;46:1372–1378
  16. Gill P, Sparkes R, Pinchin R, Clayton T, Whitaker J, Buckleton J. Interpreting simple STR mixtures using allele peak areas. Forensic Science International. 1998;91:41–53
  17. Wang T, Xue N, Birdwell JD. Least-square deconvolution: a framework for interpreting short tandem repeat mixtures. Journal of Forensic Science. 2006;51:1284–1297
  18. Balding DJ. Weight-of-Evidence for Forensic DNA Profiles. Chichester: Wiley; 2005;
  19. Fung WK, Hu YQ. Statistical DNA Forensics: Theory, Methods and Computation. Chichester: Wiley; 2008;
  20. Gill P, Whitaker J, Flaxman C, Brown N, Buckleton J. An investigation of the rigor of interpretation rules for STRs derived from less than 100pg of DNA. Forensic Science International. 2000;112:17–40
  21. Gill P, Brenner CH, Buckleton JS, Carracedo A, Krawczak M, Mayr WR, et al. DNA commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures. Forensic Science International. 2006;160:90–101
  22. Montelius K, Karlsson AO, Holmlund G. STR data for the AmpFSTR Identifiler loci from Swedish population in comparison to European, as well as with non-European population. Forensic Science International. Genetics. 2008;2:e49–e52
  23. Tvedebrink T, Eriksen PS, Mogensen HS, Morling N. Estimating the probability of allelic drop-out of STR alleles in forensic genetics. Forensic Science International. Genetics. 2009;3:222–226
  24. Song YS, Patil A, Murphy EE, Slatkin M. Average probability that a “Cold Hit” in a DNA database search results in an erroneous attribution. Journal of Forensic Science. 2009;54:22–27

PII: S1355-0306(10)00099-7

doi:10.1016/j.scijus.2010.07.002

Science & Justice
Volume 51, Issue 1 , Pages 10-15, March 2011