Abstract

The reliability and probative value of forensic science evidence is inextricably linked to the rates at which examiners make errors. Jurors and others cannot rationally assess the significance of a reported forensic science match without having some information about the rate at which false positive errors occur. This article calls for the implementation of proficiency tests that are designed and administered for the express purpose of providing factfinders with reasonable first pass estimates of error rates across forensic disciplines and techniques. The composition of the test designers and administrators, the features of tests and reference samples, the composition and selection of test participants, the use of blind test protocols and the coding of test responses are critical elements in this endeavour. A proficiency testing plan that addresses each of these issues is identified.

Forensic science evidence is widely believed to be the most powerful form of evidence in existence. Like eyewitness evidence, some types of forensic science evidence can tie a particular person to a crime scene or a criminal act. Unlike eyewitness evidence, the testimonial infirmities associated with forensic match testimony seem slight. Whereas legal decision makers readily understand that eyewitnesses may misperceive, misremember, misdescribe, or lie, they are less likely to worry about these infirmities when evaluating forensic science testimony. But how accurate are forensic match reports? How can we provide legal decision makers with an empirically-based sense of the frequency with which various types of forensic testimony are wrong or misleading?

The answer is to conduct methodologically rigorous, blind, external proficiency tests using realistic samples across the forensic sciences, including DNA, fingerprints, bitemarks, toolmarks, footwear, tire tracks, handwriting, glass, fibre, hair, paint, etc. A proficiency test is an assessment of the performance of laboratory personnel. In a blind proficiency test, the source of tested samples is not revealed to examinees until after the test has concluded and examinees are not aware that they are participating in a test (Peterson, 2003a). An external test is one that is conducted by an organization other than the one in which the examiner works.

Calls for blind, external proficiency testing in the forensic sciences are nothing new. Such calls were particularly frequent in the early 1990s when DNA evidence was in its infancy (Balding and Donnelly, 1995; Kaye, 1993; Koehler, 1993; National Academy of Sciences, 1992; Thompson, 1993). These calls drew support from some earlier results of blind versus open proficiency tests of laboratories conducting drug analyses. Those tests confirmed the commonsensical ideas that analysts do not approach test samples in the same manner that they approach case samples (Cembrowski and Vanderlinde, 1988), and that error rates will tend to be higher in blind tests than in open tests (Boone et al., 1982; Hansen et al., 1985). But over the past 20 years, a broad, systematic blind proficiency testing programme has not materialized. Perceived complexity and expense are surely part of the story. But the more important part of the story is misunderstanding about (a) the specific purpose of such tests, (b) why the results of such tests would provide crucial information for legal decision makers, (c) why current testing procedures and data fail to address the need, and (d) how the new tests could be conducted in practice.

Adding to the confusion, a 1996 National Academy of Sciences (NAS) report on DNA evidence flatly contradicted the recommendation made in a 1992 NAS report on DNA evidence about the need to identify error rates. Whereas the 1992 report indicated that blind proficiency tests that estimate error rates should be required for all new DNA methods (National Academy of Sciences, 1992, p. 55), the 1996 report countered that error rate estimates from proficiency tests are ‘almost certain to yield wrong values’ because ‘[w]hen errors are discovered, they are investigated thoroughly so that corrections can be made’ (National Academy of Sciences, 1996, p. 86). According to this logic, error rate estimates based on proficiency test data provide no insight into the risk of an error in a given case. These and other arguments that the 1996 NAS report offered up against the value of error rate data were rebutted in a Symposium issue of Jurimetrics (see Balding, 1997; Koehler, 1997; Lempert, 1997; Thompson, 1997). Those rebuttal arguments will not be repeated here. But the reality is that, following the 1996 NAS report, DNA error rate data from proficiency testing became scarce and rarely found its way into the courtroom. Likewise, the non-DNA forensic sciences, which had little tradition of using proficiency tests to estimate error rates in the first place, continued to be promoted as reliable and even error-free.

Here, some clarification about the purpose of proposed proficiency tests is in order. Proficiency tests serve many purposes. They may be used to train personnel, promote baseline competency levels, improve laboratory practices and procedures, and identify future needs for a laboratory or technique. These are valid and important purposes. However, they are not the only purposes of proficiency tests and they are not of concern in the present article. Instead, this article focuses on using proficiency tests to identify reasonable first pass estimates for the rates at which various types of forensic errors occur. Researchers have long suggested that it is crucial to measure error rates for the various forensic sciences because the probative value of forensic science evidence is inextricably linked to the rates at which examiners make errors (Balding, 2005; Gutowski, 2005; Koehler, 1997, 2008, 2011; Koehler et al., 1995; Lempert, 1991, 1997; Thompson et al., 2003). Without such information, jurors and other legal decision makers have no scientifically meaningful way of assigning weight to forensic match reports across the various forensic subfields.

Of course, many forensic scientists do participate in proficiency tests, including tests administered by such professional organizations as Collaborative Testing Services (CTS). These external proficiency tests serve many purposes. But they are not well-designed to assess error rates in realistic settings. Indeed, CTS expressly cautions against using their test results to draw inferences about accuracy rates in the forensic disciplines or among participating forensic examiners. Although some imperfect tests are arguably better than no tests at all, CTS has a point: CTS tests are not blind (i.e. analysts know they are being tested), not well controlled (e.g. participation is voluntary and analysts may or may not receive assistance from others), and not particularly realistic (e.g. samples are often pristine and recycled from previous tests).

When designing proficiency tests for the express purpose of providing a starting point for estimating error rates in the various forensic disciplines, careful thought must be given to at least four issues: (1) the composition of the test designers and administrators who oversee the testing process, (2) the features of test and reference samples, (3) the composition and selection of test participants, (4) the use of blind test protocols, and (5) the coding of test responses. A national proficiency-testing plan that addresses each of these issues is considered below.

1. Test designers and administrators

The designers and administrators of proficiency tests should be qualified, disinterested parties. By ‘qualified’, I mean people who have expertise in such areas as experimental design, testing, statistics, behavioural sciences, police investigation and forensic science. It would be hard to overstate the importance of including statisticians, behavioural scientists and others who have expertise in research methodology. If the proficiency tests are not properly designed, then scientific inferences cannot be made. By ‘disinterested’, I mean that test designers and administrators should not be affiliated with the examinees or the examinees’ laboratories, nor should they stand to benefit from or be harmed by any particular outcome or set of outcomes on the proficiency tests.

2. Features of test samples

The samples used in proficiency tests should be representative of the types of prints, markings and traces that arise in actual cases. This may be accomplished in different ways. One way is for test administrators to access a database of all cases in a county, state, country or other population over some time period (e.g. 5 years), and to note which cases included forensic science evidence. A random sample of those cases might then be identified as prototypes for the construction of proficiency test samples. Samples identified in this manner are likely to vary widely. For fingerprint cases, one case might include two detailed latent prints plus known prints from one suspect and two innocents. Another case might include one badly smudged latent print and known prints from each of 10 suspects, including a pair of identical twins.

Once a random sample of cases has been identified, test administrators should write comparable cases and then manufacture forensic evidence that resembles the samples and cases chosen on key dimensions. Even if the actual forensic evidence items from selected cases could be used, it would not be appropriate to do so because ground truth (e.g. which hairs came from which person) would be unknown. This problem is remedied by the manufacturing process.

The newly created evidence should be rated for difficulty using an agreed-upon rating scheme. Doing so will help allow researchers to track the impact of difficulty on examiner accuracy. The process of manufacturing samples that vary in difficulty in agreed-upon ways will not be a simple chore. Ongoing research by Jennifer Mnookin and her colleagues that examines difficulty metrics in latent fingerprints may provide some guidance in this area (Mnookin, 2009).

Test administrators should also track task features such as whether multiple items are from a single common source, and whether the source of the trace evidence or marking is available for comparison. Because this proficiency test plan calls for representative samples rather than deliberately diverse samples, sample difficulty levels and other task features are tracked but not experimentally manipulated. If, for example, fingerprints recovered from crime scenes are commonly partial and badly distorted, then these types of latent prints should make up the bulk of the samples used in proficiency tests. This approach increases our confidence that the general error rates detected in the tests are applicable to those in case work.

3. Test participants

Participants should be representative of the forensic scientists who testify in court. Pertinent background features of test participants should be tracked, including training, experience and number of cases in which the participants have testified. By tracking examiners’ characteristics, we will gain insight into the conditions under which performance varies.

All forensic scientists who testify in court should be part of the broad participant pool. Examiners cannot opt in or out. However, it is not important that all or even most forensic scientists be selected to participate in the proficiency tests. Participants in each of the tested subfields should be sampled using statistically sound methods. A proper sampling will allow for the identification of industry-wide error rates across various forensic subfields, and at considerably less cost than plans that require testing all examiners.

Some readers may question the value of identifying industry-wide error rates for the various forensic sciences. After all, shouldn’t factfinders be concerned with the chance that the particular forensic scientist who ran the analyses in the focal case makes errors? Or, better yet, shouldn’t factfinders want to know the chance that the forensic scientist made a crucial error in the focal case? Certainly, factfinders would like to know the risk of error in the instant case. But the question that needs to be addressed is what information can help the factfinder reach that goal. According to an elementary principle of statistical prediction, the industry-wide error rate informs the risk of error in the instant case by providing a starting point for the estimate. Just as knowing that the average Major League baseball shortstop commits an error on about 3–4% of his chances informs us about the chance that he’ll commit an error on his next chance, the industry-wide false positive error rate for ballistics examinations informs us about the chance that a particular reported ballistics match is erroneous. Failure to appreciate this general principle has been identified as ‘one of the most significant departures of intuition from the normative theory of prediction’ (Kahneman and Tversky, 1973, p. 243; for an empirical review, see Koehler, 1996). To put the point another way, unless factfinders have access to a reasonable estimate of the general prevalence (or base rate) at which errors occur, they will not have the information they need to rationally assess the probability of an error in the instant case. Case-specific information alone—including the training, knowledge and experience of the person who performed the analysis, or the apparent care that went into the analysis—will not provide an adequate basis for estimating the chance that a significant error occurred.1

The importance of the base rate has long been understood and accepted in the scientific and medical communities (Gastwirth, 1987). For example, researchers and doctors understand that the chance that a patient suffers from a particular disease or condition depends both on the specific symptoms the patient exhibits (case-specific information) as well as the relative frequency of the condition (the base rate) (Dunn and Greenhouse, 1950; Meehl and Rosen, 1955). Moreover, as a practical matter, the reference classes from which medical base rates are derived are usually broad (e.g. ‘men over 50’) rather than highly specific to the patient of interest. Nevertheless, such base rates have repeatedly been shown to improve predictions about specific patients relative to judgments made without the benefit of such base rates (Dawes et al., 1989).

These points about the value of base rate probabilities are sometimes misunderstood in the forensic science community. Indeed, some have suggested that base rates for errors should not be used or even computed (Bono, 2011; Budowle et al., 2009; Bunch et al., 2009). Fortunately, a recent and influential report from a statistically sophisticated NAS panel rejects this position and expressly calls for rigorous studies to estimate error rates throughout the forensic sciences (National Academy of Sciences, 2009, p. 25, 122, 191).

To be clear, an appreciation of the role that background probabilities play in individual case prediction tasks does not require the decision maker to discard more individuating forms of evidence such as evidence that a particular examiner has not erred on any of 20 proficiency tests. This is not a zero-sum argument in which the decision maker chooses between one or more general background rates and examiner-specific information. From a logical standpoint, both components should inform a factfinder’s estimate that a particular decision is in error. But in situations where there is a relative dearth of reliable, examiner-specific information, the general background rate should play a relatively large role, perhaps larger than intuition might suggest.

This point is well illustrated by ‘Stein’s paradox’ (Efron and Morris, 1977). Suppose that well-designed proficiency tests provide a first-pass error rate formula (0 formula1) for a particular forensic procedure. Suppose further that an individual examiner who has participated in these well-designed proficiency tests has an observed error rate of y (0 formula yformula1). According to Stein’s paradox, a computation that combines formulaand y in a manner that ‘shrinks’ or regresses y in the direction of formula will yield a more accurate estimate of the examiner’s true error rate than either y or formulaalone. When the sample size used to compute y is very small (as it usually will be in the forensic sciences because individual examiners do not participate in large numbers of well-designed proficiency tests), the end result of Stein’s computation will show a strong influence of formula (i.e. general background rate) value relative to y (i.e. the individual examiner’s rate of error on a small number of tests). In other words, the true error rate for an examiner who erred 0 times in 20 proficiency tests (or 0 times in, say, 100 comparisons) will be a lot closer to the first pass error rate estimate based on lots of examiners than it will be to the examiner’s error rate on the tests he or she took (0%).

This fact, which was recently identified as ‘the single most striking result of post-World War II statistical theory’ (Efron, 2010, p. 149), will not surprise Bayesian thinkers or those who appreciate regression effects in everyday life. But the psychological resistance it encounters in the context forensic science proficiency testing is powerful. Why, ask some sceptics, should the proficiency test results of other analysts, taken at other times, in other laboratories, be used to estimate the reliability of a single analyst for whom proficiency test data are available? Why not simply use the empirical results from the tests taken by the analyst in question as a best estimate for his/her true error rate, and then use this value as a reasonable first pass estimate for the chance that the analyst erred in this particular case? The reason this suggestion should be rejected is that the broader background probability—in this application, an industry-wide error rate—provides powerful indirect evidence about the parameter of interest.

4. Blind tests

Ideally, proficiency tests should be blind in the sense that any party that has a direct interest in how the examiners perform should not be aware that the proficiency test materials are part of a test rather than part of actual casework. Behaviour may change under observation and it is important to make test conditions as similar to casework conditions as possible. Part of that similarity means not telling examiners that they are being tested. This is a key feature in a scientifically valid proficiency test of human performance, and one that is expressly recommended for use in forensic science proficiency testing in a statement offered by the American Statistical Association (American Statistical Association, 2010).

Some forensic supporters take offense at the suggestion that forensic scientists’ behaviour may vary when they know they are being tested. But the notion that behaviour changes under observation is well-documented across many domains for experts and novices alike (Risinger et al., 2002). It is simply part of the human condition.

Pleas for blind proficiency testing are sometimes dismissed on grounds that they would be too difficult to implement. If ‘too difficult to implement’ means that the population of laboratories and examiners is unlikely to embrace such testing voluntarily, then I agree. A testing programme that relies on voluntary participation will not produce trustworthy data because the sampled population may no longer be representative of testifying examiners. Therefore, participation in this testing programme should be required by law. Under such legislation, laboratories that provide courtroom testimony on forensic science matters must agree to participate in a blind test when asked to do so.

If ‘too difficult to implement’ means that a mandatory blind testing programme is too unwieldy to implement (see e.g. Expert Working Group on Human Factors in Latent Print Analysis, 2012, p. 33–34), then I disagree. Blind proficiency testing has been used in some forensic science areas, including the Department of Defence’s forensic urine drug testing programme and the HIV testing programme (Peterson and Gaensslen, 2001). Blind tests have also been used for DNA analyses. For example, Rand et al. (2002) report the results of DNA blind trials across 129 laboratories in 28 European countries2 (see also Rand et al., 2004). Joe Peterson and colleagues conducted a detailed pilot investigation in the USA which showed that ‘blind tests can be constructed and successfully submitted to forensic DNA laboratories’ (Peterson and Gaensslen, 2001; see also Peterson et al., 2003a,b). Smaller scale blind proficiency tests for DNA analyses were conducted in the early DNA evidence days as well (Honma et al., 1989; Kuo, 1988, 1990; Walsh et al., 1991).

This is not to say that the practical problems associated with a blind testing programme are trivial. They are not. Indeed, even after completing their successful blind proficiency tests in the DNA area, Peterson and colleagues believed that the costs and logistics of a full-scale blind proficiency-testing programme were too great. But the purpose, costs and logistics of the testing programme described here are different from the one contemplated and pilot-tested by the Peterson group. The current plan focuses on estimating error rates at the industry level rather than the examiner level. Consequently, it would not require testing every laboratory and examiner on a regular basis. Moreover, the scientific and legal community’s resolve to address the error rate issue in the forensic sciences is much stronger than it was a decade ago. As noted previously, the NAS expressly called for mandatory proficiency tests and quantitative estimates of error rates (National Academy of Sciences, 2009, p. 25, 122, 191). Likewise, trial judges are beginning to look more critically at the error rate issue in cases where the forensic evidence is challenged, and some courts have rejected claims of zero error rate and subjective certainty testimony (Giannelli, 2009).

5. Coding of test responses

Where possible, test responses should be coded as correct, incorrect, or inapplicable. False positive rates are computed as the proportion of samples that are from different sources that are reported to match. Sometimes these are referred to as identifications. False negative rates are computed as the proportion of samples that are from the same source that are reported as non-matches. Sometimes these are referred to as exclusions or eliminations. Samples that are judged to be ‘inconclusive’ are inapplicable when estimating false positive and false negative error rates. They do not count for purposes of these computations. However, the frequency with which samples are judged to be inconclusive should be tracked for other purposes (see Expert Working Group on Human Factors in Latent Print Analysis, 2012). Some forensic sciences permit examiners a broader range of responses than the three categories described above. For example, professional guidelines encourage shoeprint examiners to offer one of seven conclusions: unsuitable, elimination, identification, probably made, could have made, inconclusive or probably did not make (Scientific Working Group, 2009).

6. Conclusion

Jurors and others cannot rationally assess the significance of a reported forensic science match without having some information about the rate at which false positive errors occur for the technique in question. Such information will give factfinders a meaningful starting point for thinking about accuracy and error rates. At present, factfinders are guided by little more than presumption, rumour, media hyperbole and unscientific claims made by testifying experts at trial. Properly designed proficiency tests provide a necessary first step toward correcting this problem.3

The design and administration of these tests—i.e. tests that will provide reasonable first pass estimates for the rates at which various types of errors occur for various forensic analyses—is a major undertaking. Until now, no one has offered the outlines of a plan for achieving this goal. This article addresses four questions pertaining to the recommended proficiency tests that deserve much more attention that they have received to date: (1) Who is designing the test and overseeing the testing process? (2) What is the nature of the test samples? (3) Who are the test participants? (4) How can we ensure that the test will be treated by examiners as actual case work? and (5) How will test responses be coded?

Of course, it would be easy to dismiss the entire enterprise by repeating the practical questions, as if to suggest that they pose impossible obstacles. Who exactly will administer the tests? How much will it cost? How can participation be ensured? How will examiners create enough time to participate? A similar set of sceptical questions concerns the risks associated with misinterpretation or misuse of the outcome data. How exactly will those outcome data be described to factfinders? How will we know whether a particular examiner or a particular examination has a higher or lower error risk than the estimates obtained from these tests? What will be done to prevent unscrupulous advocates from abusing the outcome data to confuse factfinders or even discredit the entire forensic enterprise? These are all important questions and each deserves careful analysis. But they are the wrong questions to ask at this stage. The better questions to ask are (1) Do legal decision makers need to know forensic error rates to evaluate the significance of forensic evidence? (2) Is it best to estimate those rates via a broad and rigorously scientific testing programme? and (3) Should trial judges require the forensic science community to cooperate with a rigorous testing programme as a condition for admitting forensic testimony at trial? Elsewhere, I expressed scepticism about whether the forensic sciences will voluntarily embrace significant reform of the sort envisioned by the 2009 NAS report on the forensic sciences (Koehler, 2010). And I have no illusions about the level of excitement the current proposal will generate among the masses in the forensic community. But if and when influential figures in the legal and scientific communities begin to think that the answers to the three questions above just might be ‘yes’, then we are surely more than half way there.

1 Obviously, an error rate based on a large amount of rigorous testing on a particular examiner is preferable to an error rate that is less individualized. But such data are beyond reach given the amount of testing that would need to be done at the individual level to obtain data that have sufficiently narrow confidence intervals. The same is not true of industry-wide error rate data.

2Rand et al. (2002) concede that their blind trials did not include a simulated case work situation due to ‘the vastly differing nature and internal organisation of the [participating] laboratories’ (p. 202). Importantly, the 129 participating laboratories in this study were from 28 European countries. Whether the differences that exist across DNA laboratories throughout the USA are as vast and significant for testing purposes as those that exist across DNA laboratories from 28 different countries is doubtful, but ultimately an empirical question.

3 Arguably, the true first step is the elimination of false and unscientific statements about the strength of the forensic science evidence (e.g. risk of error is 0%, match is 100% certain). I don’t focus on these claims here for several reasons. Firstly, prominent scholars have clearly and repeatedly called for such reform already (e.g. Expert Working Group on Human Factors in Latent Print Analysis, 2012; National Academy of Sciences, 2009). Secondly, we can do much more to enhance factfinders’ understanding of the probative value of forensic science evidence than simply block exaggerated or misleading testimony. We can provide factfinders with scientific data.

References

American Statistical Association
,
ASA Statement on Strengthening Forensic Science
,
2010
 
Balding
D J
,
Errors and misunderstandings in the second NRC report
Jurimetrics
,
1997
, vol.
37
(pg.
469
-
476
)
Balding
D J
Weight-of-Evidence for Forensic DNA Profiles
,
2005
West Sussex, UK
John Wiley and Sons Ltd
Balding
D J
Donnelly
P
,
Inferring identity from DNA profile evidence
PNAS
,
1995
, vol.
92
(pg.
11741
-
11745
)
Bono
J P
,
Commentary on “The need for a research culture in the forensic sciences”
UCLA Law Rev.
,
2011
, vol.
58
(pg.
781
-
787
)
Boone
D J
Hansen
H J
Hearn
T L
Lewis
D S
Dudley
D
,
Laboratory evaluation and assistance efforts: Mailed, on-site and blind PT surveys conducted by the Centers for Disease Control
Am. J. Public Health
,
1982
, vol.
72
(pg.
1364
-
1368
)
Budowle
B
Bottrell
M C
Bunch
S G
Fram
R
Harrison
D
Meagher
S
Oien
C T
Peterson
P E
Seiger
D P
Smith
M B
Smrz
M A
Soltis
G L
Stacey
R B
,
A perspective on errors, bias, and interpretation in the forensic sciences and direction for continuing advancement
J. Forensic Sci.
,
2009
, vol.
54
(pg.
798
-
809
)
Bunch
S G
Smith
E D
Giroux
B N
Murphy
P D
,
Is a match really a match? A primer on the procedures and validity of firearm and toolmark identification
Forensic Sci. & Communication
,
2009
, vol.
11
(pg.
1
-
10
)
Cembrowski
G S
Vanderlinde
R E
,
Survey of special practices associated with College of American Pathologists Proficiency Testing in the Commonwealth of Pennsylvania
Archives of Pathology and Laboratory Medicine
,
1988
, vol.
112
(pg.
374
-
376
)
Dawes
R M
Faust
D
Meehl
P E
,
Clinical versus actuarial judgment
Science
,
1989
, vol.
243
(pg.
1668
-
1674
)
Dunn
J E
Jr
Greenhouse
S W
Cancer Diagnostic Tests: Principles and Criteria for Development and Evaluation
,
1950
Government Printing Office
Federal Security Agency, Psublic Health Service, #9
Efron
B
,
The future of indirect evidence
Stat. Sci.
,
2010
, vol.
25
(pg.
145
-
157
)
Efron
B
Morris
C
,
Stein's paradox in statistics
Scientific American
,
1977
, vol.
236
(pg.
119
-
127
)
Expert Working Groupon Human Factorsin Latent Print Analysis (2012) Latent Print Examination and Human Factors: Improving the Practice through a Systems Approach. U.S. Department of Commerce, National Institute of Standards and Technology
Gastwirth
J L
,
The statistical precision of medical screening procedures: Application to polygraph and aids antibodies test data
Stat. Sci.
,
1987
, vol.
2
(pg.
213
-
238
)
Giannelli
P C
,
The NRC report and its implications for criminal litigation
Jurimetrics J.
,
2009
, vol.
50
(pg.
53
-
66
)
Gutowski
S
,
Error rates in the identification sciences
Forensic Bulletin
,
2005
, vol.
23
(pg.
23
-
29
)
Hansen
H J
Caudill
S P
Boone
J D
,
Crisis in drug testing: Results of a CDC blind study
J. Am. Med. Ass.
,
1985
, vol.
253
(pg.
2382
-
2387
)
Honma
M
Yoshii
T
Ishiyama
I
Mitani
K
Kominami
R
Muramatsu
M
,
Individual identification from semen by the deoxyribonucleic acid (DNA) fingerprint technique
J. Forensic Sci.
,
1989
, vol.
34
(pg.
222
-
227
)
Kahneman
D
Tversky
A
,
On the psychology of prediction
Psychol. Rev.
,
1973
, vol.
80
(pg.
237
-
251
)
Kaye
D H
,
DNA evidence: Probability, population genetics, and the courts
Harvard. J. Law Technol.
,
1993
, vol.
7
(pg.
101
-
172
)
Koehler
J J
,
DNA matches and statistics: Important questions, surprising answers
Judicature
,
1993
, vol.
76
(pg.
222
-
229
)
Koehler
J J
,
The base rate fallacy reconsidered: Normative, descriptive and methodological challenges
Behavioral & Brain Sci.
,
1996
, vol.
19
(pg.
1
-
53
)
Koehler
J J
,
Why DNA likelihood ratios should account for error (even when a National Research Council report says they should not)
Jurimetrics J.
,
1997
, vol.
37
(pg.
425
-
437
)
Koehler
J J
,
Fingerprint error rates and proficiency tests: What they are and why they matter
Hastings Law J.
,
2008
, vol.
59
(pg.
1077
-
1098
)
Koehler
J J
,
Forensic science reform in the 21st century: A major conference, a blockbuster report, and reasons to be pessimistic
Law, Probability & Risk
,
2010
, vol.
9
(pg.
1
-
6
)
Koehler
J J
,
If the shoe fits, they might acquit: The value of shoeprint testimony
J. Empirical Legal Studies
,
2011
, vol.
8
(pg.
21
-
48
)
Koehler
J J
Chia
A
Lindsey
J S
,
The random match probability (RMP) in DNA evidence: Irrelevant and prejudicial?
Jurimetrics J.
,
1995
, vol.
35
(pg.
201
-
219
)
Kuo, M. (1988). California Association of Crime Laboratory Directors: DNA Committee Report #6. Orange County Sheriff Coroners Crime Laboratory
Kuo, M. (1990). California Association of Crime Laboratory Directors. DNA Committee – Results of Blind Trial #2. Orange County Sheriff Coroners Crime Laboratory
Lempert
R
,
Some caveats concerning DNA as criminal identification evidence: With thanks to the Reverend Bayes
Cardozo Law Rev.
,
1991
, vol.
13
(pg.
303
-
341
)
Lempert
R
,
After the DNA wars: Skirmishing with NRC II
Jurimetrics J.
,
1997
, vol.
37
(pg.
439
-
468
)
Meehl
P E
Rosen
A
,
Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores
Psychol. Bulletin
,
1955
, vol.
52
(pg.
194
-
216
)
Mnookin
J
,
Error rates for latent fingerprinting as a function of visual complexity and cognitive difficulty
,
2009
 
National Institute of Justice Award 2009-DN-BX-K225
National Academyof Sciences, National Research Council, Committee on DNA Technology in Forensic Science
DNA Technology in Forensic Science
,
1992
Washington D.C.
National Academies Press
National Academy of Sciences, National Research Council, Committee on DNA Forensic Science: An Update
The Evaluation of Forensic DNA Evidence
,
1996
Washington D.C
National Academies Press
National Academyof Sciences, National Research Council, Committee on Identifying the Needs of the Forensic Science Community
Strengthening Forensic Science in the United States: A Path Forward
,
2009
Washsington D.C
National Academies Press
Peterson
J L
Gaensslen
R E
Developing Criteria for Model External DNA Proficiency Testing – Final Report
,
2001
National Institute of Justice
 
National Institute of Justice, Grant No. 96-DN-VX-0001
Peterson
J L
Lin
G
Ho
M
Chen
Y
Gaensslen
R E
,
The feasibility of external blind DNA proficiency testing. I. Background and findings
J. Forensic Sci.
,
2003
, vol.
48
(pg.
21
-
31
)
Peterson
J L
Lin
G
Ho
M
Chen
Y
Gaensslen
R E
,
The feasibility of external blind DNA proficiency testing. II. Experience with actual blind tests
J. Forensic Sci.
,
2003
, vol.
48
(pg.
32
-
40
)
Rand
S
Schurenkamp
W
Brinkmann
B
,
The GEDNAP (German DNA Profiling Group) blind trial concept
Int. J. Legal Med.
,
2002
, vol.
116
(pg.
199
-
206
)
Rand
S
Schurenkamp
W
Hohoff
C
Brinkmann
B
,
The GEDNAP (German DNA Profiling Group) blind trial concept, Part II, trends and developments
Int. J. Legal Med.
,
2004
, vol.
118
(pg.
83
-
89
)
Risinger
D M
Saks
M J
Thompson
W C
Rosenthal
R
,
The Daubert/Kumho implications of observer effects in forensic science: Hidden problems of expectation and suggestion
California Law Rev.
,
2002
, vol.
90
(pg.
1
-
56
)
Scientific Working Group for Shoeprint and Tire Tread Evidence
,
“Standard for expressing conclusions of forensic footwear and tire impression examinations”
,
2009
 
Thompson
W C
,
Evaluating the admissibility of new genetic identification tests: Lessons from the “DNA War.”
J. Criminal Law & Criminol.
,
1993
, vol.
84
(pg.
22
-
104
)
Thompson
W C
,
Accepting lower standards: The National Research Council’s second report on forensic DNA evidence
Jurimetrics
,
1997
, vol.
37
(pg.
405
-
424
)
Thompson
W C
Taroni
F
Aitken
C G G
,
How the probability of a false positive affects the value of DNA evidence
J. Forensic Sci.
,
2003
, vol.
48
(pg.
47
-
54
)
Walsh
P S
Fildes
N
Louie
A S
Higuchi
R
,
Report of the blind trial of the Cetus AmpliType HLA DQ-alpha forensic deoxyribonucleic acid (DNA) amplification and typing kit
J. Forensic Sci.
,
1991
, vol.
36
(pg.
1551
-
1556
)