Algorithmic Approaches to Match Degraded Land Impressions

Eric Hare, Heike Hofmann, Alicia Carriquiry
PapersLaw, Probability and Risk, mgx018,


Bullet matching is a process used to determine whether two bullets may have been fired from the same gun barrel. Historically, this has been a manual process performed by trained forensic examiners. Recent work however has shown that it is possible to add statistical validity and objectivity to the procedure. In this paper, we build upon the algorithms explored in Automatic Matching of Bullet Lands by formalizing and defining a set of features, computed on pairs of bullet lands, which can be used in machine learning models to assess the probability of a match. We then use these features to perform an analysis of the two Hamby bullet sets (Set 252 and Set 44), to assess the presence of microscope operator effects in scanning. We also take some first steps to address the issue of degraded bullet lands, and provide a range of degradation at which the matching algorithm still performs well. Finally, we discuss generalizing land to land comparisons to full bullet comparisons as would be used for this procedure in a criminal justice situation.

Automatic Matching of Bullet Land Impressions

Eric Hare, Heike Hofmann, Alicia Carriquiry
PapersAnnals of Applied Statistics. doi: 10.1214/17-AOAS1080


In 2009, the National Academy of Sciences published a report questioning the scientific validity of many forensic methods including firearm examination. Firearm examination is a forensic tool used to help the court determine whether two bullets were fired from the same gun barrel. During the firing process, rifling, manufacturing defects, and impurities in the barrel create striation marks on the bullet. Identifying these striation markings in an attempt to match two bullets is one of the primary goals of firearm examination. We propose an automated framework for the analysis of the 3D surface measurements of bullet land impressions which transcribes the individual characteristics into a set of features that quantify their similarities. This makes identification of matches easier and allows for a quantification of both matches and matchability of barrels. The automatic matching routine we propose manages to (a) correctly identify land impressions (the surface between two bullet groove impressions) with too much damage to be suitable for comparison, and (b) correctly identify all 10,384 land-to-land matches of the James Hamby study.

Designing Modular Software: A Case Study in Introductory Statistics

Eric Hare, Andrea Kaplan
PapersJournal of Computational and Graphical Statistics (2017): 1-8.


intRo is a modern web-based application for performing basic data analysis and statistical routines. Leveraging the power of R and Shiny, intRo implements common statistical functions in a powerful and extensible modular structure, while remaining simple enough for the novice statistician. This simplicity lends itself to a natural presentation in an introductory statistics course as a substitute for other commonly used statistical software packages, such as Excel and JMP. intRo is currently deployed at the URL Within this paper, we introduce the application and explore the design decisions underlying intRo, as well as highlight some challenges and advantages of reactive programming.

Manipulation of Discrete Random Variables with discreteRV

Eric Hare, Andreas Buja, and Heike Hofmann
PapersR Journal, Volume 7, No. 1, June 2015


A prominent issue in statistics education is the sometimes large disparity between the theoretical and the computational coursework. discreteRV is an R package for manipulation of discrete random variables which uses clean and familiar syntax similar to the mathematical notation in introductory probability courses. The package offers functions that are simple enough for users with little experience with statistical programming, but has more advanced features which are suitable for a large number of more complex applications. In this paper, we introduce and motivate discreteRV, describe its functionality, and provide reproducible examples illustrating its use.

Biomathematical description of synthetic Peptide libraries.

Timo Sieber, Eric Hare, Heike Hofmann, and Martin Treppel
PapersPLOS One, Volume 10, No. 6, 2015


Libraries of randomised peptides displayed on phages or viral particles are essential tools in a wide spectrum of applications. However, there is only limited understanding of a library's fundamental dynamics and the influences of encoding schemes and sizes on their quality. Numeric properties of libraries, such as the expected number of different peptides and the library's coverage, have long been in use as measures of a library's quality. Here, we present a graphical framework of these measures together with a library's relative efficiency to help to describe libraries in enough detail for researchers to plan new experiments in a more informed manner. In particular, these values allow us to answer-in a probabilistic fashion-the question of whether a specific library does indeed contain one of the "best" possible peptides. The framework is implemented in a web-interface based on two packages, discreteRV and peptider, to the statistical software environment R. We further provide a user-friendly web-interface called PeLiCa (Peptide Library Calculator,, allowing scientists to plan and analyse their peptide libraries.

Can you buy a president? Politics after the Tillman Act

Andrea Kaplan, Eric Hare, and Heike Hofmann
PapersChance, Volume 27, No. 1, February 2014


Motivated by the 2010 Citizens United ruling and the subsequent birth of "Super PACs", we retrieve independent expenditures data from the Federal Elections Commission, in conjunction with presidential polling data to analyze the 2012 presidential campaign. Using R, and several packages, we scrape data from these sources and analyze them in order to highlight interesting trends in campaign spending. Furthermore, we correlate these trends in spending over time to the changes in the polls. Ultimately, there is not a lot of evidence to support a clear and direct relationship between increases in spending and changes in public support. However, our analysis does reinforce some commonly held views of Super PAC spending habits and the candidates' geographical areas of strength and weakness.

Putting Down Roots: A Graphical Exploration of Community Attachment

Andrea Kaplan, Eric Hare
PapersAccepted in Computational Statistics


In this paper, we explore the relationships that individuals have with their communities. This work was prepared as part of the ASA Data Expo ’13 sponsored by the Graphics Section and the Computing Section, using data provided by the Knight Foundation Soul of the Community survey. The Knight Foundation in cooperation with Gallup surveyed 43,000 people over three years in 26 communities across the United States with the intention of understanding the association between community attributes and the degree of attachment people feel towards their community. These include the different facets of both urban and rural communities, the impact of quality education, and the trend in the perceived economic conditions of a community over time. We begin by focusing on the choices made in producing the visualizations and technical aspects of how they were created. We will explain the development and use of web-based interactive graphics, including an overview of the R package shiny and the JavaScript library D3. Then we describe the stories about community attachment that unfolded from our analysis.