Assumption 5: Current Methods for Modifying Data to Protect Identity

One method to reduce the chance of re-identifying patient data is called “data perturbation.” The data perturbation process makes subtle changes to the original data, such as changing a cholesterol result from 160 to 161, to reduce the chance of re-identification while not substantially altering the value of those data for secondary purposes, including clinical research. While these methods are clever and very useful to help protect the identity of individuals today, there is no apparent way to extend that perturbation model to genomic signatures. There are only four letters in the genomic alphabet (A, T, C, and G). Changing them changes their clinical significance, so at our current level of genetic understanding, perturbation is untenable with genetic information. A deliberate shift of a single-base pair in the genetic sequence may have profound implications for both clinical care and research. While we know that some base pair variations, which manifest as Simple Nucleotide Polymorphisms (SNPs), are inconsequential, others are the proximate cause of serious diseases (e.g., thalassemias). A complete understanding of the interaction of all genetic components represented in the 3.2 billion base pairs is, at best, a daunting task, if not impossible for the foreseeable future. We cannot assume that absence of evidence of an effect of a base pair perturbation constitutes evidence of absence. Further, even if we could perturb a known set of “inert” base pairs, the residual, “active” set of base pairs in a genomic signature are more than sufficient to be self- identifying. Hence, data perturbation strategies fail in their primary intent with genomic signatures. Adoption of computational methods to improve privacy, such as homomorphic encryption, may eventually replace perturbation methods and allow comparisons of data without decrypting it.20 ? 21

< Prev   CONTENTS   Source   Next >