- An organization providing a recipient with de-identified data must have “actual knowledge” that such de-identified information could be re-identified.
- A class action lawsuit against Google and University of Chicago claims that the failure to adhere to HIPAA’s de-identification standards is prima facie negligence.
- Agreements should include language to protect against re-identification of individuals, and tools, policies and training used to help manage risks.
A recent opinion article published in STAT News explored whether potential litigation is looming surrounding the de-identified data exception in HIPAA. The authors of the article point out that “large volumes of data underpin the development of any AI effort,” which is why companies like Google, Samsung, AT&T and other holders of large volumes of consumer information (i.e., a “Big Data Holder”) are increasingly eager to partner with health care organizations which can provide them with an additional layer of data that is valuable for innovation and other “big data” projects. However, what happens when a health care organization or its HIPAA business associate (i.e., EHR vendors, HIEs, and the like) disclose to a Big Data Holder what they believe is patients’ de-identified information later becomes “re-identified” when combined with other consumer information that such Big Data Holder has already accumulated?
The article’s authors suggest that HIPAA does not require de-identified data to actually be re-identified before it is no longer considered de-identified, and that the “mere ability to use the information, in combination with other data, to identify individuals” would prevent patients’ PHI from qualifying under HIPAA safe-harbor de-identification standard. However, the requirements for de-identification of PHI laid out in 45 C.F.R. 164.514(b) and OCR’s own on-point Q&As do not support such a “mere ability” argument.
The HIPAA Privacy Rule specifically provides that “A covered entity may determine that health information is not individually identifiable health information” if the eighteen (18) identifiers listed in 45 CFR 164.514(b)(2)(i)(A)-(R) are completely removed and the covered entity does not have “actual knowledge” that the information could be used alone or in combination with the other information to identify an individual who is a subject of the information. In its Guidance Regarding Methods for De-identification of PHI in Accordance with HIPAA, OCR elaborates further on how it views the “actual knowledge” standard in its response to the following FAQs:
Q: What is “actual knowledge” that the remaining information could be used either alone or in combination with other information to identify an individual who is a subject of the information?
A: In the context of the Safe Harbor method, actual knowledge means clear and direct knowledge that the remaining information could be used, either alone or in combination with other information, to identify an individual who is a subject of the information. This means that a covered entity has actual knowledge if it concludes that the remaining information could be used to identify the individual. The covered entity, in other words, is aware that the information is not actually de-identified information.
Q: If a covered entity knows of specific studies about methods to re-identify health information or use de-identified health information alone or in combination with other information to identify an individual, does this necessarily mean a covered entity has actual knowledge under the Safe Harbor method?
A: No. Much has been written about the capabilities of researchers with certain analytic and quantitative capacities to combine information in particular ways to identify health information. [Ftnts 32,22,34,35]. A covered entity may be aware of studies about methods to identify remaining information or using de-identified information alone or in combination with other information to identify an individual. However, a covered entity’s mere knowledge of these studies and methods, by itself, does not mean it has “actual knowledge” that these methods would be used with the data it is disclosing. OCR does not expect a covered entity to presume such capacities of all potential recipients of de-identified data. This would not be consistent with the intent of the Safe Harbor method, which was to provide covered entities with a simple method to determine if the information is adequately de-identified
Therefore, a discloser of de-identified data would have to have actual knowledge that any remaining information contained in a de-identified data set could be used in combination with information that a recipient already has to re-identify an individual – thus, the existence of a “mere ability” alone is not enough.
That said, the general warning bell that the authors are ringing on health care organizations not taking more care to control how their de-identified data sets are being created or shared should not fall on deaf ears. It is important to understand that although the safe harbor de-identification standard permits certain remaining information to be retained (i.e., the patient’s State, initial 3 digits of a zip code for geographic areas of 20,000 or more individuals, and other data that is not expressly listed among the 17 identifiers that must be removed), any identifier that is retained in a de-identified data set could increase the opportunity that such de-identified data set can then be used in combination with other identifiers to reidentify a patient. Disclosers of de-identified data sets must also be especially careful with the “catch all identifier” which prohibits a de-identified data set from including “any other unique identifying number, characteristic, or code” (except as would allow only the covered entity to re-identify the individual). This would include prohibiting the inclusions of identifiers such as a person’s initials, medical record or other numbers that have been “scrambled,” rare medical conditions or medical events (ex., patient who gave birth to sextuplets), and any other unique characteristics (ex., a person over the age of 100).
When drafting data-sharing agreements, including HIPAA business associate agreements, health care organizations should hold firm that specific protective language and restrictions be included. For example, a potential recipient should be required to notify the original data source if the de-identified data it is being provided can be re-identified based on how it intends to use and manipulate it. Agreements should also most certainly expressly prohibit the recipient of such de-identified data sets from re-identifying any individual. Where a HIPAA business associate vendor might be creating de-identified data sets for its own internal “management and administration,” as would be permitted under its HIPAA BAA, the further disclosure or even sale of such de-identified data sets to third parties should be either controlled, curtailed or outright prohibited under the terms of the controlling agreement. In addition to contractual language, there are other steps that an organization can take to try and mitigate some of the risks that the creation and sharing of de-identified data sets can pose through implementing effective tools, policies and training, some which are offered through Legal HIE’s Membership Subscription option.
Although HIPAA does not contain a private right of action, as the STAT News article points out, creative litigators are using HIPAA as a “standard of care” when bringing invasion of privacy-type cases against wrong doers, as was the case in the class action lawsuit brought against Google and University of Chicago for “negligent” sharing of de-identified data. Therefore, affected organizations would be well-advised to put additional guardrails up around how their de-identified information is being created and shared to better manage these arising risks.