Meta-study finds only one patient safety indicator out of 21 meets the scientific criteria for being considered a true indicator of hospital safety.
Adverse events recorded in billing data that are used to gauge and rank the safety of hospitals are woefully inaccurate, according to Johns Hopkins researchers.
In a meta-study published in the journal Medical Care, only one patient safety indicator (PSI) out of 21 met the scientific criteria for being considered a true indicator of hospital safety, says the study's lead author, Bradford Winters, MD, associate professor of anesthesiology and critical care medicine at Johns Hopkins.
The potentially inaccurate measures evaluated in the meta-study are also used by several high-profit public rating systems, including U.S. News & World Report's Best Hospitals, Leapfrog's Hospital Safety Score, and the Centers for Medicare & Medicaid Services Star Ratings.
"These measures have the ability to misinform patients, misclassify hospitals, misapply financial data, and cause unwarranted reputational harm to hospitals," Winters said in remarks accompanying the study. "If the measures don't hold up to the latest science, then we need to re-evaluate whether we should be using them to compare hospitals."
Of the 21 PSI measures developed by the Agency for Healthcare Research and Quality and CMS, 16 had insufficient data and could not be evaluated for their validity. Five measures contained enough information to be considered for the analysis.
Only one measure—PSI 15, which measures accidental punctures or lacerations obtained during surgery—met the researchers' criteria to be considered valid.
Winters recently spoke with HealthLeaders Media about the meta-study's findings. The following transcript has been lightly edited.
HLM: Why is billing data used if it is so inaccurate?
Winters: A lot of it has to do with the fact that it is easy to obtain.
Coders already do the medical billing for hospitals, so these administrative databases exist because they are generated out of the billing process.
All the ICD-9 and ICD-10 codes wind up in these administrative databases and they're easy to query using software packages. Doing direct chart reviews as a secondary process is laborious. It takes time for coders who are often already working hard to get the billing work done.
Folks might say the coders are already reading through the charts to do the billing coding, so why not ask them to do the adverse event reporting process?
You could, but you'd have to have a guideline and a process for them and it would still add extra time.
Imagine if you were competing one database for billing and another database for adverse events. That would be very valuable, but it would take more time.
Consequently, medical chart reviews are only done as a small sample.
HLML Why is billing data so inaccurate?
Winters: It starts with unclear or incomplete documentation by the clinicians. That is part of it.
The coders may misinterpret things. The doctors or nurses may have in their minds clearly documented what happened but the coder may misinterpret it.
There is the potential to accidentally provide the wrong code by the coder. A lot of them have overlapping numbers and they can sound fairly similar. So there is the possibility of a transcription error and the wrong code is picked.
You may have applied the wrong code, or you may not apply all the codes that should be applied. A patient stay in the hospital, particularly a patient who is very sick and stays for a long time, can have a lot of codes.
Sometimes they get skipped over by accident and don't get in there. So an adverse event that was found in the chart didn't get coded at all in the administrative database or it got miscoded because it was misinterpreted.
HLM: Could you not say the same things about the accuracy of clinical data?
Winters: You could have the same problem. When you are pulling 100 or 200 charts to do this medical record review, the folks looking at it are looking with a lot of detail.
They're picking up the things that may have been missed or misinterpreted. But it is true that a medical chart review could miss things. If it is not complete, or the language in the chart is ambiguous, you could still make the same mistakes.
That being said, the medical chart is considered the gold standard. It's not perfect, but it is the standard by which we make the comparison.
HLM: Will billing data accuracy improve with the use of ICD-10?
Winters: I don't think we know. A lot of the increased granularity of ICD-10 has to do with improving the ability to epidemiologically track diseases.
We know that there was one study that we put in our meta-analysis that looked at the effect of documentation of "present on admissions "and it did seem to improve the validity of some of these measures.
Whether it improves them enough to reach the threshold that we argue should exist before you use these measures as tools to determine hospital reimbursements still remains to be determined.
ICD-9 didn't seem to work very well. We can't assume that ICD-10 is going to be a cure for this. We have to prove it, because there is lots of money on the table for hospitals.
If they are denied reimbursements on pay-for-performance schemes, they have to have a valid measure.
HLM: Will the move toward value-based reimbursements improve the accuracy of billing data to track safety?
Winters: It is unclear, but we need to invest in the research of healthcare delivery in these areas to make these determinations.
We can't assume these things are going to be true or untrue depending upon your optimistic or pessimistic point of view. We have to put them to the test.
HLM: Why was PSI 15 the only safety metric to meet your criteria for accuracy?
Winters: That is a good question.
The type of data that was available in the papers that we used for this meta-analysis doesn't allow us to answer this. My speculations is that maybe there is less ambiguity about what constitutes this particular PSI.
When it is documented in the chart as something that happened to the patient in the hospital, the documentation that tends to go with it by clinicians is fairly unambiguous. But that is supposition. We have no idea from the records we used.
HLM: Based on your findings, are these various hospital safety ratings useless?
Winters: They are potentially flawed. Where these PSI and HAC (hospital-acquired condition) measures are a component of these ratings scales, they are potentially introducing error into them. They are potentially inaccurate, but we didn't measure directly those ratings scales' accuracy.
I would be overstepping our results to say they're useless.
That being said, the validity of the PSI and HAC measures are not particularly good. If they are going to be a component of a larger measure that is going to be used to measure pay for performance that is concerning and needs to be addressed.
HLM: How would you fix this?
Winters: As we suggested, first of all, we think there needs to be transparency about how these measures, such as CMS's Star Ratings are developed.
Some measures may be good. Some may not be. It has to be openly and transparently debated as to which measures are valid sufficiently to be used for this kind of scheme.
Providers need to be at the table, insurers need to be at the table, as well as the government. It needs to be open and transparent. To simply say the measure is good enough is adequate.
We proposed the 80% framework based on threshold theory, which is a process that one determines whether a medical value such as a laboratory value is valid enough to be used for the decision-making process.
We propose a framework where that gets applied to the pay for performance process. That's arguable, but whatever these cut offs should be needs to be transparently discussed.
Maybe it should be 75% or 70% but there needs to be a transparent discussion of how this is going to be put together for pay for performance and reimbursement.
HLM: Is doing nothing an option?
Winters: If it turns out that ICD-10 or eventually ICD-11, which has specific components that are supposed to improve the ability to identify adverse events, improves the predictive value of these metrics and they all climb into the 80% range, it will be a non-issue.
But if there are no changes, people like me will continue to say that they aren't accurate enough to use for reimbursement.
John Commins is the news editor for HealthLeaders.