Achieving Clarity in Real-World Data Purchases
"Wait, What?"
Real-world data (RWD) purchases, often costing hundreds of thousands or even millions of dollars, are critical investment decisions in the healthcare industry. However, the decision-making process is often less transparent and far from perfect.
Have you ever looked at terabytes of RWD and thought, “What is all this telling me?”; “How am I supposed to use this data?”; or “How does this inform our strategy?” If so, you’re not alone. Many healthcare professionals have navigated countless market research studies, only to feel overwhelmed by the richness and complexity that RWD databases offer.
Too often, teams select a data partner as if checking a box: identify a need for data, check the preferred vendor list, and flip the switch. The unfortunate result is often realizing the data doesn’t meet their business needs or discovering better options only after the purchase.
If you’ve encountered any of these issues, we’d love to share some tips to help you make smart and informed data purchasing decisions. Getting the right dataset early on sets the stage for greater success. As the old saying goes, “Well begun is half done.”
Key Steps to Effective RWD Purchases
Understand the Types of RWE Data
Real-World Evidence (RWE) data comes in various forms, each offering unique insights. Here are some of the most common types:
Electronic Health Records (EHRs) is the systematized collection of patient and population electronically stored health information in a digital format 1, which is designed to eliminates the need to track down a patient’s previous paper medical records. While EHR includes structured data, the unique beauty of EHR resides in its unstructured data – raw medical notes and imaging records attract a good amount of interest in learning patterns and predicting selected outcomes among researchers.
However, EHR data comes with its challenges. The data can be heterogeneous and subject to incompleteness. For instance, one study 2 suggests that about half of their EHR patients did not have corresponding diagnoses in their pathology reports, and significant information incompleteness was observed in many other study variables. This highlights the need for careful consideration when using EHR data for research purposes.
Claims Data (Mx/Rx) refer to data generated during processing healthcare claims in health insurance plans or from practice management systems 3. Claims data tend to be well-populated and suffer less from incompleteness for two main reasons: they are structured in very standardized forms (i.e. 835/837 files), and they are collected and shared primarily for payment and reimbursement purposes.
While all healthcare claims are prepared for billing purposes, they can still be captured at different stages on their journey to insurance providers. For instance, when a doctor transmits a medical claim through a clearinghouse—before it reaches an insurance provider—these claims are collected and packaged into Open-Ended Claims Data. This data – even though with very good data recency – is not confined to any specific insurance provider and may not provide full visibility into a patient’s medical actions. In contrast, the claims from insurance providers that can be provided directly by health insurance companies are known as Closed-End Claims Data. This data covers nearly all interactions a patient has within the healthcare system that are billed to that insurance provider – which enables a comprehensive view of patient’s diagnostic and treatment actions 4.
Since 1982, Diagnosis-Related Groups (DRGs) have been used in the U.S. to determine how much Medicare pays hospitals for each patient admission 5. Payments are made at a fixed rate based on a patient’s DRG rather than the costs generated by the hospital for servicing that patient. While this system simplifies reimbursement and standardizes payments, it may also reduce incentives for hospitals to document every service provided to the patient. As a result, claims may be submitted with less granularity and visibility into the inpatient medical activities, potentially impacting the depth and detail of available data for inpatient analysis.
Chargemaster (CDM) Data is a comprehensive listing of items billable to a hospital patient or a patient’s health insurance provider 6. Inpatient activities are well-documented since the chargemaster typically serves as the starting point for negotiations with patients and health insurance providers about the amount of money to be paid to the hospital.
However, because the primary users of the chargemaster are hospitals, its coverage of outpatient activities are more confined to emergency rooms and outpatient departments of hospitals that also use the chargemaster in their billing practices. This limitation means that while inpatient data is thoroughly captured, outpatient activities may not be as extensively documented in the chargemaster data.
Registry Data is a collection of information systematically gathered and maintained about patients who share a common condition, treatment, or exposure. Patient registries include data on patient diagnosis, demographics, treatment, and outcomes and are now fundamental to the provision of successful global health systems 7.
While registry data is incredibly valuable for tracking specific conditions, treatments, and outcomes, it’s important to understand its limitations. Registries are focused on collecting detailed information relevant to the specific condition or treatment being studied, which means they may exclude unrelated medical history and activities. This targeted approach ensures that the data collected is highly relevant to the registry’s objectives but may not provide a comprehensive view of a patient’s overall medical history.
Laboratory Data are the test results obtained from laboratory tests – which analyze a patient’s sample of blood, urine, or body tissues to help diagnose disease or other conditions. Lab data proves valuable for a variety of use cases in healthcare analytics, from market sizing to monitoring disease progression to finding biomarker signals of patients eligible for certain therapeutics 8.
Lab data can provide a deep point-in-time clinical and biochemistry profile of a patient, offering insights into their current health status and underlying conditions. However, it does not contain information related to the treatments, procedures, or the overall management of patient’s condition.
Have questions or need further assistance? Our team is here to help! Contact us for personalized guidance on your RWE needs.
Stay tuned for the next part of our series, where we will delve deeper into how to choose the right RWD data that fits your needs.
Reference:
- Gunter TD, Terry NP (March 2005). “The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions”. Journal of Medical Internet Research. 7 (1): e3. doi:10.2196/jmir.7.1.e3. PMC 1550638. PMID 15829475.
- Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit Transl Bioinforma. 2010;2010:1.
- Liu, F., Panagiotakos, D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol 22, 287 (2022).
- Baser O, Samayoa G, Yapar N, Baser E, Mete F. Use of Open Claims vs Closed Claims in Health Outcomes Research. J Health Econ Outcomes Res. 2023 Sep 5;10(2):44-52. doi: 10.36469/001c.87538. PMID: 37692913; PMCID: PMC10484335.
- Centers for Medicare & Medicaid Services. Design and development of the Diagnosis Related Group (DRG)
- Layton, W., Lemmon, K., & Coustasse, A. (2020). Charge masters and the effects on hospitals. International Journal of Healthcare Management, 14(4), 933–939.
- Parums DV. Editorial: Registries and Population Databases in Clinical Research and Practice. Med Sci Monit. 2021 Jun 14;27:e933554. doi: 10.12659/MSM.933554. PMID: 34149048; PMCID: PMC8212698.
- “Real-World Data: What Is It and Why Does It Matter?” Datavant