The promise of using “Big Data” from health care has been the focus of many in the health IT community, particularly from academia, as witnessed in the recent Health IT Connect conference. As we move from a paper legacy for healthcare data to a new electronic one, huge amounts of clinical data are being amassed by Electronic Health Records (EHR) systems deployed in doctor’s offices and hospitals.
And beyond that, mobile technology, new “smart” devices, social and other web services are enabling what has been described as a “tsunami of personal data that will dwarf all the data collected by clinicians.”
Is EHR data “big data”?
It is certainly tempting to think of EHR data as “big data.” But is this really the case?
Generally, “big data” refers to very large stores of data, often needing cloud hosting simply due to its size. Though new kinds of tools and visualizations are generally needed to extract meaningful insights from this data, the data is logically accessible as though it were all in one place. Consider Google search engine usage tracking. Or maps – availability of map data is now ubiquitous, and available from most any modern device (computers, iPads, smartphones, etc.). These are examples of “big data,” though other examples are numerous.
But what about health data?
To date, we are fairly early in the metamorphosis of health data from its paper legacy to a new electronic platform. Previously, health data was local to a clinical practice setting – every doctor kept their own papers charts (their local paper “database”), with recognition that these were incomplete records of a patient’s overall story. Every hospital kept their own charting system too.
The first generation of EHRs were by-and-large locally installed enterprise systems, meaning that the paper charts were transformed into electronic ones, but the overall distribution of health data across the landscape was not fundamentally changed. Each doctor (with an EHR), each hospital, each health care setting that kept records, had their own, separate and local databases. Just like before.
The next generation of EHRs might be considered as two-pronged. First, health information exchanges (HIEs) have been encouraged in order to try to connect these legacy data installations together, somehow. Though business models as to how to sustain these HIEs are still in the proof-of-market stage, the need is certainly there to try to connect these data silos together. Such need is encouraged by federal policy and investment, as well as by market need – many hospital systems, not waiting for regional or statewide HIEs to emerge, have already constructed their own “private HIE” hubs for community EHR-using physicians to connect through.
The second prong of this second-generation is web-based EHRs. A web-based approach to EHRs fundamentally changes the data-organization landscape – instead of local data stores housed in each practice setting, the web database amasses data from all users everywhere. Even though access and permissions are built to have the web-based EHR experience be similar to locally-housed data (you only interact with your own clinic’s data, even though the data is hosted), the hosted data behaves more like “big data.”
Can EHR data ever become “big data” like other data sources?
The emergence of HIEs and web-based EHR data certainly moves us closer to being able to access clinical across multiple clinical settings, geographies and other factors (of course, with HIPAA de-identification safeguards in place, like they are with locally-housed data).
Health care is a vast ecosystem, and different elements of it will be able to aggregate and analyze its data at different rates. Hospitals and integrated delivery systems (like Kaiser) will have access to sufficiently large data sets that very useful healthcare insights can be discovered. Much research has already taken place at Kaiser in this vein, however it is hard to know how unique the findings are to the particulars of the Kaiser delivery system, which might not be applicable outside their system.
Clinician EHR systems will likely lag behind that gathered by hospitals, because of the smaller size of those settings. Web-based EHRs will likely play a very large role in liberating data from these small, granular silos and allowing very useful healthcare research to take place from small-practice clinical observations.
Consumer-created, or device-generated data is another area where vast amounts of data will be collected by their various proprietary sources (web sites, Facebook apps, devices like glucometers, etc.). Since most of these companies are web-based, the ability to share and use their data is much like web-based EHRs – in aggregate, they can become part of the enormous amount of information which can be valuable at unprecedented levels in clinical research.
Big Data in healthcare will likely come from the aggregation of areas within the vast healthcare space. Each area of data – hospitals, doctor’s offices, consumers – will likely move to a “big data” environment at different rates. The kinds of insights available to clinical research – about diseases, health care delivery and inequalities, and personal help will all move forward at a pace we have never before experienced.