Preview

Real-World Data & Evidence

Advanced search

Electronic medical records as a source of real-world clinical data

https://doi.org/10.37489/2782-3784-myrwd-13

EDN: HMDMZY

Contents

Scroll to:

Abstract

Currently, information technologies are being actively introduced in the healthcare of the Russian Federation. The share of state and municipal medical organizations that have implemented various medical information systems increased from 3.9 % in 2007 to 91 % in 2021. One of the key tasks of informatization is the introduction of electronic medical records (EMRs), which accumulate large amounts of Real-World Data (RWD). Despite the importance of EHR as a source of RWD, they have a number of shortcomings, such as the decentralized nature of database management systems, unstructured information storage, etc. The article describes the sequential processes for collecting high-quality RWD based on EHR, including the use of artificial intelligence technologies, for the purposes of scientific research, the creation of decision support systems, statistical analysis, etc. The basis of the proposed methodology is the centralized collection of information from EMR in the so-called data lakes, where as much as possible of raw data on the patient is accumulated and subsequent extraction of data from unstructured records through natural language processing (NLP) models. The proposed technology, subject to continuous improvement, will provide a correct and comprehensive solution for the skilful understanding of any text from any medical record.

For citations:


Gusev A.V., Zingerman B.V., Tyufilin D.S., Zinchenko V.V. Electronic medical records as a source of real-world clinical data. Real-World Data & Evidence. 2022;2(2):8-20. (In Russ.) https://doi.org/10.37489/2782-3784-myrwd-13. EDN: HMDMZY

Introduction

Digital health is one of the largest and fastest growing technology markets [1]. The widespread introduction of electronic medical records (EMRs), laboratory and radiological information systems, personal medical devices and telemedicine technologies ensure the constant creation and accumulation of big data in healthcare, the volume of which is doubling every year [2]. In 2013, 153 exabytes, and in 2020 about 2314 exabytes of data were produced, which means a total growth rate of at least 48% annually. The global healthcare big data market is expected to reach $9.5 billion by 2023 [3].

The development of artificial intelligence (AI) and big data analysis technologies makes it possible to create new software products and services that are the basis for the digital transformation of both diagnostic and treatment processes in health organizations and the healthcare system as a whole. Moreover, the application of big data and AI is changing the existing directions in the work of the pharmaceutical industry, including research in Real World Data (RWD) [4].

EMR systems are one of the main sources of real world data [5]. Proper use of this source, including centralized collection of information from often decentralized healthcare information systems (HIS), data cleaning and preparation, AI-enabled information extraction and other technologies, allows to assess the prevalence of diseases and risk factors [6].

In this article, we will consider the main processes for extracting RWD from EMR and ensuring the quality of these processes.

General information about healthcare informatization in the Russian Federation

Research and development in the application of various information technologies in the healthcare of Russia began in the mid-60s of the last century. The creation of the first software tools was mainly conducted in the leading scientific schools, research institutes and medical universities of the former USSR. Initially, the created software was intended for the automated generation of statistical reports and accounting. Further, informatization gradually began to be introduced into the treatment and diagnostic process, starting with accounting for incoming patients, maintaining health records and information support [7].

The creation of a commercial HIS market happened in the Russian Federation at the turn of the late 1990s and early 2000s. Separate software for diagnostics and the first HIS appeared, which made it possible to keep EMR. By the mid-2000s, in general, in the practical healthcare environment, it was understood that information technologies could indeed become an effective tool for the development of healthcare. However, the level of their application in healthсare organizations (HO) was low [8].

Due to the lack of funding and regulation from the state, informatization projects, as a rule, were launched at the initiative of leaders who were interested in new technologies. Most often, the first computers appeared in the departments of statistics and accounting to automate management activities, and hardware and software were purchased at the own expense of the Moscow Region. The professional development of the first domestically developed HISs was carried out, in most cases, by small (20–30 people) private companies that worked on the order of a limited number of HOs. To a large extent, this work was the creation of poorly replicated custom systems focused on the specifics of the work of customers. [9]. Nevertheless, more and more new developer companies appeared on the market, and the peak of their number, according to the "Medical Information Technologies" online catalog of the Association for the Development of Medical Information Technologies [10], fell on 2012 (Fig. 1).

Fig 1. Dynamics of the number of companies developing HIS
of a healthcare organization (MIS MO) in the Russian Federation in 2007-2021 
according to the Association for the Development of Medical Information Technologies

According to the developers and users of HISs in 2003-2008, the most important incentive for the development of the industry should have been state regulation [11]. Responding to this industry challenge, in 2008 the Ministry of Health and Social Development of the Russian Federation began preparations for the launch of a federal project on large-scale informatization - the creation of a Unified State Health Information System (EGISZ), the actual start of which took place in 2011.

In 2011-2021, several stages and state programs were implemented in the healthcare of the Russian Federation, starting with Basic Informatization as part of the EGISZ in 2011-2012 and ending with the launch in 2019 of the “Creating a unified digital circuit in the healthcare sector on basis of the EGISZ” federal project. The constant development of the necessary infrastructure, the purchase and implementation of various software ensured an increase in the number of health organizations that have implemented HIS, including the maintenance of EMR. If in 2007 the share of such HOs was 3.9%, in 2009 - 10.6%, in 2011 - 15% [12], in 2012 it increased to 36.4%. For 2021, this indicator reached a value of 91% (Fig. 2).

Fig 2. Dynamics of the share of state and municipal healthcare organizations
in the Russian Federation that have implemented MIS MO

In 2017-2018 with the active participation of the Ministry of Health of the Russian Federation as a whole, the current system of legal and technical regulation was determined and approved. It is based on Article 91 "Information support in the field of healthcare" of the federal law No. 323-FZ, introduced by the federal law "On Amendments to Certain Legislative Acts of the Russian Federation on the Application of Information Technologies in the Field of Health" No. 242-FZ of July 29, 2017. [13].

Currently, all information support in the field of healthcare is divided into 2 large blocks: "Information systems in the field of healthcare" and "Other information systems". The first block includes software products created by order of state organizations. In accordance with the current legislation, they are classified according to 3 main levels (Fig. 3):

  • federal state information systems (GIS) in the field of healthcare, including the Unified State Health Information System (EGISZ) and the State Information System of Compulsory Medical Insurance (GIS OMS);
  • state information systems in the field of healthcare (GISZ) of the constituent entities of the Russian Federation;
  • institutional information systems represented by healthcare information systems of a health organization (MIS MO) and information systems of pharmaceutical organizations (IS FO).

Fig 3. Functional diagram of information systems in the field of healthcare
in the Russian Federation

All other software products intended for use in the healthcare sector, but developed and marketed by private companies, are referred to as the so-called "Other Information Systems".

Requirements for the structure, functions, procedure and timing of the exchange of information between information systems in the field of healthcare, including the EGISZ, the GIS of the constituent entities of the Russian Federation and the MIS MO, are determined by government decree No. 140 dated 09.02. which replaced Government Decree No. 555 issued in 2018. Requirements for other information systems, including the requirements for information protection and the procedure for connecting "Other IS" to the EGISZ and other information systems in the field of healthcare, are determined by government decree No. 447 dated April 12, 2018 "On the procedure for interaction between state and non-state information systems in the field of healthcare".

Today, the regulation of healthcare informatization includes more than 30 government decrees and orders and more than 40 orders of the Ministry of Health, while the process of its improving does not stop.

Electronic medical records: definition, prevalence, applicable law

The problem of defining and approving terminology in the field of EMR has existed in Russia since at least the beginning of the 2000s, when the first attempts were made to propose unified definitions, incl. using for this purpose the translation and adaptation of the released international standards in the field of digital health. In 2006, the Russian National Standard GOST R 52636-2006 introduced the term "Electronic case history" (Elektronnaya istoriya bolezni) [14], which meant any electronic medical record. The term has now fallen out of common usage, as "case history" has often been associated with the hospital stage. In 2008, GOST R ISO/TS 18308-2008 “Health Informatization. Requirements for the architecture of electronic medical records”, in which the term “elektronnaya medicinskaya karta” (here: EMR) was proposed, which is an incorrect translation of the term “electronic health record” (EHR), although it is the international term EHR that is closest to the term [15]. In 2009 a paper [16] presented an overview of the various variants of terms and formulated proposals for their definition.

In 2013, leading industry experts developed a set of terms and definitions for an electronic medical record, presented in the paper [17]. These developments formed the basis of the draft national standards, which included GOSTs “Electronic medical record. Basic principles, terms and definitions”, “Electronic medical record used in a healthcare organization” and “Integrated electronic medical record”. These documents were approved by the Expert Council of the Russian Ministry of Health on the use of ICT in healthcare on 10.10.2015. However, then a dispute arose among experts about how electronic document flow (EDF) and, in particular, EMR should be regulated - with the help of standards (voluntary application) or with the help of orders of the Ministry of Health (mandatory application). As a result, GOST projects were never approved.

Thus, today there is no regulatory approval of the term “Elektronnaya medicinskaya karta”. GOST R 52636–2006 is still the only valid document describing the processes of organizing electronic document management related to it.

In this regard, we use the paper [17], which includes the following concepts:

  • Personal'naya medicinskaya zapis' (PMZ, Personal Medical Record) – any recording relating to the health of a particular person and made by a specific person. PMZ is the primary structural unit of information about the health of the subject, characterized by a specific author responsible for the content of this record, a specific context and the moment the record was made.

Note 1. This definition is somewhat extended compared to GOST R 52636-2006 by health records that can be made by the patient himself or his proxies (for example, parents).

Note 2: Health-related information may be transmitted electronically directly from a medical device, but such a record must be confirmed by the person responsible for arranging the measurement made using this device.

  • Elektronnaya personal'naya medicinskaya zapis' (EPMZ, Electronic Personal Medical Record) – any personal medical record placed on an electronic medium. EPMZ is tied to a specific electronic storage and is characterized by a certain life cycle in this storage.
  • Elektronnaya medicinskaya karta (Electronic medical record, EMR) – a set of electronic personal medical records (EPMZ) relating to one person, collected, stored and used in a medical organization.

Note 1. Here the term EMR is an analogue of the international term Electronic Medical Record (EMR).

Note 2. The term EMR implies the integration of all information (all EPMZ) about a patient available in a given medical organization in electronic form. In this case, EPMZ within EMR can be additionally combined into groups related, for example, to a specific completed case of the disease (in outpatient practice) or to a specific hospitalization (in inpatient treatment). Some EPMZs may not fit into any of the groups and may not refer to any specific hospitalization or completed case.

  • Integrirovannaya elektronnaya medicinskaya karta (Integrated electronic medical record, EHR) – a set of electronic personal health records (EPMZ) relating to one person, collected, transferred and used by several medical organizations. The EPMZ included in the EHR can be stored both centrally and distributed (in various HOs). With distributed storage, access to individual EPMZ included in the EHR is carried out through a centralized index containing information about the storage location and access method to each EPMZ. EHR can be created by a HO group or by a health authority.

Note: Here the term EHR is an analogue of the international term Electronic Health Record (EHR).

EHR is a tool for integrating health data collected from various sources that can be used at various levels. Today, this term is most often used for regional (GIS of a constitutional entity of the Russian Federation) and federal (EGISZ) systems, but it can also be used for networks of clinics or departmental networks using various HIS [18].

  • Personal'naya elektronnaya medicinskaya karta (Personal Electronic Health Record, PHR) – a set of electronic personal health records (EPMZ) received from various sources and related to one person, who collects and manages them, and also determines access rights to them. PHR refers to documents of personal storage and can be stored by its subject on their own electronic media (personal computer, flash memory devices, etc.) or in specialized storages accessible via the information and telecommunication network Internet.

Note 1. Here the term PHR is an analogue of the international term Personal Health Record (PHR).

PHR provides the patient and his authorized representatives with the opportunity to enter information about their own health status, physiological parameters of their body and other information related to their own health. Maintaining PHR ensures greater adherence and involvement of the patient in the treatment process, is an effective means of maintaining a healthy lifestyle, increases a person's involvement in caring for their own health and adherence to treatment. To varying degrees, PHR today includes various classes of "personal accounts of patients" created at various levels from specific medical organizations to the federal service "My Health" on a single portal of public services.

The above terminology has become generally accepted and is widely used in various, including regulatory, documents. [19][20]. In accordance with [15] the primary and main goals of maintaining EMR are:

  • collection and storage of the maximum available amount of information about the health of a particular patient in electronic form;
  • prompt provision of access to this information to authorized medical workers, the patient himself and his authorized representatives in the most convenient and accessible form;
  • construction of the specialized electronic services on the basis of this information focused both on medical personnel and on the patient himself, and providing an increase in the safety and quality of medical care, as well as improving the quality of life and health of patients.

Thus, the concept of EMR is closely related to a set of tasks covering the documentation of the diagnosis and treatment processes of a particular patient using IT. It includes as well the processes of medical examination, maintaining a healthy lifestyle and any other information related to the health of a particular individual. The information collected in the EMR serves primarily to ensure the continuity and quality of care.

According to [19], maintenance of EMR in HIS of a healthcare organization (MIS MO) includes:

  • collection, systematization and processing of information about persons who are provided with medical care, as well as about persons in respect of whom medical examinations and evaluations are carried out;
  • appointment of diagnostic and laboratory tests, referrals to diagnostic and laboratory tests;
  • receiving and issuing the results of diagnostic and laboratory tests, medical reports and (or) links to images from archives of medical images;
  • registration of temporary incapacity;
  • conduction of individual programs of habilitation and rehabilitation;
  • formation of prescriptions for medicines and medical devices;
  • issuance of medical documents reflecting patient’s health (and their copies), certificates and extracts.

At the same time, the maintenance of EMR, according to [15], involves a number of secondary goals that can be provided in accordance with the requirements and capabilities of specific HOs, health authorities or providers of various digital health services. These include:

  • accounting of activities and automated construction of analytical and financial reporting of healthcare organizations on the basis of primary medical information received from EMR;
  • management of healthcare organizations or health care in the region, as well as planning and policy development in relation to healthcare organizations and health care in general;
  • control of the quality and relevancy of the treatment, legal confirmation of the treatment;
  • conducting scientific and clinical research based on the analysis of depersonalized data extracted from EMR, incl. using AI technologies;
  • use of depersonalized data from EMR for teaching students of medical specialties, doctors and patients, as well as for machine learning in order to create new AI products;
  • other functions defined by legislation related to ensuring public health and safety.

Thus, although the standard [15] provides for the use of EMR as a source of data from real clinical practice, it is important to emphasize that this is a secondary goal of maintaining EMR. It provides features and disadvantages of EMR as a source of RWD, which we will consider further.

It is important to emphasize that over the 15 years that have passed since the beginning of the discussion of the topic of EMR, not only has the volume of health-related information collected in electronic form increased significantly, but the structure of the sources of this information has become much more complicated. If earlier the main source of data in the EMR were medical records generated by health workers within a single HIS, now the following has been added:

  • data entered by patients themselves using telemedicine technologies and remote monitoring (information about well-being, condition, measurements of physiological parameters, medication, etc.);
  • data from various medical devices used by the patient at home;
  • data obtained from various commercial healthcare organizations (primarily clinical laboratories) and provided by the patient himself;
  • lifestyle data that can be obtained from various non-medical sources (social networks, mobile operators, retail chains, fitness centers, etc.).

These non-medical data are also the most valuable and promising resource for scientific and medical analysis, which requires their integration into a single EHR of the patient.

Electronic medical records as a source of real world data

The foreign literature presents many examples of the use of EMR as a source of RWD. For example, Hernandez-Boussard et al. (2019) determined whether EHR data are sufficient to form reliable clinical statements and make appropriate decisions within the framework of medical care for patients with cardiovascular diseases. Based on the analysis of the received 10,840 records, the authors showed that the accuracy of the results was 98.3% consistent with the data of previous randomized clinical trials (RCTs) [21].

A similar study was carried out by Kibbelaar et al. (2017) within the framework of the Dutch project HemoBase, the purpose of which was to enrich the results of RCTs with data from EHR and form clinical recommendations for patients with oncohematological pathology based on the analysis [22].

The research team of Moja et al. (2016) developed a decision support system for oncologists within the framework of the ONCO-CODES project based on the analysis of EHR data generated at the stage of primary health care. The authors managed to prove the effectiveness and safety of the developed system [23].

In a paper by Griffith et al. (2019) authors studied the possibility of using EHR data from patients with small cell lung cancer to predict patient recovery using the criteria of existing clinical guidelines. The authors found that such an approach can be justified when both clinical data and the results of objective investigations are used [24].

The use of EMR as a source of RWD, incl. from the point of view of machine learning and academic research, is criticized due to the following shortcomings:

  1. Poor quality and convenience of the EMR interface.
  2. Functions of reuse (copying) once entered data into new records used by developers [25].
  3. The decentralized nature of EMR systems, many of which are based on outdated technologies and use servers locally installed in the HO without the possibility of maintaining a single common database.
  4. Lack of unified reference information for coding records in electronic medical documents [26].
  5. Omissions of data and poor-quality filling of on-screen forms by users [27].
  6. About 80% of EMR information is unstructured, including those records stored in the form of ordinary text documents [28][29][30][31].

The reasons for the noted shortcomings are:

  1. Low level of interest of EMR developers in improving the quality of data and the convenience of the interface, because their revenue mainly depends on the size of the client base and the actual level of use of the EMR in the healthcare organization, and not on the quality of the information collected in the EMR.
  2. Lack of legislative and other motivation programs to improve the quality and completeness of the information generated in the EMR.
  3. High time spent on filling out detailed on-screen forms of electronic medical documents, which makes this method less popular among developers compared to using templates and free-filled fields.

However, despite the drawbacks noted, EMRs are one of the most important sources of RWD. With the correct use of EMR, you can get a huge amount of information aimed at addressing various challenges in the health care system. In order to reduce the risks caused by the current problems in the use of EMR, it is important to ensure correct data extraction.

Extracting data from electronic medical records: a general scheme

The implementation of a well-thought-out strategy, which includes a number of interrelated sequential stages of data processing, and its subsequent high-quality implementation is a key success factor in obtaining RWD from EMR (Fig. 4). Without such special training, information from the EMR will not be suitable for machine processing tasks, including the building of data sets for the purposes of research, the creation of artificial intelligence (AI) systems, statistical analysis, etc.

Fig 4. Scheme of generating data sets of RWD
from electronic medical record systems

Let us explore in more detail each of the processes.

Accumulation of medical records in EMR systems

The most important challenge of data accumulation in EMR is the significant reluctance of physicians to work with HIS. This problem exists all over the world. Moreover, the inconvenience of EMR interfaces and the increased burden on doctors due to the need to use EMR are among the main reasons for their emotional "burnout" [32].

It is no secret that even in healthcare organizations with a high level of HIS implementation, all patient records are kept in paper format, albeit with computer printouts. Of course, physicians have a reason for dissatisfaction: they need to make an entry both in the computer and in the paper note, look for information either in the computer or in the paper record. Such actions create inconvenience and increase the burden. Until recently, the main reason for such duplication was the lack of a legitimate status of electronic document flow (EDF). However, since February 2021, this problem has been resolved with the issuance of the order of the Ministry of Health No. 947n [20], allowing the use of EDF without duplication on paper and clarifying the main problems of such paperless workflow. However, the year that has passed since the entry of Order No. 947n into force has shown that the legal possibility of EDF in itself is not yet sufficient. This kind of document flow needs to be actively encouraged.

Today, the only incentive for EDF is the regulatory requirements that oblige certain medical documents to be submitted to the EGISZ. It's even included in the licensing requirements for healthcare organizations. Such stimulation leads to the fact that the EMR consists mainly of formally required documents: statistical coupons, discharge reports, registers of accounts and other documents that include a small amount of health data about the patient. Such contents of the EMR consisting of formally required documents significantly reduces the value of the EMR for analytics and research.

It is necessary to develop positive incentives and motivation programs to fill EMRs with precisely medical documents containing clinically valuable information. To do this, EMR should be useful to physicians in their daily activities. In particular, in [32] the following benefits are noted:

  • the ability to transfer data to colleagues electronically (this is required by 70% of physicians);
  • provide access to EMR from home (76%)
  • share examination results with patients (48%).

Taking into account the requirements of the legislation on the protection of personal data, the EMR management system must depersonalize the accumulated personal data of the patient and transfer it to the centralized data lake system. The depersonalization process must be carried out strictly in the operator's secure infrastructure (healthcare organization or departmental data center in the case of a centralized HIS). Depersonalization must be implemented according to uniform technical principles in all EMR systems. Such an approach will subsequently ensure the connection of various episodes of the patient's request for medical care, obtained from different healthcare organizations and EMR management systems, into a single integrated patient EHR.

Centralized collection of raw data from EHR to data lakes

The main task of the data lake is the centralized accumulation of any raw information on the patient, the so-called raw data. The more raw data accumulated, the better. It is very important that all episodes of a patient's seeking medical care are loaded into the data lake, including cases of outpatient and inpatient treatment for all reasons, data on ambulance calls, medical examination data, rehabilitation data, etc.

The value of the data lake will be much greater if, in addition to data from anonymized EMRs, it can be loaded with data from patients themselves, including information from social media accounts, data from wearable devices, background information about the patient’s living conditions, including characteristics of the place of residence, environment, data about the harmful and dangerous factors at the workplace, the characteristics of the health care system in the area of permanent residence of the patient, etc. (fig. 5).

Fig 5. The composition of the data
that needs to be collected in the data lake

From a technical standpoint, there are several key requirements for a data lake:

  1. Exclusion of any possibility of recreating patient identifiers from the received data, including last name, first name and patronymic, numbers of identification documents, personal insurance policy number, phone numbers, email addresses, etc.
  2. Reception of information from EMR management systems in the form of structured electronic medical records based on the HL7 CDA 3.0 standard.
  3. Automatic association of various episodes of the course of the patient's diseases, obtained from heterogeneous incompatible with each other sources of information.

Extracting data from unstructured records

Various technologies can be used to extract data from unstructured medical records, incl. natural language processing (NLP). From a technical point of view, the task of this stage is to analyze the maintained, incl. unstructured records in order to extract individual structured features from it. Various types of features are presented in Table 1.

Table 1. Types of features extracted from raw data

Feature

Example

Binary

smoking, taking antihypertensive drugs, etc.

Numerical

temperature, heart rate, blood pressure, height, weight, laboratory test values, etc.

Date

date of birth, date of event, date of death, etc.

Text

symptom, place of work, etc.

Reference code

ICD code, gender, etc.

To do this, machine learning models are being developed that can find predefined features in the text blocks received at the “input” and return structured information, which will then be written to the database and will be suitable for further processing (Fig. 6)

Fig 6. An example of extracting structured features
from a text entry using NLP models

Of course, it is nearly impossible to get a 100% correct and comprehensive solution that would understand any text and extract every feature. However, by constantly working on improving NLP models, developers can get a fairly powerful tool for processing and extracting data from almost any medical record.

Formation of a digital profile

By extracting features from all the records accumulated in the lake, it is possible to form a digital patient profile. Sometimes in literature, complexly collected data on a patient is called a digital twin of the patient [33], which, in our opinion, is not entirely correct, since digital twins should allow full-fledged modeling of object changes in various conditions, which is impossible without a complex mathematical model of a patient’s health.

The more varied is the data a patient has in their digital profile, the greater is its value in terms of conducting RWD research and AI research and development tasks. [34]. The composition of such data is shown in Fig. 7.

A mandatory task of this process is the format-logical control of each extracted feature. To do this, the limits of acceptable values for the corresponding unit of measure must be stored in the directory of the corresponding information system. In the event that NLP models have extracted some feature value that does not fit within the allowable limits, the information system should mark this value as incorrect in order to exclude further processing. It is not recommended to remove erroneous entries from the system database. They are necessary for the subsequent analysis of the reasons for the appearance of low-quality data, determining their prevalence in the context of various HOs or EMR management systems. Such an analysis can be of significant value for subsequent measures to improve the quality of EMRs.

Fig 7. The concept of a digital patient profile according to [34].

What is more, at this stage, a comprehensive interpretation of all the prepared data is carried out: checking for duplicates and inconsistencies in the data, calculating secondary features (for example, BMI from the extracted data on weight and height), identifying and fixing final information about risk factors, registered diseases, forecasts, etc.

Building datasets on demand

The generated digital patient profiles are fully stored in the database in a structured form suitable for queries, analytical processing and the formation of datasets based on certain criteria. The generated sets can be uploaded as machine-readable files (for example, in CSV format) and used for further analysis and processing, including in research, machine learning, etc.

Conclusion

For more than 10 years now, the Russian healthcare industry has been implementing a number of large state projects on digital transformation, which ensured the accumulation of EMR archives in healthcare organizations. The development of big data processing technologies, such as AI, make it possible to extract valuable clinical information from the accumulated EMR and use it both to create innovative products, such as clinical decision support systems, and to conduct real world data research.

Currently, EMRs are one of the most important sources of RWD. The implementation of a well-thought-out strategy, which includes a number of interrelated sequential stages of data processing, and its subsequent high-quality implementation, in turn, is a key success factor in obtaining RWD from EMR. We identified 5 key stages that allow obtaining high-quality datasets to achieve specific goals: 1) accumulation of medical records in EMR management systems, 2) centralized collection of depersonalized medical records from EMR in the data lake, 3) extracting features from unstructured medical documents, 4) forming a digital patient profile; 5) building of datasets on demand. Each of the presented stages contains a number of requirements and sequential processes for their implementation.

In order to ensure confidence in the developments and conclusions formed on the basis of the analysis of the RWD obtained from the EMR, it is necessary to ensure the quality of the implementation of all stages and processes for the formation of RWD sets.

ADDITIONAL INFORMATION

Acknowledgments. The authors are grateful to T.A. Goldina, Head of Routine Practice Data and Scientific Communication, JSC Sanofi Russia, for assistance in writing this article.

Conflict of interests: The authors declare no conflict of interest.

References

1. Digital Health Market Size By Technology, Telehealth, mHealth, Apps, Health Analytics, Digital Health System (EHR), By Component, Industry Analysis Report, Regional Outlook, Application Potential, Price Trends, Competitive Market Share & Forecast, 2020-2026. https://www.gminsights.com / industry-analysis / digital-health-market.

2. Harnessing the Power of Data in Health: Stanford Medicine 2017 Health Trends Report. https://med.stanford.edu / content / dam / sm / sm-news / documents / StanfordMedicineHealthTrendsWhitePaper2017. pdf.

3. 2020 Global Health Care Outlook. https://www2.deloitte.com / global / en / pages / life-sciences-and-healthcare / articles / global-health-caresector-outlook. html.

4. Гольдина Т. А., Колбин А. С., Белоусов Д.Ю., Боровская В.Г. Обзор исследований реальной клинической практики. Качественная клиническая практика. 2021; (1):56-63. https://doi.org / 10.37489 / 2588-0519-2021-1-56-63.

5. Kim HS, Lee S, & Kim JH. Real-world Evidence versus Randomized Controlled Trial: Clinical Research Based on Electronic Medical Records. Journal of Korean medical science. 2018;33 (34):e213. https://doi.org / 10.3346 / jkms. 2018.33. e213.

6. Ka-Shing Cheung. Application of Big Data analysis in gastrointestinal research / Ka-Shing Cheung, Wai K Leung, Wai-Kay Seto. 2019. https://pubmed.ncbi.nlm.nih.gov / 31293336 / .

7. Гаспарян С.А., Пашкина Е. С. Страницы истории информатизации здравоохранения России. — М., 2002. — 304 с.

8. Фролов С.В., Маковеев С. Н., Семёнова С. В., Фареа С.Г. Современные тенденции развития рынка медицинских информационных систем. Вестник Тамбовского государственного технического университета. 2010;16 (2):266-72.

9. Гусев А. В., Романов Ф.А., Дуданов И.П. Обзор медицинских информационных систем на отечественном рынке в 2005 году. Медицинский академический журнал. 2005;5 (3):Приложение 7. 72-84.

10. Ассоциация развития медицинских информационных технологий. https://www.armit.ru / .

11. Гусев А.В. Обзор рынка комплексных медицинских информационных систем. Врач и информационные технологии. 2009; (6):4-17.

12. Гусев А.В. Медицинские информационные системы: состояние, уровень использования и тенденции. Врач и информационные технологии. 2011; (3):6-14.

13. Бойко Е. Л. Цифровое здравоохранение. Вестник Росздравнадзора. 2018; (3):5-8.

14. ГОСТ Р 52636-2006 «Электронная история болезни. Общие положения». https://docs.cntd.ru / document / 1200048924.

15. ГОСТ Р ИСО / ТС 18308-2008 «Информатизация здоровья. Требования к архитектуре электронного учёта здоровья». https://docs.cntd.ru / document / 1200067414.

16. Емелин И.В., Зингерман Б. В., Лебедев Г.С. Проблемы определения ключевых терминов медицинской информатики. Информационно-измерительные и управляющие системы. 2009; (12):15-23.

17. Зингерман Б. В., Шкловский-Корди Н.Е. Электронная медицинская карта и принципы её организации. Врач и информационные технологии. 2013; (2):37-58.

18. Зингерман Б.В., Шкловский-Корди Н. Е. Интегрированная электронная медицинская карта: задачи и проблемы. Врач и информационные технологии. 2015; (1):24-34.

19. Приказ Министерства здравоохранения РФ от 24.12.2018 № 911н «Об утверждении требований к государственным информационным системам в сфере здравоохранения субъектов Российской Федерации, медицинским информационным системам медицинских организаций и информационным системам фармацевтических организаций».

20. Приказ Министерства здравоохранения РФ от 07.09.2020 № 947н «Об утверждении Порядка организации системы документооборота в сфере охраны здоровья в части ведения медицинской документации в форме электронных документов».

21. Hernandez-Boussard T, Monda KL, Crespo BC, & Riskin D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record-based studies. Journal of the American Medical Informatics Association: JAMIA. 2019;26 (11):1189-94. https://doi.org / 10.1093 / jamia / ocz119.

22. Kibbelaar RE, Oortgiesen BE, van der Wal-Oost AM, Boslooper K, Coebergh JW, Veeger NJGM, Joosten P, Storm H, van Roon EN, Hoogendoorn M. Bridging the gap between the randomised clinical trial world and the real world by combination of population-based registry and electronic health record data: A case study in haemato-oncology. Eur J Cancer. 2017 Nov;86:178-85. doi: 10.1016 / j. ejca. 2017.09.007. Epub 2017 Oct 6. PMID: 28992561. https://doi.org / 10.1016 / j.ejca.2017.09.007.

23. Moja L, Passardi A, Capobussi M, Banzi R, Ruggiero F, Kwag K, Liberati EG, Mangia M, Kunnamo I, Cinquini M, Vespignani R, Colamartini A, Di Iorio V, Massa I, González-Lorenzo M, Bertizzolo L, Nyberg P, Grimshaw J, Bonovas S, Nanni O. Implementing an evidence-based computerized decision support system linked to electronic health records to improve care for cancer patients: the ONCO-CODES study protocol for a randomized controlled trial. Implement Sci. 2016 Nov 25;11 (1):153. doi: 10.1186 / s13012-016-0514-3. PMID: 27884165; PMCID: PMC5123241. https://doi.org / 10.1186 / s13012-016-0514-3.

24. Griffith SD, Tucker M, Bowser B, Calkins G, Chang CJ, Guardino E, Khozin S, Kraut J, You P, Schrag D, Miksad RA. Generating Real-World Tumor Burden Endpoints from Electronic Health Record Data: Comparison of RECIST, Radiology-Anchored, and Clinician-Anchored Approaches for Abstracting Real-World Progression in Non-Small Cell Lung Cancer. Adv Ther. 2019 Aug;36 (8):2122-36. doi: 10.1007 / s12325-019-00970-1. Epub 2019 May 28. PMID: 31140124; PMCID: PMC6822856. https://doi.org / 10.1007 / s12325-019-00970-1.

25. Wang MD, Khanna R, and Najafi N. Characterizing the Source of Text in Electronic Health Record Progress Notes. JAMA Internal Medicine. 2017;177 (8):1212-3.

26. Topol EJ. Editor. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. New York: Basic Books, 2019.

27. Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O, Hedensted S, Spreafico R, Hafler DA and McKinney EF. From Big Data to Precision Medicine. Front. Med. 2019;6:34. doi: 10.3389 / fmed.2019.00034

28. Gilmore-Bykovskyi AL, Block LM, Walljasper L, Hill N, Gleason C, Shah MN. Unstructured clinical documentation reflecting cognitive and behavioral dysfunction: toward an EHR-based phenotype for cognitive impairment. J Am Med Inform Assoc. 2018 Sep 1;25 (9):1206-12. doi: 10.1093 / jamia / ocy070. PMID: 29947805; PMCID: PMC6118865.

29. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Health Inform. 2018 Sep;22 (5):1589-604. doi: 10.1109 / JBHI.2017.2767063. Epub 2017 Oct 27. PMID: 29989977; PMCID: PMC6043423.

30. Kong HJ. Managing Unstructured Big Data in Healthcare System. Healthcare informatics research. 2019;25 (1):1-2. https://doi.org / 10.4258 / hir. 2019.25.1.1.

31. Исследование полноты и структурированности данных медицинских информационных систем Санкт-Петербурга. https://actcognitive.org / proekty / city-healthcare?.

32. Kroth PJ, Morioka-Douglas N, Veres S, et al. Association of Electronic Health Record Design and Use Factors With Clinician Stress and Burnout. JAMA Netw Open. Published online August 16, 2019;2 (8):e199609. doi: 10.1001 / jamanetworkopen. 2019.9609

33. Björnsson B, Borrebaeck C, Elander N et al. Digital twins to personalize medicine. Genome Med. 2020;12:4. https://doi.org / 10.1186 / s13073-019-0701-3.

34. Voigt I, Inojosa H, Dillenseger A, Haase R, Akgün K, Ziemssen T. Digital Twins for Multiple Sclerosis. Front Immunol. 2021 May 3; 12:669811. doi: 10.3389 / fimmu. 2021.669811. PMID: 34012452;PMCID: PMC8128142.https://doi.org / 10.3389 / fimmu. 2021.669811.


About the Authors

A. V. Gusev
LLC «K-Sky»; Federal State Budgetary Institution «Central Research Institute for the Organization and Informatization of Healthcare» of the Ministry of Health of Russia
Russian Federation

Petrozavodsk

Moscow


Competing Interests:

Авторы заявляют об отсутствии конфликта интересов



B. V. Zingerman
LLC «TelePat»
Russian Federation

Moscow


Competing Interests:

Авторы заявляют об отсутствии конфликта интересов



D. S. Tyufilin
Federal State Budgetary Institution «Central Research Institute for the Organization and Informatization of Healthcare» of the Ministry of Health of Russia
Russian Federation

Moscow


Competing Interests:

Авторы заявляют об отсутствии конфликта интересов



V. V. Zinchenko
State Budgetary Institution of Healthcare of the City of Moscow «Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Department of Health of the City of Moscow»
Russian Federation

Moscow


Competing Interests:

Авторы заявляют об отсутствии конфликта интересов



Review

For citations:


Gusev A.V., Zingerman B.V., Tyufilin D.S., Zinchenko V.V. Electronic medical records as a source of real-world clinical data. Real-World Data & Evidence. 2022;2(2):8-20. (In Russ.) https://doi.org/10.37489/2782-3784-myrwd-13. EDN: HMDMZY

Views: 6438


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3784 (Online)