This is work performed by Daniel Alcaide, unless otherwise mentioned. It is currently being written up.

Patient profiling and selection receive growing attention due to the large economic and societal value. The involvement of analytical methods that are able to handle the increasing amount of healthcare data can make this process more agile and facilitate, for example, patient recruitment in clinical trials. However, these processes are currently extremely labor-intensive. Here we present the application of STAD on intensive care unit patients.

A proof-of-principle interface can be found at https://dalcaide.shinyapps.io/diagnosis_explorer/. The code underlying this interface is available on github at https://github.com/vda-lab/ICD_diagnosis_explorer.

## What’s the distance between diagnoses?

The MIMIC-III critical care database (described in this paper) contains deidentified health data for almost 60,000 intensive care unit patients. A lot of information is available for each patient, including a list of diagnoses (encoded using ICD-9). For see if we can find substructures in this patient population, we need to calculate distances between them, and we’ll focus on the diagnoses to do this.

Unfortunately, there is an issue: no simple distance metric exists for lists of diagnoses for patients. This is because they are categorical data (i.e. each diagnosis is a category) that are put in a specific order (i.e. the first diagnosis in the list is the most important, and importance drops as you go down the list).

 Patient X Patient Y Order ICD Description Order ICD Description 1 99662 Infection and inflammatory reaction due to other vascular device, implant, and graft 1 4329 Unspecified intracranial hemorrhage 2 99591 Sepsis 2 4019 Unspecified essential hypertension 3 5990 Urinary tract infection, site not specified (5990) 3 99702 Iatrogenic cerebrovascular infarction or hemorrhage 4 4019 Unspecified essential hypertension 4 99591 Sepsis 5 5990 Urinary tract infection, site not specified 6 43491 Cerebral artery occlusion, unspecified with cerebral infarction

Codes 2, 3 and 4 of patient 1 correspond to codes 4, 5 and 2 of patient 2 (in that order). To make sure that not only presence/absence of a code is considered, but also its position, we can use the following distance metric:

where $c_{X}$ and $c_{Y}$ are the same code in patient X or Y, respectively.

To get to the distance between patients rather than between a single code in 2 patients, we sum these values:

## What does such network look like?

Using this metric, the STAD network for patients in the MIMIC-III database that suffer from a “pathological fracture of vertebrae” looks like this:

As usual, colours are assigned automatically using community detection.

A complete user interface to explore these networks can be found at https://dalcaide.shinyapps.io/diagnosis_explorer/.