Complex Queries


This session shows how database manipulations can provide you with measures of episodes of illness.  Measures of episodes of illness are important in evaluating performance of clinicians and healthcare  programs.  This page is based on a recent publication.  For details see:  Alemi F, Walters SR.  A Mathematical Theory for Identifying and Measuring Severity of Episodes of Care. Qual Manag Health Care. 2006 April/June;15(2):72-82.

This page is based on United States patent application 10/054,706 filed on 1/24/2002 by George Mason University on  behalf of Farrokh Alemi, Ph.D. and Valentin Prudius.  We grant  permission to individual scientists within university, Federal and State  governments settings to use these procedures free of licensing fees for  the purpose of evaluating its effectiveness.  Permission is also  granted to all students using this procedure as part of an educational class.

Episodes of Illness

This paper provides a mathematical model for identifying episodes of  care and measuring the severity of an episode. Measures of episodes of  care, in general, and the approach proposed here, in particular, can be  used to set capitation rates or to profile clinicians’ performance.  Numerous approaches to measuring episodes of care exist.  Examples  include Prospective Risk Adjustment, Ambulatory Visit Groups, Disease  Staging, Products of Ambulatory Care, Ambulatory Diagnosis Groups and  Ambulatory Care Groups. In addition to broad approaches to measurement  of episodes of illness, many have developed disease specific episodes of  care. There have been a number of review articles about episodes of  care. Given the wide range of approaches to measurement of episodes of  care, the natural question to ask is why we need yet another approach?  Three reasons have motivated us to seek a new approach to measuring  episodes of care. First, we provide a mathematical model for measuring  episodes of care. No other approach does so. Most existing approaches to  measuring episodes of care do not describe the internal procedures used  for measuring severity or identifying episodes of care. Some commercial  approaches consider such information as business secrets that should not  be disclosed. Even when they do describe the internal mechanism of the  approach, all rely on heuristics that make clinical sense but do not  provide a mathematical theory for the relation between the variables  used in constructing episodes of care. Thus, researchers face a black  box -- the content of which they know little about or is based on  heuristics that they cannot easily modify and reapply. In the absence of  a theory, it is difficult to learn from one study how better measures  can be constructed. Each study and each approach exists on its own  merits and fails to contribute to the other. Then researchers tend to  compete on claims of accuracy rather than to build on each other's work.  As a result, while many approaches exist, cumulative progress in the  field – where in one investigator builds on another person’s approach,  has been limited. The mathematical theory proposed in this paper allows  us to accumulate information and improve our understanding of how  severity of episodes of care should be measured. Future researchers can  change the theory to arrive at more accurate predictions. As new  insights are found, the theory is modified and knowledge is accumulated.  Second, our proposed approach does not classify diagnoses into clusters  of diseases before identifying episodes of care. All existing approaches  are built on the concept of classifying possible diagnoses into a few  clusters and then findings rules for creating episodes for these  clusters. Schneeweiss and colleagues reported that 92 diagnosis clusters  make up 86 percent of all ambulatory visits. Others have expanded this  set to 125, with varying levels of severity and different time periods,  during which the diagnoses in the cluster will belong to the same  episode.  We propose an approach that does not attempt to reduce  the large set of possible diagnoses into a smaller set of clusters.   Reduction into smaller set of diseases, by definition, gives up  important nuances in order to have a manageable set of diagnoses. For  example, infections often follow wounds and therefore may be considered  part of the same episode. But an otitis media, even though an infection  of the ear, could not possibly be part of an episode of trauma to the  leg. Defining all infections as one cluster of diagnoses forces  investigators to ignore important differences that might exist between  types of infections. In our approach, all operations are defined on  individual diagnoses without need to pre-set these diagnoses into broad  clusters. Sometimes classification of diseases into clusters is based on  the etiology of the disease, leading in our view to counter intuitive  classifications. An episode of trauma may include fracture to the knee  as well as fracture of the leg – even though the knee and leg are  different problems. Congestive heart failure may be part of episode of  myocardial infarction even though one involves the heart the other the  lung. Two very dissimilar diagnoses may be part of the same episode,  even though these diagnoses do not describe the same illness. Third, the  objective of our proposed approach, contrary to some existing approaches  is not to create homogenous resource use episodes. Thus, not all  follow-up visits are part of the same episode even though they may all  be short visits and therefore have similar resource use.  In our  approach, the nature of the diagnosis, not the intensity of visits, is  the basis of classifying visits into episodes. For example, follow-up  visit for myocardial infarction is part of an MI episode and a follow-up  visit for trauma is part of trauma episode. Intensity-based measures  cannot be used for evaluating whether the numbers of visits are  appropriate. In essence, they are fee schedules, except that these fee  schedules are based on groups of visits or diagnoses and not single  visit diagnosis. We propose a relation-based episode classification  system that remedies this important shortcoming. It can be used to judge  appropriateness of number of visits.

Proposed Method of Measuring Severity of Episodes

We start our description of the methods with a few definitions. An  episode of care is a group of diagnoses on the same patient that  describes the course of a given illness. Note that this definition does  not depend on the nature of services delivered, the doctor delivering  services, or the site of services. Nor, contrary to others, does this  definition assume that services are temporally contiguous. Thus, it  allows for episodes to be overlapping; for example, a patient may have  an acute exacerbation of their chronic diabetes and experience an  episode of upper respiratory infection. Episodes are characterized by  what we call "an anchor diagnosis." This is the diagnosis that gives its  name to the episode. Episodes have starting (sometimes referred to as  trigger diagnosis) and stopping points that may be different from the  anchor diagnosis. Episodes are characterized by a rate of progression, a  peak severity during the course of episode, and morbidity and mortality  outcomes. One episode, for example, may have a rapid onset, progress to  a very serious condition, and then lead to death. Another episode may  have a slow onset and never become serious.

Selecting diagnoses that are part of same episode

Defining an episode begins with selecting diagnoses that are part of  the same episode. Imagine that a patient has had a series of diagnosis D1,   D2, ... Dm at times T1, T2,   through Tm. Whether two diagnoses are part of the same episode depends   on the nature of the two diagnoses and the time between them. Two   diagnoses that are similar or related in nature should be part of the   same episode unless they occur at significantly different times. If we   define Pia as the probability that the diagnosis “i” and   diagnosis “a” belong to the same episode, then the theory suggests that:

Pia= function {Tia,   Sia}

Where the similarity between the diagnosis “i” and diagnosis “a” is Sia and number of days between diagnosis “i” and diagnosis “a” is Tia  and calculated as:

Tia= Ta – Ti  and T ia > 0

Note that the probability of being part of the same episode, Pia,   should be directly related to similarity of two diagnoses, Sia,   and inversely related to, Tia, the time between the two   diagnoses. A specific mathematical function that preserves these two   relationships is:

Pia= S ia / (1+βT ia)

In the above equation,  α and β are constants. When a patient presents with several diagnoses, then the probability that any two of the diagnoses   may belong to an episode is calculated using the above formula. Later, these pair-wise probabilities of belonging to the same episode are used   to classify diagnosis into groups -- using one of many widely available  classification methods. For a specific example of a classification   algorithm see Appendix at end of this paper.

Severity of an episode

Diagnoses differ in terms of their severity. We show severity of  diagnosis “i” as Sevi and calculate the overall severity of   an episode by the following mathematical formula:

Overall severity of episode = 1-   п i  (1 - Sevi )

There are many different mathematical formulas for aggregating  severity of individual diagnosis to severity of an episode. The most  common approach is to add or average the severity scores for each  diagnosis. Adding scores is not appropriate, as episodes with few severe  diagnoses would be scored lower than episodes with many non-severe  diagnoses. Averaging is also not appropriate, as patients who have two  diagnoses, one severe and the other not, will be rated lower than  patients with just the severe diagnosis. Instead of adding or averaging  the scores, we prefer using the above Multiplicative model. For example,  if a patient has two diagnoses, one with severity score 0.9 and another  with severity score 0.5, then the overall severity of the episode is  calculated as: Overall severity for the patient = 1 -(1-0.9)*(1-0.5) =  0.95 Compared to the adding or the averaging formula, the multiplicative  formula has several advantages: The influence of severe diagnoses on the  overall score are not diluted by non-severe diagnoses and merely  increasing the number of diagnoses will not necessarily result in high  overall severity scores.   ¤

Evaluation of the Algorithm

We created a measure for severity of episodes of illness for  developmentally delayed children who were enrolled in the Medicaid  program of one Southeastern State. Developmentally delayed children use  health services extensively. To reduce computational difficulties and  without loss of generality, we randomly sampled 565 patients among the  3250 patients in the database. The data included both in-patient and  outpatient Medicaid payments for the patient. The in-patient portion  included both the health professionals billing and the institution's  bills. On average, the State paid $9,296 per patient per year. The  standard error of the cost was $2,238, reflecting large variation in  cost of care across patients. Cost ranged from low of $29 (reflecting  patients enrolled for portion of the year) to $884,967 per year.  Estimating the time between two diagnoses, Tia, is easy and   can be read directly from the database by taking the absolute value of   the difference in dates of the two diagnoses. Estimating the similarity   of the two diagnoses, Sia, was more difficult. A surrogate   measure of similarity of two diagnoses is the number of times the two   diagnoses co-occur within a specific time frame. The implicit assumption   is that complications and related problems tend to occur in clusters. We   calculated a score proportional to the likelihood that two diagnoses   belong to the same episode by the formula provided earlier. We used this   score to classify diagnoses into episodes in such a manner that   diagnoses within one episode were more similar than diagnoses in   different episodes. Appendix A gives a detailed example of how diagnoses   were classified. The mean number of episodes was 147 (standard error =   320). Patients differed considerably in the number of episodes they had.   We calculated the severity of each diagnosis as the average amount paid   for the diagnosis. Severity and costs are not always related, especially   when patients die before expensive services can be delivered. But in our   database no patient passed away. Therefore, cost may have been a   reasonable surrogate measure of severity.   ¤

Results

To test the accuracy of our measures of episodes of care, we  regressed cost of care on severity of the episode, number of episodes  and interaction between number of episodes and severity of episodes. We  measured cost of care by the amount the State paid for each encounter.  Since patients' eligibility for Medicaid changes frequently, the amount  paid by the State is only an approximate measure of total cost of care  of the patient. To have one estimate of severity for a patient, we  averaged the severity scores for each patient across all their episodes  during the year. The averaged severity score ranged from 0.01 to 0.27.  The mean was 0.03 (standard error = 0.001). Table 1 summarizes  regression results.

        Coefficients P-value
 Intercept   -7297  0.003
 Average severity of episodes   -33.58  0.000
 Number  of episodes   444971 0
Interaction between number of episodes & severity of  episodes      756 0
 Table 1: Regression of "Amount paid by the State" on  severity and number of episodes
Number  of observations   = 565,     Adjusted R Squared   = 53.11% 

The dependent variable was "the amount paid by the State". All three  independent variables -- "the average severity of the episodes", "the  number of episodes of the patient" and “the interaction between the  severity and the number of episodes” -- were statistically significant  predictors of the dependent variable at alpha levels lower than 0.001.  The R-Squared adjusted by degrees of freedom was 53%.

Discussion

Data showed that episodes of care can be constructed from encounter  databases. Furthermore, the proposed measure of episode of care  explained a large percentage of variance in cost of care. The magnitude  of the percent of variance explained by the measures reported here is of  special interest. Many measures of severity and case mix report R2   values less than 10%.   Because our approach explains a large   percent of the variance, our confidence in the validity of our measure   of severity of episodes is increased. One may expect the performance of   the approach developed in this paper to deteriorate when parameters of   the model are estimated from one database and applied to another   unrelated database. Nevertheless the magnitude of percent of variations   in objective data explained by our approach is so high that we are   hopeful that even with drops in performance, our approach will remain   relatively more accurate than many existing approaches. The approach   proposed here can be used to construct episodes of care for specific   diseases. Thus, if one investigator is interested in episodes for   diabetes and another is interested in episodes of cancer, both can use   the algorithm proposed here by pre-selecting patients with a particular   disease. The most appealing part of the proposed approach is the ease   with which the approach can be integrated with existing databases. The   proposed mathematical model works on any administrative database, which   has information on date of visit and diagnosis. Any person familiar with   database operations can implement it. In addition, electronic medical   record companies can use the algorithm proposed here to embed methods of   analyzing performance of clinicians within their electronic record   systems. ¤

Appendix A: An Algorithm for Classifying Severity of Episodes of Illness

1) Start with a Table of patient unique identification number,  diagnosis and time of diagnosis. Here is a small example database:
 
Time (dd/mm/yy) Patient ID Diagnosis
    01/01/01 1001 A
    12/01/01 1001 B
    22/01/01 1002 A
    12/01/01 1002 B
    22/01/01 1003 C
    02/02/01 1001 D
    02/02/01 1002 B
    12/02/01 1003 D
    13/02/01 1003 B
    01/05/01 1002 C
Table 2:  Starting Data

2) Create a query identifying for any pair of diagnoses the number of  unique patients for whom the two diagnoses co-occur within 30 days. Note  that the co-occurrence of diagnosis "a" and "b" does not depend on the  order of which one comes first. Here is how the query will look like for  the above example data: 

First diagnosis Second diagnosis Co-occurrences     First diagnosis Second diagnosis Co-occurrences
    A     A     2           C     A     0
    A     B     2           C     B     1
    A     C     0           C     C     2
    A     D     1           C     D     1
    B     A     2           D     A     1
    B     B     2           D     B     2
    B     C     1           D     C     1
    B     D     2           D     D     2

3) For each patient conduct the following analysis: a) For the  patient, when the same diagnosis occurs at two different time periods,  rename the diagnoses into unique names -- usually a combination of the  name and date of diagnosis. For example patient 1002 has the following  data when renamed:

   Time (dd/mm/yy)     Patient ID     Diagnoses
    12/01/01     1002     B1201
    22/01/01     1002     A
    13/02/01     1002     B1302
    01/05/01     1002     C
Table 4:  Renaming same diagnosis at  two times
 

4) For the patient, measure the absolute value of the length of time  between any pair of diagnoses for the patient, refer to this as time  between any two diagnoses. For example for patient 1002 the time between  two different diagnoses will be:

First diagnosis Second diagnosis Time     First diagnosis Second diagnosis Time
     A      B1201      10         B1302     A       21
     A      B1302      21             B1302      B1201       31
     A      C      38            B1302      C      17
    B1201      A     10        C      A       38
     B1201      B1302     31            C      B1201       48
     B1201      C     48            C     B1302      17
 Table 5:  Time between diagnoses

5) For the patient, look up the similarity of any pair of different  diagnoses they have from step "2" and divide this by absolute value of  the time between the two diagnoses, from step "b". Refer to this as the  score. For example for the patient 1002 the results will be:

   First diagnosis     Second diagnosis       Time     First diagnosis  Second diagnosis  Time
  A     B1201 .20            B1302      A .10
     A      B1302  .10             B1302      B1201 .06
     A      C .0            B1302     C   .06
    B1201      A  .20         C      A 0
     B1201      B1302    .06        C     B1201  .02
     B1201      C     .02              C      B1302 .06
 Table 6:  Calculation of the Score

6) For the patient, standardized the score so that it ranges between  1 and zero by subtracting the minimum value from each score and dividing  the results by the difference of maximum and minimum score. Refer to  this as standardized score. For the patient 1002 the standardized score  is as follows:

 First diagnosis Second diagnosis  Time       First diagnosis  Second diagnosis Time
 A  B1201 1      B1302  A  .50
  A  B1302  .48    B1302  B1201  .30
  A  C   .00        B1302  C   .30
     B1201  A  1.0    C  A  .00
 B1201  B1302  .32    C  B1201  .10
 B1201  C 0.1    C  B1302  .30
 Table 7:  Standardized scores

7) Classify different diagnoses into episodes by using the  standardized score. The following is one classification procedure that  could be used:

  1. Combine the two diagnoses with maximum standardized score into   one episode if the value of the standardized score is higher than a   pre-set cutoff -- usually 0.5.
  2. Create a new diagnosis to represent the two diagnoses that were  combined into an episode. Calculate the standardized score for this  new diagnosis by averaging the standardized score of its two  components.
  3. Exclude the diagnoses that have already been combined into new  diagnoses from further analysis and repeat steps starting from step  1. For example, the data for case 1002 will follow these steps:
  4.  
    4       A      B1201      B1302      C
     A            1.0      .48      .00
     B1201      1.0        .32      .10
     B1302      .50      .32             .30
     C      .00      .10      .30       
 Table 8:  Classify diagnoses into  episodes
  1. A new diagnosis is created named AB1201 and standardized scores   for the new diagnosis are calculated as the average of its   component:
    4       A      B1201 AB1201
     B1302      .50      .32 (.5+.32)/2
     C      .00      .10 (.00+.10)/2
 Table 9:  Calculation for a new  diagnosis
  1. The diagnosis already combined into an episode are excluded from   further analysis and the steps are repeated and a new maximum of   0.41 is found.
  2.  
        B1302      C      AB1201
     B1302             .30      .41
     C      .30             .05
 Table 10:  Finding a new maximum
  1. The new maximum is not higher than the cutoff of 0.5. Therefore,   no other diagnoses are combined into new episodes.

The result of the calculation for patient 1002 was three episodes:

  • The combination of diagnosis A and diagnosis B on 12/01/01.
  • Diagnosis B on 13/02/01 by itself.
  • Diagnosis C by itself. Note that diagnosis B on 13/02/01 was not  combined with diagnosis B on 12/01/01 even though both are the same  diagnosis.

Analyze Data 

Do Until Correct Policy.  If you do not do well in these assignments, you can resubmit corrected version of the assignment for a grade of B.    You can do so as often as you like. 

Ask Us Policy.  You can complete these assignments in class, where the teacher is available to answer your question. 

No Time Pressure.  You can also request more time for completing these assignments.

Video accuracy:  To help you solve these problems we provide you with video tapes of us doing the same.  Please note that the problems solved in the video differ slightly from the data we have asked you to solve, therefore some differences will exists between your answers and the answers in the videos.

Import data from the following files into four tables.  From ptid.xls to Patients, from claims.xls to Claims, from icd.xls to ICD9, from cpt.xls to CPT table.  See how to import data

  1. For each person, find the description of the diagnoses with maximum cost.  See it done
  2. Calculate the number of times a pair of diagnoses co-occur in the same person.  See it done.
  3. Calculate for each person, the average time between two consecutive diagnoses.  See it done. 
  4. Calculate the episodes of illness implied in the claims table.

Please bring your work to class or email screen captures to the instructor.

Presentations

There are several presentations for this lecture:

  1. Listen to lecture on episodes of illnesssee SWF version.

  2. See Power Point slides for the lecture

  3. See video on how to calculate similarity of two diagnoses

  4. Walk through the algorithm for measuring episodes of illness (no sound).  See SWF version.

  5. To help you better understand how to create some complex queries, see video showing how to create query for the following question: For each patient find diagnoses that are the most common (occur the maximum number of times in that patient), and display their ICD9 descriptions.

Narrated slides and video require use of Flash.  SWF version is best suited for non-explorer browsers. 

Recently Asked Questions

Ask  a  question  and  we  will  answer  it  within  the  next  48  hours.  If   you have no questions, please review the answer to the questions asked by others:
 

Question:Question 2, the way you did the query makes no sense at all to me. Can you explain without the code what is actually going on. If I follow correctly, I think the question is asking how many times does a diagnosis co-occur with another diagnosis for a patient. Why would you make a copy of the claims table and just compare a duplicate of that. The question makes it seem like you want to look at each patient and see how many times any two diagnoses show up in one patient. The result you get makes no sense to me because there is no diagnosis code 42 there is only a patient ID 42 and a Claim ID 42. And take the first result line for example. PatientID 42 never had 128 claims to begin and 913 isn't a diagnosis code either. Maybe I just don't get the question or something, but I need more explanation on this.   Answer:  In this question you need two copies of the claims table to find all possible pairs of diagnoses. You group by these pairs (and patient) and get one row per pair per patient, you can add attribute with count. The additional condition is used to avoid pairing a claim with itself.  This question was asked on 10/22/2008 9:10:04 AM and answered on 10/23/2008 12:46:08 AM.

Question:For question one, on the see it done video your query does not answer question one. It only finds the maximum amount of a bill for each patient. I thought the query was suppose to show the max amount of bill for a each patient as well as tell what diagnosis went along with that max amount bill. How would you do this?   Answer:  This is true. The video shows only part of the answer. There are several ways to create this query. One possibility of to first calculate maximum values, as shown in the video, and then match them against claims in the second step.  This question was asked on 10/22/2008 8:44:23 AM and answered on 10/23/2008 12:41:01 AM.

Question:When I run question 3, I can't get it to run at all. The progress bar at the bottom says it is finished running, but my results don't pop up.  Answer:  Access 2007 is slow. The same happens on my computer - you need to wait and it will show results.  This question was asked on 10/21/2008 3:39:55 PM and answered on 10/23/2008 12:38:25 AM.

Question:When I did the first query on the last question called "RepeatedCases" I got more cases than yours showed when you ran the query. I saw you click on Description from each table but I don't know if that had anything to do with it. I don't know why I am getting a list of 1055 repeated cases.   Answer:  Please note that the video is using a different data set than the assignment. You have probably done the analysis correctly. This clarification was left by Farrokh Alemi, Ph.D.  This question was asked on 10/8/2008 10:02:32 PM and answered on 10/10/2008 9:54:12 AM.

Question:The third question down on the page that I am looking at for Complex "Querries" asks about answering Questions 4 & 5 and the answere is "No these are optional"   Answer:  Ok I see your point. This was optional in previous version of the course and this question remains on the web from previous semesters. I guess to be consistent I have to accept that answering question 4 is optional and of course there is no more a question 5. I will revise the page so that the date of the questions and answers are put in and so that other students do not make the same mistake. But for the time being you can assume that it is optional. Please alert Vikas to this note too.   This question was asked on 11/20/2006 1:05:06 PM and answered on 11/20/2006 2:51:28 PM.

Question:Is this information correct that we do not have to do Questions 4 & 5? Question 4 is very complex and difficult. I do not see a 5th question if there is one. Has something been changed since the question was asked?   Answer:  Unfortunately I cannot find where it says that you do not need to answer questions 4 and 5. It should be removed and at a minimum you should make an attempt to answer them.  This question was asked on 11/13/2006 4:17:26 PM and answered on 11/15/2006 11:33:59 AM.

Question:Why did we use the Min function in the video provided to calculate the time between two diagnoses? Also shouldn't we use time between two diagnoses within 30 days? If yes, how would you do that?  Answer:  We use the minimum function because we want the time between two consecutive events, clearly that would be the smaller value for time between any two events.  This question was asked on 10/30/2006 9:24:34 AM and answered on 11/1/2006 8:13:43 AM.

Question:Do we have to answer questions 4 and 5?  Answer:  No these are optional.  This question was asked on 10/27/2005 10:01:54 PM and answered on 10/27/2005 10:07:36 PM.

Question:Are there any audio presentations associated with this lecture?  Answer:  I expect to put something together for this lecture within a few days  This question was asked on 10/23/2005 8:33:40 PM and answered on 10/24/2005 8:03:31 PM.

Question:in Access i have two tables, without any common field, Both table contains same fields i.e. 1)fund name(contain duplicates), 2)a/c name(contain duplicates) 3)new price(may contain duplicates). But second table (updated table)contains more & new records. Now i want to find only the new records after comparing both the tables as i dont know which record is knew. i used unmatched query but it use only one field to link. How can i find new records.?  Answer:  You can do this in several ways. You can use the criteria field and the condition NOT to state that the values in the two tables should not be equal, you can also select through one to many joins, where the content of the larger table is entered always and values of the smaller tables are entered only if they match the larger table. Then select for all null values to see records in one table but not the other.  This question was asked on 5/26/2005 2:27:53 PM and answered on 5/28/2005 2:12:20 PM.

Question:My Expression Builder seems to function differently. Question #4 on dates--The example does not include an expression. I can't get past that point.   Answer:  Search within the functions of your Access database for insturction on how to calcuate the difference between two dates  This question was asked on 3/31/2005 7:26:08 AM and answered on 3/31/2005 1:16:34 PM.

Suggested Improvements

Comment:This lecture is a bit complicated. We need more guidance on how to create a query to calculate episodes of illness. Please provide a video to do so.  This comment was left on 10/30/2006 9:22:13 AM.

Comment:The lecture was interesting but complicated. The answers to questions asked may need to be updated to apply to the current semester to eliminate conflicting answers and confusion, such as whether Questions 4 & 5 need to be answered.  This comment was left on 11/20/2006 1:07:18 PM.

Comment:Very complicated lecture. The guide for question 3 was unclear. It would help if the speaker were louder and the screen wasn't so blurry, maybe zoom in some.  This comment was left on 10/21/2008 3:41:00 PM.

Comment:the step by step instructions helped   This comment was left on 11/26/2008 5:49:10 PM.

Comment:The video on the episodes of illness needed to be reviewed a few times to catch the process.  This comment was left on 12/10/2008 1:06:12 PM.

 

More

  1. Brailer DJ, Kroch EA. Member risk adjustment for ambulatory episodes of care. Health Care Manag Sci 1999 Jul; 2 (3):125-36.
  2. Miller LG. Provider profiling: advancing to episodes of care.  Physician Exec 1995 Oct; 21 (10): 40-1.
  3. Hornbrook MC, Hurtado AV, Johnson RE. Health care episodes:  Definitions, measurement and use. Medical Care Review, 1985, 42(2):  163-218.
  4. Goldfield N, Averill R, Eisenhandler J, Hughes JS, Muldoon J,  Steinbeck B, Bagadia F. The prospective risk adjustment system. J  Ambulatory Care Manage 1999 Apr; 22 (2): 41-52.
  5. Schneider KC, Lichtenstein JL, Freeman JL, Newbold RC, Fetter RB,  Gottlieb L, Leaf P, Portlock CS. "Ambulatory visit groups: An  outpatient classification system." Journal of Ambulatory Care  Management, 1988, 11(3): 1-12.
  6. Rosen AK, Houchens RL, Gibson TB, Mayer-Oakes A. Developing  episodes of care for adult asthma patients: a cautionary tale. Am J  Med Qual 1998 Spring;13(1):25-35.
  7. Tenan PM, Fillmore HH, Caress B, et al. PACs: Classifying  ambulatory care patients and services for clinical and financial  management. Journal of Ambulatory Care Management 1988, 11 (3),  36-53.
  8. Weiner JP, Starfield BH, Steinwachs DM, Mumford LM. A case mix  system for application to ambulatory care quality assurance and  utilization review. Presented at the American Medical Review  Research Center's Annual Research Symposium, Washington 1988,
  9. Starfield B, Weiner J, Mumford L, Steinwachs D. Ambulatory care  groups: a categorization of diagnoses for research and management.  Health Services Research 1991 Apr;26(1):53-74.
  10. Schulman KA, Yabroff KR, Kong J, Gold KF, Rubenstein LE, Epstein  AJ, Glick H. A claims data approach to defining an episode of care.  Health Serv Res 1999 Jun; 34 (2): 603-21.
  11. Mehta SS, Suzuki S, Glick HA, Schulman KA. Determining an  episode of care using claims data. Diabetic foot ulcer. Diabetes  Care 1999 Jul;22(7):1110-5.
  12. Rosen AK, Mayer-Oakes A . Developing a tool for analyzing  medical care utilization of adult asthma patients in indemnity and  managed care plans: can an episodes of care framework be used?Am J  Med Qual 1998 Winter;13 (4): 203-12.
  13. Moscovice I. A method for analyzing resource use in ambulatory  care settings. Medical Care 1977, XV (12): 1024-1044.
  14. Gold M. Effects of hospital based primary care setting on  internists treatment of primary care episodes. Health Services  Research 1981, 16 (4): 383-405.
  15. Rosen AK, Mayer-Oakes A. Episodes of care: theoretical  frameworks versus current operational realities. Jt Comm J Qual  Improv 1999 Mar; 25 (3): 111-28.
  16. Schneeweiss R, Rosenblatt RA, Cherkin DC, Kirkwood CR, Hart G.  Diagnostic clusters: A new tool for analyzing the content of  ambulatory medical records. Medical Care 1983, XXI (1): 105-122.
  17. Cave DG. Profiling physician practice patterns using diagnostic  episode clusters. Medical Care 1995, 33 (5): 463-486.
  18. Sala J, Riba MD, Manresa F. A way of obtaining isoresource  consumption care episodes. J Ambulatory Care Manage 1999 Apr;22 (2):  53-62.
  19. Stimson DH, Charles G, Rogerson CL. Ambulatory care  classification systems. Health Serv Res 1986 Feb;20(6 Pt 1):683-703
  20. Rogerson CL, Stimson DH, Simborg DW, Charles G. Classification  of ambulatory care using patient-based, time-oriented indexes. Med  Care 1985 Jun; 23(6): 780-8.
  21. Pineault R. The effect of prepaid practice on physicians  utilization behavior. Medical Care 1976; 14: 121-36.
  22. Young KM, Fisher CR. Medicare episodes of illness: a study of  hospital, skilled nursing facilities, and home health agency care.  Health Care Financing Review 1980; 2: 1-23.
  23. Morris SS, Cousens SN, Lanata CF, and Kirkwood BR. Diarrhea --  defining the episode. International Journal of Epidemiology, 1994,  23 (3): 617-623.
  24. Ettner SL, Frank RG, McGuire TG, Newhouse JP, Notman EH . Risk  adjustment of mental health and substance abuse payments. Inquiry  1998 Summer;35(2):223-39.
  25. Read the translation of this lecture into Arabic. 

An algorithm for classifying episodes

-- Episodes of Care Classification

-- Table and View Creation Script

-- Author: Farrokh Alemi, Samuel Walters

-- Last Update: 2/14/06

-- Format: ANSI SQL-92

-- For a database implementation please click here

-- ------------------------------------------- 

-- This script creates the four tables necessary to execute the Episodes of care algorithm.

-- This script only needs to be executed once to establish the tables. 

CREATE TABLE tblSimilarity(

FirstDiagnosis             varchar(50)          NOT NULL,

SecondDiagnosis      varchar(50)          NOT NULL,

Similarity           int                  NOT NULL); 

CREATE TABLE tblPatientDiagnosesGrouped(

RecordID             int           NOT NULL,

              PatientID            varchar(50)   NOT NULL,

              DiagnosisDateTime    datetime      NOT NULL,

              DiagnosisCode        varchar(250)  NOT NULL); 

CREATE TABLE tblPatientDiagnoses(

              RecordID             int IDENTITY(1,1)    NOT NULL,

              PatientID            varchar(50)          NOT NULL,

              DiagnosisDateTime    datetime             NOT NULL,

              DiagnosisCode        varchar(250)         NOT NULL); 

CREATE TABLE tblEpisodeIndex(

              EpisodeRecordID      int IDENTITY(1,1)    NOT NULL,

              EpisodeID            varchar(2500),

              PatientID            varchar(50),

              DiagnosisCode        varchar(50),

              DiagnosisDateTime    datetime);  

-- ----------------------------------------------------

-- Classify Episodes of Care                         --

-- Author: Farrokh Alemi, Samuel Walters             --

-- Last Modified: 2/14/06                            --

-- ----------------------------------------------------

-- ----------------------------------------------------

-- SECTION I - Diagnosis Record Grouping

-- Groups duplicate patient diagnoses by patient into single records to avoid processing

-- duplicates and improve transaction performance.

-- ---------------------------------------------------- 

-- Delete any grouped records from previous analyses 

DELETE FROM tblPatientDiagnosesGrouped; 

-- Insert new patient diagnoses records into grouped table,

-- grouping by patient, diagnosis code,

-- diagnosis date/time, and taking the last matching

-- RecordID for all diagnosis records in the grouped row. 

INSERT INTO tblPatientDiagnosesGrouped(

              PatientID,

              DiagnosisCode,

              DiagnosisDateTime,

              RecordID) 

SELECT        tblPatientDiagnoses.PatientID,

              tblPatientDiagnoses.DiagnosisCode,

              tblPatientDiagnoses.DiagnosisDateTime,

              Max(tblPatientDiagnoses.RecordID) 

FROM          tblPatientDiagnoses 

GROUP BY      tblPatientDiagnoses.PatientID,

              tblPatientDiagnoses.DiagnosisCode,

              tblPatientDiagnoses.DiagnosisDateTime 

ORDER BY      tblPatientDiagnoses.DiagnosisDateTime;

-- ----------------------------------------------------

-- SECTION II - Construct Similarity Index

-- Constructs a table containing the number of times

-- any diagnosis pair co-occurs for each patient. This

-- table is used to calculate the similarity score

-- for each diagnosis pair for each patient.

-- ----------------------------------------------------

-- Delete all Similarity table entries from previous

-- analyses.  

DELETE FROM tblSimilarity; 

-- Determine the number of times a pair of diagnoses

-- co-occur for any given patient, and insert those

-- counts into the similarity table with the

-- diagnosis pairs. 

INSERT INTO   tblSimilarity( FirstDiagnosis, SecondDiagnosis, Similarity) 

SELECT        FirstDiagnosis, SecondDiagnosis, Count(PatientID) 

FROM          (SELECT              tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

                           tblPatientDiagnosesGrouped2.DiagnosisCode AS SecondDiagnosis,

                           tblPatientDiagnosesGrouped1.PatientID 

              FROM          tblPatientDiagnosesGrouped as tblPatientDiagnosesGrouped1,

                           tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2

              WHERE         tblPatientDiagnosesGrouped2.PatientID = tblPatientDiagnosesGrouped1.PatientID AND  

-- To adjust the sensitivity of this algorithm, modify the value "30" on the next line or the maximum number of days between diagnosis pairs)

-- to allow more or less maximum days between diagnosis pairs.  If this number is edited, the same number must be modified in

-- Section III to account for the change.

CONVERT(int, tblPatientDiagnosesGrouped2.DiagnosisDateTime –

              tblPatientDiagnosesGrouped1.DiagnosisDateTime) <= 30 

              GROUP BY      tblPatientDiagnosesGrouped1.DiagnosisCode,

                           tblPatientDiagnosesGrouped2.DiagnosisCode,

                           tblPatientDiagnosesGrouped1.PatientID 

              HAVING        tblPatientDiagnosesGrouped2.DiagnosisCode >

tblPatientDiagnosesGrouped1.DiagnosisCode 

              ) As GroupedPerPatient 

GROUP BY      FirstDiagnosis,

SecondDiagnosis;

-- ----------------------------------------------------

-- SECTION III - Identify Episodes of Care

-- This query uses the previously constructed similarity table to

-- identify episodes of illness for all patients in grouped table. Paired diagnoses are

-- identified as those for whom the similarity of the diagnosis pair divided by the

-- elapsed time between the diagnoses is greater than or equal to the cutoff

-- value, defined as the maximum similarity value in tblSimilarity divided by the maximum timeframe within

-- which two diagnoses co-occur in the Similarity table (30 days unless altered above) plus one to avoid

-- division by zero in cases where two diagnoses  occur on the same date/time value. The resulting episode

-- records are inserted into the table tblEpisodeIndex.

-- ----------------------------------------------------- 

-- Delete any records remaining from previous analyses from the episode index table.

DELETE FROM tblEpisodeIndex; 

-- Insert an episode record for each diagnosis belonging to an episode for a patient. Episodes are uniquely

-- identified using EpisodeID. 

INSERT INTO tblEpisodeIndex (PatientID, DiagnosisCode, DiagnosisDateTime, EpisodeID) 

SELECT        PatientID, DiagnosisCode, DiagnosisDateTime, Min(Episode) AS EpisodeID 

FROM   (SELECT              TOP 100 PERCENT

                     tblPatientDiagnosesGrouped.PatientID,

                     tblPatientDiagnosesGrouped.DiagnosisCode,

                     tblPatientDiagnosesGrouped.DiagnosisDateTime,

                     FirstDiagnosis,

                     FirstDiagnosisDate,

                     SecondDiagnosis,

                     SecondDiagnosisDate,

                     (CASE

                            WHEN   SecondDiagnosis Is Null

                            THEN   tblPatientDiagnosesGrouped.PatientID +

                                  tblPatientDiagnosesGrouped.DiagnosisCode +

                                   CONVERT(varchar(50),
                                   CONVERT(int,tblPatientDiagnosesGrouped.DiagnosisDateTime))

                           ELSE   tblPatientDiagnosesGrouped.PatientID +FirstDiagnosis +

                                      CONVERT(varchar(50),CONVERT(int,FirstDiagnosisDate))

                     END) AS Episode 

       FROM          tblPatientDiagnosesGrouped                          

                     LEFT JOIN     (SELECT       TOP 100 PERCENT

                                   tblPatientDiagnosesGrouped1.PatientID,

                                  tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

                                  tblPatientDiagnosesGrouped1.DiagnosisDateTime AS FirstDiagnosisDate,                                  

                                  tblPatientDiagnosesGrouped2.DiagnosisCode as SecondDiagnosis,

                                  tblPatientDiagnosesGrouped2.DiagnosisDateTime as SecondDiagnosisDate,                            

                                  tblPatientDiagnosesGrouped1.RecordID,                                               

                                                                CONVERT       (Float,tblSimilarity.Similarity)/(1 +

CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime -              tblPatientDiagnosesGrouped1.DiagnosisDateTime)) AS Score       

FROM   (SELECT       CONVERT(float,Max(Similarity))/31 as CutoffValue

                                          FROM  tblSimilarity) as Cutoff,                                         

(tblSimilarity

INNER JOIN

tblPatientDiagnosesGrouped As tblPatientDiagnosesGrouped1

ON tblSimilarity.FirstDiagnosis = tblPatientDiagnosesGrouped1.DiagnosisCode)           

INNER JOIN

tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2 ON tblSimilarity.SecondDiagnosis = tblPatientDiagnosesGrouped2.DiagnosisCode AND tblPatientDiagnosesGrouped1.PatientID = tblPatientDiagnosesGrouped2.PatientID                                        

WHERE  tblPatientDiagnosesGrouped1.DiagnosisDateTime < tblPatientDiagnosesGrouped2.DiagnosisDateTime AND

tblPatientDiagnosesGrouped1.RecordID < tblPatientDiagnosesGrouped2.RecordID AND

CONVERT(float, tblSimilarity.Similarity)/(1 + CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime -   tblPatientDiagnosesGrouped1.DiagnosisDateTime)) > CutoffValue 

                                         ORDER BY      tblPatientDiagnosesGrouped1.DiagnosisDateTime DESC,                                                                    

                                                                    tblPatientDiagnosesGrouped2.DiagnosisCode DESC 

) As LinkedDiagnoses

 

ON tblPatientDiagnosesGrouped.DiagnosisDateTime = LinkedDiagnoses.SecondDiagnosisDate AND

       tblPatientDiagnosesGrouped.DiagnosisCode = LinkedDiagnoses.SecondDiagnosis AND

       tblPatientDiagnosesGrouped.PatientID = LinkedDiagnoses.PatientID                    

ORDER BY      tblPatientDiagnosesGrouped.PatientID,

tblPatientDiagnosesGrouped.DiagnosisDateTime

) as EpisodesUngrouped

GROUP BY      PatientID, DiagnosisCode, DiagnosisDateTime

ORDER BY      PatientID, Min(Episode);


This page is part of the course on Healthcare Databases, lecture on Complex Queries  This page was first prepared on January 2005 and last revised  on 11/27/2007.   Copyright protected.