![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction to Evolution of DatabasesThis material is based in part on "Data structures: Data Bases and Data Base Management Systems" in Austin C. Information Systems for Health Services Management. Today computers deal with a great deal of data. They can do so because they are faster and better than before. But there is another reason why we can search and find what we want on our computers despite large amount of data being stored in them. Imagine if during a search we had to examine each data item to see if it is what we want. Then it would take a fraction of second to go through one item and over billions of item it may take hours and days. How is it that computers can search the entire world wide web of information and come back in a fraction of a second to provide us with the result? The answer is simple. Our ability to use data is improved when we introduce more structure in how data is stored. Then it is no longer necessary to go through all of the data to reach a specific item. The more the structure, the easier to go through large amount of the data. Think of a structure as an address to a datum. The more precise the address we have, the easier it is to find a datum even when we do not knock at each neighbors house. Likewise structure of the data tells the computer where to look for data and avoid unnecessary effort. Ever wondered why an Internet search can return the addresses of home pages that contain a word so quickly. The answer is because the addresses are stored under a list of words arranged alphabetically. When you enter a word, the computer jumps to the appropriate row of the Table without any processing of information in other rows. Thus, it is possible to peruse very large databases in a fraction of a second. In this section we introduce several approaches to putting structure into data. The three main types of databases are flat, relational and hierarchical models. We end the section with discussion of data-less databases, where only the structure of data is kept and no data. Flat DatabasesFlat data models have the least amount of structure. They typically take the form of one large Table, where the first row is the list of the variables and subsequent rows are data. Each case has a row of data. Many statisticians use flat models. Here is an example of a flat file for students in a class:
AdvantagesMost software include free access to flat data files. For a small number of cases, flat databases do a reasonably fast job. DisadvantagesFlat databases waste computer storage by requiring it to keep information on items that logically cannot be available. For example, if we are keeping information on zip codes, flat files require us to enter a missing information for zip codes in foreign countries. Hierarchical and structural databases avoid this problem by defining different tables for classes of countries in which zip code is not available. Thus flat files keep large sparse data full of missing information. When the size of database is large, search through the data takes a long time. Flat databases are not conducive to complicated search queries that divide the database further. For example, in a flat file about students it would be difficult to find all students that live in a certain zip code and have received grade of C. Such a query will require repeated pass through the data. First pass may identify all students who live in a zip code, second pass may identify all students who have grade of C and the third pass may find students in both groups. Such multiple passes through data are inefficient and take long time periods. Since every simple if statement (if zip code is equal to 22101 then ...) in a computer program takes a fraction of a second, efficient search methods are important for large databases. Relational DatabasesIn a relational data base, one stores a record with related fields as data. All items on the same record are said to be related to each other. Thus, one may store the information that patients have medical records in the following fashion:
This information in essence says that the patient ID number, name and medical record number belong to the same person. To effectively store both information items and the relationships among these items, relational data are kept in table formats. The first row of the table shows the name of the variables and subsequent rows are data. All items in the same row usually belong to the same case and are related. One column of the table is treated as the key to the table. Numbers or characters in this column corresponds to items in other Tables. Thus it is possible to move from one table to another. In a relational database, tables do not need to be of the same size but they all need to have a key column that connects them to each other and that uniquely defines the elements inside the table. ExampleHere is an example of a Table of grades for the example introduced under flat files:
This is an example of an additional table for contact information:
When a query is made, the relational database searches through its tables to find the answer. Often the answer involves information pooled together from different tables. For example, a query for names of people with grade of A, can be answered from the Table of Grades. The query for grades of people in zip code area 22101 must be answered from both Tables. Can you try to answer this question: What is the average final grade of the persons in zip code area 22101? Did you notice how you move from one table to another to get your answer. You use the key shared between the two tables, to find the relevant information in the other table. AdvantagesA relational database makes life difficult for the designer of the database but easy for the user of the database. It is more difficult for the designer because many possible relationships must be anticipated. It is easier for the user because data can be examined from many different perspectives. In addition, if Tables are appropriately defined it is possible to avoid having to enter missing information for variables that are not logically possible. This helps data entry and data processing speed. A relational database is also easy to modify because adding new concepts involves adding new Tables, not altering old ones. Hierarchical DatabasesHierarchical database models are one type of relational data models in which the relationship between any two adjacent item is similar to a father-child relationship. Hierarchical database models resemble a-cyclical tree structures (directional tree structures that do not include circular paths). ExampleThe file directory in your desk top is an example of a hierarchical database system. A folder may contains other folders which may contain other files, which contain data. AdvantagesIn hierarchical models, children inherit the relationships and characteristics of their fathers. An operation on the father affects the children; if you tell the computer to do an operation on a folder, the operation affects all the folders and files that it contains. For example, if you delete the top folder, you would delete all of the folders and files it contains. This feature saves time for the person maintaining the files. Sometimes, when the real world relationship that is being modeled is not hierarchical, it is difficult to fit relationships into a hierarchical database model. In these circumstances, an operation on the father should not be executed on the children. Distributed DatabasesMost databases physically reside in one place. They may be backed up to another place but the elements of the database are not maintained in different places. In a distributed database, data are kept in different settings and on different computers. One or several central computers maintain indexes to where the data are. Using the address of the data then computers can communicate and find the information needed. Distributed databases need not only addresses for where the data are but also need an audit trail of who has updated data or retrieved it. Audit data are needed in order to pinpoint errors in the system and in order to understand where confidentiality of the system breaks down. When a computer requests data from another, an audit trail is created by storing who sent data where and when. When this computer passes the data to another, the information needs to be updated in the original computer. As the number of computers receiving the data increases the task of auditing becomes more difficult. At some point (at least theoretically), it is necessary to cutoff the original computer from being updated about where the information has traveled. ExampleA good example of a distributed database is the World Wide Web pages. These pages of data are kept on a different computers, often referred to as Web servers. The address of each file is the location address you enter when you want to see the page. This location address is an index to the Web pages. Centralized computers keep the beginning of these addresses, called domains. Subsequent detailed addresses are kept at the Web servers. When searching a distributed database, two steps must occur. First a program, sometimes called crawler, must index the content of the databases, then another program, often called a search engine, would search the index for your request. When a match is found, the index is used to find the address of the information. This address is provided to the engine that assembles a list for you. When you click on the items identified by the search engine, you use the address to retrieve the data items it has found. Advantages and disadvantagesThe information system manager must make decisions on whether to use centralized or de-centralized databases based on a number of issues including the following:
Data-less DatabasesHistorically, the design of Information Systems, specially national information systems, has focused on creation of super-databases that contain all of the transactions about the patient, from which the clinicians can retrieve portion of the database to which they have legitimate access. These super-databases are expensive to construct and maintain, as they require 24 hour, seven days a week operation, a tight security, extensive and ongoing coordination of data elements across existing modern and legacy systems, as well as a cadre of trained personnel to maintain the database. Furthermore, the design of national registries and databases create perplexing problems with patients’ privacy and confidentiality of medical records. Around the world, local and State laws require that patients’ consent be sought before disclosing the content of a patient records to others. In the United States, the Congress is debating Federal endorsement of already existing State laws that limit the nature of what is an appropriate consent. To be meaningful, patients’ consent should be for release to a specific organization, for a specific purpose and governed by a specific time frame. National registries make a mockery of what is a meaningful consent. The patient is asked to consent to release of information to an intermediary who may disclose it to others, who may in turn release it to others. The patient never knows how and when the information is actually used; therefore the consent provided is not an informed release of medical information. For example, data from immunization registries may show in court custody cases without the current custodian’s consent. Few parents who agree to release information to a national or regional database have in mind that the data can be used at some time in the future to argue that they are not fit parents. There are also other problems with the current process of obtaining consent. In many States, open-ended consent is illegal. Local and State law require that consent be time dependent. Thus, national and regional registries face a practical operational problem. They can warehouse information about the patient for a specific time period, but when that time period is expired, they have no legal authority to continue to disclose the information to others. Finally, the problem of managing the data after release creates large and perhaps unrealistic problems. For example, to understand who is responsible for an illegal release of medical information, it is important to trace back the information to the source. Given that a series of agencies may have collected and released information to each other before the illegal release occurred, one has to track many pieces of information. Thus the person who creates a large medical database must not only create the database but also maintain an equally large database about patterns of release of information. The problems with cost of operation and difficulties with maintaining privacy and confidentiality of the patients has encouraged us to suggest an alternative approach. Components of a Data-less Information SystemA data-less information system contains no centralized data and relies on distributed databases. The Data-less Information System relies on rapid communication to construct the data at the point of need for the data. A data-less information system also does not require any additional hardware; instead it relies on the hardware of existing clinics and health institutions. There are three components to a data-less information system:
Using these three components, any clinic or government agency who has patients’ consent can rapidly pool and analyze information about the patient from other machines. When a patient shows at a clinic, consent is obtained, then the networked is pooled, and the necessary data are put together. When the data is reviewed, the information is erased, reducing any possibility of accidental disclosure or any need to manage data post release. When data is needed about a group of patients -- as when a policy analysis needs to be done -- it is difficult to obtain consent from the patients. Furthermore, group reports may be used in ways that could harm organizations that collect and store these data. For example, organizations may not wish to report where their customers and patients come from in fear that their competitors may use this information to gain advantage over them. Data-less Information Systems should be organized in a fashion that prohibits any reports that singles out an organization without the consent of that organization. When group reports are obtained across organizations and across a large number of patients, the database can remove patient and organizational information before disclosing the data. For example, if a policy maker wants to know how many people in his/her region are immunized on time, he can ask the system to sample new births from the birth registry, request data on the sample of cases identified, pool data on the cases, remove the identifiers, calculate the summary statistics needed and present the results for a one time use of the policy maker. The data is provided without consent if and only if the sample includes multiple organizations and patients. The key advantages of the Data-less Information Systems are:
An exampleWhen using the World Wide Web, the information is cached, meaning that the information is downloaded for temporary use and discarded afterwards. A browser has many characteristics of a data-less information system. It relies on large distributed data, it relies on communication devices to collect the data and it discards the data after use. If a browser is further developed to add some of the features described above, then it will become a true data-less information system. It should for example have decoders for accessing legacy systems. It should not allow copying of information. It should allow group information without consent but remove organization specific or patient specific identifiers. And, it should allow more analysis of the data. Under these circumstances, existing browsers become what we have in mind when we talk of data-less information systems. A data-less information system can radically improve the operation of United States immunization registries. Both the Center for Disease Control and Robert Wood Johnson Foundation have funded the operation of a number of immunization registries around the United States. These registries face a number of problems and limitations that are typical of traditional registries. A key problem is how to obtain patients’ consent. Another problem is how to maintain these operations after grant funding for them expires. The proposed data-less information registries solves both the cost and the privacy concerns. One national or multiple regional networks can be created to pool data on demand. What Do You Know?Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through writing. When you write you not only recall what you have written but also may need to make inferences about what you have read. Please complete the following assessment:
If you are taking the course online, submit your responses as a word document attached to an email to your instructor, otherwise bring your work to class. Keep a copy of all of your work till end of semester. PresentationsFollowing resources are available to assist you:
Narrated lectures and videos require use of Flash. Recently Asked Questions
Ask a question and we will answer it within the next 48 hours. If you
have no questions, please review the answer to the questions asked by others:
Question: How can i merge cells in a row without altering the other rows in a table. Answer: If your goal is to display specific row with merged cells, this can be done by specific queries as described in section 8. It is, however, not possible to permanently modify table to have different cells in different rows. In databases, each row in a table always have the same cells (fields). This question was asked on 9/7/2009 7:47:00 PM and answered on 9/8/2009 10:23:03 AM. Question: For question #2 am I using the data tables from the flat databases example, relational database example, or both? Also, am I telling you how I find out the answer to your question in my head or how I would type the code into Access to get my answer? Answer: You should use the data tables from the relational database. In this lecture you do not need to know how to write queries in Access yet, so it is sufficient to describe steps. This question was asked on 8/25/2008 8:53:06 AM and answered on 8/26/2008 10:42:34 PM. Question: Why can relational databases allow for missing data that is logically not possible? Answer: Because they do not force the collection of such data. Since data are collected in separate tables, only relevant data is collected and irrelevant and inappropriate data is not collected at all. In contrast, in flat files inapprorpriate data needs to be collected or marked as not available. This question was asked on 2/12/2006 5:29:14 PM and answered on 2/13/2006 4:30:31 PM. Question: what is an index adn a cluster? Answer: I do not see where we have used the word cluster, so I am not sure the context on which you are asking this question. Index is used often in the first lecture and it is not used to designate any specific technical meaning. It is intended to mean a pointer to content of another web page. This question was asked on 9/26/2005 10:31:59 AM and answered on 9/26/2005 8:27:52 PM. Question: ARe there currently any data-less information systems being utilized in any industry sector? If so, how long was the implementation phase? Answer: I am not aware of such system in practice, though with growth of HIPAA regulations more of these systems should be expected. Of course, Google is a good example of dataless system. This question was asked on 9/6/2005 5:24:52 PM and answered on 9/6/2005 9:02:34 PM. Question: Regarding question #3 from the "What do you know" section: I feel that the lecture didn't elaborate on this topic very much, other than to say that the Tables need to be appropriately defined. Answer: There is a video that shows how tables are organized and we will get into this topic at much length later, when we discuss entity relationship diagrams This question was asked on 9/6/2005 4:10:40 PM and answered on 9/6/2005 9:01:33 PM. Question: Is it possible to have a database that is a combination of two (ie, say a relational AND hierarchical Database)? Answer: No. The definition of one excludes the other. For example a hirearchhial database does not allow other types of relationships and therefore cannot be a relational database or a flat database. This question was asked on 9/5/2005 2:41:12 PM and answered on 9/5/2005 11:58:06 PM. Question: Would the concept of a patient's electronic medical record within a system such as the Kaiser Permanente of California be an example of a data-less system? Answer: No. These medical records acutally have data in them. The best example of a dataless system is your browser. When you do a google search the data is assembled for you. You do not have access to the data before then. This question was asked on 2/10/2005 9:28:12 AM and answered on 2/10/2005 2:54:46 PM. Suggested Changes
Add
your own suggestions or read below suggestions made by others regarding
how to improve this session: Comment: The lecture was very clear and understandable. Left on 2/12/2006 5:23:40 PM Comment: The lecture material and explanation of Table creation was clear. Practice in creating the tables was very helpful in learning how to create the tables. Left on 9/1/2006 10:15:27 PM Comment: Lesson was well organized and easy to understand. It would be easier to understand the issue of data redundancy for flat files vs relational databases if the examples used included say two sets of grades for one student. Left on 9/7/2006 11:13:14 AM Comment: The lecture was easy and informative. Left on 8/29/2006 5:33:36 PM Comment: The overview was understandable. Left on 2/13/2005 11:18:09 AM Comment: Presentation of topics is well organized, easy to understand, and without fluff. Left on 9/5/2007 2:53:16 PM Comment: test Left on 1/15/2005 1:36:51 PM Comment: So far, I like the class and the lecture was clear and easy to follow. Left on 8/26/2008 9:07:59 AM Comment: Enjoyed the lecture and the background material. Looking forward to improving my knowledge about databases through the course. Left on 9/2/2008 8:44:37 AM Comment: This chapter is well organized as it fulfills its objectvive of introducing the concept of databases in general. Left on 9/3/2008 8:58:37 PM More
This page is part of the course on Healthcare Databases, the lecture on Evolution of Databases. It was last edited on Saturday August 30, 2008 by Farrokh Alemi, Ph.D. © Copyright protected. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||