Text cleaning: EMT-P is a natural language processing (NLP) system that processes raw chief complaint (CC) text entries to correct non-standard words in natural language entries. EMT-P was developed from an extensive analysis of emergency department (ED) CC text (Travers & Haas, 2003). Patterns of natural language were identified, and individual modules were developed to address the various patterns. UMLS: The EMT-P system must be used in conjunction with the medical terminology resources to the Unified Medical Language System® (UMLS®). EMT-P maps cleaned CC text to standardized terms from the UMLS Metathesaurus®. The UMLS is a compilation of all the major controlled vocabularies in healthcare. It is managed by the National Library of Medicine, which has encouraged its use in addressing vocabulary issues in healthcare information systems.
Modular approach: EMT-P is a modular natural language processing system, which preserves the raw data as much as possible by only cleaning each text entry until it matches a standard term. Strategies are applied from least (Round 1) to most (Round 3) aggressive. Although it is possible for users to choose which modules to run, EMT-P version 2.2 runs all modules by default. Examples of the processing addressed in each round are shown here.
EMT-P is based on the common repository architecture. At the core is a database management system (DBMS) which contains all the UMLS information. The database is used as a library of CC data and serves as a reference for ideal CCs. The controller (a Java program) calls each appropriate analysis/modification process (Perl scripts). Connections to the DBMS are made with a JDBC driver.
The main control module (RunEMTP.class) calls the appropriate Perl scripts during each round of input-cleaning. Each Perl script produces an output text file for the next script to accept as input. After each round, the database is queried with the cleaned data. If there is a match, the entry is kept in the final output file and no further processing is performed on that entry. If not, it is sent to the next round for more aggressive cleaning. Upon the final round of cleaning, all entries are kept in the final output file and noted as being a match or non-match.
Favorite Chief Complaints
EMT-P was designed to address the most common patterns in natural language chief complaint (CC) entries from the emergency department (ED). For example, in the ED, the abbreviation CP most often means chest pain (as opposed to cerebral palsy or cyclophosphamide/prednisone), so the acronym expansion module replaces CP with chest pain. There will always be some natural language CC entries that are not addressed by EMT-P, because they are infrequent and even obscure. Here are some of the favorites we have come across in our work with CC data:
assault jumped by 8 guys
something is wrong
hit in lip bungie cord
pt states she has clicking in her head
stuck gravel up nose
side pain in legs
massive in knee
light to moderate
Travers D, Wu S, Scholer M, Westlake M, Waller A, McCalla A. (2007). Evaluation of a chief complaint processor for biosurveillance. Proceedings of the 2007 AMIA Symposium, 736-740.
Dara, J., Dowling, J.N., Travers, D., Cooper, G.F., Chapman, W.W. (2007). Chief complaint preprocessing evaluated on statistical and non-statistical classifiers. Advances in Disease Surveillance, 2:4.
Travers, D. A., & Haas, S. (2006). The Unified Medical Language System© coverage of emergency department chief complaints. Academic Emergency Medicine, 13, 1319-1323.
Travers, D., Kipp, A., MacFarquhar, J., Waller, A. (2006). Evaluation of Emergency Medical Text Processor for Pre-Processing Chief Complaint Data for Syndromic Surveillance. Advances in Disease Surveillance, 1:71.
Travers, D.A., Haas, S.W. (2004). Evaluation of Emergency Medical Text Processor, a system for cleaning chief complaint textual data. Academic Emergency Medicine, 11(11): 1170-1176.
Travers D.A., Haas, S.W. (2003). Using nurses’ natural language entries to build a concept-oriented terminology for patients’ chief complaints in the emergency department. Journal of Biomedical Informatics. 36:260-270.