Text cleaning: EMT-P is a natural language processing (NLP) system that processes raw chief complaint (CC) text entries to correct non-standard words in natural language entries. EMT-P was developed from an extensive analysis of emergency department (ED) CC text (Travers & Haas, 2003). Patterns of natural language were identified, and individual modules were developed to address the various patterns.
UMLS: The EMT-P system must be used in conjunction with the medical terminology resources to the Unified Medical Language System® (UMLS®). EMT-P maps cleaned CC text to standardized terms from the UMLS Metathesaurus®. The UMLS is a compilation of all the major controlled vocabularies in healthcare. It is managed by the National Library of Medicine, which has encouraged its use in addressing vocabulary issues in healthcare information systems.
Modular approach: EMT-P is a modular natural language processing system, which preserves the raw data as much as possible by only cleaning each text entry until it matches a standard term. Strategies are applied from least (Round 1) to most (Round 3) aggressive. Although it is possible for users to choose which modules to run, EMT-P version 2.2 runs all modules by default. Examples of the processing addressed in each round are shown here.
Architecture
EMT-P is based on the common repository architecture. At the core is a database
management system (DBMS) which contains all the UMLS information. The database
is used as a library of CC data and serves as a reference for ideal CCs.
The controller (a Java program) calls each appropriate analysis/modification
process (Perl scripts). Connections to the DBMS are made with a JDBC driver.
Process
The main control module (RunEMTP.class) calls the appropriate Perl scripts
during each round of input-cleaning. Each Perl script produces an output
text file for the next script to accept as input. After each round, the
database is queried with the cleaned data. If there is a match, the entry
is kept in the final output file and no further processing is performed
on that entry. If not, it is sent to the next round for more aggressive
cleaning. Upon the final round of cleaning, all entries are kept in the
final output file and noted as being a match or non-match.

Publications
Travers D, Wu S, Scholer M, Westlake M, Waller A, McCalla A. (2007).
Evaluation of a chief complaint processor for biosurveillance. Proceedings
of the 2007 AMIA Symposium, 736-740.
Dara J., Dowling, J.N., Travers, D., Cooper, G.F., Chapman, W.W. (2007). Evaluation of preprocessing techniques for chief complaint classification. Journal of Biomedical Informatics 2008, 41 (4): 613-23.
Travers, D. A., & Haas, S. (2006). The Unified Medical Language System© coverage of emergency department chief complaints. Academic Emergency Medicine, 13, 1319-1323.
Travers, D., Kipp, A., MacFarquhar, J., Waller, A. (2006). Evaluation of Emergency Medical Text Processor for Pre-Processing Chief Complaint Data for Syndromic Surveillance. Advances in Disease Surveillance, 1:71.
Travers, D.A., Haas, S.W. (2004). Evaluation of Emergency Medical Text Processor, a system for cleaning chief complaint textual data. Academic Emergency Medicine, 11(11): 1170-1176.
Travers D.A., Haas, S.W. (2003). Using nurses’ natural language entries
to build a concept-oriented terminology for patients’ chief complaints
in the emergency department. Journal of Biomedical Informatics. 36:260-270.
