Cross Disciplinary Consultancy: Negation Detection Use Case

Despite considerable effort since the turn of the century to develop Natural Language Processing (NLP) methods and tools for detecting negated terms in chief complaints, few standardised methods have emerged. Those methods that have emerged (e.g. the NegEx algorithm) are confined to local implementations with customised solutions.

June 18, 2019

Text mining using tidy data principles

Presented April 25, 2018.

April 26, 2018

Evaluating Twitter for Foodborne Illness Outbreak Detection in New York City

An estimated one in six Americans experience illness from the consumption of contaminated food (foodborne illness) annually; most are neither diagnosed nor reported to health departments1. Eating food prepared outside of the home is an established risk factor for foodborne illness2. New York City (NYC) has approximately 24,000 restaurants and >8.5 million residents, of whom 78% report eating food prepared outside of the home at least once per week3.

January 19, 2018

Detecting Previously Unseen Outbreaks with Novel Symptom Patterns

Commonly used syndromic surveillance methods based on the spatial scan statistic first classify disease cases into broad, pre-existing symptom categories ("prodromes") such as respiratory or fever, then detect spatial clusters where the recent case count of some prodrome is unexpectedly high. Novel emerging infections may have very specific and anomalous symptoms which should be easy to detect even if the number of cases is small. However, typical spatial scan approaches may fail to detect a novel outbreak if the resulting cases are not classified to any known prodrome.

May 02, 2019

Who Should We Be Listening to? Applying Models of User Authority to Detecting Emerging Topics on the EIN

Emerging event detection is the process of automatically identifying novel and emerging ideas from text with minimal human intervention. With the rise of social networks like Twitter, topic detection has begun leveraging measures of user influence to identify emerging events. Twitter's highly skewed follower/followee structure lends itself to an intuitive model of influence, yet in a context like the Emerging Infections Network (EIN), a sentinel surveillance listserv of over 1400 infectious disease experts, developing a useful model of authority becomes less clear.

May 02, 2019

A web-based platform to support text mining of clinical reports for public health surveillance

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process.

May 02, 2019

A review of automated text classification in event-based biosurveillance

Event-based biosurveillance is a practice of monitoring diverse information sources for the detection of events pertaining to human, plant, and animal health. Online documents, such as news articles, newsletters, and (micro-) blog entries, are primary information sources in it. Document classification is an important step to filter information and machine learning methods have been successfully applied to this task.



May 02, 2019

Tuning a Chief Complaint Text Parser for Use in DoD ESSENCE

An expanded ambulatory health record, the Comprehensive Ambulatory Patient Encounter Record (CAPER) will provide multiple types of data for use in DoD ESSENCE. A new type of data not previously available is the Reason for Visit (ROV), a free-text field analogous to the Chief Complaint (CC). Intake personnel ask patients why they have come to the clinic and record their responses. Traditionally, the text should reflect the patient's actual statement. In reality the staff often "translates" the statement and adds jargon. Text parsing maps key words or phrases to specific syndromes.

May 02, 2019

Developing an application ontology for mining clinical reports: the extended syndromic surveillance ontology

Ontologies representing knowledge from the public health and surveillance domains currently exist. However, they focus on infectious diseases (infectious disease ontology), reportable diseases (PHSkbFretired) and internet surveillance from news text (BioCaster ontology), or are commercial products (OntoReason public health ontology). From the perspective of biosurveillance text mining, these ontologies do not adequately represent the kind of knowledge found in clinical reports.

June 14, 2019

Assessing the Coverage of BioCaster Terms in Web News

We describe here a multilingual ontology to support disease surveillance by intelligent text mining systems from Web-based rumours. We informally assessed the coverage of its English terms on large sample of news collected from the Web.

July 30, 2018


Contact Us

NSSP Community of Practice



This website is supported by Cooperative Agreement # 6NU38OT000297-02-01 Strengthening Public Health Systems and Services through National Partnerships to Improve and Protect the Nation's Health between the Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC. CDC is not responsible for Section 508 compliance (accessibility) on private websites.

Site created by Fusani Applications