It is a known fact that
investigators of accidents
and decision-makers are
often shocked to learn that
a lot of "fatal" incidents
could've been prevented -
all relevant data and
records were there, in the
system.
Thousands (sometimes
millions) of Records are
written, recorded or
acquired by speech-to-text
recognition - all
measurable, observed data
and information are
recorded. In such a huge
amount of data it is
impossible to distinguish
valuable "signal" from the
worthless "noise". How to
treat this tremendous amount
of data to dramatically
reduce or prevent critical
incidents and accidents?
Computerized selection of
the vital and valuable few
records by discovering
hidden patterns and
relationships in data and
texts is a long-awaited
solution worldwide.
Data Mining and Text
Mining
Data Mining is the process
of discovering hidden
patterns and relationships
in data.
Text Mining involves the
application of Data Mining
tools to textual data
extracting patterns from
natural language, i.e.
mostly unstructured data
when identical things are
described in different words
and vice versa, different
things may be described in
similar words. Text Mining
is different from the web
search, when user is looking
for something already known
or has been written by
someone else. Providing
efficient Text Mining
solution is an indispensable
part of FavoWeb intelligent
incident data collection and
management.
FavoWeb FRACAS
(
Reporting, Analysis
and Corrective Action System)
supports the crucial task of
Safety and Security mission:
pattern recognition,
classification,
categorization and labeling
of data sets and free texts.
FavoWeb FRACAS developed a
safety text-categorization
system capable to assign
incoming new
failure/incident reports to
one or more of predefined
categories, on the basis of
their textual content.
FavoWeb FRACAS Text
Mining
- Complex approach to
large-scale text mining
tasks provides uniquely
complete solution
- High Dimension (large
amount of input parameters
- single words of a
vocabulary)
- Sparse Document
Vectors (small number of
distinct words in each
document)
- Heterogeneous Use of
Terms (same category
documents may have small
overlap)
- High Level of
Redundancy (many different
features relevant to the
classification)
Text Mining for
Prediction
Prediction is the
ultimate goal of FavoWeb
FRACAS text mining.
FavoWeb process of Text
Mining is a complex and
complete solution for all
three main stages of Text
Mining:
1. Text
Pre-processing -
Data cleaning and
transformations, selection
of subsets, preliminary
feature selection, reduction
of the large number of
parameters to a manageable
amount.
FavoWeb tools possibilities:
- Binary and
word-frequency coding
- Reduction of
vocabulary dimension by
stemming, lemmatization,
word frequency, etc.
2. Model Building
and Validation -
Considering various models
and choosing the best one
based on their predictive
performance to assign new
reports to one or more set
of predefined categories on
the basis of their textual
content.
FavoWeb FRACAS model
validation stage utilizes
all known modern approaches: