About ANLAP [1-3]
- ANLAP is artificial intelligence tool of automatic natural language abstracting & processing, and is based on a semantic parsing of a natural language input. The method is based on a revision of the original notion of conceptual dependency (CD) as proposed by Schank, et al. [5-8]. In developing ANLAP, we made extensive use of the Natural Language Tool Kit (NLTK) due to Loper, et al. [4, 9-10].
- The process begins with a context free parsing of the input text. The parsing of the input text is accomplished with a statistical parser, based on training data from the Brown and Penn State Treebank corporas [10]. The parsed text is converted to a “chunked” text by a procedure known as a stochastic Parts Parser for noun phrases (see Church [11]), where the parsed text is subdivided into Noun and Verb Phrases.
- The parsed text is then converted into a semantic one based on Conceptual Dependency. This is accomplished with the aid of a specially constructed dictionary. The construct of the dictionary is illustrated by the example that is discussed below.
- The Abstracting and Processing part of the program performs the following three functions :
1. A natural language prompting of user for input.
2. Processing by an expert system based on user’s natural language input
of rules.
3. Abstracting of unstructured data as demonstrated in examples in this paper.
- Here we focus exclusively on the last function above. The Abstracting of unstructured data is performed by first parsing the input text. A question is then formulated by the user who defines the headings to be used in forming a table by searching the unstructured data.
- The defined headings in both their keyword and semantic forms are used to search the data in both the keyword and semantic forms.
The semantics is used to disambiguate the presence of the heading words when it is not associated with data.
- It was also necessary to string the noun phrases ([NP]) together to define a piece of data as illustrated in the following snippet of text :
12 percent of the root causes as personnel errors
Chunked
[12 percent][NP] of the [root causes][NP] as [personnel errors][NP].- The semantic-free parsing is now converted into a conceptual dependency form. Its convenient to write this in first order format as shown below in Table 1:
- The conversion to CD form is now evident from our chunking operations.
We can now write a first order relation for each Chunk as well as the relation between the sequence of chunks as shown in Table 2.
We call this Conceptual Analysis and a browse through the Analysis will give a semantic meaning of the text.
- Search for the heading ‘personnel’ consists of searching through the conceptual analysis to establish the above sequence of links.
- It is worth noting that the performance of ANLAP is greatly enhanced when we decide in advance that we are only interested in looking for very specific kinds of information in texts such as a failure event report, an NDE inspection report, or a material property testing report. With the user’s help, ANLAP will first convert the unstructured data of natural language sentences into the structured data of a table with user- defined headings.
- This method of getting meaning from text is called “Information Extraction” [10], and the table output format opens the door for us to link up with powerful tools such as SQL and DATAPLOT [12].
REFERENCES
- Marcal P. V., Fong J. T., Yamagata N., “Artificial Intelligence(AI) Tools for Data Acquisition and Probability Risk Analysis of Nuclear Piping Failure Databases,” Proc. ASME Pressure Vessels & Piping Conference, July 26-30, 2009, Prague, The Czech Republic.
- Marcal, P. V., 2009, ANLAP User’s Manual. Published by MPACT Corp., Julian CA, pedrovmarcal@gmail.com(2009).
- Yamagata N., Marcal P. V., Fong J.T.,”Artificial Intelligence(AI) Tools for Data Acquisition and Probability Risk Analysis of Structure Databases,” Proc. JSME 22th Computational Mechanics Conf., October, 2009, Kanazawa, Japan (in Japanese)
- Loper, E., and Bird, S., 2002, “NLTK: The Natural Language Toolkit,” Proc. ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL. Somerset, NJ: Association for Computational Linguistics, http://epydoc.sourceforge.net/ (2002).
- Schank, R., 1969, A Conceptual Dependency Representation for a Computer-Oriented Semantics, Ph.D. Thesis University of Texas, Austin, 1969, also available as Stanford AI Memo 83, Stanford Artificial Intelligence Project, Computer Science Department, Stanford University, Stanford, CA (1969).
- Schank, R. C., and Tesler, L., 1969, “A conceptual dependency parser for natural language,” Proc. of the 1969 Interna-tional Conf. on Computational Linguistics, Sang-Saby, Sweden, pp. 1-3. Association for Computational Linguistics, Morristown, NJ (1969).
- Schank, R., 1972, “Conceptual dependency: A theory of natural language understanding,” Cognitive Psychology, Vol. 3 (4), pp. 552-631 (1972).
- Schank, R. C., 1973, “Identification of Conceptualizations Underlying Natural Language,” in Computer Models of Thought and Language, R. C. Schank and K. M. Colby, eds., Chapter 5. San Francisco, CA: W. H. Freeman and Company (1973).
- Bird, S., 2006, “NLTK: The Natural Language Toolkit,” Proc. COLING/ACL 2006 Interactive Presentation Sessions, pages 69-72, Sydney, July 2006. Assoc. Computational Linguistics (2006).
- Bird, S., Klein, E., and Loper, E., 2008, Natural Language Processing. Published by the authors and distributed with the Natural Language Toolkit [http://www.nltk.org/] Version 0.9.7a, 18 Dec (2008).
- Church, K. W., “A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text,” Proc. Second Conf. on Applied Natural Language Processing, Austin, TX, 1988, pp. 136-143 (1988).
- Filliben, J. J., and Heckert, N. A., 2002, DATAPLOT: A Statistical Data Analysis Software System, A Public Domain Software Released by National Institute of Standards & Technology, Gaithersburg, MD 20899, http://www.itl.nist.gov/div898/software/dataplot.html (2002).