This document discusses using ontology learning to semantically annotate a corpus of 15,000 web service interfaces. It proposes extracting terms from the interfaces at a fine-grained level and using pattern-based methods to discover taxonomic and non-taxonomic relations to automatically generate an ontology. The method achieved 62% accuracy for common concepts and 71% for common instances compared to a golden ontology.
Ekaw ontology learning for cost effective large-scale semantic annotation
1. Ontology Learning for
Large-scale Semantic Annotation
of Web service Interfaces
Shahab Mokarizadeh (Royal Institute of Technology )
Peep Kungas (Univeristy of Tartu)
Mihhail Matskin (Royal Institute of Technology)
2. Motivation
•Motivation: Analysis of public web-services for
Identifying Missing but Valuable Web service
(to be implemented)
•Materials :
- Corpus of circa 15000 WSDL documents
(http://www.soatrader.com/web-services )
•Challenges :
- Absence of any kind of semantic information (e.g.
documentation) in around 95% of WSDLs
- Frequent misspelling, abbreviation, technical
words, etc.
2
3. Initial Step : Knowledge Acquisition
•Knowledge about Web-services themselves:
- Functionality of service
- Attributes of service (e.g. QoS, Rating, etc)
- Structural relations with other services,
- …..
» Ontology of Services
•Knowledge about Web-service Domain
- Domain Concepts and Relations
» Domain Ontology √
•Knowledge Acquisition → Ontology Learning
3
4. Domain Ontology Learning Granularity
Granularity of Term Extraction from WSDL :
- Coarse Grained:
• Service Names
• Operation Name
• …..
- Fine Grained:√
• Part names of input/output parameters
• XML Schema leaf element names
4
5. Ontology Learning Method
• Pattern based method:
Input text is scanned for predefined “ lexico-
syntactic” patterns where the pattern indicates a
relation of interest , either “taxonomic” or “non-
taxonomic “.
• Pattern based method is applicable because:
Underlying extracted terms so often follow specific
patterns.
5
6. Information Elicitation
Ontology Learning Steps Term Extraction
Syntactic Refinement
Ontology Discovery
Information Extraction: Pattern-based Semantic
• Start with fine-grained granularity Analysis
• If term is ambiguous , terms from Term Disambiguation
coarse granularity are incorporated Class and Relation
Determination
Ontology Enrichment
Adding Relations
Ontology
6
7. Lexico-Syntactic Term Analysis -1
1- (Noun1)+ …+(Nounn) e.g. PictureIdentifier
Term:(N|Wordn) [(nn)(N|Word1) + .. +(nn) (N|Wordn-1)]
(Header) [ Modifier ]
Identifier Picture
Concept & Relation Example
Identification
Modifier isA Concept Picture isA Concept
Header isA Concept Identifier isA Concept
Term subConceptOf Header PictureIdentifier subConceptOf Identifier
Modifier hasProperty Term Picture hasProperty PictureIdentifier
“PictureIdentifier” isInstanceOf PictureIdentifier
7
8. Lexico-Syntactic Term Analysis -2
2- (Adj1)+ …+(Nounn) e.g.SupportedImage
(N|Wordn) [(mod)(A|Word1)+…+(nn) (N|Wordn-1)]
(Header) [ Modifier ]
Image Supported
Concept & Relation Example
Identification
Header isA Concept Image isA Concept
Term subConceptOf Header SupportedImage subConceptOf Image
“SupportedImage“ isInstanceOf SupportedImage
8
9. Adding other Non-Taxonomic Relations
Exploiting WordNet to find following relations:
• SynonymOf :(having a common synset)
• SimilarTo: (based on taxonomy and corpus statistics
of words)
• More ….
Example:
• Image isSynonymOf Picture
9
10. Evaluation
•Comparing automatically Generated Ontology
with Golden Ontology :
• Common Concepts: 862(out of 1391)≈62%
• Common Instances: 1313 (out of 1853) ≈71%
– Instance Level:
• Precision: 85%
• Recall: 78%
10
11. Conclusion
•Pattern based ontology building is promising but
not enough!
•The result is not a really ontology (e.g. upper
level concepts are missing) .
•More non-taxonomic relations need to be
discovered.
11