Boshra F. Zopon AL Bayaty1, Shashank Joshi2
1Department of Computer Science,Yashwantrao Mohite College, Bharati Vidyapeeth University
AL-Mustansiriya University, Baghdad, Iraq
2Department of Computer Engineering, Engineering College, Bharati Vidyapeeth University
Article Publishing History
Article Received on :
Article Accepted on :
Article Published :
Article Metrics
ABSTRACT:
Word sense disambiguation is process of identifying correct meaning based on algorithm used. Many more research is carried out in this domain popular dataset referred is wordnet. This paper discuss about word sense disambiguation using adaboost algorithm. In thiswork wordnet data and senseval standards are used resolve meaning of word with the help of given context.
KEYWORDS:
WSD; Supervised learning approaches; Senseval-3; WSD; WordNet
Copy the following to cite this article:
AL Bayaty B. F. Z, Joshi S. Empirically Implementation Adaboost to Solve Ambiguity. Orient.J. Comp. Sci. and Technol;8(2)
|
Copy the following to cite this URL:
AL Bayaty B. F. Z, Joshi S. Empirically Implementation Adaboost to Solve Ambiguity. Orient. J. Comp. Sci. and Technol;8(2). Available from: http://www.computerscijournal.org/?p=2887
|
Introduction
One of natural language processing applications is word sense disambiguation. There are two main ways to identify meaning of word correctly:
Supervised Approach
Where along with the algorithm context is used to train system to identify word correctly. Adaboost, is theoretical approach for learning model called probably Approximately correct (PAC). Adaptive Boosting constructs a strong classifier by taking a linear combination of a number of weak classifier. This approach is known as adaptive boosting, because classifier technique helps to classify those words which were not classified correctly.
Unsupervised Approach
In these approaches acquire information from unannotated raw text. Always the performance of unsupervised approaches is been lower than that of the other approaches used for word sense disambiguation.
Problem Definition
To identify meaning of word correctly using adaptive boosting approach to improve overall classification. In this case algorithms are used to report their classification and then overall accuracy of classification is improved.
Excremental Setup
To address the problem statement discussed so far experiment is preformed and set up for that is as below.
- Data set: 10 nouns, 5 verbs.
- Reference for meaning and POS: WordNet ver. 2.1.
- Algorithm: Adaboost.
- Dictionary file: To specify meaning.
- Training: To train system with given context.
- Senseval format: Representation in the form of XML.
- IDE: Eclipse kepler 6.0.
- P.L.: J2SE 6.0.
- O.S.: Windows 7 32 bit.
Implementation and Algorithm Used
Adaptive boosting approach identifies week learner (classifier) and boosts performance of these classifiers. The actual process carried out is as mentioned below.
Box (1): Adaboost Algorithm implemented
To make learning process easier members of training data are weighted equally. Adaboost Algorithm treats it as an input. For X components, it is iterated y times one turn is allotted for each classifier.
The Training Phase
Data set of 10 nouns and 5 verbs is used. To make understanding of senses, system is trained by referring senseval-3 structure to map word with sense by using surrounding context. This entire structure uses XML format to represent and process data using semi structured approach.
The System Answer File
This file provide accuracy related with various senses and meaning with high accuracy is identified and considered as a final answer by refering context. The screenshot below shows the System Answer. Txt file for Adaboost algorithm implemented
The Result
The results for our dataset shown in table (1) below:
Table 1.: Data Set Of Words And Results Of Adaboost Classifier
Word
|
POS
|
# Senses
|
Score
|
Accuracy
|
Praise
|
n
|
2
|
812
|
1000
|
Name
|
n
|
6
|
1000
|
1000
|
Worship
|
v
|
3
|
450
|
485
|
Worlds
|
n
|
8
|
143
|
1000
|
Lord
|
n
|
3
|
500
|
1000
|
Owner
|
n
|
2
|
811
|
1000
|
Recompense
|
n
|
2
|
815
|
1000
|
Trust
|
v
|
6
|
167
|
167
|
Guide
|
v
|
5
|
371
|
431
|
Straight
|
n
|
3
|
500
|
500
|
Path
|
n
|
4
|
333
|
333
|
anger
|
n
|
3
|
500
|
500
|
Day
|
n
|
10
|
111
|
1000
|
Favored
|
v
|
4
|
250
|
250
|
Help
|
v
|
8
|
125
|
125
|
Overall accuracy of adaboost is 65.27%, which is quite good.
Conclusion
After performing this experiment for some words adaboost delivers more accurate results, for example {Day, Recompense, Owner, Lord, Worlds, Name, and Praise}. But for other words accuracy is not maintained this accuracy need to be modified to increase the probability of identifying word with correct meaning. In this part of our work Adaboost achieved 65.27% accuracy according to the data set using Word Net and Senseval-3.
Acknowledgment
The first author thanks the ministry of higher education/Iraq; also I would like to thank my research guide Dr. Shashank Joshi (Professor at Bharati Vidyapeeth University, College of Engineering) for submitted his advices within preparing this work.
References
Books
- Nitin Indurkhya and Fred J. Damerau “HANDBOOK OF NATURAL LANGUAGE PROCESSING” SECOND EDITION. Chapman & Hall/CRC, USA, 2010.
- Daniel Jurafsky and James H. Martin, Naïve Bayes Classifier Approach to Word Sense Disambiguation, chapter 20, Computational Lexical Semantics, Sections 1 to 2, University of Groningen, 2009.
- Patrick Niemeyer and Jonathan Knudsen, Learning Java, O’REILLY, Second Edition, USA 2002.
- Steve Holzner, Eclipse, O’RILLY, Third Indian reprinted, 2007.
Journal Papers
5. Zhi-Hua Zhou, Yang Yu, National Key Laboratory for Novel Software Technology,Nanjing University, Nanjing 210093, China, 2008.
6. Boshra F. Zopon AL_Bayaty, Dr. Shashank Joshi,Conceptualisation of Knowledge Discovery from Web Search, Bharati Vidyapeeth University, International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014, pages 1246- 1248.
7. Miller, G. et al., 1993, Introduction to WordNet: An On-line Lexical Database,ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.pdf, Princeton University.
Links
8. http://www.senseval.org/senseval3.
9. http://www.e-quran.com/language/english.
10. http://wordnet.princeton.edu.
11. https://code.google.com/p/pr- toolkit/source/browse/applications/postagging/trunk/src/edlin/classification/AdaBoost.java?r=5
![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.