Poet Attribution for Urdu: Finding Optimal Configuration for Short Text
Keywords:Poet Attribution, Author Attribution, Ngrams, Classification, Urdu
This study presents a machine learning system to identify the poet of a given poetic piece consisting of 2 lines (i.e. a couplet) or more. The task is more difficult than the general task of author attribution, as the number of words in verses and poems are usually less than the number of articles present in author attribution datasets. We applied classification algorithms with different sets of feature configurations to run several experiments and found that the system performs best when support vector machine using a combination of unigram and bigram are used . The best system (for 5 Urdu poets) has the accuracy of 88.7%.
Oxford Leaner’s Dictionaries. https://www.oxfordlearnersdictionaries.com/definition/english/poem?q=poem
G.F. Simons, & C.D. Fennig, “Ethnologue: Languages of Asia”. SIL International, Dallas, 2017.
T.? ?G. ?Bailey, ?“?A History of Urdu Literature?”?. Association Press (Y.M.C.A.), 1932.
H. Love, “Attributing Authorship: An Introduction”, Cambridge University Press,
S. ?Raghavan, A. Kovashka, & R. Mooney, “Authorship attribution using probabilistic
context-free grammars”. In ?Proceedings of the ACL 2010 (? pp. 38-42), 2010.
E. F. ?Can, F. Can, P. Duygulu, & M. Kalpakli, “Automatic categorization of ottoman literary texts ?by poet and time period”, In ?Computer and Information Sciences II (pp. 51-57), Springer, London, 2011..
D. O. ?Sahin, O. E. Kural, E. Kilic, & A. Karabina, “A Text Classification Application: Poet Detection from Poetry”, ?arXiv preprint arXiv:1810.11414?, 2018.
P. W. ?Smith, & W. Aldridge, “Improving authorship attribution: optimizing Burrows' Delta method”, ?Journal of Quantitative Linguistics,? ?18(? 1), (pp 63-88), 2011.
J. ?Burrows, “‘Delta’: a measure of stylistic difference and a guide to likely authorship”, ?Literary and linguistic computing,? ?17?(3), (pp 267-287), 2002.
D.L.Hoover,“TestingBurrows'sdelta”,?Literaryandlinguisticcomputing,??19(?4),(pp 453-475), 2004.
D. L. Hoover, “Word frequency, statistical stylistics and authorship attribution”, In What's in a Word-list?? (pp. 55-72), Routledge, 2016.
M?. Eder, “Does size matter? Authorship attribution, small samples, big problem”, Digital Scholarship in the Humanities?, ?30?(2), (pp 167-182), 2015.
A. F. ?Ahmed, R. Mohamed, B. Mostafa, & A. S. Mohammed, A. S, “Authorship attribution in Arabic poetry”, In ?2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA)? (pp. 1-6), IEEE, 2015.
G?. ?Rakshit?, A. Ghosh, P. Bhattacharyya, P., & G. Haffari, “Automated analysis of bangla poetry for classification and poet identification”, In ?Proceedings of the 12th International Conference on Natural Language Processing? (pp. 247-253), 2015.
A?. A. Raza, A. Athar, & S. Nadeem, “N-gram based authorship attribution in Urdu poetry”. In ?Proceedings of the Conference on Language & Technology (pp. 88-93), 2009.
M. ?A. Hearst, “Support Vector Machines”, ?IEEE Intelligent Systems 13, 4 (pp 18–28), 1998. DOI:?https://doi.org/10.1109/5254.708428
S?. Russell, & P. Norvig, “?Artificial Intelligence: A Modern Approach?”, 3rd ed., Pearson, 2003.
S. ?Hochreiter?, & J. Schmidhuber, “Long short-term memory”, ?Neural Computation,? ?9(? 8) (pp 1735-1780), 1997.
L?.? Richardson?, “Beautiful soup documentation”, 2007.
?] S?. ?Bird, E. Klein, E. Loper, 2009, “Natural language processing with Python: analyzing text with the natural language toolkit”, O'Reilly Media, Inc., 2009.