Automatic Text Summarization Based on The Power of Reconstructing Sentences from Each Other in a Sparse Reconstruction Framework

Authors
1 Tarbiat modares university
2 Tariat modares university
Abstract
The rapid and continuous growth of the World Wide Web has made the process of extracting useful information with minimal volume of a large collection of documents a serious challenge these days. Summarizing documents are a very time-consuming and difficult task for humans, so it reveals the need for a powerful summarizing system to reduce the volume of texts and also to speed up access to useful information.

Recently, a summarization system based on the sparse representation approach has been presented, which tries to reconstruct each sentence in a sparse form by a linear combination of other sentences. In this approach, select a subset of sentences of the main text that contain important information about the text and send it to the output as a summary.

It is also necessary to select the least number of text sentences that have the maximum reconstruction of other text sentences, which achieves this goal by using the sparse representation approach.

This model consists of a penalty function based on the L2 norm to control sentence reconstruction and regularization.

The reconstruction function based on the L2 norm causes all the words to have an equal role in reconstructing sentences, which may cause outlier words to change the summarization result. Therefore, to improve the quality of the summary obtained in this article, we rewrite the penalty function with the L2 norm. This causes a different amount of error to be allocated for each of the words in sentence reconstruction, which causes the sensitivity of the method to be reduced to outlier words. The implementation results show that the proposed method provides a quick and high-quality summary based on the ROUGE and measure-F criteria compared to the previous methods.


Keywords

[1] Aker, Ahmet, et al. "Multi-document summarization techniques for generating image descriptions: A comparative analysis." Multi-source, Multilingual Information Extraction and Summarization. Springer, Berlin, Heidelberg, 2013. 299-320.

[2] Hosseinikhah, Tayyebeh, Abbas Ahmadi, and Azadeh Mohebi. "A New Persian Text Summarization Approach Based on Natural Language Processing and Graph Similarity." Iranian Journal of Information Processing and Management (2018).

[3] Weiss, Sholom M., Nitin Indurkhya, and Tong Zhang. Fundamentals of predictive text mining. Springer, 2015.‌

[4] Ker, Sue J., and Jen Nan Chen. "A Text Categorization Based on a Summarization Extraction." ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval. 2000.‌

[5] He, Ruifang, et al. "Multi-document summarization via group sparse learning." Information Sciences 349 (2016): 12-24.‌

[6] Eldén, Lars. Matrix methods in data mining and pattern recognition. Society for Industrial and Applied Mathematics, 2007.‌

[7] Lin, Chin-Yew, and Franz Josef Och. "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics." Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). 2004.‌

[8] Reeve, Lawrence H., Hyoil Han, and Ari D. Brooks. "The use of domain-specific concepts in biomedical text summarization." Information Processing & Management 43.6 (2007): 1765-1776.‌

[9] Alguliyev, Rasim M., et al. "COSUM: Text summarization based on clustering and optimization." Expert Systems 36.1 (2019): e12340.‌

[10] Wen, Xuezhi, et al. "A rapid learning algorithm for vehicle classification." Information sciences 295 (2015): 395-406.‌

[11] Gu, Bin, et al. "Incremental support vector learning for ordinal regression." IEEE Transactions on Neural networks and learning systems 26.7 (2014): 1403-1416.‌

[12] Song, Wei, Jiu Zhen Liang, and Soon Cheol Park. "Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering." Information Sciences 273 (2014): 156-170.‌

[13] Rouane, Oussama, Hacene Belhadef, and Mustapha Bouakkaz. "Combine clustering and frequent itemsets mining to enhance biomedical text summarization." Expert Systems with Applications 135 (2019): 362-373.‌

[14] Dey, Monalisa, and Dipankar Das. "A Deep Dive into Supervised Extractive and Abstractive Summarization from Text." Data Visualization and Knowledge Engineering. Springer, Cham, 2020. 109-132.‌

[15] Kryściński, Wojciech, et al. "Neural text summarization: A critical evaluation." arXiv preprint arXiv:1908.08960 (2019).‌

[16] Khandelwal, Urvashi, et al. "Sample efficient text summarization using a single pre-trained transformer." arXiv preprint arXiv:1905.08836 (2019).‌

[17] Sanchez-Gomez, Jesus M., Miguel A. Vega-Rodríguez, and Carlos J. Perez. "Parallelizing a multi-objective optimization approach for extractive multi-document text summarization." Journal of Parallel and Distributed Computing 134 (2019): 166-179.‌

[18] Fan, Hao-Teng, et al. "Speech enhancement using segmental nonnegative matrix factorization." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.‌

[19] Fan, Hao-Teng, et al. "Speech enhancement using segmental nonnegative matrix factorization." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.‌

[20] Bhatia, Neelima, and Arunima Jaiswal. "Automatic text summarization and it's methods-a review." 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence). IEEE, 2016.‌

[21]



[22]

Unnikrishnan, P., V. K. Govindan, and SD Madhu Kumar. "Enhanced sparse representation classifier for text classification." Expert Systems with Applications 129 (2019): 260-272.‌

Björck, Åke. Numerical methods in matrix computations. Vol. 59. Cham: Springer, 2015.‌

[23]



[24] Parvasideh, Parvaneh, and Mansoor Rezghi. "A novel dictionary learning method based on total least squares approach with application in high dimensional biological data." Advances in Data Analysis and Classification (2020): 1-23.‌

Mukherjee, Subhadip, Rupam Basu, and Chandra Sekhar Seelamantula. "ℓ1-K-SVD: A robust dictionary learning algorithm with simultaneous update." Signal Processing 123 (2016): 42-52.