Skip to main navigation Skip to search Skip to main content

Machine Learning Based Extractive Text Summarization Using Document Aware and Document Unaware Features

  • Muhammad Ammar Saleem
  • , Junaid Shuja
  • , Mohammad Ali Humayun
  • , Saad Bin Ahmed
  • , Raja Wasim Ahmad
  • National University of Computer and Emerging Science
  • Universiti Teknologi Petronas
  • University of the Punjab
  • Lakehead University

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Scopus citations

Abstract

Automatic text summarization isAutomatic text summarization a natural language processing problemNatural language processing useful for education and journalism. Text summarizationLow-resource for low-resource languages is a challenging problem mainly due to a lack of text processing tools and large datasets. This research studies the impactExtractive text summarization of textual features forTextual features extractive text summarization of low-resource languages inLow-resource languages the task of Urdu extractive text summarization. The proposed method extracts textual features thatTextual features better represent the document’s context helping prediction ofPrediction the summary with greater accuracy. These document-aware features includeDocument-aware features: cosine-position, relativeCosine-position length, ratio of part of speech, ratio of numerical data, TF-ISF. TheTF-ISF support vector regression modelSupport vector regression is trained on extracted document-aware features. The trained model is then used to predict the summary for the original Urdu text in the test document. The evaluation metrics used in this research are ROGUE-1 and ROGUE-2 forROGUE-1 and ROGUE-2 evaluating the summary quality.

Original languageEnglish
Title of host publicationStudies in Systems, Decision and Control
PublisherSpringer Science and Business Media Deutschland GmbH
Pages143-158
Number of pages16
DOIs
StatePublished - 2024

Publication series

NameStudies in Systems, Decision and Control
Volume553
ISSN (Print)2198-4182
ISSN (Electronic)2198-4190

Keywords

  • Feature extraction
  • Machine learning
  • Support vector machine
  • Text summarizations
  • low-resource language

Fingerprint

Dive into the research topics of 'Machine Learning Based Extractive Text Summarization Using Document Aware and Document Unaware Features'. Together they form a unique fingerprint.

Cite this