Automatic Summarization of Financial Reports
Abstract
The field of Natural Language Processing (NLP) has witnessed substantial advancements
due to both the development of new algorithms and the increase of
computational capacities. As a result, novel NLP techniques have been successfully
applied in various fields such as medicine, law, and finance.
Companies and corporations publish several financial reports and disclosures every
year to meet the requirements and needs of law enforcement entities, analysts, customers,
and stakeholders. Processing these materials requires a tremendous amount
of time and manual labor, especially with the unwieldy volume of data accumulated
throughout the years. This thesis attempts to exploit the potential of modern NLP
techniques to summarize financial reports in an automated fashion. We utilize an
attention-based language model, namely RoBERTa to build a financially-oriented
language model. This model is combined with various non-parametric methods to
generate sentence vectors which can facilitate the efficient selection of important sentences
in the document as part of the summarization algorithm. We illustrate the
superiority of a financially tuned language model. Further, it is shown that simple
non-parametric sentence embeddings achieve ROUGE scores that are comparable to
or even better than those produced by Sentence Transformer and Universal Embeddings.