Automatic Summarization of Financial Reports
MetadataShow full item record
The field of Natural Language Processing (NLP) has witnessed substantial advancements due to both the development of new algorithms and the increase of computational capacities. As a result, novel NLP techniques have been successfully applied in various fields such as medicine, law, and finance. Companies and corporations publish several financial reports and disclosures every year to meet the requirements and needs of law enforcement entities, analysts, customers, and stakeholders. Processing these materials requires a tremendous amount of time and manual labor, especially with the unwieldy volume of data accumulated throughout the years. This thesis attempts to exploit the potential of modern NLP techniques to summarize financial reports in an automated fashion. We utilize an attention-based language model, namely RoBERTa to build a financially-oriented language model. This model is combined with various non-parametric methods to generate sentence vectors which can facilitate the efficient selection of important sentences in the document as part of the summarization algorithm. We illustrate the superiority of a financially tuned language model. Further, it is shown that simple non-parametric sentence embeddings achieve ROUGE scores that are comparable to or even better than those produced by Sentence Transformer and Universal Embeddings.