Optimizing Inference in Bayesian Networks: From Join Tree Propagation to Deep Learning

Date
2020-03
Authors
dos Santos, Andre Evaristo
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

Many di erent platforms, techniques, and concepts can be employed while modelling and reasoning with Bayesian networks (BNs). A problem domain is modelled initially as a directed acyclic graph (DAG) and the strengths of relationships are quanti ed by conditional probability tables (CPTs). We consider four perspectives to BN inference. First, a central task in discrete BN inference is to determine those variables relevant to answer a given query. Two linear algorithms for this task explore the possibly relevant and active parts of a BN, respectively. We empirically compare these two methods along with a variation of each. Second, we start with BN inference using message passing in a join tree. We start with BN inference using message passing in a join tree. Here, we propose Simple Propagation (SP) as a new join tree propagation algorithm for exact inference in discrete Bayesian networks. We establish the correctness of SP. The striking feature of SP is that its message construction exploits the factorization of potentials at a sending node but without the overhead of building and examining graphs as done in Lazy Propagation (LP). Experimental results on optimal (or close to optimal) join trees built from numerous benchmark Bayesian networks show that SP is often faster than LP. Sum-Product Networks (SPNs) are a probabilistic graphical model with deep learning applications. A key feature in an SPN is that inference is tractable under certain structural constraints. This is a strong feature when compared to BNs, where inference is NP-hard. SPNs internal nodes can be understood as marginal inference recording of a BN. This is the third perspective. SPNs have shown to excel at many tasks, achieving even state-of-the-art in important tasks such as image completion. Even though SPNs pose a clear advantage against BNs, the later still have a clearer illustration of in uence among the variables they represent. To take advantage of both, we use an SPN to perform inference, but utilize BNs to observe i the bacterial relationships in soil datasets. We rst learn a BN to read independencies in linear time between bacterial community characteristics. These independencies are useful in understanding the semantic relationships between bacteria within communities. Next, we learn an SPN to perform e cient inference. Here, inference can be conducted to answer traditional queries, involving marginals, or most probable expla- nation (MPE) queries, requesting the most likely values of the non-evidence variables given evidence. Our results extend the literature by showing that known relationships between soil bacteria holding in one or a few datasets, in fact, hold across at least 3500 diverse datasets. This study paves the way for future large-scale studies of agricultural, health, and environmental applications, for which data are publicly available. In an SPN, leaf nodes are indicator variables for each value that a random variable can assume and the remaining nodes are either sum or product. As contribution to SPN inference, we derive partial propagation (PP), which performs SPN exact inference without requiring a full propagation over all nodes in the SPN as currently done. Experimental results on SPN datasets demonstrate that PP has several advantages over full propagation in SPNs, including relative time savings, absolute time savings in large SPNs, and scalability. Finally, as the fourth perspective to BN inference, we give conditions under which convolutional neural networks (CNNs) de ne valid SPNs. One subclass, called con- volutional SPNs (CSPNs), can be implemented using tensors but also can su er from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yielded a larger subclass of CNNs, which is called deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they keep the rigorousness probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and MPE inference. This allowed an alternative method for learning DCSPNs using vectorized di erentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. The results on image sampling were encouraging and sampled images exhibited variability, a salient attribute.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science, University of Regina. x, 97 p.
Keywords
Citation