Optimizing Rough Set Flow Graph Inference

Date
2018-09
Authors
Wang, Jun
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

Bayesian networks (BNs) are a tool for managing uncertainty. A BN consists of a directed acyclic graph (DAG) and a corresponding set of conditional probability tables (CPTs). The probabilistic conditional independencies encoded in the DAG indicate that the product of the CPTs is a joint probability distribution. A central step in probabilistic inference is to eliminate variables. More specifically, every variable in the BN, but not appearing in a given query, must necessarily be eliminated. Eliminating these variables in any order will yield a correct result. It is well-known, however, that the order in which these variables are eliminated can dramatically affect the amount of computation needed to answer the query. Hence, several heuristics have been developed for determining good elimination orderings in BN inference. Rough sets are another tool for managing uncertainty. The basic idea of rough set theory is to approximate an undefinable set by two definable sets, called upper and lower approximations. Rough set ow graphs (RSFGs), introduced by Pawlak, are a graphical framework for managing uncertainty. Pawlak also investigated an accompanying inference algorithm to answer a query in a RSFG. Pawlak's algorithm eliminates variables all at once, which is computationally expensive. A more efficient algorithm was proposed later that eliminates the variables one-by-one. It was shown that the new algorithm never performs more work than Pawlak's algorithm. However, one open problem is to determine good elimination orderings in RSFG inference. In this thesis, we optimize RSFG inference and make following contributions. First, we introduce the notion of barren variables and exploit them by safely removing them without any numeric computation in computer memory. Second, we propose prune independent-by-evidence variable and exploit them in the same way as for barren variables. Third, we sum out all remaining variables by proposing heuristics to determine good elimination orderings for RSFG inference. Broadly speaking, our heuristics can be classified into domain heuristics and edge heuristics. Domain heuristics determine elimination orderings based upon the domain cardinalities of the variables to be eliminated. On the other hand, the graphical structure of the RSFG with respect to edges are the key drivers for computing elimination orderings with edge heuristics. Analysis of our experimental results suggests the edge heuristic appears to yield good elimination orderings, while domain heuristics are somewhat ineffective.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. xiii, 72 p.
Keywords
Citation
Collections