Tractable Probabilistic Models for Ethical AI

. Among the many ethical dimensions that arise in the use of ML technology, three stand out as immediate and profound: enabling the interpretability of the underlying decision boundary, addressing the potential for learned algorithms to become biased against certain groups, and capturing blame and responsibility for a system’s outcomes. In this talk, we advocate for a research program that seeks to bridge tractable (probabilistic) models for knowledge acquisition with rich models of autonomous agency that draw on philosophical notions of beliefs, intentions, causes and e ﬀ ects.


Motivation
Machine learning (ML) techniques have become pervasive across a range of different applications, and are now widely used in areas as disparate as recidivism prediction, consumer credit-risk analysis, and insurance pricing [8,25]. Likewise, in the physical world, machine learning models are critical components in autonomous agents such as robotic surgeons and self-driving cars. Among the many societal/ethical dimensions that arise in the use of ML technology in such applications, three stand out as immediate and profound. First, to increase trust and accommodate human insight, interpretability of the underlying decision boundary is essential. Second, there is the potential for learned algorithms to become biased against certain groups, which needs to be addressed. Third, in so much that the decisions of ML models impact society, both virtually (e.g., denying a loan) and physically (e.g., accidentally driving into a pedestrian), the enabling of blame and responsibility is a significant challenge.
Many definitions have been proposed in the literature for such ethical considerations [2,19], but there is considerable debate about whether a formal notion is appropriate at all, given the rich social contexts that occur in human-machine interactions. Valid arguments are also made about the challenges of model building and deployment [10,11]: everything from data collection to ascribing responsibility when technology goes awry can demonstrate and amplify abuse of power and privilege. Such issues are deeply intertwined with legal and regulatory problems [15,32].
Be that as it may, what steps can be taken to enable ethical decision-making a reality in AI systems? Human-in-the-loop systems are arguably required given the aforementioned debate [24,34], but such loops still need to interface with an automated system of considerable sophistication that in the very least reasons about the possible set of actions. In particular, simply delegating responsibility of critical decisions to humans in an ad hoc fashion can be problematic. Often critical actions can be hard to identify immediately and it is only the ramification of those actions that raise alarm, in which case it might be too late for the human to fix. Moreover, understanding the model's rationale is a challenge in itself, as represented by the burgeoning field of explainable artificial intelligence [4,14,29]. So a careful delineation is needed as to which parts are automated, which parts are delegated to humans, which parts can be obtained from humans a-priori (i.e., so-called knowledge-enhanced machine learning [9]), but also how systems can be made to reason about their environment so that they are able to capture and deliberate on their choices, however limiting their awareness of the world might be. In the very least, the latter capacity offers an additional layer of protection, control and explanation before delegating, as the systems can point out which beliefs and observations led to their actions. and finite-domain relational logic, and so can represent certain types of knowledge representation languages. They can also be learned from data. We report on a few preliminary results [7,12,17,23,[26][27][28]33,35]. Firstly, we discuss results on studying causality-related properties in such models, and extracting counterfactual explanations from them. On the topic of fairness, it is shown that the approach enables an effective technique for determining the statistical relationships between protected attributes and other training variables. This could then be applied as a pre-processing step for computing fair models. On the topic of moral responsibility, it is shown how models of moral scenarios and blameworthiness can be extracted and learnt automatically from data as well as how judgements be computed effectively. In both themes, the learning of the model can be conditioned on expert knowledge allowing us to represent and reason about the domain of interest in a principled fashion.

Closing Remarks
We conclude with key observations about the interplay between tractability, learning and knowledge representation in the context of ethical decision-making. Among other things, we observe that the tractable model paradigm is in its early years, at least as far capturing a broad range of knowledge representation languages is concerned, and moreover, there is altogether less emphasis on mental modeling and agency. (First-order expressiveness is yet another dimension for allowing richness in specifications, as are proposals with an explicit causal theory such as [30]). In contrast, readers may want to consult discussions in [23,24] on knowledge representation approaches where a more comprehensive model of the environment and its actors is considered, but where knowledge acquisition and learning are used in careful, limited ways.
Analogously, we observe that although many expressive languages [6,18] are known to compile to tractable models, this is purely from the viewpoint of reasoning, or more precisely, probabilistic query computation. What is likely needed is a set of strategies for reversing this pipeline: from a learned tractable model, we need to be able to infer high-level representations. In the absence of general strategies of that sort, the more modest proposal is perhaps to interleave declarative knowledge for high-level patterns but allow low-level patterns to be learnt, which then are altogether compiled for tractable inference.
Overall, the discussed results can be seen occupying positions on a spectrum: the fairness result simply provides a way to accomplish de-biasing, but does not engage with a specification of the users or the environment in any concrete way. Thus, it is closer to mainstream fairness literature. The moral reasoning result is richer in that sense, as it explicitly accounts for actors and their actions in the environment. However, it does not explicitly infer how these actions and effects might have come about -these might be acquired via learning, for examplenor does it reason about what role these actions play amongst multiple actors in the environment. Thus, clearly, in the long run, richer formal systems are needed, which might account for sequential actions [3] and multiple agents [20].
However, this reverts the position back to the issues of tractability and knowledge acquisition not being addressed in such proposals. So, the question is this: can we find ways to appeal to tractable probabilistic models (or other structures with analogous properties) with such rich formal systems? As mentioned, it is known that certain probabilistic logical theories can be reduced to such structures, so perhaps gentle extensions to those theories might suggest ways to integrate causal epistemic models and tractable learning.
Beyond that technical front, much work remains to be done, of course, in terms of delineating automated decision-making from delegation and notions of accountability [13]. It is also worth remaking that computational solutions of the sort discussed in the previous section do make strong assumptions about the environment in which the learning and acting happens. In a general setting, even data collection can amplify positions of privilege, and moreover, there are multiple opportunities for failure and misspecification [10,11]. Orchestrating a framework where this kind of information and knowledge can be communicated back to the automated system is not at all obvious, and is an open challenge. In that regard, the two-pronged approach is not advocated as a solution to such broader problems, and indeed, it is unclear whether abstract models can imbibe cultural and sociopolitical contexts in a straightforward manner. However, it at least allows us to specify norms for human-machine interaction, provide goals and situations to achieve, model the machine's beliefs, and allow the machine to entertain models of the user's knowledge. Ultimately, the hope is that the expressiveness argued for offers additional protection, control and explanation during the deployment of complex systems with machine learning components.