d_separation.py 25 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677
  1. """
  2. Algorithm for testing d-separation in DAGs.
  3. *d-separation* is a test for conditional independence in probability
  4. distributions that can be factorized using DAGs. It is a purely
  5. graphical test that uses the underlying graph and makes no reference
  6. to the actual distribution parameters. See [1]_ for a formal
  7. definition.
  8. The implementation is based on the conceptually simple linear time
  9. algorithm presented in [2]_. Refer to [3]_, [4]_ for a couple of
  10. alternative algorithms.
  11. The functional interface in NetworkX consists of three functions:
  12. - `find_minimal_d_separator` returns a minimal d-separator set ``z``.
  13. That is, removing any node or nodes from it makes it no longer a d-separator.
  14. - `is_d_separator` checks if a given set is a d-separator.
  15. - `is_minimal_d_separator` checks if a given set is a minimal d-separator.
  16. D-separators
  17. ------------
  18. Here, we provide a brief overview of d-separation and related concepts that
  19. are relevant for understanding it:
  20. The ideas of d-separation and d-connection relate to paths being open or blocked.
  21. - A "path" is a sequence of nodes connected in order by edges. Unlike for most
  22. graph theory analysis, the direction of the edges is ignored. Thus the path
  23. can be thought of as a traditional path on the undirected version of the graph.
  24. - A "candidate d-separator" ``z`` is a set of nodes being considered as
  25. possibly blocking all paths between two prescribed sets ``x`` and ``y`` of nodes.
  26. We refer to each node in the candidate d-separator as "known".
  27. - A "collider" node on a path is a node that is a successor of its two neighbor
  28. nodes on the path. That is, ``c`` is a collider if the edge directions
  29. along the path look like ``... u -> c <- v ...``.
  30. - If a collider node or any of its descendants are "known", the collider
  31. is called an "open collider". Otherwise it is a "blocking collider".
  32. - Any path can be "blocked" in two ways. If the path contains a "known" node
  33. that is not a collider, the path is blocked. Also, if the path contains a
  34. collider that is not a "known" node, the path is blocked.
  35. - A path is "open" if it is not blocked. That is, it is open if every node is
  36. either an open collider or not a "known". Said another way, every
  37. "known" in the path is a collider and every collider is open (has a
  38. "known" as a inclusive descendant). The concept of "open path" is meant to
  39. demonstrate a probabilistic conditional dependence between two nodes given
  40. prescribed knowledge ("known" nodes).
  41. - Two sets ``x`` and ``y`` of nodes are "d-separated" by a set of nodes ``z``
  42. if all paths between nodes in ``x`` and nodes in ``y`` are blocked. That is,
  43. if there are no open paths from any node in ``x`` to any node in ``y``.
  44. Such a set ``z`` is a "d-separator" of ``x`` and ``y``.
  45. - A "minimal d-separator" is a d-separator ``z`` for which no node or subset
  46. of nodes can be removed with it still being a d-separator.
  47. The d-separator blocks some paths between ``x`` and ``y`` but opens others.
  48. Nodes in the d-separator block paths if the nodes are not colliders.
  49. But if a collider or its descendant nodes are in the d-separation set, the
  50. colliders are open, allowing a path through that collider.
  51. Illustration of D-separation with examples
  52. ------------------------------------------
  53. A pair of two nodes, ``u`` and ``v``, are d-connected if there is a path
  54. from ``u`` to ``v`` that is not blocked. That means, there is an open
  55. path from ``u`` to ``v``.
  56. For example, if the d-separating set is the empty set, then the following paths are
  57. open between ``u`` and ``v``:
  58. - u <- n -> v
  59. - u -> w -> ... -> n -> v
  60. If on the other hand, ``n`` is in the d-separating set, then ``n`` blocks
  61. those paths between ``u`` and ``v``.
  62. Colliders block a path if they and their descendants are not included
  63. in the d-separating set. An example of a path that is blocked when the
  64. d-separating set is empty is:
  65. - u -> w -> ... -> n <- v
  66. The node ``n`` is a collider in this path and is not in the d-separating set.
  67. So ``n`` blocks this path. However, if ``n`` or a descendant of ``n`` is
  68. included in the d-separating set, then the path through the collider
  69. at ``n`` (... -> n <- ...) is "open".
  70. D-separation is concerned with blocking all paths between nodes from ``x`` to ``y``.
  71. A d-separating set between ``x`` and ``y`` is one where all paths are blocked.
  72. D-separation and its applications in probability
  73. ------------------------------------------------
  74. D-separation is commonly used in probabilistic causal-graph models. D-separation
  75. connects the idea of probabilistic "dependence" with separation in a graph. If
  76. one assumes the causal Markov condition [5]_, (every node is conditionally
  77. independent of its non-descendants, given its parents) then d-separation implies
  78. conditional independence in probability distributions.
  79. Symmetrically, d-connection implies dependence.
  80. The intuition is as follows. The edges on a causal graph indicate which nodes
  81. influence the outcome of other nodes directly. An edge from u to v
  82. implies that the outcome of event ``u`` influences the probabilities for
  83. the outcome of event ``v``. Certainly knowing ``u`` changes predictions for ``v``.
  84. But also knowing ``v`` changes predictions for ``u``. The outcomes are dependent.
  85. Furthermore, an edge from ``v`` to ``w`` would mean that ``w`` and ``v`` are dependent
  86. and thus that ``u`` could indirectly influence ``w``.
  87. Without any knowledge about the system (candidate d-separating set is empty)
  88. a causal graph ``u -> v -> w`` allows all three nodes to be dependent. But
  89. if we know the outcome of ``v``, the conditional probabilities of outcomes for
  90. ``u`` and ``w`` are independent of each other. That is, once we know the outcome
  91. for ``v``, the probabilities for ``w`` do not depend on the outcome for ``u``.
  92. This is the idea behind ``v`` blocking the path if it is "known" (in the candidate
  93. d-separating set).
  94. The same argument works whether the direction of the edges are both
  95. left-going and when both arrows head out from the middle. Having a "known"
  96. node on a path blocks the collider-free path because those relationships
  97. make the conditional probabilities independent.
  98. The direction of the causal edges does impact dependence precisely in the
  99. case of a collider e.g. ``u -> v <- w``. In that situation, both ``u`` and ``w``
  100. influence ``v``. But they do not directly influence each other. So without any
  101. knowledge of any outcomes, ``u`` and ``w`` are independent. That is the idea behind
  102. colliders blocking the path. But, if ``v`` is known, the conditional probabilities
  103. of ``u`` and ``w`` can be dependent. This is the heart of Berkson's Paradox [6]_.
  104. For example, suppose ``u`` and ``w`` are boolean events (they either happen or do not)
  105. and ``v`` represents the outcome "at least one of ``u`` and ``w`` occur". Then knowing
  106. ``v`` is true makes the conditional probabilities of ``u`` and ``w`` dependent.
  107. Essentially, knowing that at least one of them is true raises the probability of
  108. each. But further knowledge that ``w`` is true (or false) change the conditional
  109. probability of ``u`` to either the original value or 1. So the conditional
  110. probability of ``u`` depends on the outcome of ``w`` even though there is no
  111. causal relationship between them. When a collider is known, dependence can
  112. occur across paths through that collider. This is the reason open colliders
  113. do not block paths.
  114. Furthermore, even if ``v`` is not "known", if one of its descendants is "known"
  115. we can use that information to know more about ``v`` which again makes
  116. ``u`` and ``w`` potentially dependent. Suppose the chance of ``n`` occurring
  117. is much higher when ``v`` occurs ("at least one of ``u`` and ``w`` occur").
  118. Then if we know ``n`` occurred, it is more likely that ``v`` occurred and that
  119. makes the chance of ``u`` and ``w`` dependent. This is the idea behind why
  120. a collider does no block a path if any descendant of the collider is "known".
  121. When two sets of nodes ``x`` and ``y`` are d-separated by a set ``z``,
  122. it means that given the outcomes of the nodes in ``z``, the probabilities
  123. of outcomes of the nodes in ``x`` are independent of the outcomes of the
  124. nodes in ``y`` and vice versa.
  125. Examples
  126. --------
  127. A Hidden Markov Model with 5 observed states and 5 hidden states
  128. where the hidden states have causal relationships resulting in
  129. a path results in the following causal network. We check that
  130. early states along the path are separated from late state in
  131. the path by the d-separator of the middle hidden state.
  132. Thus if we condition on the middle hidden state, the early
  133. state probabilities are independent of the late state outcomes.
  134. >>> G = nx.DiGraph()
  135. >>> G.add_edges_from(
  136. ... [
  137. ... ("H1", "H2"),
  138. ... ("H2", "H3"),
  139. ... ("H3", "H4"),
  140. ... ("H4", "H5"),
  141. ... ("H1", "O1"),
  142. ... ("H2", "O2"),
  143. ... ("H3", "O3"),
  144. ... ("H4", "O4"),
  145. ... ("H5", "O5"),
  146. ... ]
  147. ... )
  148. >>> x, y, z = ({"H1", "O1"}, {"H5", "O5"}, {"H3"})
  149. >>> nx.is_d_separator(G, x, y, z)
  150. True
  151. >>> nx.is_minimal_d_separator(G, x, y, z)
  152. True
  153. >>> nx.is_minimal_d_separator(G, x, y, z | {"O3"})
  154. False
  155. >>> z = nx.find_minimal_d_separator(G, x | y, {"O2", "O3", "O4"})
  156. >>> z == {"H2", "H4"}
  157. True
  158. If no minimal_d_separator exists, `None` is returned
  159. >>> other_z = nx.find_minimal_d_separator(G, x | y, {"H2", "H3"})
  160. >>> other_z is None
  161. True
  162. References
  163. ----------
  164. .. [1] Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
  165. .. [2] Darwiche, A. (2009). Modeling and reasoning with Bayesian networks.
  166. Cambridge: Cambridge University Press.
  167. .. [3] Shachter, Ross D. "Bayes-ball: The rational pastime (for
  168. determining irrelevance and requisite information in belief networks
  169. and influence diagrams)." In Proceedings of the Fourteenth Conference
  170. on Uncertainty in Artificial Intelligence (UAI), (pp. 480–487). 1998.
  171. .. [4] Koller, D., & Friedman, N. (2009).
  172. Probabilistic graphical models: principles and techniques. The MIT Press.
  173. .. [5] https://en.wikipedia.org/wiki/Causal_Markov_condition
  174. .. [6] https://en.wikipedia.org/wiki/Berkson%27s_paradox
  175. """
  176. from collections import deque
  177. from itertools import chain
  178. import networkx as nx
  179. from networkx.utils import UnionFind, not_implemented_for
  180. __all__ = [
  181. "is_d_separator",
  182. "is_minimal_d_separator",
  183. "find_minimal_d_separator",
  184. ]
  185. @not_implemented_for("undirected")
  186. @nx._dispatchable
  187. def is_d_separator(G, x, y, z):
  188. """Return whether node sets `x` and `y` are d-separated by `z`.
  189. Parameters
  190. ----------
  191. G : nx.DiGraph
  192. A NetworkX DAG.
  193. x : node or set of nodes
  194. First node or set of nodes in `G`.
  195. y : node or set of nodes
  196. Second node or set of nodes in `G`.
  197. z : node or set of nodes
  198. Potential separator (set of conditioning nodes in `G`). Can be empty set.
  199. Returns
  200. -------
  201. b : bool
  202. A boolean that is true if `x` is d-separated from `y` given `z` in `G`.
  203. Raises
  204. ------
  205. NetworkXError
  206. The *d-separation* test is commonly used on disjoint sets of
  207. nodes in acyclic directed graphs. Accordingly, the algorithm
  208. raises a :exc:`NetworkXError` if the node sets are not
  209. disjoint or if the input graph is not a DAG.
  210. NodeNotFound
  211. If any of the input nodes are not found in the graph,
  212. a :exc:`NodeNotFound` exception is raised
  213. Notes
  214. -----
  215. A d-separating set in a DAG is a set of nodes that
  216. blocks all paths between the two sets. Nodes in `z`
  217. block a path if they are part of the path and are not a collider,
  218. or a descendant of a collider. Also colliders that are not in `z`
  219. block a path. A collider structure along a path
  220. is ``... -> c <- ...`` where ``c`` is the collider node.
  221. https://en.wikipedia.org/wiki/Bayesian_network#d-separation
  222. """
  223. try:
  224. x = {x} if x in G else x
  225. y = {y} if y in G else y
  226. z = {z} if z in G else z
  227. intersection = x & y or x & z or y & z
  228. if intersection:
  229. raise nx.NetworkXError(
  230. f"The sets are not disjoint, with intersection {intersection}"
  231. )
  232. set_v = x | y | z
  233. if set_v - G.nodes:
  234. raise nx.NodeNotFound(f"The node(s) {set_v - G.nodes} are not found in G")
  235. except TypeError:
  236. raise nx.NodeNotFound("One of x, y, or z is not a node or a set of nodes in G")
  237. if not nx.is_directed_acyclic_graph(G):
  238. raise nx.NetworkXError("graph should be directed acyclic")
  239. # contains -> and <-> edges from starting node T
  240. forward_deque = deque([])
  241. forward_visited = set()
  242. # contains <- and - edges from starting node T
  243. backward_deque = deque(x)
  244. backward_visited = set()
  245. ancestors_or_z = set().union(*[nx.ancestors(G, node) for node in x]) | z | x
  246. while forward_deque or backward_deque:
  247. if backward_deque:
  248. node = backward_deque.popleft()
  249. backward_visited.add(node)
  250. if node in y:
  251. return False
  252. if node in z:
  253. continue
  254. # add <- edges to backward deque
  255. backward_deque.extend(G.pred[node].keys() - backward_visited)
  256. # add -> edges to forward deque
  257. forward_deque.extend(G.succ[node].keys() - forward_visited)
  258. if forward_deque:
  259. node = forward_deque.popleft()
  260. forward_visited.add(node)
  261. if node in y:
  262. return False
  263. # Consider if -> node <- is opened due to ancestor of node in z
  264. if node in ancestors_or_z:
  265. # add <- edges to backward deque
  266. backward_deque.extend(G.pred[node].keys() - backward_visited)
  267. if node not in z:
  268. # add -> edges to forward deque
  269. forward_deque.extend(G.succ[node].keys() - forward_visited)
  270. return True
  271. @not_implemented_for("undirected")
  272. @nx._dispatchable
  273. def find_minimal_d_separator(G, x, y, *, included=None, restricted=None):
  274. """Returns a minimal d-separating set between `x` and `y` if possible
  275. A d-separating set in a DAG is a set of nodes that blocks all
  276. paths between the two sets of nodes, `x` and `y`. This function
  277. constructs a d-separating set that is "minimal", meaning no nodes can
  278. be removed without it losing the d-separating property for `x` and `y`.
  279. If no d-separating sets exist for `x` and `y`, this returns `None`.
  280. In a DAG there may be more than one minimal d-separator between two
  281. sets of nodes. Minimal d-separators are not always unique. This function
  282. returns one minimal d-separator, or `None` if no d-separator exists.
  283. Uses the algorithm presented in [1]_. The complexity of the algorithm
  284. is :math:`O(m)`, where :math:`m` stands for the number of edges in
  285. the subgraph of G consisting of only the ancestors of `x` and `y`.
  286. For full details, see [1]_.
  287. Parameters
  288. ----------
  289. G : graph
  290. A networkx DAG.
  291. x : set | node
  292. A node or set of nodes in the graph.
  293. y : set | node
  294. A node or set of nodes in the graph.
  295. included : set | node | None
  296. A node or set of nodes which must be included in the found separating set,
  297. default is None, which means the empty set.
  298. restricted : set | node | None
  299. Restricted node or set of nodes to consider. Only these nodes can be in
  300. the found separating set, default is None meaning all nodes in ``G``.
  301. Returns
  302. -------
  303. z : set | None
  304. The minimal d-separating set, if at least one d-separating set exists,
  305. otherwise None.
  306. Raises
  307. ------
  308. NetworkXError
  309. Raises a :exc:`NetworkXError` if the input graph is not a DAG
  310. or if node sets `x`, `y`, and `included` are not disjoint.
  311. NodeNotFound
  312. If any of the input nodes are not found in the graph,
  313. a :exc:`NodeNotFound` exception is raised.
  314. References
  315. ----------
  316. .. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
  317. minimal d-separators in linear time and applications." In
  318. Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
  319. """
  320. if not nx.is_directed_acyclic_graph(G):
  321. raise nx.NetworkXError("graph should be directed acyclic")
  322. try:
  323. x = {x} if x in G else x
  324. y = {y} if y in G else y
  325. if included is None:
  326. included = set()
  327. elif included in G:
  328. included = {included}
  329. if restricted is None:
  330. restricted = set(G)
  331. elif restricted in G:
  332. restricted = {restricted}
  333. set_y = x | y | included | restricted
  334. if set_y - G.nodes:
  335. raise nx.NodeNotFound(f"The node(s) {set_y - G.nodes} are not found in G")
  336. except TypeError:
  337. raise nx.NodeNotFound(
  338. "One of x, y, included or restricted is not a node or set of nodes in G"
  339. )
  340. if not included <= restricted:
  341. raise nx.NetworkXError(
  342. f"Included nodes {included} must be in restricted nodes {restricted}"
  343. )
  344. intersection = x & y or x & included or y & included
  345. if intersection:
  346. raise nx.NetworkXError(
  347. f"The sets x, y, included are not disjoint. Overlap: {intersection}"
  348. )
  349. nodeset = x | y | included
  350. ancestors_x_y_included = nodeset.union(*[nx.ancestors(G, node) for node in nodeset])
  351. z_init = restricted & (ancestors_x_y_included - (x | y))
  352. x_closure = _reachable(G, x, ancestors_x_y_included, z_init)
  353. if x_closure & y:
  354. return None
  355. z_updated = z_init & (x_closure | included)
  356. y_closure = _reachable(G, y, ancestors_x_y_included, z_updated)
  357. return z_updated & (y_closure | included)
  358. @not_implemented_for("undirected")
  359. @nx._dispatchable
  360. def is_minimal_d_separator(G, x, y, z, *, included=None, restricted=None):
  361. """Determine if `z` is a minimal d-separator for `x` and `y`.
  362. A d-separator, `z`, in a DAG is a set of nodes that blocks
  363. all paths from nodes in set `x` to nodes in set `y`.
  364. A minimal d-separator is a d-separator `z` such that removing
  365. any subset of nodes makes it no longer a d-separator.
  366. Note: This function checks whether `z` is a d-separator AND is
  367. minimal. One can use the function `is_d_separator` to only check if
  368. `z` is a d-separator. See examples below.
  369. Parameters
  370. ----------
  371. G : nx.DiGraph
  372. A NetworkX DAG.
  373. x : node | set
  374. A node or set of nodes in the graph.
  375. y : node | set
  376. A node or set of nodes in the graph.
  377. z : node | set
  378. The node or set of nodes to check if it is a minimal d-separating set.
  379. The function :func:`is_d_separator` is called inside this function
  380. to verify that `z` is in fact a d-separator.
  381. included : set | node | None
  382. A node or set of nodes which must be included in the found separating set,
  383. default is ``None``, which means the empty set.
  384. restricted : set | node | None
  385. Restricted node or set of nodes to consider. Only these nodes can be in
  386. the found separating set, default is ``None`` meaning all nodes in ``G``.
  387. Returns
  388. -------
  389. bool
  390. Whether or not the set `z` is a minimal d-separator subject to
  391. `restricted` nodes and `included` node constraints.
  392. Examples
  393. --------
  394. >>> G = nx.path_graph([0, 1, 2, 3], create_using=nx.DiGraph)
  395. >>> G.add_node(4)
  396. >>> nx.is_minimal_d_separator(G, 0, 2, {1})
  397. True
  398. >>> # since {1} is the minimal d-separator, {1, 3, 4} is not minimal
  399. >>> nx.is_minimal_d_separator(G, 0, 2, {1, 3, 4})
  400. False
  401. >>> # alternatively, if we only want to check that {1, 3, 4} is a d-separator
  402. >>> nx.is_d_separator(G, 0, 2, {1, 3, 4})
  403. True
  404. Raises
  405. ------
  406. NetworkXError
  407. Raises a :exc:`NetworkXError` if the input graph is not a DAG.
  408. NodeNotFound
  409. If any of the input nodes are not found in the graph,
  410. a :exc:`NodeNotFound` exception is raised.
  411. References
  412. ----------
  413. .. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
  414. minimal d-separators in linear time and applications." In
  415. Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
  416. Notes
  417. -----
  418. This function works on verifying that a set is minimal and
  419. d-separating between two nodes. Uses criterion (a), (b), (c) on
  420. page 4 of [1]_. a) closure(`x`) and `y` are disjoint. b) `z` contains
  421. all nodes from `included` and is contained in the `restricted`
  422. nodes and in the union of ancestors of `x`, `y`, and `included`.
  423. c) the nodes in `z` not in `included` are contained in both
  424. closure(x) and closure(y). The closure of a set is the set of nodes
  425. connected to the set by a directed path in G.
  426. The complexity is :math:`O(m)`, where :math:`m` stands for the
  427. number of edges in the subgraph of G consisting of only the
  428. ancestors of `x` and `y`.
  429. For full details, see [1]_.
  430. """
  431. if not nx.is_directed_acyclic_graph(G):
  432. raise nx.NetworkXError("graph should be directed acyclic")
  433. try:
  434. x = {x} if x in G else x
  435. y = {y} if y in G else y
  436. z = {z} if z in G else z
  437. if included is None:
  438. included = set()
  439. elif included in G:
  440. included = {included}
  441. if restricted is None:
  442. restricted = set(G)
  443. elif restricted in G:
  444. restricted = {restricted}
  445. set_y = x | y | included | restricted
  446. if set_y - G.nodes:
  447. raise nx.NodeNotFound(f"The node(s) {set_y - G.nodes} are not found in G")
  448. except TypeError:
  449. raise nx.NodeNotFound(
  450. "One of x, y, z, included or restricted is not a node or set of nodes in G"
  451. )
  452. if not included <= z:
  453. raise nx.NetworkXError(
  454. f"Included nodes {included} must be in proposed separating set z {x}"
  455. )
  456. if not z <= restricted:
  457. raise nx.NetworkXError(
  458. f"Separating set {z} must be contained in restricted set {restricted}"
  459. )
  460. intersection = x.intersection(y) or x.intersection(z) or y.intersection(z)
  461. if intersection:
  462. raise nx.NetworkXError(
  463. f"The sets are not disjoint, with intersection {intersection}"
  464. )
  465. nodeset = x | y | included
  466. ancestors_x_y_included = nodeset.union(*[nx.ancestors(G, n) for n in nodeset])
  467. # criterion (a) -- check that z is actually a separator
  468. x_closure = _reachable(G, x, ancestors_x_y_included, z)
  469. if x_closure & y:
  470. return False
  471. # criterion (b) -- basic constraint; included and restricted already checked above
  472. if not (z <= ancestors_x_y_included):
  473. return False
  474. # criterion (c) -- check that z is minimal
  475. y_closure = _reachable(G, y, ancestors_x_y_included, z)
  476. if not ((z - included) <= (x_closure & y_closure)):
  477. return False
  478. return True
  479. @not_implemented_for("undirected")
  480. def _reachable(G, x, a, z):
  481. """Modified Bayes-Ball algorithm for finding d-connected nodes.
  482. Find all nodes in `a` that are d-connected to those in `x` by
  483. those in `z`. This is an implementation of the function
  484. `REACHABLE` in [1]_ (which is itself a modification of the
  485. Bayes-Ball algorithm [2]_) when restricted to DAGs.
  486. Parameters
  487. ----------
  488. G : nx.DiGraph
  489. A NetworkX DAG.
  490. x : node | set
  491. A node in the DAG, or a set of nodes.
  492. a : node | set
  493. A (set of) node(s) in the DAG containing the ancestors of `x`.
  494. z : node | set
  495. The node or set of nodes conditioned on when checking d-connectedness.
  496. Returns
  497. -------
  498. w : set
  499. The closure of `x` in `a` with respect to d-connectedness
  500. given `z`.
  501. References
  502. ----------
  503. .. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
  504. minimal d-separators in linear time and applications." In
  505. Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
  506. .. [2] Shachter, Ross D. "Bayes-ball: The rational pastime
  507. (for determining irrelevance and requisite information in
  508. belief networks and influence diagrams)." In Proceedings of the
  509. Fourteenth Conference on Uncertainty in Artificial Intelligence
  510. (UAI), (pp. 480–487). 1998.
  511. """
  512. def _pass(e, v, f, n):
  513. """Whether a ball entering node `v` along edge `e` passes to `n` along `f`.
  514. Boolean function defined on page 6 of [1]_.
  515. Parameters
  516. ----------
  517. e : bool
  518. Directed edge by which the ball got to node `v`; `True` iff directed into `v`.
  519. v : node
  520. Node where the ball is.
  521. f : bool
  522. Directed edge connecting nodes `v` and `n`; `True` iff directed `n`.
  523. n : node
  524. Checking whether the ball passes to this node.
  525. Returns
  526. -------
  527. b : bool
  528. Whether the ball passes or not.
  529. References
  530. ----------
  531. .. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
  532. minimal d-separators in linear time and applications." In
  533. Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
  534. """
  535. is_element_of_A = n in a
  536. # almost_definite_status = True # always true for DAGs; not so for RCGs
  537. collider_if_in_Z = v not in z or (e and not f)
  538. return is_element_of_A and collider_if_in_Z # and almost_definite_status
  539. queue = deque([])
  540. for node in x:
  541. if bool(G.pred[node]):
  542. queue.append((True, node))
  543. if bool(G.succ[node]):
  544. queue.append((False, node))
  545. processed = queue.copy()
  546. while any(queue):
  547. e, v = queue.popleft()
  548. preds = ((False, n) for n in G.pred[v])
  549. succs = ((True, n) for n in G.succ[v])
  550. f_n_pairs = chain(preds, succs)
  551. for f, n in f_n_pairs:
  552. if (f, n) not in processed and _pass(e, v, f, n):
  553. queue.append((f, n))
  554. processed.append((f, n))
  555. return {w for (_, w) in processed}