{"id":229,"date":"2021-07-12T14:33:27","date_gmt":"2021-07-12T14:33:27","guid":{"rendered":"https:\/\/fujie.ece.ufl.edu\/?page_id=229"},"modified":"2025-12-11T14:16:48","modified_gmt":"2025-12-11T19:16:48","slug":"learning-based-planning-with-temporal-logic-constraints","status":"publish","type":"page","link":"https:\/\/faculty.eng.ufl.edu\/fujie\/research\/learning-based-planning-with-temporal-logic-constraints\/","title":{"rendered":"Learning-based Planning with Temporal Logic Constraints"},"content":{"rendered":"<p>This\u00a0project is to develop a model-free reinforcement learning method for stochastic planning under temporal logic constraints. Using temporal logic formulas instead of reward function, we can rigorously express the desired properties to be achieved in the learned policies for stochastic systems. It will also provide provable performance guarantees in the controller learned from RL algorithms. With Temporal Logic, we can specify tasks such as, \u201cvisiting a sequence of waypoints A, B,C,D while avoiding obstacles\u201d, \u201cif the region A is visited, then region B must be visited before region C\u201d, \u201cif the probability of reaching A is less than 0.2, then only B,C,D need to be visited with a probability greater than 0.7.\u201d, and logical compositions of these properties.<\/p>\n<ul>\n<li><strong>Probabilistic Planning with constraints in PCTL:<\/strong> In recent work [1], we\u00a0propose an approach to translate high-level system specifications expressed by a subclass of Probabilistic Computational Tree Logic (PCTL) into chance constraints. We devise a variant of Approximate Dynamic Programming method\u2014approximate value iteration\u2014 to solve for the optimal policy while the satisfaction of the PCTL formula is guaranteed.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.55.10-PM-300x188-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-239 size-full\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.55.10-PM-300x188-3.png\" alt=\"\" width=\"300\" height=\"188\" \/><\/a> <a href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.55.25-PM-300x188-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-235 size-full alignright\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.55.25-PM-300x188-2.png\" alt=\"\" width=\"300\" height=\"188\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Exploiting the structure of automata to enable data- and sample- efficient RL for LTL formulas:\u00a0<\/strong> In [2], we\u00a0study model-free reinforcement learning to maximize the probability of satisfying high-level system specifications expressed in a subclass of temporal logic formulas\u2014syntactically co- safe linear temporal logic. In order to address the issue of <em>sparse reward<\/em> given by satisfaction of temporal logic formula,\u00a0we propose a topological approximate dynamic programming which includes two steps: First, we decompose the planning problem into a sequence of sub-problems based on the topological property of the task automaton which is translated from a temporal logic formula. Second, we extend a model-free approximate dynamic programming method to solve value functions, one for each state in the task automaton, in an order reverse to the causal dependency. Particularly, we show that the run-time of the proposed algorithm does not grow exponentially with the size of specifications. The correctness and efficiency of the algorithm are demonstrated using a robotic motion planning example.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.59.17-PM-300x188-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-193\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.59.17-PM-300x188-1.png\" alt=\"\" width=\"300\" height=\"188\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><a style=\"font-family: gentona, 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size: 18px;font-style: normal;color: #e3722c;text-decoration: none\" href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-7.00.09-PM-768x403-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-195 alignright\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-7.00.09-PM-768x403-1-300x157.png\" alt=\"\" width=\"300\" height=\"157\" srcset=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-7.00.09-PM-768x403-1-300x157.png 300w, https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-7.00.09-PM-768x403-1.png 768w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Hierarchical Probabilistic Planning as Logical Composition:<\/strong> In [3], we combine hierarchical planning with logical compositional reasoning for improving the scalability of RL with temporal logic taskspecifications. This is because, in practice, scalability poses the main barrier for many systems since the size of policy grows exponentially in the size of the specification formula. To tackle this issue, we proposed a compositional planning approach that decomposes the original temporal logic-constrained planning problem to several problems with smaller and simpler specifications. By establishing the relations between compositional reasoning in quantitative temporal logic and composability of a class of near-optimal policies called entropy-regulated optimal policies, we develop an algorithm that is able to generate modular sub-policies under the set of simple specifications and further reuse and compose them to solve for a good initial policy under the original specification. Moreover, based on the option framework in hierarchical MDP planning, the optimal policy can be obtained efficiently via planning with both primitive actions, sub-policies, and the \u2018compositions\u2019 (or precisely, arbitration) of sub-policies.<\/li>\n<\/ul>\n<p><em>Example of composition:<\/em> Consider the robot wants to visit regions labeled 1,2,3 in a given order. the options include the optimal policies for reaching regions 1, 2, 3. The composed policy is based on generalized logical composition to construct the semi-optimal policy for reaching <strong>1 OR 2<\/strong> by<strong> composing the two options of reaching 1 , reaching 2,<\/strong> using a composition rule specified in paper [3], instead of recomputing a new policy for the task.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/option_compose-300x231-2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-233\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/option_compose-300x231-2.jpg\" alt=\"\" width=\"300\" height=\"231\" \/><\/a><\/p>\n<p><a href=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.59.17-PM-300x188-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-193\" src=\"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-content\/uploads\/sites\/701\/2021\/07\/Screen-Shot-2019-09-26-at-6.59.17-PM-300x188-1.png\" alt=\"\" width=\"300\" height=\"188\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Project Opportunities:\u00a0Based on the existing work, we are interested in exploring integrating preference logic and temporal logic for preference-based decision-making in stochastic environments. If you are interested in this direction, please reach out to Dr. Jie Fu for more information.<\/strong><\/p>\n<p><strong>Related Publications:<\/strong><\/p>\n<p>[1] Lening Li, Jie Fu, \u201cApproximate Dynamic Programming with Probabilistic Temporal Logic Constraints\u201d, <a href=\"https:\/\/arxiv.org\/abs\/1810.02199\" target=\"_blank\" rel=\"noopener\">arXiv:1810.02199<\/a>, Annual American Control Conference, 2019.<\/p>\n<p>[2] Lening Li, Jie Fu, \u201cTopological Approximate Dynamic Programming under Temporal Logic Constraints\u201d, <a href=\"https:\/\/arxiv.org\/abs\/1907.10510\" target=\"_blank\" rel=\"noopener\">arXiv: 1907.10510<\/a>, IEEE Conference on Decision and Control, 2019.<\/p>\n<p>[3] Xuan Liu, Jie Fu,\u00a0 \u201cCompositional Planning in Markov Decision Process: Temporal abstraction meets generalized logic composition\u201d,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1810.02497\" target=\"_blank\" rel=\"noopener\">arXiv:1810.02497<\/a>\u00a0<a href=\"https:\/\/users.wpi.edu\/~jfu2\/math.OC\">math.OC<\/a>,\u00a0<i>American Control Conference,<\/i>\u00a02019.<\/p>\n<p>[4]Lening Li, Jie Fu, \u201cSampling-based approximate optimal temporal logic planning\u201d,\u00a0<i>IEEE International Conference on Robotics and Automation (ICRA),\u00a0<\/i>2017.<\/p>\n<p>[5] Ivan Papusha, Jie Fu, Ufuk Topcu, and Richard M. Murray, \u201cAutomata Theory Meets Approximate Dynamic Programming: Optimal Control with Temporal Logic Constraints\u201d,\u00a0<i>IEEE Conference on Decision and Control,<\/i>\u00a02016.\u00a0<a href=\"\/fujie\/wp-content\/uploads\/sites\/79\/2021\/07\/adpltl_cdc2016.pdf?_ga=2.221593560.157219153.1626099855-2088537602.1625171532\" target=\"_blank\" rel=\"noopener\">pdf<\/a><\/p>\n<p>[6] Jie Fu and Ufuk Topcu, \u201cComputational methods for stochastic control with metric interval temporal logic specifications,\u201d\u00a0<i>IEEE Conference on Decision and Control,<\/i>\u00a02015.\u00a0<a href=\"\/fujie\/wp-content\/uploads\/sites\/79\/2021\/07\/\/Fu-Topcu-2015-CDC.pdf?_ga=2.221593560.157219153.1626099855-2088537602.1625171532\" target=\"_blank\" rel=\"noopener\">pdf<\/a><\/p>\n<p>[7] Jie Fu, Shuo Han, and Ufuk Topcu, \u201cOptimal control in Markov decision processes via distributed optimization,\u201d\u00a0<i>IEEE Conference on Decision and Control<\/i>, 2015.\u00a0<a href=\"\/fujie\/wp-content\/uploads\/sites\/79\/2021\/07\/Fu-Han-Topcu-2015-CDC.pdf?_ga=2.221593560.157219153.1626099855-2088537602.1625171532\" target=\"_blank\" rel=\"noopener\">pdf<\/a><\/p>\n<p>[8] Jie Fu and Ufuk Topcu, \u201cPareto efficiency in synthesizing shared autonomy policies with temporal logic constraints.\u201d\u00a0<i>IEEE International Conference on Robotics and Automation,<\/i>\u00a02015.\u00a0<a href=\"\/fujie\/wp-content\/uploads\/sites\/79\/2021\/07\/Fu-Topcu-2015-ICRA.pdf?_ga=2.221593560.157219153.1626099855-2088537602.1625171532\" target=\"_blank\" rel=\"noopener\">pdf<\/a><\/p>\n<p>[9] Jie Fu and Ufuk Topcu, \u201cProbably approximately correct MDP learning and control with temporal logic constraints\u201d,\u00a0<i>Robotics: Science and Systems,<\/i>\u00a02014.\u00a0<a href=\"\/fujie\/wp-content\/uploads\/sites\/79\/2021\/07\/Fu-Topcu-2014-RSS.pdf?_ga=2.221593560.157219153.1626099855-2088537602.1625171532\" target=\"_blank\" rel=\"noopener\">pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This\u00a0project is to develop a model-free reinforcement learning method for stochastic planning under temporal logic constraints. <\/p>\n","protected":false},"author":1383,"featured_media":233,"parent":11,"menu_order":3,"comment_status":"closed","ping_status":"closed","template":"page-templates\/page-sidebar-none.php","meta":{"_acf_changed":false,"inline_featured_image":false,"featured_post":"","footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-229","page","type-page","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/pages\/229","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/users\/1383"}],"replies":[{"embeddable":true,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/comments?post=229"}],"version-history":[{"count":1,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/pages\/229\/revisions"}],"predecessor-version":[{"id":567,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/pages\/229\/revisions\/567"}],"up":[{"embeddable":true,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/pages\/11"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/media\/233"}],"wp:attachment":[{"href":"https:\/\/faculty.eng.ufl.edu\/fujie\/wp-json\/wp\/v2\/media?parent=229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}