Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
survey

Reinforcement Learning in Healthcare: A Survey

Published: 23 November 2021 Publication History

Abstract

As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision making by using interaction samples of an agent with its environment and the potentially delayed feedbacks. In contrast to traditional supervised learning that typically relies on one-shot, exhaustive, and supervised reward signals, RL tackles sequential decision-making problems with sampled, evaluative, and delayed feedbacks simultaneously. Such a distinctive feature makes RL techniques a suitable candidate for developing powerful solutions in various healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged period with delayed feedbacks. By first briefly examining theoretical foundations and key methods in RL research, this survey provides an extensive overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis, and many other control or scheduling problems that have infiltrated every aspect of the healthcare system. In addition, we discuss the challenges and open issues in the current research and highlight some potential solutions and directions for future research.

Supplementary Material

yu (yu.zip)
Supplemental movie, appendix, image and software files for, Reinforcement Learning in Healthcare: A Survey

References

[1]
David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans. 2017. Agent-agnostic human-in-the-loop reinforcement learning. arXiv:1701.04079. Retrieved from https://arxiv.org/abs/1701.04079.
[2]
Giovanni Acampora, Diane J. Cook, Parisa Rashidi, and Athanasios V. Vasilakos. 2013. A survey on ambient intelligence in healthcare. Proc. IEEE 101, 12 (2013), 2470–2494.
[3]
Brian M. Adams, Harvey T. Banks, Hee-Dae Kwon, and Hien T. Tran. 2004. Dynamic multidrug therapies for HIV: Optimal and STI control approaches. Math. Biosci. Eng. 1, 2 (2004), 223–241.
[4]
Inkyung Ahn and Jooyoung Park. 2011. Drug scheduling of cancer chemotherapy based on natural actor-critic approach. BioSystems 106, 2–3 (2011), 121–129.
[5]
Riad Akrour, Marc Schoenauer, and Michèle Sebag. 2012. April: Active preference learning-based reinforcement learning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 116–131.
[6]
Walid Abdullah Al and Il Dong Yun. 2019. Partial policy-based reinforcement learning for anatomical landmark localization in 3d medical images. IEEE Transactions on Medical Imaging 39, 4 (2019), 1245–1255.
[7]
Amir Alansary, Loic Le Folgoc, Ghislain Vaillant, Ozan Oktay, Yuanwei Li, Wenjia Bai, Jonathan Passerat-Palmbach, Ricardo Guerrero, Konstantinos Kamnitsas, Benjamin Hou, et al. 2018. Automatic view planning with multi-scale deep reinforcement learning agents. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 277–285.
[8]
Amir Alansary, Ozan Oktay, Yuanwei Li, Loic Le Folgoc, Benjamin Hou, Ghislain Vaillant, Ben Glocker, Bernhard Kainz, and Daniel Rueckert. 2018. Evaluating reinforcement learning agents for anatomical landmark detection. Medical Image Analysis 53 (2018), 156–164.
[9]
A. Ml Albisser, B. S. Leibel, T. G. Ewart, Z. Davidovac, C. K. Botz, W. Zingg, H. Schipper, and R. Gander. 1974. Clinical control of diabetes by the artificial pancreas. Diabetes 23, 5 (1974), 397–404.
[10]
Ali Alinejad, Nada Y. Philip, and Robert S. H. Istepanian. 2012. Cross-layer ultrasound video streaming over mobile WiMAX and HSUPA networks. IEEE Trans. Inf. Technol. Biomed. 16, 1 (2012), 31–39.
[11]
Hideki Asoh, Masanori Shiro1 Shotaro Akaho, Toshihiro Kamishima, Koiti Hasida, Eiji Aramaki, and Takahide Kohro. 2013. An application of inverse reinforcement learning to medical records of diabetes treatment. In Proceedings of the Workshop on Reinforcement Learning with Generalized Feedback (ECMLPKDD’13). 1–8.
[12]
Hideki Asoh, Masanori Shiro, Shotaro Akaho, Toshihiro Kamishima, K. Hashida, Eiji Aramaki, and Takahide Kohro. 2013. Modeling medical records of diabetes using Markov decision processes. In Proceedings of the ICML’13 Workshop on Role of Machine Learning in Transforming Healthcare.
[13]
Susan Athey and Guido W. Imbens. 2015. Machine learning methods for estimating heterogeneous causal effects. stat 1050, 5 (2015).
[14]
Donghoon Baek, Minho Hwang, Hansoul Kim, and Dong-Soo Kwon. 2018. Path planning for automation of surgery robot based on probabilistic roadmap and reinforcement learning. In Proceedings of the 15th International Conference on Ubiquitous Robots (UR’18). IEEE, 342–347.
[15]
Abiral Baniya, Stephen Herrmann, Qiquan Qiao, and Huitian Lu. 2017. Adaptive interventions treatment modelling and regimen optimization using sequential multiple assignment randomized trials (SMART) and Q-learning. In Proceedings of the IIE Annual Conference. 1187–1192.
[16]
Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable reinforcement learning via policy extraction. In Advances in Neural Information Processing Systems. MIT Press, 2499–2509.
[17]
Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, Vol. 29, MIT Press, 1471–1479.
[18]
Richard Bellman. 2013. Dynamic Programming. Courier Corporation.
[19]
Richard Ernest Bellman. 1983. Mathematical Methods in Medicine. World Scientific.
[20]
A. V. Bernstein and E. V. Burnaev. 2018. Reinforcement learning in computer vision. In Proceedings of the International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’17), Vol. 10696. International Society for Optics and Photonics, 106961S.
[21]
Surya Bhupatiraju, Kumar Krishna Agrawal, and Rishabh Singh. 2018. Towards mixed optimization for reinforcement learning with program synthesis. arXiv:1807.00403. Retrieved from https://arxiv.org/abs/1807.00403.
[22]
Eddy C. Borera, Brett L. Moore, Anthony G. Doufas, and Larry D. Pyeatt. 2011. An adaptive neural network filter for improved patient state estimation in closed-loop anesthesia control. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI’11). IEEE, 41–46.
[23]
Melanie K. Bothe, Luke Dickens, Katrin Reichel, Arn Tellmann, Björn Ellger, Martin Westphal, and Ahmed A. Faisal. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Rev. Med. Devices 10, 5 (2013), 661–673.
[24]
Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, and Eyke Hüllermeier. 2014. Preference-based reinforcement learning: Evolutionary direct policy search using a preference-based racing algorithm. Mach. Learn. 97, 3 (2014), 327–351.
[25]
Keith Bush and Joelle Pineau. 2009. Manifold embeddings for model-based reinforcement learning under partial observability. In Advances in Neural Information Processing Systems. MIT Press, 189–197.
[26]
Emily L. Butler, Eric B. Laber, Sonia M. Davis, and Michael R. Kosorok. 2017. Incorporating patient preferences into estimation of optimal individualized treatment rules. Biometrics (2017).
[27]
Stephen W. Carden and James Livsey. 2017. Small-sample reinforcement learning: Improving policies using synthetic data 1. Intell. Decis. Technol. 11, 2 (2017), 167–175.
[28]
Bibhas Chakraborty and Erica E. M. Moodie. 2013. Statistical Reinforcement Learning. Springer, New York. 31–52 pages.
[29]
Bibhas Chakraborty, Susan Murphy, and Victor Strecher. 2010. Inference for non-regular parameters in optimal dynamic treatment regimes. Stat. Methods Med. Res. 19, 3 (2010), 317–343.
[30]
Bibhas Chakraborty, Victor Strecher, and S. A. Murphy. 2008. Bias correction and confidence intervals for fitted Q-iteration. In Proceedings of the Workshop on Model Uncertainty and Risk in Reinforcement Learning (NIPS’08). Citeseer.
[31]
Chun-Hao Chang, Mingjie Mai, and Anna Goldenberg. 2018. Dynamic measurement scheduling for adverse event forecasting using deep RL. arXiv:1812.00268. Retrieved from https://arxiv.org/abs/1812.00268.
[32]
Edward Y. Chang, Meng-Hsi Wu, Kai-Fu Tang Tang, Hao-Cheng Kao, and Chun-Nan Chou. 2017. Artificial intelligence in XPRIZE DeepQ tricorder. In Proceedings of the 2nd International Workshop on Multimedia for Personal Health and Health Care. ACM, 11–18.
[33]
Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2015. Distilling knowledge from deep networks with applications to healthcare domain. arXiv:1512.03542. Retrieved from https://arxiv.org/abs/1512.03542.
[34]
Jie Chen, Henry Y. K. Lau, Wenjun Xu, and Hongliang Ren. 2016. Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning. In Proceedings of the International Conference on Advanced Computational Intelligence (ICACI’16). IEEE, 378–384.
[35]
Li-Fang Cheng, Niranjani Prasad, and Barbara E. Engelhardt. 2018. An optimal policy for patient laboratory tests in intensive care units. arXiv:1808.04679. Retrieved from https://arxiv.org/abs/1808.04679.
[36]
Weiwei Cheng, Johannes Fürnkranz, Eyke Hüllermeier, and Sang-Hyeun Park. 2011. Preference-based policy iteration: Leveraging preference learning for reinforcement learning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 312–327.
[37]
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M Hoffman, et al. 2018. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface 15, 141 (2018), 20170387.
[38]
Tianshu Chu, Jie Wang, and Jiayu Chen. 2016. An adaptive online learning framework for practical breast cancer diagnosis. In Medical Imaging 2016: Computer-Aided Diagnosis, Vol. 9785. International Society for Optics and Photonics, 978524.
[39]
Elena Daskalaki, Peter Diem, and Stavroula G. Mougiakakou. 2013. Personalized tuning of a reinforcement learning control algorithm for glucose regulation. In Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’13). IEEE, 3487–3490.
[40]
Elena Daskalaki, Luca Scarnato, Peter Diem, and Stavroula G. Mougiakakou. 2010. Preliminary results of a novel approach for glucose regulation using an Actor-Critic learning based controller. IET, 1–5.
[41]
Audrey De Jong, Giuseppe Citerio, and Samir Jaber. 2017. Focus on ventilation and airway management in the ICU. Intens. Care Med. 43, 12 (2017), 1912–1915.
[42]
Marc Peter Deisenroth, Gerhard Neumann, and Jan Peters. 2013. A survey on policy search for robotics. Foundations and trends in Robotics 2, 1–2 (2013), 388–403.
[43]
Kun Deng, Russ Greiner, and Susan Murphy. 2014. Budgeted learning for developing personalized treatment. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA’14). IEEE, 7–14.
[44]
Steven E. Dilsizian and Eliot L. Siegel. 2014. Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr. Cardiol. Rep. 16, 1 (2014), 441.
[45]
Francisco Elizalde, Enrique Sucar, Julieta Noguez, and Alberto Reyes. 2009. Generating explanations based on Markov decision processes. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 51–62.
[46]
Damien Ernst, Pierre Geurts, and Louis Wehenkel. 2005. Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6 (Apr. 2005), 503–556.
[47]
Damien Ernst, Guy-Bart Stan, Jorge Goncalves, and Louis Wehenkel. 2006. Clinical data based optimal STI strategies for HIV: A reinforcement learning approach. In Proceedings of the 45th IEEE Conference on Decision and Control. IEEE, 667–672.
[48]
Ashkan Ertefaie, Susan Shortreed, and Bibhas Chakraborty. 2016. Q-learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia. Stat. Med. 35, 13 (2016), 2221–2234.
[49]
Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Joan Vila-Francés, and Rafael Magdalena-Benedito. 2011. Adaptive treatment of anemia on hemodialysis patients: A reinforcement learning approach. In Proceedings of the Annual Conference of the Center for Information-Development Management (CIDM’11). IEEE, 44–49.
[50]
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. 2019. A guide to deep learning in healthcare. Nat. Med. 25, 1 (2019), 24.
[51]
Mayalen Etcheverry, Bogdan Georgescu, Benjamin Odry, Thomas J. Re, Shivam Kaushik, Bernhard Geiger, Nadar Mariappan, Sasa Grbic, and Dorin Comaniciu. 2018. Nonlinear adaptively learned optimization for object localization in 3D medical images. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 254–262.
[52]
ARDS Definition Task Force, V. M. Ranieri, G. D. Rubenfeld, et al. 2012. Acute respiratory distress syndrome. J. Am. Assoc. 307, 23 (2012), 2526–2533.
[53]
Evan M. Forman, Stephanie G. Kerrigan, Meghan L. Butryn, Adrienne S. Juarascio, Stephanie M. Manasse, Santiago Ontañón, Diane H. Dallal, Rebecca J. Crochiere, and Danielle Moskow. 2018. Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss?J. Behav. Med. (2018), 1–15.
[54]
Johannes Fürnkranz, Eyke Hüllermeier, Weiwei Cheng, and Sang-Hyeun Park. 2012. Preference-based reinforcement learning: A formal framework and a policy iteration algorithm. Mach. Learn. 89, 1–2 (2012), 123–156.
[55]
Joseph Futoma, Anthony Lin, Mark Sendak, Armando Bedoya, Meredith Clement, Cara O’Brien, and Katherine Heller. 2018. Learning to treat sepsis with multi-output gaussian process deep recurrent Q-Networks. https://openreview.net/forum.
[56]
Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1 (2015), 1437–1480.
[57]
Adam E. Gaweda, Mehmet K. Muezzinoglu, George R. Aronoff, Alfred A. Jacobs, Jacek M. Zurada, and Michael E. Brier. 2005. Individualization of pharmacological anemia management using reinforcement learning. Neural Netw. 18, 5-6 (2005), 826–834.
[58]
Adam E. Gaweda, Mehmet K. Muezzinoglu, George R. Aronoff, Alfred A. Jacobs, Jacek M. Zurada, and Michael E. Brier. 2005. Incorporating prior knowledge into Q-learning for drug delivery individualization. In Proceedings of the 4th International Conference on Machine Learning and Applications. IEEE, 6–pp.
[59]
Adam E. Gaweda, Mehmet K. Muezzinoglu, George R. Aronoff, Alfred A. Jacobs, Jacek M. Zurada, and Michael E. Brier. 2005. Reinforcement learning approach to individualization of chronic pharmacotherapy. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’05), Vol. 5. IEEE, 3290–3295.
[60]
Adam E. Gaweda, Mehmet K. Muezzinoglu, Alfred A. Jacobs, George R. Aronoff, and Michael E. Brier. 2006. Model predictive control with reinforcement learning for drug delivery in renal anemia management. In Proceedings of the IEEE Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS’06). IEEE, 5177–5180.
[61]
Marzyeh Ghassemi, Leo Anthony Celi, and David J. Stone. 2015. State of the art review: The data revolution in critical care. Crit. Care 19, 1 (2015), 118.
[62]
Mohammad M. Ghassemi, Stefan E. Richter, Ifeoma M. Eche, Tszyi W. Chen, John Danziger, and Leo A. Celi. 2014. A data-driven approach to optimized medication dosing: A focus on heparin. Intens. Care Med. 40, 9 (2014), 1332–1339.
[63]
Florin C. Ghesu, Bogdan Georgescu, Sasa Grbic, Andreas Maier, Joachim Hornegger, and Dorin Comaniciu. 2018. Towards intelligent robust detection of anatomical structures in incomplete volumetric data. Med. Image Anal. 48 (2018), 203–213.
[64]
Florin C. Ghesu, Bogdan Georgescu, Tommaso Mansi, Dominik Neumann, Joachim Hornegger, and Dorin Comaniciu. 2016. An artificial agent for anatomical landmark detection in medical images. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 229–237.
[65]
Florin Cristian Ghesu, Bogdan Georgescu, Yefeng Zheng, Sasa Grbic, Andreas Maier, Joachim Hornegger, and Dorin Comaniciu. 2017. Multi-scale deep reinforcement learning for real-time 3D-landmark detection in CT scans. IEEE Trans. Pattern Anal. Mach. Intell. (2017).
[66]
Ruben Glatt, Felipe Leno Da Silva, Reinaldo Augusto da Costa Bianchi, and Anna Helena Reali Costa. 2020. Decaf: Deep case-based policy inference for knowledge transfer in reinforcement learning. Expert Syst. Appl. 156 (2020), 113420.
[67]
Yair Goldberg and Michael R. Kosorok. 2012. Q-learning with censored data. Ann. Stat. 40, 1 (2012), 529.
[68]
Tiago Salgado Magalhães Taveira Gomes. 2017. Reinforcement learning for primary care and appointment scheduling. Faculdade de Engenharia da Universidade do Porto Mestrado de Engenharia da Informação.
[69]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672–2680.
[70]
Travis R. Goodwin and Sanda M. Harabagiu. 2016. Medical question answering for clinical decision support. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 297–306.
[71]
Omer Gottesman, Fredrik Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, and Leo Anthony Celi. 2019. Guidelines for reinforcement learning in healthcare. Nat. Med. 25, 1 (2019), 16.
[72]
Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, et al. 2018. Evaluating reinforcement learning algorithms in observational health settings. arXiv:1805.12298. Retrieved from https://arxiv.org/abs/1805.12298.
[73]
Sander Greenland. 2000. Causal analysis in the health sciences. J. Am. Statist. Assoc. 95, 449 (2000), 286–289.
[74]
Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. 2012. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Syst. Man Cybernet. C (Appl. Rev.) 42, 6 (2012), 1291–1307.
[75]
Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. 2003. Efficient solution algorithms for factored MDPs. J. Artif. Intell. Res. 19 (2003), 399–468.
[76]
Arthur Guez. 2010. Adaptive Control of Epileptic Seizures Using Reinforcement Learning. Ph.D. Dissertation. McGill University Library.
[77]
Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau. 2008. Adaptive treatment of epilepsy via batch-mode reinforcement learning. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’08). AAAI Press, 1671–1678.
[78]
Amin Hassani et al. 2010. Reinforcement learning based control of tumor growth with chemotherapy. In Proceedings of the International Conference on System Science and Engineering (ICSSE’10). IEEE, 185–189.
[79]
Jianxing He, Sally L. Baxter, Jie Xu, Jiming Xu, Xingtao Zhou, and Kang Zhang. 2019. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 1 (2019), 30.
[80]
Daniel Hein, Steffen Udluft, and Thomas A. Runkler. 2018. Interpretable policies for reinforcement learning by genetic programming. Eng. Appl. Artif. Intell. 76 (2018), 158–169.
[81]
Bernhard Hengst. 2012. Hierarchical approaches. In Reinforcement Learning. Springer, 293–323.
[82]
Todd Hester and Peter Stone. 2012. Learning and using models. In Reinforcement Learning. Springer, 111–141.
[83]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. stat 1050 (2015), 9.
[84]
Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, and Elad Yom-Tov. 2016. A reinforcement learning system to encourage physical activity in diabetes patients. arXiv:1605.04070. Retrieved from https://arxiv.org/abs/1605.04070.
[85]
Andreas Holzinger. 2016. Interactive machine learning for health informatics: When do we need the human-in-the-loop?Brain Inf. 3, 2 (2016), 119–131.
[86]
Chuanpu Hu, William S. Lovejoy, and Steven L. Shafer. 1994. Comparison of some control strategies for three-compartment PK/PD models. J. Pharmacokinet. Biopharmaceut. 22, 6 (1994), 525–550.
[87]
Zhengxing Huang, Wil M. P. van der Aalst, Xudong Lu, and Huilong Duan. 2011. Reinforcement learning based resource allocation in business process management. Data Knowl. Eng. 70, 1 (2011), 127–145.
[88]
Pierre Humbert, Julien Audiffren, Clément Dubost, and Laurent Oudre. Learning from an expert. In Porceedings of 30th Conference on Neural Information Processing Systems (NIPS). 1–5.
[89]
Kyle Humphrey. 2017. Using reinforcement learning to personalize dosing strategies in a simulated cancer trial with high dimensional data. Master’s Theses, The University of Arizona.
[90]
Robert S. H. Istepanian, Nada Y. Philip, and Maria G. Martini. 2009. Medical QoS provision based on reinforcement learning in ultrasound streaming over 3.5 G wireless systems. IEEE J. Select. Areas Commun. 27, 4 (2009).
[91]
Tommi Jaakkola, Satinder P. Singh, and Michael I. Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in Neural Information Processing Systems. MIT Press, 345–352.
[92]
Abhyuday Jagannatha, Philip Thomas, and Hong Yu. 2018. Towards high confidence off-policy reinforcement learning for clinical applications. In CausalML Workshop. ICML.
[93]
Kathleen M. Jagodnik, Philip S. Thomas, Antonie J. van den Bogert, Michael S. Branicky, and Robert F. Kirsch. 2017. Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 10 (2017), 1892–1905.
[94]
Ammar Jalalimanesh, Hamidreza Shahabi Haghighi, Abbas Ahmadi, Hossein Hejazian, and Madjid Soltani. 2017. Multi-objective optimization of radiotherapy: Distributed Q-learning and agent-based simulation. J. Exp. Theoret. Artif. Intell. (2017), 1–16.
[95]
Ammar Jalalimanesh, Hamidreza Shahabi Haghighi, Abbas Ahmadi, and Madjid Soltani. 2017. Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning. Math. Comput. Simul. 133 (2017), 235–248.
[96]
J. Larry Jameson and Dan L. Longo. 2015. Precision medicineiopersonalized, problematic, and promising. Obstetr. Gynecol. Surv. 70, 10 (2015), 612–614.
[97]
Roger W. Jelliffe, June Buell, Robert Kalaba, R. Sridhar, and Richard Rockwell. 1970. A computer program for digitalis dosage regimens. Math. Biosci. 9 (1970), 179–193.
[98]
Russell Jeter, Christopher Josef, Supreeth Shashikumar, and Shamim Nemati. 2019. Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care?arXiv:1902.03271. Retrieved from https://arxiv.org/abs/1902.03271.
[99]
Fei Jiang, Yong Jiang, Hui Zhi, Yi Dong, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong, Haipeng Shen, and Yongjun Wang. 2017. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2, 4 (2017), 230–243.
[100]
Alistair E. W. Johnson, Mohammad M. Ghassemi, Shamim Nemati, Katherine E. Niehaus, David A. Clifton, and Gari D. Clifford. 2016. Machine learning and decision support in critical care. Proc. IEEE 104, 2 (2016), 444–466.
[101]
Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, H. Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Sci. Data 3 (2016), 160035.
[102]
Tadashi Kamio, Tomoaki Van, and Ken Masamune. 2017. Use of machine-learning approaches to predict clinical deterioration in critically ill patients: A systematicreview. Int. J. Med. Res. Health Sci. 6, 6 (2017), 1–7.
[103]
Hao-Cheng Kao, Kai-Fu Tang, and Edward Y. Chang. 2018. Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018), 2305–2313.
[104]
Kenji Kawaguchi. 2016. Bounded optimal exploration in MDP. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 1758–1764.
[105]
Richard S. E. Keefe, Robert M. Bilder, Sonia M. Davis, Philip D. Harvey, Barton W. Palmer, James M. Gold, Herbert Y. Meltzer, Michael F. Green, George Capuano, T. Scott Stroup, et al. 2007. Neurocognitive effects of antipsychotic medications in patients with chronic schizophrenia in the CATIE Trial. Arch. Gen. Psychiatr. 64, 6 (2007), 633–647.
[106]
Taylor Killian, George Konidaris, and Finale Doshi-Velez. 2016. Transfer learning across patient variations with hidden parameter Markov decision processes. arXiv:1612.00475. Retrieved from https://arxiv.org/abs/1612.00475.
[107]
Taylor W. Killian, Samuel Daulton, George Konidaris, and Finale Doshi-Velez. 2017. Robust and efficient transfer learning with hidden parameter Markov decision processes. In Advances in Neural Information Processing Systems. MIT Press, 6250–6261.
[108]
Jens Kober and Jan R. Peters. 2009. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. MIT Press, 849–856.
[109]
Ismail Kola and John Landis. 2004. Can the pharmaceutical industry reduce attrition rates?Nat. Rev. Drug Discov. 3, 8 (2004), 711.
[110]
Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, and A. Aldo Faisal. 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 11 (2018), 1716.
[111]
M. Komorowski, A. Gordon, L. A. Celi, and A. Faisal. 2016. A Markov Decision Process to suggest optimal treatment of severe infections in intensive care. In Neural Information Processing Systems Workshop on Machine Learning for Health.
[112]
Elizabeth F. Krakow, Michael Hemmer, Tao Wang, Brent Logan, Mukta Arora, Stephen Spellman, Daniel Couriel, Amin Alousi, Joseph Pidala, Michael Last, et al. 2017. Tools for the precision medicine era: How to develop highly personalized treatment recommendations from cohort and registry data using Q-learning. Am. J. Epidemiol. 186, 2 (2017), 160–172.
[113]
Julian Krebs, Tommaso Mansi, Hervé Delingette, Li Zhang, Florin C Ghesu, Shun Miao, Andreas K Maier, Nicholas Ayache, Rui Liao, and Ali Kamen. 2017. Robust non-rigid registration through agent-based action learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 344–352.
[114]
Eric B. Laber, Kristin A. Linn, and Leonard A. Stefanski. 2014. Interactive model building for Q-learning. Biometrika 101, 4 (2014), 831–847.
[115]
Eric B. Laber, Daniel J. Lizotte, and Bradley Ferguson. 2014. Set-valued dynamic treatment regimes for competing outcomes. Biometrics 70, 1 (2014), 53–61.
[116]
Eric B. Laber, Daniel J. Lizotte, Min Qian, William E. Pelham, and Susan A. Murphy. 2014. Dynamic treatment regimes: Technical challenges and applications. Electr. J. Stat. 8, 1 (2014), 1225.
[117]
Michail G. Lagoudakis and Ronald Parr. 2003. Least-squares policy iteration. J. Mach. Learn. Res. 4 (Dec. 2003), 1107–1149.
[118]
Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. 2017. Building machines that learn and think like people. Behav. Brain Sci. 40 (2017).
[119]
Huitian Lei, Ambuj Tewari, and Susan Murphy. 2014. An actor-critic contextual bandit algorithm for personalized interventions using mobile devices. In Advances in Neural Information Processing Systems, Vol. 27. MIT Press, 1–9.
[120]
Leonard Leibovici, Michal Fishman, Henrik C. Schonheyder, Christian Riekehr, Brian Kristensen, Ilana Shraga, and Steen Andreassen. 2000. A causal probabilistic network for optimal treatment of bacterial infections. IEEE Trans. Knowl. Data Eng. 12, 4 (2000), 517–528.
[121]
Kun Li and Joel W. Burdick. 2017. A function approximation method for model-based high-dimensional inverse reinforcement learning. arXiv:1708.07738. Retrieved from https://arxiv.org/abs/1708.07738.
[122]
Kun Li, Mrinal Rath, and Joel W. Burdick. 2018. Inverse reinforcement learning via function approximation for clinical motion analysis. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). IEEE, 610–617.
[123]
Lihong Li. 2012. Sample complexity bounds of exploration. In Reinforcement Learning. Springer, 175–204.
[124]
Luchen Li, Matthieu Komorowski, and Aldo A. Faisal. 2018. The actor search tree critic (ASTC) for off-policy pomdp learning in medical decision making. arXiv:1805.11548. Retrieved from https://arxiv.org/abs/1805.11548.
[125]
Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv:1701.07274. Retrieved from https://arxiv.org/abs/1701.07274.
[126]
Yuxi Li. 2018. Deep reinforcement learning. arXiv:1810.06339. Retrieved from https://arxiv.org/abs/1810.06339.
[127]
Rui Liao, Shun Miao, Pierre de Tournemire, Sasa Grbic, Ali Kamen, Tommaso Mansi, and Dorin Comaniciu. 2017. An artificial agent for robust image registration. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’17). 4168–4175.
[128]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971. Retrieved from https://arxiv.org/abs/1509.02971.
[129]
Rongmei Lin, Matthew D. Stanley, Mohammad M. Ghassemi, and Shamim Nemati. 2018. A deep deterministic policy gradient approach to medication dosing and surveillance in the ICU. In Proceedings of the IEEE Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’18). IEEE, 4927–4931.
[130]
Yuan Ling, Sadid A. Hasan, Vivek Datla, Ashequl Qadir, Kathy Lee, Joey Liu, and Oladimeji Farri. 2017. Diagnostic inferencing via improving clinical concept extraction with deep reinforcement learning: A preliminary study. In Proceedings of the Machine Learning for Healthcare Conference. ML Research Press, 271–285.
[131]
Yuan Ling, Sadid A. Hasan, Vivek Datla, Ashequl Qadir, Kathy Lee, Joey Liu, and Oladimeji Farri. 2017. Learning to diagnose: Assimilating clinical narratives using deep reinforcement learning. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. Asian Federation of Natural Language Processing, 895–905.
[132]
Kristin A. Linn, Eric B. Laber, and Leonard A. Stefanski. 2017. Interactive Q-learning for Quantiles. J. Am. Statist. Assoc. 112, 518 (2017), 638–649.
[133]
Zachary C. Lipton. 2017. The foctor just won’t accept that!arXiv:1711.08037. Retrieved from https://arxiv.org/abs/1711.08037.
[134]
Zachary C. Lipton. 2018. The mythos of model interpretability. Commun. ACM 61, 10 (2018), 36–43.
[135]
Chunming Liu, Xin Xu, and Dewen Hu. 2015. Multiobjective reinforcement learning: A comprehensive overview. IEEE Trans. Syst. Man. Cybernet.: Syst. 45, 3 (2015), 385–398.
[136]
Daochang Liu and Tingting Jiang. 2018. Deep reinforcement learning for surgical gesture segmentation and classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 247–255.
[137]
Ying Liu, Brent Logan, Ning Liu, Zhiyuan Xu, Jian Tang, and Yangzhi Wang. 2017. Deep reinforcement learning for dynamic treatment regimes on medical registry data. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI’17). IEEE, 380–385.
[138]
Ying Liu, Yuanjia Wang, Michael R. Kosorok, Yingqi Zhao, and Donglin Zeng. 2016. Robust hybrid learning for estimating personalized dynamic treatment regimens. arXiv:1611.02314. Retrieved from https://arxiv.org/abs/1611.02314.
[139]
Daniel J. Lizotte, Michael Bowling, and Susan A. Murphy. 2012. Linear fitted-q iteration with multiple reward functions. J. Mach. Learn. Res. 13 (Nov. 2012), 3253–3295.
[140]
Daniel J. Lizotte, Michael H. Bowling, and Susan A Murphy. 2010. Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In Proceedings of the International Conference on Machine Learning (ICML’10). Citeseer, 695–702.
[141]
Daniel J. Lizotte and Eric B. Laber. 2016. Multi-objective Markov decision processes for data-driven decision support. J. Mach. Learn. Res. 17, 1 (2016), 7378–7405.
[142]
Cristobal Lowery and A. Aldo Faisal. 2013. Towards efficient, personalized anesthesia using continuous reinforcement learning for propofol infusion control. In Proceedings of the IEEE/EMBS Special Topic Conference on Neural Engineering (NER’13). IEEE, 1414–1417.
[143]
Daniel J. Luckett, Eric B. Laber, Anna R. Kahkoska, David M. Maahs, Elizabeth Mayer-Davis, and Michael R. Kosorok. 2018. Estimating dynamic treatment regimes in mobile health using V-learning. Journal of the American Statistical Association 115, 530 (2018), 692–706.
[144]
Jake Luo, Min Wu, Deepika Gopukumar, and Yiqing Zhao. 2016. Big data application in biomedical research and health care: A literature review. Biomed. Informat. Insights 8 (2016), BII–S31559.
[145]
Kai Ma, Jiangping Wang, Vivek Singh, Birgi Tamersoy, Yao-Jen Chang, Andreas Wimmer, and Terrence Chen. 2017. Multimodal image registration with deep context reinforcement learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 240–248.
[146]
Francis Maes, Raphael Fonteneau, Louis Wehenkel, and Damien Ernst. 2012. Policy search in a space of simple closed-form formulas: Towards interpretability of reinforcement learning. In Proceedings of the International Conference on Discovery Science. Springer, 37–51.
[147]
Mufti Mahmud, Mohammed Shamim Kaiser, Amir Hussain, and Stefano Vassanelli. 2018. Applications of deep learning and reinforcement learning to biological data. IEEE Trans. Neural Netw. Learn. Syst. 29, 6 (2018), 2063–2079.
[148]
Gabriel Maicas, Gustavo Carneiro, Andrew P. Bradley, Jacinto C. Nascimento, and Ian Reid. 2017. Deep reinforcement learning for active breast lesion detection from dce-mri. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 665–673.
[149]
Jordan M. Malof and Adam E. Gaweda. 2011. Optimizing drug therapy with reinforcement learning: The case of anemia management. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’11). IEEE, 2088–2092.
[150]
Vukosi Ntsakisi Marivate, Jessica Chemali, Emma Brunskill, and Michael L. Littman. 2014. Quantifying uncertainty in batch personalized sequential decision making. In Proceedings of the AAAI Workshop: Modern Artificial Intelligence for Health Analytics.
[151]
José D. Martín-Guerrero, Faustino Gomez, Emilio Soria-Olivas, Jürgen Schmidhuber, Mónica Climente-Martí, and N. Víctor Jiménez-Torres. 2009. A reinforcement learning approach for individualizing erythropoietin dosages in hemodialysis patients. Expert Syst. Appl. 36, 6 (2009), 9737–9742.
[152]
José D. Martín-Guerrero, Emilio Soria-Olivas, Marcelino Martínez-Sober, Mónica Climente-Martí, Teresa De Diego-Santos, and N. Víctor Jiménez-Torres. 2007. Validation of a reinforcement learning policy for dosage optimization of erythropoietin. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 732–738.
[153]
Christopher A. Merck and Samantha Kleinberg. 2016. Causal explanation under indeterminism: A sampling approach. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 1037–1043.
[154]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[155]
Brett L. Moore, Anthony G. Doufas, and Larry D. Pyeatt. 2011. Reinforcement learning: A novel method for optimal control of propofol-induced hypnosis. Anesthes. Analges. 112, 2 (2011), 360–367.
[156]
Brett L. Moore, Periklis Panousis, Vivekanand Kulkarni, Larry D. Pyeatt, and Anthony G. Doufas. 2010. Reinforcement learning for closed-loop propofol anesthesia: A human volunteer study. In Proceedings of the Annual Conference on Innovative Applications of Artificial Intelligence (IAAI’10). AAAI Press, 1807–1813.
[157]
Brett L. Moore, Larry D. Pyeatt, Vivekanand Kulkarni, Periklis Panousis, Kevin Padrez, and Anthony G. Doufas. 2014. Reinforcement learning for closed-loop propofol anesthesia: A study in human volunteers. J. Mach. Learn. Res. 15, 1 (2014), 655–696.
[158]
Brett L. Moore, Todd M. Quasny, and Anthony G. Doufas. 2011. Reinforcement learning versus proportional–integral–derivative control of hypnosis in a simulated intraoperative patient. Anesthes. Analges. 112, 2 (2011), 350–359.
[159]
Brett L. Moore, Eric D. Sinzinger, Todd M. Quasny, and Larry D. Pyeatt. 2004. Intelligent control of closed-loop sedation in simulated ICU patients. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’04). AAAI Press, 109–114.
[160]
Susan A. Murphy. 2003. Optimal dynamic treatment regimes. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 65, 2 (2003), 331–355.
[161]
Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, and Katie Witkiewitz. 2016. A batch, off-policy, actor-critic algorithm for optimizing the average reward. arXiv:1607.05047. Retrieved from https://arxiv.org/abs/1607.05047.
[162]
Vivek Nagaraj, Andrew Lamperski, and Theoden I. Netoff. 2017. Seizure control in a computational model using a reinforcement learning stimulation paradigm. Int. J. Neur. Syst. 27, 07 (2017), 1750012.
[163]
Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, and Nathan Brown. 2018. Exploring deep recurrent models with reinforcement learning for molecule design. https://openreview.net/forum.
[164]
Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In Proceedings of the IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society. IEEE, 2978–2981.
[165]
Stelmo Magalhães Barros Netto, Vanessa Rodrigues Coelho Leite, Aristófanes Corrêa Silva, Anselmo Cardoso de Paiva, and Areolino de Almeida Neto. 2008. Application on reinforcement learning for diagnosis based on medical image. In Reinforcement Learning. InTech.
[166]
Phuong D. Ngo, Susan Wei, Anna Holubová, Jan Muzik, and Fred Godtliebsen. 2018. Control of blood glucose for type-1 diabetes by using reinforcement learning with feedforward algorithm. Comput. Math. Methods Med. (2018).
[167]
Phuong D. Ngo, Susan Wei, Anna Holubová, Jan Muzik, and Fred Godtliebsen. 2018. Reinforcement-learning optimal control for type-1 diabetes. In Proceedings of the IEEE EMBS International Conference on Biomedical & Health Informatics (BHI’18). IEEE, 333–336.
[168]
Thanh Thi Nguyen, Ngoc Duy Nguyen, Fernando Bello, and Saeid Nahavandi. 2019. A new tensioning method using deep reinforcement learning for surgical pattern cutting. In IEEE International Conference on Industrial Technology (ICIT’19). IEEE, 1339–1344.
[169]
Amin Noori, Mohammad Ali Sadrnia, et al. 2017. Glucose level control using Temporal Difference methods. In Proceedings of the Iranian Conference on Electrical Engineering (ICEE’17). IEEE, 895–900.
[170]
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hongming Chen. 2017. Molecular de-novo design through deep reinforcement learning. J. Cheminformat. 9, 1 (2017), 48.
[171]
Dirk Ormoneit and Śaunak Sen. 2002. Kernel-based reinforcement learning. Mach. Learn. 49, 2–3 (2002), 161–178.
[172]
Regina Padmanabhan, Nader Meskin, and Wassim M. Haddad. 2015. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed. Sign. Process. Contr. 22 (2015), 54–64.
[173]
Regina Padmanabhan, Nader Meskin, and Wassim M. Haddad. 2017. Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 293 (2017), 11–20.
[174]
Gabriella Panuccio, Arthur Guez, Robert Vincent, Massimo Avoli, and Joelle Pineau. 2013. Adaptive control of epileptiform excitability in an in vitro model of limbic seizures. Exp. Neurol. 241 (2013), 179–183.
[175]
Sonali Parbhoo. 2014. A Reinforcement Learning Design for HIV Clinical Trials. Ph.D. Dissertation.
[176]
Sonali Parbhoo, Jasmina Bogojeska, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2017. Combining kernel and model based learning for HIV therapy selection. AMIA Summits Transl. Sci. Proc. 2017 (2017), 239–248.
[177]
Jason Pazis and Ronald Parr. 2013. PAC Optimal exploration in continuous space markov decision processes. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 774–781.
[178]
Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-wei H. Lehman, Andrew Ross, Aldo Faisal, and Finale Doshi-Velez. 2019. Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. In AMIA Annual Symposium Proceedings, Vol. 2018. American Medical Informatics Association, 887–896.
[179]
Jan Peters and Stefan Schaal. 2008. Natural actor-critic. Neurocomputing 71, 7–9 (2008), 1180–1190.
[180]
Brenden K. Petersen, Jiachen Yang, Will S. Grathwohl, Chase Cockrell, Claudio Santiago, Gary An, and Daniel M. Faissol. 2018. Precision medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive, personalized multi-cytokine therapy for sepsis. arXiv:1802.10440. Retrieved from https://arxiv.org/abs/1802.10440.
[181]
Joelle Pineau, Marc G. Bellemare, A. John Rush, Adrian Ghizaru, and Susan A. Murphy. 2007. Constructing evidence-based treatment strategies using methods from computer science. Drug Alcohol Depend. 88 (2007), S52–S60.
[182]
Joelle Pineau, Arthur Guez, Robert Vincent, Gabriella Panuccio, and Massimo Avoli. 2009. Treating epilepsy via adaptive neurostimulation: A reinforcement learning approach. Int. J. Neur. Syst. 19, 04 (2009), 227–240.
[183]
Mariya Popova, Olexandr Isayev, and Alexander Tropsha. 2018. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, 7 (2018), eaap7885.
[184]
Niranjani Prasad, Li-Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engelhardt. 2017. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv:1704.06300. Retreived from https://arxiv.org/abs/1704.06300.
[185]
Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, and Emma Brunskill. 2018. Behaviour policy estimation in off-policy policy evaluation: Calibration matters. arXiv:1807.01066. Retrieved from https://arxiv.org/abs/1807.01066.
[186]
Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Deep Reinforcement learning for sepsis treatment. arXiv:1711.09602. Retrieved from https://arxiv.org/abs/1711.09602.
[187]
Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous state-space models for optimal sepsis treatment: A deep reinforcement learning approach. In Proceedings of the Machine Learning for Healthcare Conference. ML Research Press, 147–163.
[188]
Aniruddh Raghu, Matthieu Komorowski, and Sumeetpal Singh. 2018. Model-based reinforcement learning for sepsis treatment. arXiv:1811.09602. Retrieved from https://arxiv.org/abs/1811.09602.
[189]
Daniele Ravì, Charence Wong, Fani Deligianni, Melissa Berthelot, Javier Andreu-Perez, Benny Lo, and Guang-Zhong Yang. 2017. Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21, 1 (2017), 4–21.
[190]
Andrew Rhodes, Laura E. Evans, Waleed Alhazzani, Mitchell M. Levy, Massimo Antonelli, Ricard Ferrer, Anand Kumar, Jonathan E. Sevransky, Charles L. Sprung, Mark E. Nunnally, et al. 2017. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intens. Care Med. 43, 3 (2017), 304–377.
[191]
Martin Riedmiller. 2005. Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In Proceedings of the European Conference on Machine Learning. Springer, 317–328.
[192]
Kirk Roberts, Matthew S. Simpson, Ellen M. Voorhees, and William R. Hersh. 2016. Overview of the TREC 2016 clinical decision support track. In Proceedings of the Annual Text Retrieval Conference (TREC’16). NIST Special Publication.
[193]
Gavin A. Rummery and Mahesan Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Vol. 37. University of Cambridge, Department of Engineering Cambridge, UK.
[194]
A. John Rush, Maurizio Fava, Stephen R. Wisniewski, Philip W. Lavori, Madhukar H. Trivedi, Harold A. Sackeim, Michael E. Thase, Andrew A. Nierenberg, Frederic M. Quitkin, T. Michael Kashner, et al. 2004. Sequenced treatment alternatives to relieve depression (STAR* D): rationale and design. Control. Clin. Trials 25, 1 (2004), 119–142.
[195]
Nasser Sadati, Ali Aflaki, and Mehran Jahed. 2006. Multivariable anesthesia control using reinforcement learning. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’06), Vol. 6. IEEE, 4563–4568.
[196]
Farhang Sahba. 2016. Object segmentation in image sequences using reinforcement learning. In Proceedings of the World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCI’16). IEEE, 1416–1417.
[197]
Farhang Sahba, Hamid R. Tizhoosh, and Magdy M. A. Salama. 2006. A reinforcement learning framework for medical image segmentation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’06), Vol. 6. IEEE, 511–517.
[198]
Farhang Sahba, Hamid R. Tizhoosh, and Magdy M. M. A. Salama. 2007. Application of opposition-based reinforcement learning in image segmentation. In Proceedings of the IEEE Symposium on Computational Intelligence in Image and Signal Processing. IEEE, 246–251.
[199]
Farhang Sahba, Hamid R. Tizhoosh, and Magdy M. A. Salama. 2008. Application of reinforcement learning for segmentation of transrectal ultrasound images. BMC Med. Imag. 8, 1 (2008), 8.
[200]
Justin Salamon and Juan Pablo Bello. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Sign. Process. Lett. 24, 3 (2017), 279–283.
[201]
Suchi Saria. 2018. Individualized sepsis treatment using reinforcement learning. Nat. Med. 24, 11 (2018), 1641.
[202]
Andrew J. Schaefer, Matthew D. Bailey, Steven M. Shechter, and Mark S. Roberts. 2005. Modeling medical treatment using Markov decision processes. In Operations Research and Health Care. Springer, 593–612.
[203]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv:1511.05952. Retrieved from https://arxiv.org/abs/1511.05952.
[204]
Gisbert Schneider. 2013. De Novo Molecular Design. John Wiley & Sons.
[205]
Hans-Joerg Schuetz and Rainer Kolisch. 2012. Approximate dynamic programming for capacity allocation in the service industry. Eur. J. Operat. Res. 218, 1 (2012), 239–250.
[206]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https://arxiv.orb/abs/1707.06347.
[207]
Antonio Serrano, Baldomero Imbernón, Horacio Pérez-Sánchez, José M. Cecilia, Andrés Bueno-Crespo, and José L. Abellán. 2018. Accelerating drugs discovery with deep reinforcement learning: An early approach. In Proceedings of the 47th International Conference on Parallel Processing Companion. ACM, 1–8.
[208]
Susan M. Shortreed, Eric Laber, Daniel J. Lizotte, T. Scott Stroup, Joelle Pineau, and Susan A. Murphy. 2011. Informing sequential clinical decision-making through reinforcement learning: An empirical study. Mach. Learn. 84, 1–2 (2011), 109–136.
[209]
Jun Shu, Zongben Xu, and Deyu Meng. 2018. Small sample learning in big data era. arXiv:1808.04572. Retrieved from https://arxiv.org/abs/1808.04572.
[210]
Eric D. Sinzinger and Brett Moore. 2005. Sedation of simulated ICU patients using reinforcement learning based control. Int. J. Artif. Intell. Tools 14, 01n02 (2005), 137–156.
[211]
Yousuf M. Soliman. 2014. Personalized medical treatments using novel reinforcement learning algorithms. arXiv:1406.3922. Retrieved from https://arxiv.org/abs/1406.3922.
[212]
Rui Song, Weiwei Wang, Donglin Zeng, and Michael R. Kosorok. 2015. Penalized q-learning for dynamic treatment regimens. Stat. Sin. 25, 3 (2015), 901.
[213]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.
[214]
Fengyi Tang, Kaixiang Lin, Ikechukwu Uchendu, Hiroko H Dodge, and Jiayu Zhou. 2018. Improving mild cognitive impairment prediction via reinforcement learning and dialogue simulation. arXiv:1802.06428. Retrieved from https://arxiv.org/abs/1802.06428.
[215]
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. 2017. # Exploration: A study of count-based exploration for deep reinforcement learning. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’17). MIT Press, 2750–2759.
[216]
Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y. Chang. 2016. Inquire and diagnose: Neural Symptom checking ensemble using deep reinforcement learning. In Proceedings of the NIPS Workshop on Deep Reinforcement Learning.
[217]
Yebin Tao, Lu Wang, and Daniel Almirall. 2018. Tree-based reinforcement learning for estimating optimal dynamic treatment regimes. Ann. Appl. Stat. 12, 3 (2018), 1914–1938.
[218]
Graham W. Taylor. 2004. A reinforcement learning framework for parameter control in computer vision applications. In Proceedings of the IEEE 1st Canadian Conference on Computer and Robot Vision. IEEE, 496–503.
[219]
Matthew E. Taylor and Peter Stone. 2009. Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10 (Jul. 2009), 1633–1685.
[220]
Brijen Thananjeyan, Animesh Garg, Sanjay Krishnan, Carolyn Chen, Lauren Miller, and Ken Goldberg. 2017. Multilateral surgical pattern cutting in 2d orthotropic gauze with deep reinforcement learning policies for tensioning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’17). IEEE, 2371–2378.
[221]
Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In Proceedings of the International Conference on Machine Learning. Omnipress, 2139–2148.
[222]
Philip S. Thomas, Antonie J. van den Bogert, Kathleen M. Jagodnik, and Michael S. Branicky. 2009. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Annual Conference on Innovative Applications of Artificial Intelligence (IAAI’09). AAAI Press.
[223]
Eric J. Topol. 2019. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 25, 1 (2019), 44.
[224]
Huan Hsin Tseng, Yi Luo, Sunan Cui, Jen Tzung Chien, Randall K. Ten Haken, and Issam El Naqa. 2017. Deep reinforcement learning for automated radiation adaptation in lung cancer. Med. Phys. 44, 12 (2017), 6690–6705.
[225]
Chandra Prasetyo Utomo, Xue Li, and Weitong Chen. 2018. Treatment recommendation in critical care: A scalable and interpretable approach in partially observable health states.
[226]
Hado Van Hasselt. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement Learning. Springer, 207–251.
[227]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double Q-Learning. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’16), Vol. 2. AAAI Press, 5.
[228]
Martijn van Otterlo. 2012. Solving relational and first-order logical markov decision processes: A survey. In Reinforcement Learning. Springer, 253–292.
[229]
Alfredo Vellido, Vicent Ribas, Carles Morales, Adolfo Ruiz Sanmartín, and Juan Carlos Ruiz Rodríguez. 2018. Machine learning in critical care: State-of-the-art and a sepsis case study. Biomed. Eng. Online 17, 1 (2018), 135.
[230]
Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. In Proceedings of the International Conference on Machine Learning. Omnipress, 5052–5061.
[231]
Jean-Louis Vincent. 2013. Critical care-where have we been and where are we going?Crit. Care 17, 1 (2013), S2.
[232]
Robert Vincent. 2014. Reinforcement Learning in Models of Adaptive Medical Treatment Strategies. Ph.D. Dissertation. McGill University Libraries.
[233]
Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2447–2456.
[234]
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. Omnipress, 1995–2003.
[235]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3–4 (1992), 279–292.
[236]
Zhongyu Wei, Qianlong Liu, Baolin Peng, Huaixiao Tou, Ting Chen, Xuanjing Huang, Kam-Fai Wong, and Xiangying Dai. 2018. Task-oriented dialogue system for automatic diagnosis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. Association for Computational Linguistics, 201–207.
[237]
Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits. 2017. Representation and reinforcement learning for personalized glycemic control in septic patients. arXiv:1712.00654. Retrieved from https://arxiv.org/abs/1712.00654.
[238]
Marco Wiering and Martijn Van Otterlo. 2012. Reinforcement learning. Adapt. Learn. Optimiz. 12 (2012).
[239]
Wolfram Wiesemann, Daniel Kuhn, and Berç Rustem. 2013. Robust markov decision processes. Math. Operat. Res. 38, 1 (2013), 153–183.
[240]
Christian Wirth, Riad Akrour, Gerhard Neumann, and Johannes Fürnkranz. 2017. A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18, 1 (2017), 4945–4990.
[241]
Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond sparsity: Tree regularization of deep models for interpretability. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press.
[242]
Huan Xu and Shie Mannor. 2010. Distributionally robust Markov decision processes. In Advances in Neural Information Processing Systems. MIT Press, 2505–2513.
[243]
Jiayu Yao, Taylor Killian, George Konidaris, and Finale Doshi-Velez. 2018. Direct policy transfer via hidden parameter markov decision processes. In LLARLA Workshop, FAIM, vol. 2018.
[244]
Sholeh Yasini, Mohammad Bagher Naghibi Sistani, and Ali Karimpour. 2009. Agent-based simulation for blood glucose. Int. J. Appl. Sci. Eng. Technol. 5 (2009), 89–95.
[245]
Gregory Yauney and Pratik Shah. 2018. Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection. In Proceedings of the Machine Learning for Healthcare Conference. ML Research Press, 161–226.
[246]
Elad Yom-Tov, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, and Irit Hochberg. 2017. Encouraging Physical activity in patients with diabetes: Intervention using a reinforcement learning system. J. Med. Internet Res. 19, 10 (2017).
[247]
Chao Yu, Yinzhao Dong, Jiming Liu, and Guoqi Ren. 2019. Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV. BMC Med. Inf. Decis. Making 19, 2 (2019), 60.
[248]
Chao Yu, Jiming Liu, and Hongyi Zhao. 2019. Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Med. Inf. Decis. Making 19, 2 (2019), 57.
[249]
Chao Yu, Guoqi Ren, and Yinzhao Dong. 2020. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Med. Inf. Decis. Making (2020).
[250]
Chao Yu, Guoqi Ren, and Jiming Liu. 2019. Deep inverse reinforcement learning for sepsis treatment. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI’19). IEEE, 1–3.
[251]
Pengyue Zhang, Fusheng Wang, and Yefeng Zheng. 2018. Deep reinforcement learning for vessel centerline tracing in multi-modality 3D volumes. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 755–763.
[252]
Yufan Zhao, Michael R. Kosorok, and Donglin Zeng. 2009. Reinforcement learning design for cancer clinical trials. Stat. Med. 28, 26 (2009), 3294–3315.
[253]
Yufan Zhao, Donglin Zeng, Mark A. Socinski, and Michael R. Kosorok. 2011. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67, 4 (2011), 1422–1433.
[254]
Ya-Li Zheng, Xiao-Rong Ding, Carmen Chung Yan Poon, Benny Ping Lai Lo, Heye Zhang, Xiao-Lin Zhou, Guang-Zhong Yang, Ni Zhao, and Yuan-Ting Zhang. 2014. Unobtrusive sensing and wearable devices for health informatics. IEEE Trans. Biomed. Eng. 61, 5 (2014), 1538–1554.
[255]
Shao Zhifei and Er Meng Joo. 2012. A survey of inverse reinforcement learning techniques. Int. J. Intell. Comput. Cybernet. 5, 3 (2012), 293–311.
[256]
Feiyun Zhu, Jun Guo, Ruoyu Li, and Junzhou Huang. 2018. Robust actor-critic contextual bandit for mobile health (mhealth) interventions. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 492–501.
[257]
Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, Liu Yang, and Junzhou Huang. 2018. Group-driven reinforcement learning for personalized mhealth intervention. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 590–598.

Cited By

View all
  • (2024)Monitored Markov Decision ProcessesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663015(1549-1557)Online publication date: 6-May-2024
  • (2024)Deep Anomaly Detection via Active Anomaly SearchProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662879(308-316)Online publication date: 6-May-2024
  • (2024)Artificial intelligence: Magical tool in the health sciencesIndian Journal of Allergy, Asthma and Immunology10.4103/ijaai.ijaai_10_2438:1(1-2)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

ACM Computing Surveys  Volume 55, Issue 1
January 2023
860 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3492451
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 November 2021
Accepted: 01 July 2021
Revised: 01 December 2020
Received: 01 April 2020
Published in CSUR Volume 55, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Reinforcement learning
  2. healthcare
  3. dynamic treatment regimes
  4. critical care
  5. chronic disease
  6. automated diagnosis

Qualifiers

  • Survey
  • Refereed

Funding Sources

  • Hongkong Scholar Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,535
  • Downloads (Last 6 weeks)205
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Monitored Markov Decision ProcessesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663015(1549-1557)Online publication date: 6-May-2024
  • (2024)Deep Anomaly Detection via Active Anomaly SearchProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662879(308-316)Online publication date: 6-May-2024
  • (2024)Artificial intelligence: Magical tool in the health sciencesIndian Journal of Allergy, Asthma and Immunology10.4103/ijaai.ijaai_10_2438:1(1-2)Online publication date: 10-Jun-2024
  • (2024)Artificial Intelligence's Integration in Biomedical EngineeringFuture of AI in Biomedicine and Biotechnology10.4018/979-8-3693-3629-8.ch011(223-238)Online publication date: 30-May-2024
  • (2024)Reinforcement Learning Algorithms and Applications in Healthcare and Robotics: A Comprehensive and Systematic ReviewSensors10.3390/s2408246124:8(2461)Online publication date: 11-Apr-2024
  • (2024)Deep Reinforcement Learning-Augmented Spalart–Allmaras Turbulence Model: Application to a Turbulent Round Jet FlowFluids10.3390/fluids90400889:4(88)Online publication date: 9-Apr-2024
  • (2024)Metric Space Indices for Dynamic Optimization in a Peer to Peer-Based Image Classification Crowdsourcing PlatformFuture Internet10.3390/fi1606020216:6(202)Online publication date: 6-Jun-2024
  • (2024)How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision MakingElectronics10.3390/electronics1307128113:7(1281)Online publication date: 29-Mar-2024
  • (2024)Local Contrast Learning for One-Shot LearningApplied Sciences10.3390/app1412521714:12(5217)Online publication date: 15-Jun-2024
  • (2024)Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvementComputer Science and Information Systems10.2298/CSIS221210071A21:1(335-362)Online publication date: 2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media