Send Orders for Reprints to reprints@benthamscience.net Current Bioinformatics, 2022, 17, 393-395 393 EDITORIAL 1574-8936/22 $65.00+.00 © 2022 Bentham Science Publishers Deep Reinforcement Learning Framework for COVID Therapy: A Re- search Perspective Shomona Gracia Jacob 1,* , Majdi Mohammed Bait Ali Sulaiman 2 and Bensujin Bennet 1 1 Department of Engineering, University of Technology and Applied Sciences, Nizwa, Oman; 2 University of Technology and Applied Sciences, Salalah, Oman The confluence of data science and medicine appears to be the most rewarding sphere of re- search, given the volume of clinical data being generated every nano second and the alarming rate at which novel viruses like the SARS-COV impact common man and influential governments. This editorial makes an effort to establish that research on big data management involving volume, na- ture, and predictions needs to interface across multiple disciplines of study. Resources are highly outnumbered when it comes to learning from voluminous data in terms of skilled manpower, costs, and time. Delving into the realm of COVID, both speed and skills were required to design and pro- duce effective vaccines that could save the planet. Data and the knowledge that can be inferred from the same are spanning multiple disciplines of science. Research on the learning and design of intelli- gent diagnostic and therapeutic systems are required to analyse inter-disciplinary data, and hence a confluence of knowledge is the need of the hour [1]. The impact of computational methods on the diagnostic and therapeutic advancement of existing health care systems has been witnessed in the recent past. This editorial brings about the recent findings in Covid therapy, where a confluence of deep and reinforcement learning has proved successful in predicting Covid-19 outcomes in terms of mor- tality rate, new infections, recoveries, and drug design. Reinforcement learning attempts to train an agent to achieve its target (with maximum rewards) despite an unreliable envi- ronment. Given a time instance ‘t’, the agent has access to the following inputs: action ‘a’ and state ‘s’ along with the re- ward ‘r’ for each case. At each time instance, the agent interacts with the environment and receives state ‘s’ and re- ward ‘r’ from the environment, and then chooses an action ‘a’. Subsequently the action ‘a’ is sent to the environment [2]. The environment then proceeds to the next state while the agent receives the reward for the particular action from the environment. In this way, a reinforcement learning agent tries to maximize cumulative rewards with feedback (reward) received after taking action [3]. A machine learning algorithm, based on a deep Reinforcement Learning (RL) principle for continuous management of oxygen flow rate for critically ill patients under intensive care was proposed by Zheng et al. [4]. This work attempted to identify the optimal personalized oxygen flow rate that had the possibility of minimizing the mortality rate relative to the cur- rent clinical practice. The authors modelled the oxygen flow trajectory of COVID-19 patients and their health outcomes as a Markov decision process. Based on individual patient characteristics and health status, an oxygen control policy based on Rein- forcement learning was learned and a real-time system was designed that recommended the oxygen flow rate to reduce the mor- tality rate. The results of the proposed methods were evaluated through cross validation by using a retrospective cohort of 1,372 critically ill patients with COVID-19 from New York University Langone Health ambulatory care. The electronic health rec- ords (EHR) from April 2020 to January 2021 were employed for the evaluation. The mean mortality rate under the RL algo- rithm was lower than the standard of care by 2.57% (95% CI: 2.08- 3.06) reduction (P<0.001) from 7.94% under the standard of care to 5.37 % under their proposed algorithm and the averaged recommended oxygen flow rate was 1.28 L/min (95% CI: 1.14-1.42) lower than the rate actually delivered to patients. Hence, the authors recorded that the proposed RL algorithm could potentially lead to better intensive care treatment that could greatly reduce the mortality rate while also cutting down on the unwarranted use of oxygen supplies. This could greatly impact the oxygen shortage issue and enhance overall public health during the COVID-19 pandemic. Pandemic keywords as they are stated include susceptible infected recovered (SIR) and susceptible exposed infected recov- ered (SEIR) models [5], where “S,” “E,” “I,” and “R” pertain to the number of susceptive persons, the magnitude of individuals during the incubation phase, the magnitude of contagious persons, and the number of individuals improved, respectively. These models have been in use since Ebola and SARS originated, owing to their vigorous predictive abilities of *Address correspondence to this author at the Department of Engineering, University of Technology and Applied Sciences, Nizwa, Oman; E-mail: shomo- na.gracia@nct.edu.om A R T I C L E H I S T O R Y Received: December 13, 2021 Revised: December 27, 2021 Accepted: January 31, 2022 DOI: 10.2174/1574893617666220329182633