Skip to main content

The meaning of time in reinforcement learning

Reinforcement learning (RL) is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning is concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward through the process of trial and error.

In reinforcement learning an agent starts at an empty state then analyzes the available datasets according to a policy of positive states and negative states. Rather than being explicitly taught as in supervised learning the correct set of actions for performing a task, reinforcement learning uses rewards as signals for positive states and punishments as signals for negative states.

The agent obtains the best path to a desirable reward as a cumulation of positive states and negative states. As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximize the total cumulative reward of the agent.

With the idea of reinforcement learning in mind I can explain to you the meaning of time. Time is very important in reinforcement learning since it describes the order of a sequence of positive states and negative states as well as the latency of positive states becoming negative states. More time or less time between states can signal undiscovered relationships in the environment which could point to more correct paths of obtaining a desirable reward.

But the problem with analyzing time between states is that the action of analyzing time is not equal to the action of recording time. Likewise relying on the data from the environment to provide a timestamp per state is not actually the real time of when the state occurred in the environment. Because in computing there are only two methods for precisely measuring time. There is manual code instrumentation known as data logging and automated code instrumentation known as data profiling.

So in reinforcement learning estimated times and real times and logged timestamps are ignored in favor of the order the agent ingests the data from the environment since the action of reinforcement learning happens after ingestion. And the action of reinforcement learning did not happen in the past during data logging or data profiling. The validity of this optimization is proven when the agent knowns all possible sequences of positive states and negative states from its environment whenever the environment ceases to provide new sequences.

Do you have a suggestion about how to improve this blog? Would you like to learn more about this topic? Let's talk about it. Contact me at David.Brenner.Jr@Gmail.com or 720-584-5229.

Comments

Popular posts from this blog

Network traffic monitoring in Linux with Python

You can investigate suspicious activity in your network traffic by collecting relevant machine data from your endpoint. You can use the machine data to create your own analysis. Before you start your investigation you will need to determine normal activity on your endpoint. Normal activity is the scope of functionality of the software on your endpoint during periods of low activity and high activity. You will need some kind of software that periodically collects specific machine data from your endpoint like my software developed in Python that's available for free download at https://github.com/davidbrennerjr/server-stats-collector Ingest one or more of the following machine data: Application specific logs from /var/log Raw dumps from sniffing at Layers 2-3 Raw dumps from /proc of kernel data structures Raw dumps of kernel routing tables General system-wide error messages from /var/log/syslog Do you

OpenStack+Ceph as Software-Defined Storage

SDS reduces the costs of the management of growing data stores by decoupling storage management from its hardware to allow for centralized management of cheaper, popular commodity hardware. The example SDS ecosystem uses open source software like OpenStack as a front-end interface on top of Ceph as the resource provider of a RADOS cluster of commodity solid-state drives. OpenStack provides user-friendly wrappers for accessing and modifying underlying Ceph storage. OpenStack comes in the form of distributed microservices with RESTful API's: Block (Cinder), File (Manila), Image (Glance), and Object (Swift). Each microservice can scale-out as a cluster of stand-alone services to accommodate the varying demands of high-growth storage. With OpenStack the underlying Ceph storage can address the block storage needs, file storage needs, image storage needs, and object storage needs of datacenters adopting open source as their new norm in an industry trend for high performace and high a

Application behavior monitoring in Linux with Python

You can monitor application behaviors by collecting relevant machine data from your endpoint. You can use the machine data to investigate suspicious activity and create your own analysis. Before you start your investigation you will need to determine normal activity on your endpoint. Normal activity is the scope of functionality of the software on your endpoint during periods of low activity and high activity. You will need some kind of software that periodically collects specific machine data from your endpoint like my software developed in Python that's available for free download at https://github.com/davidbrennerjr/server-stats-collector Ingest one or more of the following machine data from Category #1. Ingest one or more of the following machine data from Category #2. Category #1 General system-wide error messages from /var/log/syslog Auditing logs of application rulesets Auditing logs of security contexts Auditing logs of