Skip to main content

The meaning of time in reinforcement learning

Reinforcement learning (RL) is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning is concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward through the process of trial and error.

In reinforcement learning an agent starts at an empty state then analyzes the available datasets according to a policy of positive states and negative states. Rather than being explicitly taught as in supervised learning the correct set of actions for performing a task, reinforcement learning uses rewards as signals for positive states and punishments as signals for negative states.

The agent obtains the best path to a desirable reward as a cumulation of positive states and negative states. As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximize the total cumulative reward of the agent.

With the idea of reinforcement learning in mind I can explain to you the meaning of time. Time is very important in reinforcement learning since it describes the order of a sequence of positive states and negative states as well as the latency of positive states becoming negative states. More time or less time between states can signal undiscovered relationships in the environment which could point to more correct paths of obtaining a desirable reward.

But the problem with analyzing time between states is that the action of analyzing time is not equal to the action of recording time. Likewise relying on the data from the environment to provide a timestamp per state is not actually the real time of when the state occurred in the environment. Because in computing there are only two methods for precisely measuring time. There is manual code instrumentation known as data logging and automated code instrumentation known as data profiling.

So in reinforcement learning estimated times and real times and logged timestamps are ignored in favor of the order the agent ingests the data from the environment since the action of reinforcement learning happens after ingestion. And the action of reinforcement learning did not happen in the past during data logging or data profiling. The validity of this optimization is proven when the agent knowns all possible sequences of positive states and negative states from its environment whenever the environment ceases to provide new sequences.

Do you have a suggestion about how to improve this blog? Would you like to learn more about this topic? Let's talk about it. Contact me at David.Brenner.Jr@Gmail.com or 720-584-5229.

Comments

Popular posts from this blog

Old idea of encrypted, anonymous group chats

Encrypted, Anonymous Group Chats An owner of the chat connects through multiple VPNs, like NordVPN and SurfShark which are the most popular. Then the owner obtains access to an email provider hosted in a country outside the United States. Once the new email account has been setup and ready to use, the owner shares the login username and login password of that email account with the participants. The idea is to never send/receive any emails, only exchange messages saved as drafts in the email account. The drafts don't get sent/received anywhere. It's also important to note that messages are saved as new drafts without subject-line and recipient-info. Ideally there will be at most two drafts in the account at a time. When the chat is finished the email account is deleted. Whenever the participants are ready to chat, the participants login to that email account and compose a new email. They will write a message and encrypt it with PGP, then save it as a new draft without subjec...

Uploading files through Secure WebDAV using DAVfs

WebDAV is a protocol that facilitates uploading and downloading files through HTTP (port 80) and HTTPS (port 443). Whenever a WebDAV service is being ran over SSL it is called Secure WebDAV. DAVfs is a file system interface to the WebDAV protocol, it works with WebDAV and Secure WebDAV. The command mount uses DAVfs to recognize a WebDAV share as a regular file system so that other tools, scripts, services, and users can access the share's contents (as a file system with actual directories). Here's an easy solution for uploading files to your WebDAV account. These instructions work on Linux, FreeBSD, Solaris, and probably other distributions too. 1. Make a local directory for transferring files. mkdir <your directory>; 2. Stop other processes and users from interfering with your transfers. chown root:root <your directory> && chmod 770 <your directory>; 3. Mount your online cloud share using davfs. Enter your password when the prompt appears askin...