Wow a pure-bred Policy Gradient. It’s basically a show algorithm it’s math is so pure. While the other breeds might live fast and loose exploring around and not doing what they’re told, Rock comes from a long line of RL algorithms who stop to think and optimize a trajectory. It’s a real analytical algorithm, even in wild environments mixed with continuous and discrete parameters, Rocky should be a steady performer. You’ll be able to count on rocky to get the job done. What reward scheme will you use?
Published by B McGraw
B McGraw has lived a long and successful professional life as a software developer and researcher. After completing his BS in spaghetti coding at the department of the dark arts at Cranberry Lemon in 2005 he wasted no time in getting a masters in debugging by print statement in 2008 and obtaining his PhD with research in screwing up repos on Github in 2014. That's when he could finally get paid. In 2018 B McGraw finally made the big step of defaulting on his student loans and began advancing his career by adding his name on other people's research papers after finding one grammatical mistake in the Peer Review process. View more posts