Field report from EWRL, part 1
July 2, 2008
Well, “part 1″ in the title implies that there will be more, which is quite an optimistic statement considering the hazy state of the wifi network. But apart from that minor point, the conference is absolutely great. I mean, if there are two or three talks on a conference that fit my interests, then it’s already OK for me – and EWRL has much higher hit ratio this far. Here comes my highly subjective selection.
I heard two really interesting talks (1, 2) about Go and Monte-Carlo tree search, and others about learning history lists in deterministic POMDPs, switching off costly features in RL (e.g. the ones you calculate by multi-step lookahead), exploration with maintaining distributions over value functions, a nice trick to decrease the variance caused by rare events and some PAC-like framework for RL that I do not completely get, but looks exciting. Oh, that’s half of the programme :-) The first two talks of the conference (about regularized Q-learning and policy iteration in systems with function approximation) were also something I’d be really interested in, but unfortunately I was spending the time with sitting on a train (no train starting to Lille before 6am…), which was not nearly as entertaining. Anyway, that’s lot of reading material for the next weeks, and more to come :-) I also had a talk about being optimistic in an effective way (slides are here), and Guillaume presented our results in Go parameter optimization.
More importantly, Rich Sutton had a really great keynote speech about his visions on RL, his take-home message being “slow learning makes you fast”. By this, he means that you need to learn something about the structure of your problems (which is necessarily slow) so that later you can learn quick. As an example, he talked about feature selection (and in particular, their IDBD algorithm): you can easily generate huge amounts of features. Selecting the few interesting ones is necessarily slow, but after that you can do quick learning on those few features. Another example he brought was Dyna-type learning: you can make plans (which is slow) in the background while acting in the environment. In fact, you can make plans for any subgoals you like (even if you do not want to reach these subgoals at the moment). Of course, this is not new that you can do this, but I think the message was that you should do it. He also emphasized the necessity of constructivism (discovery of appropriate state description, features, macros), mentioning several methods for constructing options and PSRs).
There’s still two more days left with interesting-looking talks, and keynote speakers Dimitri Bertsekas and Jan Peters. And French food and Belgian beers. Sweet.

July 2, 2008 at 4:48 pm
This seems to be a great workshop. I wish I could be there myself!
Eagerly waiting for your subsequent reports.