Exploration vs Exploitation

Musings on exploration and commitment.

We are condemned to choose.

Life will repeatedly ask us to make difficult decisions about where to spend our time and energy in order to accomplish our goals. In a world with more information than our attention can meaningfully engage with, I'm learning that we have to choose to not care about a lot of problems in order to meaningfully care about any at all.

This has been a hard pill to swallow.

Having just finished school and entered “the real world”, I've been exploring a ton of different avenues for self-exploration, self-fulfillment, service to others, and, of course, fun (I'm only like, 20% hedonist). Even as a bio-hacking productivity nerd who will try anything to squeeze more value out of every day, there still never seems to be enough time to see it all, learn it all, and much less to do it all.

But how do we choose which options to commit to?

Life is, at the end of the day, a game of incomplete information: one where we will have to make our move without knowing exactly what’s behind each door. Given the dizzying array of options, and the disturbing lack of insight available to us, choosing the “right” place to spend our time feels like an impossible challenge, even if we know that it’s the right thing to do.

Fortunately for us mortals, though, some nerds much smarter than you or I have asked these questions, done the math, and come up with precise frameworks for making optimal decisions in the face of uncertainty.

In today’s issue of Generick Ideas, we’ll explore what those nerds have to say about these decisions and understand how we can use their solutions to better think about decisions in our own lives.

When it comes to making decisions, there are two main problems in Computer Science that have helped me think about balancing exploration with commitment.

– Beginning of Nerd Section (can skip to the end for the punchlines) –

The first is known as the Optimal Stopping Problem- in this problem, we must choose a time to take a specific action without having all the information we’d like to have about the future.

For example, trying to find the closest parking spot to your destination, deciding when to sell a stock, deciding who to marry, choosing a job offer, or playing a game of Howie Mandel’s Deal or No Deal are all examples of optimal stopping problems.

The optimal stopping problem is, at its core, a problem of deciding whether to stick with what you have or take your chances on something new and unknown. Although the real world is much hairier than the world of pure math, there’s a very simple heuristic (rule-of-thumb) that can serve us well in our everyday lives.

As it turns out, computer scientists have found that this decision really comes down to three steps:

  1. Estimating how many options you think you’ll have if you don’t commit (for example, you might).

  2. Exploring about ⅓ of the options without committing

  3. Committing to the next option that is better than everything you’ve seen before.

This approach ends up leading to the optimal result about 37% of the time and a pretty good result basically the rest of the time. So what does this mean for our lives?

This means that, when we are young, we should explore boldly and fearlessly, taking our early years as an opportunity to learn about ourselves, the world, and our place in it. But as we get older, we should promise ourselves that, as soon as we find the next result that “breaks our frame”, it’s in our best interest to hang on to it, commit to it, and foster our relationship with it- in every aspect of our lives.

The Multi-Armed Bandit Problem

The second problem that has helped me think about this trade-off is the Multi-Armed Bandit Problem. This problem (named after the slot-machine’s nickname The One-Armed Bandit) takes on a different shape, but asks us to consider a similar tradeoff: given a set of options that we can enjoy again and again, which option should we choose each time? Should I go to my favorite burrito place again or try a new one? Should I go to my favorite vacation spot again or try a new one? Should I pull this slot-machine again or try a new one?

In practice, the best we can do is estimate how good a new option will make us feel compared to an option that we’re already familiar with. And depending on whether you value novelty or consider yourself more risk-averse, the right answer at any moment in time could change. But, just like the Optimal Stopping problem, the solution to the Multi-Armed Bandit problem has a lot to say about how we live our lives.

Really, there are many solutions to the MAB problem. Some of them (Upper Confidence Bound) are more optimistic, while some of them (epsilon-greedy) like to play it a bit more safe. Depending on the kind of person you are and your appetite for risk, one might be a better fit for you than the other.

The Epsilon-greedy algorithm, basically tells us to choose our best option so far (our favorite childhood restaurant) every single time we get the chance, except for some small percentage epsilon (usually about 10%) of the time, where we should try something new, just to make sure we’re not missing out on other options. This approach makes sure that we almost always have a positive experience, but still gives us the opportunity to find a new favorite option.

The second solution, UCB, however, is less careful. UCB takes the strategy of optimism in the face of uncertainty. UCB prioritizes trying new options early and often and waits until it’s either explored all of its options or used up a lot of its resources (usually time) before committing to a particular option. For example, if you were to move to a new city for 12 weeks in the summer and used the UCB algorithm, you might spend almost all of June and July (8-10 weeks) trying new places and only revisit your favorite places in the last few weeks of the summer. The UCB algorithm prioritizes novelty and information over the safe option.

– End of Nerd Section –

So which of the two approaches is better? Boldness and optimism, or caution and greed?

The truth is, both algorithms work well in different situations, and for different people depending on their values. But whether you’re more of a UCB person or an Epsilon-greedy kinda guy (gal), understanding the tradeoff between your resources and new information can be critical for making informed, careful, and wise choices across all domains of life. 

The real world is much messier and more difficult than the world of pure math. It’s difficult to quantify the value of convenience, nostalgia, and personal transformation in the math of optimal decision-making.

Even then, the themes are the same: we explore when we have more time and start exploiting as we run out of it. In our youth, we develop many friends, we try new experiences, and we travel to new places in order to gain information and enrich our world-view. When we’re older, however, we find the things we truly care about and dedicate our lives to them — our marriages, our families, a tightly knit group of friends.

I’m learning that developing maturity means developing the willingness to trade optionality for substance…In [many] cases, we must sacrifice possibility and take on responsibility but, in doing so, we get to watch our commitments blossom into new life, joy, wonder, love, and variety.

So whether you’re choosing where to settle down, who to settle down with, or which job offer to sign, understanding that exploration and exploitation are in direct conflict can inform how much risk you’re willing to take as well as how valuable it is to commit to an option before your time runs out.

In my own life, I'm learning that developing maturity means developing the willingness to trade optionality for substance. For example, giving up dating to foster a relationship, moving less often to grow into new cities more deeply, turning down weeknight (and often weekend) plans to build a business, or demoting our own needs to nurture our children. In all these cases, we must sacrifice possibility and take on responsibility but, in doing so, we get to watch our commitments blossom into new life, joy, wonder, love, and variety.

Although it might seem counterintuitive, there is novelty to be found in consistency. By choosing something to commit to and staying there for a long, long time, we can find new challenges and experience new highs that just aren’t possible during the exploration phase when we’re switching from place to place, hobby to hobby, and person to person.

So, like the corny mf I am, I leave you all with a quote from author Lydia Davis:

“I had reached a juncture in my reading life that is familiar to those who have been there: in the allotted time left to me on earth, should I read more and more new books, or should I cease with that vain consumption—vain because it is endless—and begin to reread those books that had given me the intensest pleasure in my past.”