Should I always use tail recursion?

Thu, Jul 8, 2021 10 min read
The short answer if we should always use tail recursion is the dreaded … it depends.

While trying to implement my version of map for an Exercism exercise, I’ve come across a doubt that should trouble the Elixir newbie.

Whenever we have recursive functions, if calling the function it’s not the last operation being done, we may get into trouble by reaching the stack trace limit.

Tail call recursion

A special form of recursion where the last operation of a function is a recursive call. The recursion is optimized by executing the call in the current stack frame and returning its result rather than creating a new stack frame.

So if the last thing we call is the recursive function, that doesn’t result in a new entry in the stack. Instead, the calling function will simply return the value it gets.

So … should I always use tail call recursion?

I don’t know; let’s compare it.

First, let’s create an app to test our assumptions.

$ mix new benchmark_recursion

Mapping with Enum.map

Enum.map([1,2,3], &(&1 * 2))

The swiss army knife of mapping listings.

Mapping without tail call recursion

lib/benchmark_recursion.ex

defmodule BenchmarkRecursion do
  def map([head | tail], fun) do
    [fun.(head) | map(tail, fun)]
  end

  def map([], _fun), do: []
end

Pretty straight forward right?

We have a recursive function to go over a list that invokes the function we pass as an argument, just like Enum.map.

Let’s give it a try:

iex -S mix
iex(1)> list = Enum.to_list(1..100)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ...]
iex(2)> double_it = fn n -> n * 2 end
#Function<44.40011524/1 in :erl_eval.expr/5>
iex(3)> BenchmarkRecursion.map(list, double_it)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,
 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, ...]

Mapping with tail call recursion

Now let’s do the same map function but with tail call optimization.

We need to write a function with an accumulator argument that stores the final mapped list for this to work.

defmodule BenchmarkRecursion do
  def map([], _fun), do: []

  def map_tail_call(list, fun), do: do_map_tail_call(list, fun, [])

  def do_map_tail_call([head | tail], fun, mapped) do
    do_map_tail_call(tail, fun, [fun.(head) | mapped])
  end

  def do_map_tail_call([], _fun, mapped), do: mapped
end

Let’s try it:

$ iex -S mix
iex(6)> BenchmarkRecursion.map_tail_call(list, double_it)
[200, 198, 196, 194, 192, 190, 188, 186, 184, 182, 180, 178, 176, 174,
 172, 170, 168, 166, 164, 162, 160, 158, 156, 154, 152, 150, 148, 146,
 144, 142, 140, 138, 136, 134, 132, 130, 128, 126, 124, 122, 120, 118,
 116, 114, 112, 110, 108, 106, 104, 102, ...]

Hmmm, the list items are reversed, so we need to reverse the list in the end to keep the correct order.

defmodule BenchmarkRecursion do
  def map([head | tail], fun) do
    [fun.(head) | map(tail, fun)]
  end

  def map([], _fun), do: []

  def map_tail_call(list, fun), do: do_map_tail_call(list, fun, [])

  def do_map_tail_call([head | tail], fun, mapped) do
    do_map_tail_call(tail, fun, [fun.(head) | mapped])
  end

  def do_map_tail_call([], _fun, mapped), do: reverse(mapped)

  def reverse(list), do: do_reverse(list, [])

  def do_reverse([head | tail], reversed) do
    do_reverse(tail, [head | reversed])
  end

  def do_reverse([], reversed), do: reversed
end

Let’s give it another go:

iex -S mix
Erlang/OTP 24 [erts-12.0.2] [source] [64-bit] [smp:24:24] [ds:24:24:10] [async-threads:1] [jit]

Compiling 1 file (.ex)
Interactive Elixir (1.12.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> list = Enum.to_list(1..100)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ...]
iex(2)> double_it = fn n -> n * 2 end
#Function<44.40011524/1 in :erl_eval.expr/5>
iex(3)> BenchmarkRecursion.map_tail_call(list, double_it)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,
 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, ...]

It’s the same as the version without tail call recursion.

Now let’s try each with large lists:

iex -S mix
Erlang/OTP 24 [erts-12.0.2] [source] [64-bit] [smp:24:24] [ds:24:24:10] [async-threads:1] [jit]

Interactive Elixir (1.12.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> biiiiiiiig_list = Enum.to_list(1..1_000_000)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ...]
iex(2)> double_it = fn n -> n * 2 end
#Function<44.40011524/1 in :erl_eval.expr/5>
iex(3)> BenchmarkRecursion.map(list, double_it)
** (CompileError) iex:3: undefined function list/0
    (stdlib 3.15.1) lists.erl:1358: :lists.mapfoldl/3
iex(3)> BenchmarkRecursion.map(bi list, double_it)
biiiiiiiig_list    binary_part/3      binding/0          binding/1
bit_size/1
iex(3)> BenchmarkRecursion.map(bi list, double_it)
biiiiiiiig_list    binary_part/3      binding/0          binding/1
bit_size/1
iex(3)> BenchmarkRecursion.map(biiiiiiiig_list, double_it)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,
 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, ...]
iex(4)> BenchmarkRecursion.map_tail_call(biiiiiiiig_list, double_it)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,
 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, ...]

Well, they almost feel to be taking the same time.

Instead of gut feeling or going with the one that seems right after measuring a couple of times, let’s do it right. We should put on our race judge hat and clock and do a race, or in fact, a lot of them.

Benchmark FTW

Let’s add a benchmark library:

Add the following to your mix.exs dependencies

{:benchee, "~> 1.0", only: :dev}

defmodule BenchmarkRecursion.MixProject do
  use Mix.Project

  def project do
    [
      app: :benchmark_recursion,
      version: "0.1.0",
      elixir: "~> 1.12",
      start_permanent: Mix.env() == :prod,
      deps: deps()
    ]
  end

  # Run "mix help compile.app" to learn about applications.
  def application do
    [
      extra_applications: [:logger]
    ]
  end

  # Run "mix help deps" to learn about dependencies.
  defp deps do
    [
      {:benchee, "~> 1.0", only: :dev}
    ]
  end
end

Bencharmark small lists

So let’s benchmark when using small lists.

list = Enum.to_list(1..100)
double_it = fn n -> n * 2 end

iex(4)> Benchee.run(
...(4)>   %{
...(4)>     "Enum.map" => fn -> Enum.map(list, double_it) end,
...(4)>     "map" => fn -> BenchmarkRecursion.map(list, double_it) end,
...(4)>     "map_tail_call" => fn -> BenchmarkRecursion.map_tail_call(list, double_it) end
...(4)>
...(4)>   }
...(4)> )
Operating System: Linux
CPU Information: AMD Ryzen 9 3900X 12-Core Processor
Number of Available Cores: 24
Available memory: 62.78 GB
Elixir 1.12.1
Erlang 24.0.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking Enum.map...
Benchmarking map...
Benchmarking map_tail_call...

Name                    ips        average  deviation         median         99th %
map                 37.74 K       26.50 μs    ±16.38%       25.37 μs       41.90 μs
Enum.map            37.41 K       26.73 μs    ±17.86%       25.64 μs       42.18 μs
map_tail_call       35.76 K       27.96 μs    ±22.41%       26.75 μs       36.42 μs

Comparison:
map                 37.74 K
Enum.map            37.41 K - 1.01x slower +0.23 μs
map_tail_call       35.76 K - 1.06x slower +1.46 μs
%Benchee.Suite{
  configuration: %Benchee.Configuration{
    after_each: nil,
    after_scenario: nil,
    assigns: %{},
    before_each: nil,
    before_scenario: nil,
    formatters: [Benchee.Formatters.Console],
    inputs: nil,
    load: false,
    measure_function_call_overhead: true,
    memory_time: 0.0,
    parallel: 1,
    percentiles: '2c',
    pre_check: false,
    print: %{benchmarking: true, configuration: true, fast_warning: true},
    save: false,
    time: 5.0e9,
    title: nil,
    unit_scaling: :best,
    warmup: 2.0e9
  },
  scenarios: [
    %Benchee.Scenario{
      after_each: nil,
      after_scenario: nil,
      before_each: nil,
      before_scenario: nil,
      function: #Function<45.40011524/0 in :erl_eval.expr/5>,
      input: :__no_input,
      input_name: :__no_input,
      job_name: "map",
      memory_usage_data: %Benchee.CollectionData{
        samples: [],
        statistics: %Benchee.Statistics{
          absolute_difference: nil,
          average: nil,
          ips: nil,
          maximum: nil,
          median: nil,
          minimum: nil,
          mode: nil,
          percentiles: nil,
          relative_less: nil,
          relative_more: nil,
          sample_size: 0,
          std_dev: nil,
          std_dev_ips: nil,
          std_dev_ratio: nil
        }
      },
      name: "map",
      run_time_data: %Benchee.CollectionData{
        samples: [58041.0, 28480.0, 31241.0, 2.92e4, 27650.0, 29481.0, 39670.0,
         33780.0, 28721.0, 30370.0, 29670.0, 27591.0, 27880.0, 27840.0, 33651.0,
         31020.0, 29840.0, 27901.0, 28730.0, 27530.0, 27751.0, 27780.0, 28960.0,
         28001.0, 27350.0, 28090.0, 28681.0, 2.77e4, 2.81e4, 27770.0, 28701.0,
         27990.0, 27290.0, 28061.0, 28460.0, ...],
        statistics: %Benchee.Statistics{
          absolute_difference: nil,
          average: 26499.557949556358,
          ips: 37736.47854441819,
          maximum: 973691.0,
          median: 25370.0,
          minimum: 23650.0,
          mode: 25260.0,
          percentiles: %{50 => 25370.0, 99 => 41901.0},
          relative_less: nil,
          relative_more: nil,
          sample_size: 184721,
          std_dev: 4340.122892197869,
          std_dev_ips: 6180.5164717589205,
          std_dev_ratio: 0.16378095440156312
        }
      },
      tag: nil
    },
    %Benchee.Scenario{
      after_each: nil,
      after_scenario: nil,
      before_each: nil,
      before_scenario: nil,
      function: #Function<45.40011524/0 in :erl_eval.expr/5>,
      input: :__no_input,
      input_name: :__no_input,
      job_name: "Enum.map",
      memory_usage_data: %Benchee.CollectionData{
        samples: [],
        statistics: %Benchee.Statistics{
          absolute_difference: nil,
          average: nil,
          ips: nil,
          maximum: nil,
          median: nil,
          minimum: nil,
          mode: nil,
          percentiles: nil,
          relative_less: nil,
          relative_more: nil,
          sample_size: 0,
          std_dev: nil,
          std_dev_ips: nil,
          std_dev_ratio: nil
        }
      },
      name: "Enum.map",
      run_time_data: %Benchee.CollectionData{
        samples: [46661.0, 29090.0, 32020.0, 30051.0, 27940.0, 3.01e4, 40781.0,
         34170.0, 28510.0, 30571.0, 29750.0, 28090.0, 28221.0, 28020.0, 33690.0,
         34701.0, 30170.0, 28411.0, 29370.0, 28180.0, 28130.0, 27961.0, 2.88e4,
         28480.0, 28021.0, 28470.0, 28850.0, 27961.0, 28080.0, 2.82e4, 29131.0,
         28120.0, 27780.0, 28231.0, ...],
        statistics: %Benchee.Statistics{
          absolute_difference: 234.81912119593835,
          average: 26734.377070752296,
          ips: 37405.02340314527,
          maximum: 1076822.0,
          median: 25640.0,
          minimum: 23610.0,
          mode: 25520.0,
          percentiles: %{50 => 25640.0, 99 => 42180.0},
          relative_less: 0.99121658527616,
          relative_more: 1.0088612467288296,
          sample_size: 183146,
          std_dev: 4775.528289032688,
          std_dev_ips: 6681.612477482104,
          std_dev_ratio: 0.17862874741365
        }
      },
      tag: nil
    },
    %Benchee.Scenario{
      after_each: nil,
      after_scenario: nil,
      before_each: nil,
      before_scenario: nil,
      function: #Function<45.40011524/0 in :erl_eval.expr/5>,
      input: :__no_input,
      input_name: :__no_input,
      job_name: "map_tail_call",
      memory_usage_data: %Benchee.CollectionData{
        samples: [],
        statistics: %Benchee.Statistics{
          absolute_difference: nil,
          average: nil,
          ips: nil,
          maximum: nil,
          median: nil,
          minimum: nil,
          mode: nil,
          percentiles: nil,
          relative_less: nil,
          relative_more: nil,
          sample_size: 0,
          std_dev: nil,
          std_dev_ips: nil,
          std_dev_ratio: nil
        }
      },
      name: "map_tail_call",
      run_time_data: %Benchee.CollectionData{
        samples: [48401.0, 31590.0, 32261.0, 30040.0, 2.92e4, 31451.0, 2.82e4,
         30090.0, 28021.0, 30890.0, 34370.0, 32341.0, 3.28e4, 33020.0, 31901.0,
         32850.0, 37320.0, 30601.0, 32440.0, 32041.0, 32570.0, 32910.0, 32811.0,
         32220.0, 33040.0, 32701.0, 3.26e4, 37341.0, 32510.0, 29320.0, 28691.0,
         37550.0, 36940.0, ...],
        statistics: %Benchee.Statistics{
          absolute_difference: 1463.7712998073657,
          average: 27963.329249363724,
          ips: 35761.12096962682,
          maximum: 1745469.0,
          median: 26750.0,
          minimum: 24950.0,
          mode: 26240.0,
          percentiles: %{50 => 26750.0, 99 => 36420.609999999986},
          relative_less: 0.9476538974757209,
          relative_more: 1.0552375742491158,
          sample_size: 175238,
          std_dev: 6267.717277747323,
          std_dev_ips: 8015.518959640409,
          std_dev_ratio: 0.22414059577294212
        }
      },
      tag: nil
    }
  ],
  system: %{
    available_memory: "62.78 GB",
    cpu_speed: "AMD Ryzen 9 3900X 12-Core Processor",
    elixir: "1.12.1",
    erlang: "24.0.2",
    num_cores: 24,
    os: :Linux
  }
}

In the comparison section, we get the details we want:

Comparison:
map                 37.74 K
Enum.map            37.41 K - 1.01x slower +0.23 μs
map_tail_call       35.76 K - 1.06x slower +1.46 μs

Surprise, surprise. The non-optimized version is the fastest.

Well, it makes sense because it is the less complex of them all. It only goes over the list once.

Bencharmark bigger lists

...
iex(1)> list = Enum.to_list(1..1_000)
...
iex(4)> Benchee.run(
...(4)>   %{
...(4)>     "Enum.map" => fn -> Enum.map(list, double_it) end,
...(4)>     "map" => fn -> BenchmarkRecursion.map(list, double_it) end,
...(4)>     "map_tail_call" => fn -> BenchmarkRecursion.map_tail_call(list, double_it) end
...(4)>
...(4)>   }
...(4)> )
....
Comparison:
map                  3.67 K
Enum.map             3.62 K - 1.01x slower +4.02 μs
map_tail_call        3.48 K - 1.05x slower +14.92 μs

Our modest map continues to be the fastest.

Bencharmark huge lists

...
iex(1)> list = Enum.to_list(1..1_000_000)
...
iex(4)> Benchee.run(
...(4)>   %{
...(4)>     "Enum.map" => fn -> Enum.map(list, double_it) end,
...(4)>     "map" => fn -> BenchmarkRecursion.map(list, double_it) end,
...(4)>     "map_tail_call" => fn -> BenchmarkRecursion.map_tail_call(list, double_it) end
...(4)>
...(4)>   }
...(4)> )
...
Comparison:
map_tail_call          1.96
Enum.map               1.73 - 1.13x slower +68.22 ms
map                    1.58 - 1.24x slower +122.09 ms
...

Only with huge lists do we see advantages in using tail call optimization.

So should we always use tail call optimization?

Well, it depends…

For instance, in a long-running function, a callback in a GenServer where we handle many calls or when we use huge lists, YES.

In the other cases, well, no.

If you check the implementations, the first version is a lot more readable than the second. So first, strive for readability and, only if required, reduce readability for the sake of speed.

But only if it’s needed.

This is one of the miths of erlang.

At the end of the day, it’s a decision that should be based on the usage of the system you are building.

For reference: