Debugging & Tracing Elixir Applications
This post describes a few useful techniques to debug running Erlang / Elixir applications. It is meant to be a cheatsheet of things you can do to inspect and alter the state of an application without requiring the installation of packages.
For any of the code snippets below, autoverse
refers to my local hostname, which itself refers
to one of my favourite science fiction novels, “Permutation City” by Greg Egan.
Connecting to a Running Application
Start the mix application:
iex --sname example --cookie test -S mix
Get a Remote shell
For a regular Elixir application started with a name and a cookie, you can get a remote shell using:
# Where cookie has to be the exact same as the one for the application you wish to connect to
# The name can be any valid name
iex --sname tracer --cookie test --remsh example@autoverse
Distillery Release Debugging
This section is specific to Distillery releases. You may find my previous guide about distillery releases useful at this point.
For an application named example
which is released with:
mix release --env=prod
You can start the release with:
./_build/prod/rel/example foreground
You can connect to the node you just started using:
./_build/prod/rel/example remote_console
The boot script (in our case ./_build/prod/rel/example
) is created from boot.eex)
supports a variety of options supplied as env vars. A very useful one
for debugging is DEBUG_BOOT
. You can set it to true
to enable verbose
output of the commands being executed by the script.
DEBUG_BOOT=true ./_build/prod/rel/example remote_console
Application Debugging
There are plenty of tools to inspect the state of a running application in the Erlang / Elixir ecosystem. Observer is arguably the most popular ones and is part of OTP. Unfortunately in some of cases it’s not trivial to run it in production in your stack (imagine Docker (Linux) in production and macOS locally).
IEx.Helpers
If you arent’t already familiar with the IEx.Helpers
functions, take some time and do it, it’s worth it.
Read IEx.Helpers.break/2 on how to set up breakpoints (not for production 💣).
Remember that you can print help info for a module, function or macro using IEx.Helpers.h/1.
h Genserver
#=> GenServer
#=>
#=> A behaviour module for implementing the server of a client-server relation.
#=>
#=> A GenServer is a process like any other Elixir process and it can be used to
#=> keep state, execute code asynchronously and so on. The advantage of using a
#=> ..more
# Get type information using IEx.Helpers.i/1
i "Hello!"
#=> Term
#=> "Hello"
#=> Data type
#=> BitString
#=> ..more
OTP Behaviour Tracing
You can use functions of the :sys module to trace OTP behaviours or your own special processes.
I strongly suggest reading Chapter 5 of the amazing Designing for Scalability with Erlang/OTP book for a detailed exploration of the debugging facilities of OTP behaviours.
You can use :sys.get_state/1
to get the state of a process:
{:ok, pid} = Agent.start fn -> [:erlang.unique_integer] end
:sys.get_state pid
#=> [-576460752303423368]
You can replace the state of a running process using :sys.replace_state/2
:
{:ok, pid} = Agent.start fn -> [:erlang.unique_integer] end
:sys.replace_state pid, fn (state) -> [:erlang.unique_integer([:positive]) | state] end
#=> [154, -576460752303423368]
You can log events using :sys.log/2
:
{:ok, pid} = Agent.start fn -> [4242] end
# Enable logging and raise the limit of events to be logged to 100
:sys.log(pid, {true, 100})
Agent.get(pid, &(&1))
:sys.log pid, :get
#=> {:ok,
#=> [{{:in,
#=> {:"$gen_call", {#PID<0.73.0>, #Reference<0.0.8.303>},
#=> {:get, #Function<6.54118792/1 in :erl_eval.expr/5>}}}, #PID<0.102.0>,
#=> #Function<0.40920150/3 in :gen_server.decode_msg/8>},
#=> {{:out, [4242], #PID<0.73.0>, [4242]}, #PID<0.102.0>,
#=> #Function<6.40920150/3 in :gen_server.reply/5>}]}
:sys.log pid, :print
#=> *DBG* <0.102.0> got call {get,#Fun<erl_eval.6.54118792>} from [4242]
#=> *DBG* <0.102.0> sent foo to [4242], new state foo
:sys.get_status pid
#=> {:status, #PID<0.102.0>, {:module, :gen_server},
#=> [["$ancestors": [#PID<0.73.0>, #PID<0.48.0>],
#=> "$initial_call": {:erl_eval, :"-expr/5-fun-3-", 0}], :running, #PID<0.102.0>,
#=> [log: {10,
#=> [{{:out, [4242], #PID<0.73.0>, [4242]}, #PID<0.102.0>,
#=> #Function<6.40920150/3 in :gen_server.reply/5>},
#=> {{:in,
#=> {:"$gen_call", {#PID<0.73.0>, #Reference<0.0.8.303>},
#=> {:get, #Function<6.54118792/1 in :erl_eval.expr/5>}}}, #PID<0.102.0>,
#=> #Function<0.40920150/3 in :gen_server.decode_msg/8>}]}],
#=> [header: 'Status for generic server <0.102.0>',
#=> data: [{'Status', :running}, {'Parent', #PID<0.102.0>},
#=> {'Logged events',
#=> {10,
#=> [{{:out, [4242], #PID<0.73.0>, [4242]}, #PID<0.102.0>,
#=> #Function<6.40920150/3 in :gen_server.reply/5>},
#=> {{:in,
#=> {:"$gen_call", {#PID<0.73.0>, #Reference<0.0.8.303>},
#=> {:get, #Function<6.54118792/1 in :erl_eval.expr/5>}}}, #PID<0.102.0>,
#=> #Function<0.40920150/3 in :gen_server.decode_msg/8>}]}}],
#=> data: [{'State', [4242]}]]]}
# Start tracing a process, with trace info printed in the standard output
:sys.trace pid, true
send pid, :foo
*DBG* <0.73.0> got foo
*DBG* <0.73.0> new state foo
You can get statistics for a process using :sys.statistics/2
{:ok, pid} = Agent.start fn -> [] end
# Enable statistics gathering for the process
:sys.statistics pid, true
send pid, :something
# Fetch statistics for the process
:sys.statistics pid, :get
#=> {:ok,
#=> [start_time: {{2017, 9, 13}, {0, 46, 42}},
#=> current_time: {{2017, 9, 13}, {0, 49, 47}}, reductions: 285, messages_in: 1,
#=> messages_out: 0]}
⚠️ Keep in mind that the above sys functions are implemented as synchronous calls. See here for the implementation.
This means that the process in question may not respond right away.
For that reason, such functions, support passing a timeout, to specify the amount of time to wait for a response, as the last argument.
⛑ Remember to turn debugging off when you’re done, using :sys.no_debug(pid)
.
Process Tracing
Using :erlang.trace/2:
Be warned that this way of tracing can affect system performance and has to be exercised with extreme caution. The production-safe way is to use recon.
# Trace messages for an existing process
{:ok, pid} = Agent.start fn -> :some_state end
:erlang.trace pid, true, [:receive]
# Send a message after 3 seconds
Process.send_after pid, :something, 3_000
flush
# No messages yet
# After 3 seconds
flush
# {:trace, #PID<0.85.0>, :receive, :foo}
# Trace messages of processes to be created
:erlang.trace :new, true, [:receive]
⛑ Remember to turn tracing off when you’re done!
Turn it off using: :erlang.trace(tracee, false, trace_flag)
.
The dbg module
This module provides many convenient functions for debugging (with mostly cryptic names).
The first step is to start a tracer:
:dbg.tracer
# Specify the Module / Fuction / Arity you wish to trace
:dbg.tp(List, :first, 1, [])
# Start tracing calls from the current process
:dbg.p(self(), :c)
# Or you can start tracing calls from all processes
:dbg.p(:all, :call)
# You can also log the timestamp of the call
:dbg.p(:all, [:call, :timestamp])
# Or you can trace processes which haven't been spawned yet
:dbg.p(:new, [:call, :timestamp])
List.first([1,2,3])
#=> (<0.73.0>) call 'Elixir.List':first([1,2,3])
#=> 1
# Trace all the functions of the module
:dbg.tpl List, :_, []
List.last [1,2,3]
#=> (<0.73.0>) call 'Elixir.List':'__info__'(macros)
#=> (<0.73.0>) call 'Elixir.List':last([1,2,3])
#=> (<0.73.0>) call 'Elixir.List':last([2,3])
#=> (<0.73.0>) call 'Elixir.List':last([3])
#=> (<0.73.0>) call 'Elixir.List':wrap([yellow])
# Get the return values of a traced function
:dbg.tpl :random, :uniform, [{:_, [], [{:return_trace}]}]
:random.uniform
#=> (<0.73.0>) call random:uniform()
#=> (<0.73.0>) returned from random:uniform/0 -> 0.4435846174457203
# You can also trace messages
# Create a process to trace
parent = self()
pid = spawn_link fn ->
receive do
:stop -> "Byeee"
after
10_000 -> send parent, :hello_from_tracee
end
end
# Start tracing messages
:dbg.p pid, [:m, :timestamp]
send pid, :hello
#=> (<0.118.0>) << hello (Timestamp: {1505,560745,949919})
#=> (<0.118.0>) <0.140.0> ! hello_from_traceee (Timestamp: {1505,561026,448836})
# When you're done, stop tracing
:dbg.stop_clear
If you’re curious to understand how your IEx session works, you can try:
:dbg.tracer
:dbg.p self(), [:all]
# Evaluate a simple expression
2 = 1 + 1
#=> (<0.73.0>) << {eval,<0.48.0>,<<"2 = 1 + 1\n">>,
#=> #{'__struct__' => 'Elixir.IEx.State',
#=> cache => [],
#=> counter => 11,
#=> prefix => <<"iex">>}} (Timestamp: {1505,560410,139339})
#=> (<0.73.0>) in {'Elixir.IEx.Evaluator',loop,3} (Timestamp: {1505,560410,139348})
#=> (<0.73.0>) code_server ! {code_call,<0.73.0>,
#=> {ensure_loaded,'Elixir.List.Chars.List'}} (Timestamp: {1505,
#=> 560410,
#=> 139361})
#=> ..more
The 17th chapter of Erlang Programming, features lots of pragmatic
examples of erlang:trace
and dbg
functions and an overview of match specifications. I highly recommend it.
Determining the Health of an Application
# Get the uptime of the node
:c.uptime
# Get a list of running processes and show the first 400
Process.list |> IO.inspect(limit: 400)
If it looks like length(Process.list)
is growing indefinitely, there might be a process-leak.
A more efficient way to get just the number of running processes is :erlang.system_info(:process_count)
.
The time for calculating the length of a list is proportional to the length of the list
Memory
List processes by the heap memory they reserve in descending order:
Process.list
|> Enum.map(&{&1, Process.info(&1, [:total_heap_size])})
|> Enum.sort(fn {_, k1}, {_, k2} -> k1[:total_heap_size] > k2[:total_heap_size] end)
You can peek at memory stats using :erlang.memory/0
:erlang.memory
#=> [total: 75799504, processes: 34587896, processes_used: 34572944,
#=> system: 41211608, atom: 793529, atom_used: 780108, binary: 4931384,
#=> code: 19859074, ets: 2599216]
ETS
The main design objectives ETS had was to provide a way to store large amounts of data in Erlang with constant access time (functional data structures usually tend to flirt with logarithmic access time) and to have such storage look as if it were implemented as processes in order to keep their use simple and idiomatic. – http://learnyousomeerlang.com/ets
Get the number of ETS tables.
length(:ets.all)
Get stats for all tables
:ets.i
#=> id name type size mem owner
#=> ----------------------------------------------------------------------------
#=> 1 code set 344 25890 code_server
#=> 4098 code_names set 56 10041 code_server
#=> ac_tab ac_tab set 37 2022 application_controller
#=> disk_log_names disk_log_names set 1 313 disk_log_server
#=> disk_log_pids disk_log_pids set 1 312 disk_log_server
#=> elixir_config elixir_config set 10 431 <0.52.0>
:ets.info(:some_table)
#=> [read_concurrency: false, write_concurrency: false, compressed: false,
#=> memory: 121, owner: #PID<0.389.0>, heir: :none, name: :timer_tab, size: 1,
#=> node: :example@autoverse, named_table: true, type: :ordered_set, keypos: 1,
#=> protection: :protected]
💡 The size
info can be used to get the number of objects in the table
without querying the table.
Order tables by size:
:ets.all
|> Enum.map(&{&1, :ets.info(&1, :memory) * :erlang:system_info(wordsize)})
|> Enum.sort(fn {_, s1}, {_, s2} -> s1 > s2 end)
Application Configuration
You can change the configuration of an application without restarting it.
# Sample Phoenix application.
# From a remote console, we'll update the :secret_key_base setting on-the-fly.
# We fetch the current config
current_config = Application.get_env :example, Example.Endpoint
# Make some changes
new_config = put_in(current_config[:secret_key_base], "something-different")
# Apply them
Application.put_env :example, Example.Endpoint, new_config
Distributed Applications
You are advised to read this documentation page about distributed Erlang if you are not familiar with the terminology used for Erlang distributions.
Start an IEx session with a short name and a cookie:
iex --sname example --cookie test
Getting Debug Information for all Nodes
Displays system information, listing information about all processes across nodes.
:c.ni
It will have output like the following:
Pid Initial Call Heap Reds Msgs Registered Current Function Stack
<0.0.0> otp_ring0:start/2 1598 3418 0 init init:loop/1 2
<0.1.0> erts_code_purger:start/0 6772 49562 0 erts_code_purger erts_code_purger:loop/0 3
<0.4.0> erlang:apply/2 10958 8516146 0 erl_prim_loader erl_prim_loader:loop/3 5
To inspect remote processes you should use :rpc.pinfo/1 instead of Process.info/1 (or :erlang.process_info/1) since the latter works only for local processes.
init = :rpc.call Process, :whereis, [:init]
Process.info init
#=> ** (ArgumentError) argument error
#=> :erlang.process_info(#PID<26268.0.0>)
:rpc.pinfo init
#=> [registered_name: :init, current_function: {:init, :loop, 1},
#=> initial_call: {:otp_ring0, :start, 2}, status: :waiting, message_queue_len: 0,
#=> messages: [], links: [#PID<26268.6.0>, #PID<26268.7.0>, #PID<26268.4.0>],
#=> dictionary: [], trap_exit: true, error_handler: :error_handler,
#=> priority: :normal, group_leader: #PID<26268.0.0>, total_heap_size: 1220,
#=> heap_size: 610, stack_size: 2, reductions: 3660,
#=> garbage_collection: [max_heap_size: %{error_logger: true, kill: true, size: 0},
#=> min_bin_vheap_size: 46422, min_heap_size: 233, fullsweep_after: 65535,
#=> minor_gcs: 4], suspending: []]
Remember that you can get the node for a pid
using node/1
:
remote_pid = pid "26268.0.0"
node remote_pid
#=> :example_remote_node@somewhere
EPMD
The Erlang Port Mapper Deamon is the most common way to map nodes to ports to distribute your application (..but you can do without it).
You can query an epmd instance for nodes using:
epmd -names
# => epmd: up and running on port 4369 with data:
# => name example at port 53669
For a distillery release with the option include_erts: true
, you can query epmd like:
# Where "8.3.5" should be changed to your version of erts
./_build/prod/rel/erts-8.3.5/bin/epmd -names
Determining Node Connectivity
Having started 2 nodes like:
iex --sname node1 --cookie test
iex --sname node2 --cookie test
You should be able to ping each other using:
Node.ping :"node2@autoverse"
#=> pong # It works
#=> pang # Oh snap, it doesn't work
# You can get a list of the registered nodes using:
:net_adm.names
#=> {:ok, [{'node2', 57066}, {'node1', 57075}]}
Starting a slave node
From within the IEx session:
# :slave.start :autoverse, :example_slave, '-setcookie test'
:slave.start :autoverse, :example_slave, '-setcookie test -loader inet'
The slave node will be :example_slave@autoverse
, and will be included in the list of :visible
nodes.
Node.list
#=> [:example_slave@autoverse]
By default a node loads code from the file system, but can also load code from the network.
The -loader
option determines the loader to be used and is documented as follows:
-loader Loader
Specifies the name of the loader used by erl_prim_loader.
Loader can be efile (use the local file system) or inet (load using the boot_server on another Erlang node).
If flag -loader is omitted, it defaults to efile.
We can start a node using:
iex --sname example --cookie test -S mix
Start the erl_boot_loader
application from within that node:
:erl_boot_server.start {127,0,0,1}
Then start a slave node using the code server we just created:
{:ok, slave} = :slave.start 'autoverse', :slave, '-loader inet -hosts 127.0.0.1 -setcookie 1234 -id slave'
For further reading about code loading on remote nodes, I highly recommend reading this blogpost, and the documentation of the modules which handle remote code loading erl_boot_server and erl_prim_loader.
Distributing a Module
When you define a module in an IEx session the 3rd element of the
returned tuple of defmodule
is the object code for that module.
You can use code.load_binary/1 to load it in another node:
{_, _, object_code, _} = defmodule Something do
def say, do: "Hello from Something"
end
:rpc.call slave, :code, :load_binary, [Something, 'somefile', object_code]
:rpc.call slave, Something, :say, []
#=> "Hello from Something"
At any point you can use :c.m/0 to list all the loaded modules for a node and their source file.
:rpc.call slave, :c, :m, []
#=> Module File
#=> Elixir.Agent /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Agent.beam
#=> Elixir.Agent.Server /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Agent.Server.beam
#=> Elixir.Code /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Code.beam
#=> Elixir.GenServer /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.GenServer.beam
#=> Elixir.Inspect.Algebra /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Inspect.Algebra.beam
#=> Elixir.Inspect.Atom /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Inspect.Atom.beam
#=> Elixir.Inspect.Opts /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Inspect.Opts.beam
#=> Elixir.Kernel /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Kernel.beam
#=> Elixir.Keyword /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Keyword.beam
#=> Elixir.Process /home/zorbash/.asdf/installs/elixir/1.4.5/bin/../lib/elixir/ebin/Elixir.Process.beam
Libraries & Tools
The Erlang ecosystem has plenty of tools which can be used for debugging / tracing / profiling, some notable references are:
A future blog post might cover some of them, as well as LTTng / DTrace probes.
Finally, I advise you to watch this ElixirConf 2017 talk by Gabi Zuniga.
Happy tracing!