{% meta %}
  title: reclustering ejabberd
  tags: [ejabberd, jabber]
{% endmeta %}

{% mark body %}
{% filter markdown %}

Der Jabber-Dienst ist dieser Tage etwas instabil, das hat 2 Gründe:

+ Unser Storage macht uns schwere Sorgen. Wir benutzen iscsi und der Server,
der die Volumes verteilt rebootet in regelmäßigen Abständen ohne ersichtlichen
Grund.  Das hat inkonsistente bzw read-only Dateisysteme zur Folge und lässt
fast alle Dienste in einem halb-funktionierenden Zustand. Wir arbeiten an einer
Lösung.
+ Das Clustering der Ejabberd-Knoten scheint irgendwie nicht funktioniert zu
haben, sodass wir, so scheint es, kein redundantes, sondern ein voneinander
abhängiges Setup hatten. Ich habe hier mal zusammengetragen, was man tun muss,
um wirklich ordentliches Fail-Over zu haben. Das sind zwar sinnvolle
Einstellungen, aber wir garantieren natürlich nicht, dass das so funktioniert
:) Zu Doku-Zwecken ist der folgende Teil auf Englisch.

Assume we have a 2-node setup (vm-jabber{0,1}) which has a broken replication
scheme and start over be purging vm-jabber1 completely. Since Ejabberd V 2.1.x
there is a nice way to remove a db node from a setup.<br>

On our master server (vm-jabber0): Make sure to include the following line in
ejabberd.cfg

    {modules,
     [
    [...]
      {mod_admin_extra, []},
    [...]

After this, restart the ejabberd process and run:

    ejabberdctl remove_node 'ejabberd@vm-jabber1'

In a debug shell (or the webinterface) confirm that the node has been purged:

    $ ejabberdctl debug
    Attaching Erlang shell to node ejabberd@vm-jabber0.
    To detach it, press: Ctrl+G, q, Return

    Erlang R14A (erts-5.8) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [kernel-poll:false]
    
    Eshell V5.8  (abort with ^G)
    (ejabberd@vm-jabber0)1> mnesia:info().
    // SNIP //
    running db nodes   = ['ejabberd@vm-jabber0']
    stopped db nodes   = [] 
    master node tables = []
    // SNIP //
    // Hit Ctrl-C twice to abort the debug shell

On the *purged* node, stop ejabberd, remove all database files and get a fresh
ejabberd.cfg copy from the master. Also, we will need the master cookie to
authenticate the nodes with each other.

    /etc/init.d/ejabberd stop
    rm -rf /var/lib/ejabberd/*
    scp root@vm-jabber0:/etc/ejabberd/ejabberd.cfg /etc/ejabberd/
    chown root:ejabberd /etc/ejabberd/ejabberd.cfg
    chmod 640 /etc/ejabberd/ejabberd.cfg
    scp root@vm-jabber0:/var/lib/ejabberd/.erlang.cookie /var/lib/ejabberd/
    chown ejabberd:ejabberd /var/lib/ejabberd/.erlang.cookie
    chmod 440 /var/lib/ejabberd/.erlang.cookie

When we are done we have to rebuild the mnesia database i.e. import the schema
(to disc) and get copies for all tables from the master. So we start a basic
erlang process and not ejabberd since this would recreate the ejabberd db for
a new local setup.

    su - ejabberd -c bash
    erl -sname ejabberd@vm-jabber1 -mnesia dir '"/var/lib/ejabberd/"' \
      -mnesia extra_db_nodes "['ejabberd@vm-jabber0']" -s mnesia
    [...]
    (ejabberd@vm-jabber1)1> mnesia:change_table_copy_type(schema, node(), disc_copies).
    // submit and hit ctrl-c twice to exit or check the newly populated db with mnesia:info().

Now you can fire up the second ejabberd node on vm-jabber1. But there is still
work to do. Ejabberd makes some weird decisions storing the data. Basically we
want to store as much shared data as possible in ram AND disc so that the slave
node can start ejabberd on its own because it has a copy of everything on disc.
Of course some tables are not required to start the jabber server. Session or
s2s data for example can be stored in ram only. The important thing is to
elliminate or at least reduce the number of "remote copy" entries since this
could block failover. Some memory eating things like offline_msg can be ignored
if there is not enough ram to begin with. I found it very handy to use the
web_admin module to go through the replication type of each table, here is
a reminder on how to tunnel it through to your client (we do not forward port
5280 here):

    ssh vm-jabber0 -L 8000:localhost:5280 # and fire up a browser to <a href="http://localhost:8000/admin">http://localhost:8000/admin</a>

First go through the master table and make sure every table has a sane type
- you need a disc copy if the nodes hast to start on its own!

<br>

Here are the basic rules we implemented:

+ default to RAM and Disc copy on both ends
+ if the table is machine dependent use RAM copy on both ends
+ use Disc only copy only for memory eating tables on the master and Remote copy on the slave

<br>
That's it. Good Luck :)


{% endfilter %}
{% endmark %}