1 files changed, 114 insertions, 0 deletions
diff --git a/blog/2012/03-04-reclustering-ejabberd.html b/blog/2012/03-04-reclustering-ejabberd.html
new file mode 100644
index 0000000..b6dfadc
--- /dev/null
+++ b/blog/2012/03-04-reclustering-ejabberd.html
@@ -0,0 +1,114 @@
+{% meta %}
+  title: reclustering ejabberd
+  tags: [ejabberd, jabber]
+{% endmeta %}
+
+{% mark body %}
+{% filter markdown %}
+
+Der Jabber-Dienst ist dieser Tage etwas instabil, das hat 2 Gründe:
+
++ Unser Storage macht uns schwere Sorgen. Wir benutzen iscsi und der Server,
+der die Volumes verteilt rebootet in regelmäßigen Abständen ohne ersichtlichen
+Grund.  Das hat inkonsistente bzw read-only Dateisysteme zur Folge und lässt
+fast alle Dienste in einem halb-funktionierenden Zustand. Wir arbeiten an einer
+Lösung.
++ Das Clustering der Ejabberd-Knoten scheint irgendwie nicht funktioniert zu
+haben, sodass wir, so scheint es, kein redundantes, sondern ein voneinander
+abhängiges Setup hatten. Ich habe hier mal zusammengetragen, was man tun muss,
+um wirklich ordentliches Fail-Over zu haben. Das sind zwar sinnvolle
+Einstellungen, aber wir garantieren natürlich nicht, dass das so funktioniert
+:) Zu Doku-Zwecken ist der folgende Teil auf Englisch.
+
+Assume we have a 2-node setup (vm-jabber{0,1}) which has a broken replication
+scheme and start over be purging vm-jabber1 completely. Since Ejabberd V 2.1.x
+there is a nice way to remove a db node from a setup.<br>
+
+On our master server (vm-jabber0): Make sure to include the following line in
+ejabberd.cfg
+
+    {modules,
+     [
+    [...]
+      {mod_admin_extra, []},
+    [...]
+
+After this, restart the ejabberd process and run:
+
+    ejabberdctl remove_node 'ejabberd@vm-jabber1'
+
+In a debug shell (or the webinterface) confirm that the node has been purged:
+
+    $ ejabberdctl debug
+    Attaching Erlang shell to node ejabberd@vm-jabber0.
+    To detach it, press: Ctrl+G, q, Return
+
+    Erlang R14A (erts-5.8) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [kernel-poll:false]
+    
+    Eshell V5.8  (abort with ^G)
+    (ejabberd@vm-jabber0)1> mnesia:info().
+    // SNIP //
+    running db nodes   = ['ejabberd@vm-jabber0']
+    stopped db nodes   = [] 
+    master node tables = []
+    // SNIP //
+    // Hit Ctrl-C twice to abort the debug shell
+
+On the *purged* node, stop ejabberd, remove all database files and get a fresh
+ejabberd.cfg copy from the master. Also, we will need the master cookie to
+authenticate the nodes with each other.
+
+    /etc/init.d/ejabberd stop
+    rm -rf /var/lib/ejabberd/*
+    scp root@vm-jabber0:/etc/ejabberd/ejabberd.cfg /etc/ejabberd/
+    chown root:ejabberd /etc/ejabberd/ejabberd.cfg
+    chmod 640 /etc/ejabberd/ejabberd.cfg
+    scp root@vm-jabber0:/var/lib/ejabberd/.erlang.cookie /var/lib/ejabberd/
+    chown ejabberd:ejabberd /var/lib/ejabberd/.erlang.cookie
+    chmod 440 /var/lib/ejabberd/.erlang.cookie
+
+When we are done we have to rebuild the mnesia database i.e. import the schema
+(to disc) and get copies for all tables from the master. So we start a basic
+erlang process and not ejabberd since this would recreate the ejabberd db for
+a new local setup.
+
+    su - ejabberd -c bash
+    erl -sname ejabberd@vm-jabber1 -mnesia dir '"/var/lib/ejabberd/"' \
+      -mnesia extra_db_nodes "['ejabberd@vm-jabber0']" -s mnesia
+    [...]
+    (ejabberd@vm-jabber1)1> mnesia:change_table_copy_type(schema, node(), disc_copies).
+    // submit and hit ctrl-c twice to exit or check the newly populated db with mnesia:info().
+
+Now you can fire up the second ejabberd node on vm-jabber1. But there is still
+work to do. Ejabberd makes some weird decisions storing the data. Basically we
+want to store as much shared data as possible in ram AND disc so that the slave
+node can start ejabberd on its own because it has a copy of everything on disc.
+Of course some tables are not required to start the jabber server. Session or
+s2s data for example can be stored in ram only. The important thing is to
+elliminate or at least reduce the number of "remote copy" entries since this
+could block failover. Some memory eating things like offline_msg can be ignored
+if there is not enough ram to begin with. I found it very handy to use the
+web_admin module to go through the replication type of each table, here is
+a reminder on how to tunnel it through to your client (we do not forward port
+5280 here):
+
+    ssh vm-jabber0 -L 8000:localhost:5280 # and fire up a browser to <a href="http://localhost:8000/admin">http://localhost:8000/admin</a>
+
+First go through the master table and make sure every table has a sane type
+- you need a disc copy if the nodes hast to start on its own!
+
+<br>
+
+Here are the basic rules we implemented:
+
++ default to RAM and Disc copy on both ends
++ if the table is machine dependent use RAM copy on both ends
++ use Disc only copy only for memory eating tables on the master and Remote copy on the slave
+
+<br>
+That's it. Good Luck :)
+
+
+
+{% endfilter %}
+{% endmark %}