From 020701bcba397d590d284962f3ce5df3134aaa08 Mon Sep 17 00:00:00 2001
From: Evgeny Fadeev <evgeny.fadeev@gmail.com>
Date: Tue, 9 Mar 2010 22:05:39 -0500
Subject: SE loader seems to work, details are in stackexchange/README

---
 stackexchange/README | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

(limited to 'stackexchange/README')

diff --git a/stackexchange/README b/stackexchange/README
index b2a39e1c..bad11c9f 100644
--- a/stackexchange/README
+++ b/stackexchange/README
@@ -1,11 +1,12 @@
 this app's function will be to:
 
-* install it's own tables    <--- done 
-* read SE xml dump into DjangoDB  <--- done
-* populate osqa database <-- user accounts and Q&A revisions loaded
-* remove SE tables
+* install it's own tables (#todo: not yet automated)
+* read SE xml dump into DjangoDB (automated)
+* populate osqa database (automated)
+* remove SE tables (#todo: not done yet)
 
-Current process to load SE data into OSQA:
+Current process to load SE data into OSQA is:
+==============================================
 
 1) backup database
 
@@ -36,3 +37,26 @@ Current process to load SE data into OSQA:
 
     if anything doesn't go right - run 'python manage.py flush' and repeat
     steps 6 and 7
+
+NOTES:
+============
+
+Here is the load script that I used for the testing
+it assumes that SE dump has been unzipped inside the tmp directory
+
+    #!/bin/sh$
+    python manage.py flush 
+    #delete all data
+    mysql -u osqa -p osqa < sql_scripts/badges.sql
+    python manage.py load_stackexchange tmp
+
+Untested parts are tagged with comments starting with 
+#todo:
+
+The test set did not have all the usage cases of StackExchange represented so
+it may break with other sets.
+
+The job takes some time to run, especially
+content revisions and votes - may be optimized
+
+Some of the fringe cases are described in file stackexchange/ANOMALIES
-- 
cgit v1.2.3-1-g7c22