summaryrefslogtreecommitdiffstats
path: root/forum/importers/stackexchange/README
diff options
context:
space:
mode:
Diffstat (limited to 'forum/importers/stackexchange/README')
-rw-r--r--forum/importers/stackexchange/README62
1 files changed, 62 insertions, 0 deletions
diff --git a/forum/importers/stackexchange/README b/forum/importers/stackexchange/README
new file mode 100644
index 00000000..64d8f5fb
--- /dev/null
+++ b/forum/importers/stackexchange/README
@@ -0,0 +1,62 @@
+this app's function will be to:
+
+* install it's own tables (#todo: not yet automated)
+* read SE xml dump into DjangoDB (automated)
+* populate osqa database (automated)
+* remove SE tables (#todo: not done yet)
+
+Current process to load SE data into OSQA is:
+==============================================
+
+1) backup database
+
+2) unzip SE dump into dump_dir (any directory name)
+ you may want to make sure that your dump directory in .gitignore file
+ so that you don't publish it by mistake
+
+3) enable 'stackexchange' in the list of installed apps (probably aready in settings.py)
+
+4) (optional - create models.py for SE, which is included anyway) run:
+
+ #a) run in-place removal of xml namspace prefix to make parsing easier
+ perl -pi -w -e 's/xs://g' $SE_DUMP_PATH/xsd/*.xsd
+ cd stackexchange
+ python parse_models.py $SE_DUMP_PATH/xsd/*.xsd > models.py
+
+5) Install stackexchange models (as well as any other missing models)
+ python manage.py syncdb
+
+6) make sure that osqa badges are installed
+ if not, run (example for mysql):
+
+ mysql -u user -p dbname < sql_scripts/badges.sql
+
+7) load SE data:
+
+ python manage.py load_stackexchange dump_dir
+
+ if anything doesn't go right - run 'python manage.py flush' and repeat
+ steps 6 and 7
+
+NOTES:
+============
+
+Here is the load script that I used for the testing
+it assumes that SE dump has been unzipped inside the tmp directory
+
+ #!/bin/sh$
+ python manage.py flush
+ #delete all data
+ mysql -u osqa -p osqa < sql_scripts/badges.sql
+ python manage.py load_stackexchange tmp
+
+Untested parts are tagged with comments starting with
+#todo:
+
+The test set did not have all the usage cases of StackExchange represented so
+it may break with other sets.
+
+The job takes some time to run, especially
+content revisions and votes - may be optimized
+
+Some of the fringe cases are described in file stackexchange/ANOMALIES