summaryrefslogtreecommitdiffstats
path: root/stackexchange/README
blob: bad11c9fc2594f3711f6c13651c355cacd0656cb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
this app's function will be to:

* install it's own tables (#todo: not yet automated)
* read SE xml dump into DjangoDB (automated)
* populate osqa database (automated)
* remove SE tables (#todo: not done yet)

Current process to load SE data into OSQA is:
==============================================

1) backup database

2) unzip SE dump into dump_dir (any directory name)
   you may want to make sure that your dump directory in .gitignore file
   so that you don't publish it by mistake

3) add 'stackexchange' to the list of installed apps (probably aready in settings.py)

4) (optional - create models.py for SE, which is included anyway) run: 

    #a) run in-place removal of xml namspace prefix to make parsing easier
    perl -pi -w -e 's/xs://g' $SE_DUMP_PATH/xsd/*.xsd 
    cd stackexchange
    python parse_models.py $SE_DUMP_PATH/xsd/*.xsd > models.py

5) Install stackexchange models (as well as any other missing models)
    python manage.py syncdb

6) make sure that osqa badges are installed
   if not, run (example for mysql):

   mysql -u user -p dbname < sql_scripts/badges.sql

7) load SE data:

    python manage.py load_stackexchange dump_dir

    if anything doesn't go right - run 'python manage.py flush' and repeat
    steps 6 and 7

NOTES:
============

Here is the load script that I used for the testing
it assumes that SE dump has been unzipped inside the tmp directory

    #!/bin/sh$
    python manage.py flush 
    #delete all data
    mysql -u osqa -p osqa < sql_scripts/badges.sql
    python manage.py load_stackexchange tmp

Untested parts are tagged with comments starting with 
#todo:

The test set did not have all the usage cases of StackExchange represented so
it may break with other sets.

The job takes some time to run, especially
content revisions and votes - may be optimized

Some of the fringe cases are described in file stackexchange/ANOMALIES