Monday, April 18, 2011

wikipedia sql file dump to mysql tutorial

Latest wikipedia sql file can be found here: http://en.wikipedia.org/wiki/Wikipedia:Database_download

I do found sql files contains duplicated lines, mysql engine is incorrect, below are few commands might help

//replace string

perl -p -i -e "s/TYPE/ENGINE/g" *.sql

//remove line
sed '1,1d' file.txt


//Removing Duplicate Lines With Sort, Uniq and Shell Pipes
sort file.log | uniq -u