The following will rescue only the text out of a docx file.

  • rename as .zip
  • rezip with -FF
  • unzip
  • get word/document.xml
  • you can use this sed oneliner to get rid of all the markup :
 sed 's/<w:b\/>/\n/g;s/<[^>]\+>//g' document.xml

contact : @ychaouche yacinechaouche at yahoocom

