Convert text to html python ( Docx, Doc )


You can use standard text to convert html or something like BeautifulSoup to create html and write it out via soup.prettify()


from bs4 import BeautifulSoup soup = BeautifulSoup()

body = soup.new_tag('body')

soup.insert(0, body)

table = soup.new_tag('table')

body.insert(0, table)

open('path/to/output/file.html', 'w') as outfile: outfile.write(soup.prettify())


But if you only have simple text with <p> tags and you don't need to add a lot of formatting and lists and tables then this way is not worth the effort.





So, textile in combination with docx2txt create simple script for conversion from DOCX to HTML..


import textile
import docx2txt

text = docx2txt.process("job today.docx")
import pdb;pdb.set_trace()
html = textile.textile(text)
print(html)

Comments