Thursday, June 20, 2013

How to compile libxml2 for lxml (python) - A Guide

Introduction

In this small tutorial/ guide/ how to I will explain how you can build libxml2 for the use with python lxml under Linux (Debian in my case). I had to do this because I wanted to run the Springer Downloader. I'm kind of a beginner to linux and escpecially to compiling something there so I will write down the problems I had - maybe they will be helpful to someone. Comments on how to improve what I did are welcome of course.


Downloading what we need


lxml and libxml


When you go to the lxml website you will find, that under linux you can download the source of lxml and the two libraries it depends on, as stated here.  Quote:
libxml2 2.6.21 or later. It can be found here: http://xmlsoft.org/downloads.html
libxslt 1.1.15 or later. It can be found here: http://xmlsoft.org/XSLT/downloads.html
-------------------------
  1. So first of all download lxml itself (in my case this was lxml 3.2.1.tgz) and unpack it.

    EDIT: You do not need step 2 & 3. I found out a nicer way - thanks at tovotu for pointing it out so that I tried it again.
  2. Follow the link above to the FTP server and download the newest libxml2-2.9.0.tar.gz. Be sure that you have the .tar.gz file. I had to use 2.9.0, while there was already 2.9.1 out. Otherwise I had errors because it requested the older version of the library - no idea why. Unpack it.
  3. Then, also on the FTP, download libxslt-1.1.28.tar.gz (or newer) and unpack it.

gcc, make, python-dev

Now you should make sure you have a few things installed via apt-get. So open your console and make sure you can use sudo command. Then (for debian-based systems like ubuntu) type in:

  1. sudo apt-get install python2.7
  2. sudo apt-get install make
  3. sudo apt-get install gcc
  4. sudo apt-get install python-dev
  5. sudo apt-get install libxml2
  6. sudo apt-get install libxml2-dev
  7. sudp apt-get install libxslt1.1

    You might have some of those packages already installed.

Compiling

Now we have to compile the two libraries we downloaded before.

EDIT: You do not need step 1 & 2 as I found out, when you installed libxml2-dev. Go to step 3 (lxml 3.2.1.tgz).
  • Go to the folder where you unpacked libxml2-2.9.0.tar.gz to and start a console there  - you most likely can do this via right click in the folder somewhere, otherwise change your directory via 'cd'.
    In the console enter the following (yes I know this can be done in one line):
sudo ./configure
sudo make
sudo make install
 Might produce some errors but most likely it will work.
 
  • Go to the folder where you unpacked libxslt-1.1.28.tar.gz to and start a console there.
    Now type in the same commands as above. Do not do this step before the first one or it will not work!

  • Go to the folder where you unpacked lxml 3.2.1.tgz to and start a console there. Type in:
    sudo python setup.py install
    This will install the python lib into the python directory. 

Fixing an error

Now you can try to run for example the springer downloader. At least for me it failed with this error:
ImportError: /usr/lib/i386-linux-gnu/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/dist-packages/lxml/etree.so)
This error is because the libxml2.so.2.9.0 got copied to /usr/local/lib/libxml2.so.2.9.0.
You can see this by typing
sudo updatedb
sudo locate libxml2.so 
I don't know why this is the case. For me there was a  /usr/lib/i386-linux-gnu/libxml2.so.2.8.0 probably because this was installed via the debian package(?).

So we have to move the file to the correct location:
sudo cp /usr/local/lib/libxml2.so.2.9.0 /usr/lib/i386-linux-gnu/libxml2.so.2.9.0 
sudo cp /usr/local/lib/libxml2.so.2 /usr/lib/i386-linux-gnu/libxml2.so.2
 

End

Well that is it. When you need additional packages - for example for the springer downloader I mentioned (pyPdf and cssselect) download them and install them like above via 'sudo python setup.py install'.
 




2 comments:

  1. I hope you're aware that you can also simply install these packages via apt-get as python-lxml and so on. However, I don't quite understand, why you bother compiling all this stuff from scratch, which is in most cases a bad idea. Especially "make install" as root is a bad idea on a system with a powerful package manager.

    ReplyDelete
  2. You are correct and I edited my post to reflect the changes.
    I did not install libxml2-dev so compiling of lxml failed - that's why I compiled the dev package manually (which in retrospec obviously is unneeded).
    For lxml you still need python-dev and gcc to run the setup.py.

    greetings

    ReplyDelete