Here's how I managed to get tesseract working in a FloydHub notebook.
First, we open a terminal in a FloydHub instance and type:
root@floydhub:/floyd/home# sudo apt install tesseract-ocr
root@floydhub:/floyd/home# sudo apt install libtesseract-dev
If this doesn't work (as indeed it didn't, may case) we'll be getting:
E: Unable to locate package tesseract-ocr
So let's get going. First, let's find which Ubuntu release we're running on. We'll go back to root with cd ~ and type:
root@floydhub# sudo vi /etc/apt/sources.list
The first line of that file will read: "deb http:// archive.ubuntu.com/ubuntu xenial main restricted".
Currently, 'xenial' (or whatever is there) designates the Ubuntu release running on floydhub.
We'll go to https://github.com/tesseract-ocr/tesseract/wiki and look for the relevant link for that Ubuntu release, found under Ubuntu - PPA / packages from notesalexp.org.
Following that link, we'll find the command line to add the right tesseract repository. In my case (ie, xenial), it was:
root@floydhub# sudo add-apt-repository ppa:alex-p/tesseract-ocr
Then we update with:
root@floydhub# sudo apt-get update
Now we can finally go back to the beginning and install tesseract with:
root@floydhub# sudo apt install tesseract-ocr
root@floydhub# sudo apt install libtesseract-dev
This will allow us to run tesseract commands in the terminal. To interact with tesseract from inside the notebook, we'll need to also install pytesseract:
root@floydhub# pip3 install pytesseract
Now we're ready for some code (see https://pypi.org/project/pytesseract/ for documentation):
from PIL import Image
im = Image.open('my_image.png')
print(pytesseract.image_to_string(im, lang='eng')) # string parsing
print(pytesseract.image_to_boxes(im, lang='eng')) # bounding box parsing
I hope this helps others.