centos编译安装tesseract-ocr 3.05

  • centos编译安装tesseract-ocr 3.05已关闭评论
  • 623 views
  • A+
所属分类:运维实战

下载leptonica

tesseract官网给出的如下:

You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.

下载地址:http://www.leptonica.com/download.html,我这里下载的是leptonica-1.75.3

编译安装:

tar zxvf leptonica-1.75.3.tar.gz
cd leptonica-1.75.3
 ./configure 
 make&&make install

 

编译安装tesseract

wget https://github.com/tesseract-ocr/tesseract/archive/tesseract-3.05.01.tar.gz
tar zxvf tesseract-3.05.01.tar.gz
cd tesseract-3.05.01
./autogen.sh

问题一

报错:
[root@iZwz9bpg2u1r39ml9st8qzZ tesseract-master]# ./autogen.sh 
Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
./autogen.sh: line 59: bail_out: command not found
Running aclocal
./autogen.sh: line 82: aclocal: command not found
Something went wrong, bailing out!
解决:yum install automake -y

 

问题二

报错:
Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
./autogen.sh: line 59: bail_out: command not found
Running aclocal
Running 
./autogen.sh: line 87: -f: command not found
Something went wrong, bailing out!
解决:yum install libtool -y

问题三

报错:
Leptonica 1.74 or higher is required. Try to install libleptonica-dev package
解决:
配置一下leptonica的环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib
export LIBLEPT_HEADERSDIR=/usr/local/include
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

当执行autogen.sh出现如下时,就检测OK了

Running aclocal
Running /usr/bin/libtoolize
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
Running autoheader
Running automake --add-missing --copy
unittest/Makefile.am:63: variable `EXTRA_apiexample_test_DEPENDENCIES' is defined but no program or
unittest/Makefile.am:63: library has `EXTRA_apiexample_test' as canonical name (possible typo)
Running autoconf

All done.
To build the software now, do something like:

$ ./configure [--enable-debug] [...other options]

安装:

./configure
make&&make install

安装完成后下载语言包,下载地址:https://github.com/tesseract-ocr/tessdata

我这里就下了个中文跟英文的,如下:

[root@iZwz9bpg2u1r39ml9st8qzZ tessdata]# pwd
/usr/local/share/tessdata
[root@iZwz9bpg2u1r39ml9st8qzZ tessdata]# ls
chi_sim.traineddata  chi_tra.traineddata  configs eng.traineddata pdf.ttf tessconfigs

 

下来测试一下:

[root@iZwz9bpg2u1r39ml9st8qzZ ~]# tesseract 
Usage:
 tesseract --help | --help-psm | --help-oem | --version
 tesseract --list-langs [--tessdata-dir PATH]
 tesseract --print-parameters [options...] [configfile...]
 tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
 --tessdata-dir PATH Specify the location of tessdata path.
 --user-words PATH Specify the location of user words file.
 --user-patterns PATH Specify the location of user patterns file.
 -l LANG[+LANG] Specify language(s) used for OCR.
 -c VAR=VALUE Set value for config variables.
 Multiple -c arguments are allowed.
 --psm NUM Specify page segmentation mode.
 --oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
 0 Orientation and script detection (OSD) only.
 1 Automatic page segmentation with OSD.
 2 Automatic page segmentation, but no OSD, or OCR.
 3 Fully automatic page segmentation, but no OSD. (Default)
 4 Assume a single column of text of variable sizes.
 5 Assume a single uniform block of vertically aligned text.
 6 Assume a single uniform block of text.
 7 Treat the image as a single text line.
 8 Treat the image as a single word.
 9 Treat the image as a single word in a circle.
 10 Treat the image as a single character.
 11 Sparse text. Find as much text as possible in no particular order.
 12 Sparse text with OSD.
 13 Raw line. Treat the image as a single text line,
 bypassing hacks that are Tesseract-specific.
OCR Engine modes:
 0 Original Tesseract only.
 1 Cube only.
 2 Tesseract + cube.
 3 Default, based on what is available.

Single options:
 -h, --help Show this help message.
 --help-psm Show page segmentation modes.
 --help-oem Show OCR Engine modes.
 -v, --version Show version information.
 --list-langs List available languages for tesseract engine.
 --print-parameters Print tesseract parameters to stdout.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

111111111

  • 安卓客户端下载
  • 微信扫一扫
  • weinxin
  • 微信公众号
  • 微信公众号扫一扫
  • weinxin
avatar