Copyright (C) 2010, 2011, 2014 EPITA Research and Development
Laboratory (LRDE).

This file is part of Olena.

Olena is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation, version 2 of the License.

Olena is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public License
along with Olena.  If not, see <http://www.gnu.org/licenses/>.

The complete GNU General Public License Notice can also be found in
the 'COPYING' file in the root directory.

----------------------------------------------------------------------


=======================
 Introduction to Scribo
=======================

Scribo aims to provide tools for Document Image Analysis (DIA).


========
 Content
========

demo/

  viewer/
	GUI for analyzing a document and save it as PDF or HTML.

  xml2doc/
  	Command line tool to convert a document image and its
	content analysis to PDF or HTML.

scribo/
     The C++ headers of Scribo.

src/
     Various small tools related to DIA. See src/README.


============
Dependencies
============

In order to use all the features provided by Scribo, the following
dependencies should be installed :

- Qt 4.x
- xsltproc
- fop
- GraphicsMagick++ or ImageMagick++
- Tesseract 2.x or 3.01 minimum (strongly recommended)

Warning: Tesseract 2.x is still supported however, you may encounter
crashes because of bugs with Tesseract and specific locales. Moreover,
results quality is far from what it can be expected with version
3.x. Thus, we strongly advise you to use Tesseract 3.x if possible.


============
KNOWN ISSUES
============

- Tesseract 2.x may cause crashes with specific system locales. It is
  fixed in Tesseract 3.x

- From Tesseract 3.00 to 3.01 API has changed, introducing
  incompatibilities. We chose to be compatible with the latest version.
  Hence, Scribo will not compile with Tesseract 3.00.

- Apple's GCC (llvm-{gcc,g++} 4.2) compiler provided with Mac OS X
  Lion (10.7.0) has some trouble with strict aliasing and our code. If
  you encounter any crashes or strange behaviors, try compiling with
  -fno-strict-aliasing flag.

- On Mac OS X Lion (10.7.0) prefer using GraphicsMagick++ instead of
  ImageMagick++. The latter may cause crashes.


============
Bibliography
============

Further information about SCRIBO and document image analysis can be
found in the following papers.

   * `Efficient Multiscale Sauvola's Binarization`.  Guillaume
     Lazzara, Thierry Géraud.  In the International Journal of
     Document Analysis and Recognition (IJDAR), volume 17, number 2,
     pages 105-123, Springer Berlin Heidelberg, June 2014.

   * `Planting, Growing and Pruning Trees: Connected Filters Applied
     to Document Image Analysis`.  Guillaume Lazzara, Thierry Géraud,
     Roland Levillain.  In the proceedings of the 11th IAPR
     International Workshop on Document Analysis Systems (DAS)
     http://das2014.sciencesconf.org/
     Tours, France, April 7 - 10, 2014.

   * `The SCRIBO Module of the Olena Platform: a Free Software
     Framework for Document Image Analysis.`  Guillaume Lazzara,
     Roland Levillain, Thierry Géraud, Yann Jacquelet, Julien
     Marquegnies, Arthur Crépin-Leblond.  In the proceedings of the
     11th International Conference on Document Analysis and
     Recognition (ICDAR)
     http://www.icdar2011.org/
     Beijing, China, September 18 - 21, 2011.


   You can download this paper and related material from
<http://publications.lrde.epita.fr>.



.. Local Variables:
.. mode: rst
.. End:
