Es la interfaz Java del motor de OCR Tesseract. No implementa todos los metodos JNI disponibles, pero pone en práctica los suficiente para ser útil. Más...
Clases | |
class | PageIteratorLevel |
Elementos de la jerarquía de página, utilizados en ResultIterator para proporcionar funciones que operan en cada nivel sin tener que tener 5x tantas funciones. Más... | |
class | PageSegMode |
Modos de ejecucion del motor Tesseract. Más... | |
Métodos públicos | |
TessBaseAPI () | |
Constructs an instance of TessBaseAPI. | |
void | clear () |
Frees up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must call SetImage or SetRectangle before doing any Recognize or Get* operation. | |
void | end () |
Closes down tesseract and free up all memory. End() is equivalent to destructing and reconstructing your TessBaseAPI. | |
Pixa | getRegions () |
ResultIterator | getResultIterator () |
String | getUTF8Text () |
The recognized text is returned as a String which is coded as UTF8. | |
Pixa | getWords () |
boolean | init (String datapath, String language) |
Initializes the Tesseract engine with a specified language model. Returns true on success. | |
int | meanConfidence () |
Returns the mean confidence of text recognition. | |
void | setDebug (boolean enabled) |
Sets debug mode. This controls how much information is displayed in the log during recognition. | |
void | setImage (File file) |
Provides an image for Tesseract to recognize. | |
void | setImage (Bitmap bmp) |
Provides an image for Tesseract to recognize. Does not copy the image buffer. The source image must persist until after Recognize or GetUTF8Chars is called. | |
void | setImage (Pix image) |
Provides a Leptonica pix format image for Tesseract to recognize. Clones the pix object. The source image may be destroyed immediately after SetImage is called, but its contents may not be modified. | |
void | setImage (byte[] imagedata, int width, int height, int bpp, int bpl) |
Provides an image for Tesseract to recognize. Copies the image buffer. The source image may be destroyed immediately after SetImage is called. SetImage clears all recognition results, and sets the rectangle to the full image, so it may be followed immediately by a GetUTF8Text, and it will automatically perform recognition. | |
void | setPageSegMode (int mode) |
void | setRectangle (Rect rect) |
Restricts recognition to a sub-rectangle of the image. Call after SetImage. Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image. | |
void | setRectangle (int left, int top, int width, int height) |
Restricts recognition to a sub-rectangle of the image. Call after SetImage. Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image. | |
boolean | setVariable (String var, String value) |
Set the value of an internal "variable" (of either old or new types). Supply the name of the variable and the value as a string, just as you would in a config file. | |
int[] | wordConfidences () |
Returns all word confidences (between 0 and 100) in an array. The number of confidences should correspond to the number of space-delimited words in GetUTF8Text(). | |
Atributos públicos estáticos | |
static final int | AVS_FASTEST = 0 |
Default accuracy versus speed mode. | |
static final int | AVS_MOST_ACCURATE = 100 |
Slowest and most accurate mode. | |
static final String | VAR_ACCURACYVSPEED = "tessedit_accuracyvspeed" |
Accuracy versus speed setting. | |
static final String | VAR_CHAR_BLACKLIST = "tessedit_char_blacklist" |
Blacklist of characters to not recognize. | |
static final String | VAR_CHAR_WHITELIST = "tessedit_char_whitelist" |
Whitelist of characters to recognize. | |
Métodos protegidos | |
void | finalize () throws Throwable |
Called by the GC to clean up the native data that we set up when we construct the object. | |
Funciones estáticas del 'package' | |
[static initializer] | |
Métodos privados | |
native void | nativeClear () |
native void | nativeConstruct () |
Initializes native data. Must be called on object construction. | |
native void | nativeEnd () |
native void | nativeFinalize () |
Finalizes native data. Must be called on object destruction. | |
native int | nativeGetRegions () |
native int | nativeGetResultIterator () |
native String | nativeGetUTF8Text () |
native int | nativeGetWords () |
native boolean | nativeInit (String datapath, String language) |
native int | nativeMeanConfidence () |
native void | nativeSetDebug (boolean debug) |
native void | nativeSetImageBytes (byte[] imagedata, int width, int height, int bpp, int bpl) |
native void | nativeSetImagePix (int nativePix) |
native void | nativeSetPageSegMode (int mode) |
native void | nativeSetRectangle (int left, int top, int width, int height) |
native boolean | nativeSetVariable (String var, String value) |
native int[] | nativeWordConfidences () |
Métodos privados estáticos | |
static native void | nativeClassInit () |
Initializes static native data. Must be called on object load. | |
Atributos privados | |
int | mNativeData |
Used by the native implementation of the class. |
Es la interfaz Java del motor de OCR Tesseract. No implementa todos los metodos JNI disponibles, pero pone en práctica los suficiente para ser útil.
Definición en la línea 63 del archivo TessBaseAPI.java.
Constructs an instance of TessBaseAPI.
Definición en la línea 165 del archivo TessBaseAPI.java.
com.googlecode.tesseract.android.TessBaseAPI.[static initializer] | ( | ) | [static, package] |
Frees up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must call SetImage or SetRectangle before doing any Recognize or Get* operation.
Definición en la línea 239 del archivo TessBaseAPI.java.
Closes down tesseract and free up all memory. End() is equivalent to destructing and reconstructing your TessBaseAPI.
Once End() has been used, none of the other API functions may be used other than Init and anything declared above it in the class definition.
Definición en la línea 251 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.finalize | ( | ) | throws Throwable [protected] |
Called by the GC to clean up the native data that we set up when we construct the object.
Definición en la línea 174 del archivo TessBaseAPI.java.
The recognized text is returned as a String which is coded as UTF8.
Definición en la línea 405 del archivo TessBaseAPI.java.
boolean com.googlecode.tesseract.android.TessBaseAPI.init | ( | String | datapath, |
String | language | ||
) |
Initializes the Tesseract engine with a specified language model. Returns true
on success.
Instances are now mostly thread-safe and totally independent, but some global parameters remain. Basically it is safe to use multiple TessBaseAPIs in different threads in parallel, UNLESS you use SetVariable on some of the Params in classify and textord. If you do, then the effect will be to change it for all your instances.
The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped. The language is (usually) an ISO 639-3 string or null
will default to eng. It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier.
WARNING: On changing languages, all Tesseract parameters are reset back to their default values. (Which may vary between languages.)
If you have a rare need to set a Variable that controls initialization for a second call to Init you should explicitly call End() and then use SetVariable before Init. This is only a very rare use case, since there are very few uses that require any parameters to be set before Init.
datapath | the parent directory of tessdata ending in a forward slash |
language | (optional) an ISO 639-3 string representing the language |
true
on success Definición en la línea 216 del archivo TessBaseAPI.java.
Returns the mean confidence of text recognition.
Definición en la línea 437 del archivo TessBaseAPI.java.
static native void com.googlecode.tesseract.android.TessBaseAPI.nativeClassInit | ( | ) | [static, private] |
Initializes static native data. Must be called on object load.
native void com.googlecode.tesseract.android.TessBaseAPI.nativeClear | ( | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeConstruct | ( | ) | [private] |
Initializes native data. Must be called on object construction.
native void com.googlecode.tesseract.android.TessBaseAPI.nativeEnd | ( | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeFinalize | ( | ) | [private] |
Finalizes native data. Must be called on object destruction.
native int com.googlecode.tesseract.android.TessBaseAPI.nativeGetRegions | ( | ) | [private] |
native int com.googlecode.tesseract.android.TessBaseAPI.nativeGetResultIterator | ( | ) | [private] |
native String com.googlecode.tesseract.android.TessBaseAPI.nativeGetUTF8Text | ( | ) | [private] |
native int com.googlecode.tesseract.android.TessBaseAPI.nativeGetWords | ( | ) | [private] |
native boolean com.googlecode.tesseract.android.TessBaseAPI.nativeInit | ( | String | datapath, |
String | language | ||
) | [private] |
native int com.googlecode.tesseract.android.TessBaseAPI.nativeMeanConfidence | ( | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeSetDebug | ( | boolean | debug | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeSetImageBytes | ( | byte[] | imagedata, |
int | width, | ||
int | height, | ||
int | bpp, | ||
int | bpl | ||
) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeSetImagePix | ( | int | nativePix | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeSetPageSegMode | ( | int | mode | ) | [private] |
native void com.googlecode.tesseract.android.TessBaseAPI.nativeSetRectangle | ( | int | left, |
int | top, | ||
int | width, | ||
int | height | ||
) | [private] |
native boolean com.googlecode.tesseract.android.TessBaseAPI.nativeSetVariable | ( | String | var, |
String | value | ||
) | [private] |
native int [] com.googlecode.tesseract.android.TessBaseAPI.nativeWordConfidences | ( | ) | [private] |
void com.googlecode.tesseract.android.TessBaseAPI.setDebug | ( | boolean | enabled | ) |
Sets debug mode. This controls how much information is displayed in the log during recognition.
enabled | true to enable debugging mode |
Definición en la línea 294 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setImage | ( | File | file | ) |
Provides an image for Tesseract to recognize.
file | absolute path to the image file |
Definición en la línea 334 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setImage | ( | Bitmap | bmp | ) |
Provides an image for Tesseract to recognize. Does not copy the image buffer. The source image must persist until after Recognize or GetUTF8Chars is called.
bmp | bitmap representation of the image |
Definición en la línea 352 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setImage | ( | Pix | image | ) |
Provides a Leptonica pix format image for Tesseract to recognize. Clones the pix object. The source image may be destroyed immediately after SetImage is called, but its contents may not be modified.
image | Leptonica pix representation of the image |
Definición en la línea 371 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setImage | ( | byte[] | imagedata, |
int | width, | ||
int | height, | ||
int | bpp, | ||
int | bpl | ||
) |
Provides an image for Tesseract to recognize. Copies the image buffer. The source image may be destroyed immediately after SetImage is called. SetImage clears all recognition results, and sets the rectangle to the full image, so it may be followed immediately by a GetUTF8Text, and it will automatically perform recognition.
imagedata | byte representation of the image |
width | image width |
height | image height |
bpp | bytes per pixel |
bpl | bytes per line |
Definición en la línea 394 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setPageSegMode | ( | int | mode | ) |
Sets the page segmentation mode. This controls how much processing the OCR engine will perform before recognizing text.
mode | the page segmentation mode to set |
Definición en la línea 283 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setRectangle | ( | Rect | rect | ) |
Restricts recognition to a sub-rectangle of the image. Call after SetImage. Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image.
rect | the bounding rectangle |
Definición en la línea 306 del archivo TessBaseAPI.java.
void com.googlecode.tesseract.android.TessBaseAPI.setRectangle | ( | int | left, |
int | top, | ||
int | width, | ||
int | height | ||
) |
Restricts recognition to a sub-rectangle of the image. Call after SetImage. Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image.
left | the left bound |
top | the right bound |
width | the width of the bounding box |
height | the height of the bounding box |
Definición en la línea 324 del archivo TessBaseAPI.java.
boolean com.googlecode.tesseract.android.TessBaseAPI.setVariable | ( | String | var, |
String | value | ||
) |
Set the value of an internal "variable" (of either old or new types). Supply the name of the variable and the value as a string, just as you would in a config file.
Example: setVariable(VAR_TESSEDIT_CHAR_BLACKLIST, "xyz"); to ignore x, y and z. * setVariable(VAR_BLN_NUMERICMODE, "1"); to set numeric-only mode. *
setVariable() may be used before open(), but settings will revert to defaults on close().
var | name of the variable |
value | value to set |
Definición en la línea 272 del archivo TessBaseAPI.java.
Returns all word confidences (between 0 and 100) in an array. The number of confidences should correspond to the number of space-delimited words in GetUTF8Text().
Definición en la línea 449 del archivo TessBaseAPI.java.
final int com.googlecode.tesseract.android.TessBaseAPI.AVS_FASTEST = 0 [static] |
Default accuracy versus speed mode.
Definición en la línea 148 del archivo TessBaseAPI.java.
final int com.googlecode.tesseract.android.TessBaseAPI.AVS_MOST_ACCURATE = 100 [static] |
Slowest and most accurate mode.
Definición en la línea 151 del archivo TessBaseAPI.java.
int com.googlecode.tesseract.android.TessBaseAPI.mNativeData [private] |
Used by the native implementation of the class.
Definición en la línea 67 del archivo TessBaseAPI.java.
final String com.googlecode.tesseract.android.TessBaseAPI.VAR_ACCURACYVSPEED = "tessedit_accuracyvspeed" [static] |
Accuracy versus speed setting.
Definición en la línea 160 del archivo TessBaseAPI.java.
final String com.googlecode.tesseract.android.TessBaseAPI.VAR_CHAR_BLACKLIST = "tessedit_char_blacklist" [static] |
Blacklist of characters to not recognize.
Definición en la línea 157 del archivo TessBaseAPI.java.
final String com.googlecode.tesseract.android.TessBaseAPI.VAR_CHAR_WHITELIST = "tessedit_char_whitelist" [static] |
Whitelist of characters to recognize.
Definición en la línea 154 del archivo TessBaseAPI.java.