• Apryse API
Show / Hide Table of Contents
  • pdftron
    • PDFNet
    • PDFNet.CMSType
    • PDFNet.CharacterOrdering
    • PDFNet.ConnectionErrorHandlingMode
    • PDFNet.ConnectionErrorProc
    • PDFNet.ConnectionErrorProcNative
    • PDFNetLoader
  • pdftron.Common
    • ByteRange
    • Iterator<T>
    • Matrix2D
    • PDFNetException
    • PDFNetException.ErrorCodes
    • ProgressMonitor
    • Utility
  • pdftron.Crypto
    • AlgorithmIdentifier
    • AlgorithmParams
    • DigestAlgorithm
    • DigestAlgorithm.Type
    • ObjectIdentifier
    • ObjectIdentifier.Predefined
    • RSASSAPSSParams
    • X501AttributeTypeAndValue
    • X501DistinguishedName
    • X509Certificate
    • X509Extension
  • pdftron.FDF
    • FDFDoc
    • FDFField
    • FDFFieldIterator
    • XFDFExportOptions
  • pdftron.Filters
    • Filter
    • Filter.ReferencePos
    • FilterReader
    • FilterWriter
    • FlateEncode
    • MappedFile
    • MappedFile.OpenMode
    • MemoryFilter
  • pdftron.Layout
    • ContentElement
    • ContentNode
    • ContentNodeIterator
    • FlowDocument
    • List
    • List.NumberFormat
    • ListItem
    • Paragraph
    • Paragraph.TextJustification
    • Table
    • TableCell
    • TableCell.CellAlignmentHorizontal
    • TableCell.CellAlignmentVertical
    • TableRow
    • TextRun
    • TextStyledElement
  • pdftron.PDF
    • Action
    • Action.FormActionFlag
    • Action.Type
    • ActionParameter
    • AdvancedImagingConvertOptions
    • AdvancedImagingModule
    • Annot
    • Annot.AnnotationState
    • Annot.BorderStyle
    • Annot.BorderStyle.Style
    • Annot.Flag
    • Annot.Type
    • BarcodeModule
    • BarcodeOptions
    • BarcodeOptions.BarcodeOrientation
    • BarcodeOptions.BarcodeProfile
    • BarcodeOptions.BarcodeTypeGroup
    • BarcodeOptions.OutputFormat
    • BitmapInfo
    • Bookmark
    • CADConvertOptions
    • CADConvertOptions.LayoutSortOrder
    • CADModule
    • CMSSignatureOptions
    • CharData
    • CharIterator
    • ColorPt
    • ColorSpace
    • ColorSpace.Type
    • ContentReplacer
    • ConversionOptions
    • Convert
    • Convert.EPUBOutputOptions
    • Convert.ExcelOutputOptions
    • Convert.ExcelOutputOptions.SearchableImageSetting
    • Convert.FlattenFlag
    • Convert.FlattenThresholdFlag
    • Convert.HTMLOutputOptions
    • Convert.HTMLOutputOptions.ContentReflowSetting
    • Convert.HTMLOutputOptions.SearchableImageSetting
    • Convert.OutputOptionsOCR
    • Convert.OutputOptionsOCR.LanguageChoice
    • Convert.OutputOptionsOCR.PreferredOCREngine
    • Convert.PowerPointOutputOptions
    • Convert.PowerPointOutputOptions.SearchableImageSetting
    • Convert.Printer
    • Convert.Printer.Mode
    • Convert.SVGOutputOptions
    • Convert.StructuredOutputOptions
    • Convert.StructuredOutputOptions.SectionConversionSetting
    • Convert.TiffOutputOptions
    • Convert.WordOutputOptions
    • Convert.WordOutputOptions.SearchableImageSetting
    • Convert.WordOutputOptions.WordOutputFormat
    • Convert.XODOutputOptions
    • Convert.XODOutputOptions.AnnotationOutputFlag
    • Convert.XPSOutputCommonOptions
    • Convert.XPSOutputOptions
    • DataExtractionModule
    • DataExtractionModule.DataExtractionEngine
    • DataExtractionOptions
    • Date
    • Destination
    • Destination.FitType
    • DiffOptions
    • DigitalSignatureField
    • DigitalSignatureField.DocumentPermissions
    • DigitalSignatureField.FieldPermissions
    • DigitalSignatureField.SubFilterType
    • DigitalSignatureFieldIterator
    • DisallowedChange
    • DisallowedChange.Type
    • DocumentConversion
    • DocumentConversionResult
    • Element
    • Element.Type
    • ElementBuilder
    • ElementReader
    • ElementWriter
    • ElementWriter.WriteMode
    • EmbeddedTimestampVerificationResult
    • Field
    • Field.EventType
    • Field.Flag
    • Field.TextJustification
    • Field.Type
    • FieldIterator
    • FileSpec
    • FindReplace
    • FindReplaceOptions
    • FindReplaceOptions.HorizAlignment
    • FindReplaceOptions.MatchType
    • FindReplaceOptions.ReflowType
    • Flattener
    • Flattener.FlattenMode
    • Flattener.Threshold
    • Font
    • Font.Encoding
    • Font.StandardType1Font
    • Font.Type
    • FontCharCodeIterator
    • Function
    • Function.Type
    • GSChangesIterator
    • GState
    • GState.BlendMode
    • GState.GStateAttribute
    • GState.LineCap
    • GState.LineJoin
    • GState.RenderingIntent
    • GState.TextRenderingMode
    • GeometryCollection
    • GeometryCollectionSnappingMode
    • HTML2PDF
    • HTML2PDF.Proxy
    • HTML2PDF.Proxy.Type
    • HTML2PDF.TOCSettings
    • HTML2PDF.WebPageSettings
    • HTML2PDF.WebPageSettings.ErrorHandling
    • Highlights
    • Image
    • Image.InputFilter
    • Image2RGB
    • Image2RGBA
    • MergeXFDFOptions
    • OCRModule
    • OCROptions
    • OfficeToPDFOptions
    • OfficeToPDFOptions.AnimationMode
    • OfficeToPDFOptions.DisplayComments
    • OfficeToPDFOptions.DisplaySpeakerNotes
    • OfficeToPDFOptions.StructureTagLevel
    • OfficeToPDFOptions.UpdateDynamicFields
    • Optimizer
    • Optimizer.ImageSettings
    • Optimizer.ImageSettings.CompressionMode
    • Optimizer.ImageSettings.DownsampleMode
    • Optimizer.MonoImageSettings
    • Optimizer.MonoImageSettings.CompressionMode
    • Optimizer.MonoImageSettings.DownsampleMode
    • Optimizer.OptimizerSettings
    • Optimizer.TextSettings
    • OptionsBase
    • PDF2HtmlReflowParagraphsModule
    • PDF2WordModule
    • PDFDoc
    • PDFDoc.ExtractFlag
    • PDFDoc.FlattenAnnotationFlag
    • PDFDoc.InsertFlag
    • PDFDoc.SignaturesVerificationStatus
    • PDFDocGenerator
    • PDFDocInfo
    • PDFDocViewPrefs
    • PDFDocViewPrefs.PageLayout
    • PDFDocViewPrefs.PageMode
    • PDFDocViewPrefs.ViewerPref
    • PDFDraw
    • PDFDraw.PixelFormat
    • PDFNetInternalTools
    • PDFNetInternalToolsLogBackend
    • PDFNetInternalToolsLogLevel
    • PDFRasterizer
    • PDFRasterizer.ColorPostProcessMode
    • PDFRasterizer.OverprintPreviewMode
    • PDFRasterizer.Type
    • Page
    • Page.Box
    • Page.Rotate
    • PageIterator
    • PageLabel
    • PageLabel.Style
    • PageSet
    • PageSet.Filter
    • PathData
    • PathData.PathSegmentType
    • PatternColor
    • PatternColor.TilingType
    • PatternColor.Type
    • Point
    • Print
    • PrintToPdfModule
    • PrintToPdfOptions
    • PrinterMode
    • PrinterMode.DuplexMode
    • PrinterMode.NUp
    • PrinterMode.NUpPageOrder
    • PrinterMode.Orientation
    • PrinterMode.OutputColor
    • PrinterMode.OutputQuality
    • PrinterMode.PaperSize
    • PrinterMode.PrintContentTypes
    • PrinterMode.ScaleType
    • QuadPoint
    • Rect
    • RectCollection
    • Redactor
    • Redactor.Appearance
    • Redactor.Redaction
    • Reflow
    • RefreshOptions
    • SVGConvertOptions
    • Shading
    • Shading.Type
    • ShapedText
    • ShapedText.FailureReason
    • ShapedText.ShapingStatus
    • Stamper
    • Stamper.HorizontalAlignment
    • Stamper.SizeType
    • Stamper.TextAlignment
    • Stamper.VerticalAlignment
    • StructuredOutputModule
    • TaggingOptions
    • TemplateDocument
    • TemplateDocumentResult
    • TextDiffOptions
    • TextExtractor
    • TextExtractor.CharRange
    • TextExtractor.Line
    • TextExtractor.ProcessingFlags
    • TextExtractor.Style
    • TextExtractor.Word
    • TextExtractor.XMLOutputFlags
    • TextRange
    • TextSearch
    • TextSearch.ResultCode
    • TextSearch.SearchMode
    • TimestampingConfiguration
    • TimestampingResult
    • TransPDF
    • TransPDFOptions
    • TrustVerificationResult
    • VerificationOptions
    • VerificationOptions.CertificateTrustFlag
    • VerificationOptions.SignatureVerificationSecurityLevel
    • VerificationOptions.TimeMode
    • VerificationResult
    • VerificationResult.DigestStatus
    • VerificationResult.DocumentStatus
    • VerificationResult.ModificationPermissionsStatus
    • VerificationResult.TrustStatus
    • ViewChangeCollection
    • ViewerOptimizedOptions
    • WebFontDownloader
    • WordToPDFOptions
  • pdftron.PDF.Annots
    • Caret
    • CheckBoxWidget
    • Circle
    • ComboBoxWidget
    • FileAttachment
    • FileAttachment.Icon
    • FreeText
    • FreeText.IntentName
    • Highlight
    • Ink
    • Line
    • Line.CapPos
    • Line.EndingStyle
    • Line.IntentType
    • Link
    • Link.HighlightingMode
    • ListBoxWidget
    • Markup
    • Markup.BorderEffect
    • Movie
    • PolyLine
    • PolyLine.IntentType
    • Polygon
    • Popup
    • PushButtonWidget
    • RadioButtonGroup
    • RadioButtonWidget
    • Redaction
    • Redaction.QuadForm
    • RubberStamp
    • RubberStamp.Icon
    • Screen
    • Screen.IconCaptionRelation
    • Screen.ScaleCondition
    • Screen.ScaleType
    • SignatureWidget
    • Sound
    • Sound.Icon
    • Square
    • Squiggly
    • StrikeOut
    • Text
    • Text.Icon
    • TextMarkup
    • TextWidget
    • Underline
    • Watermark
    • Widget
    • Widget.HighlightingMode
    • Widget.IconCaptionRelation
    • Widget.ScaleCondition
    • Widget.ScaleType
  • pdftron.PDF.OCG
    • Config
    • Context
    • Context.OCDrawMode
    • Group
    • OCMD
    • OCMD.VisibilityPolicyType
  • pdftron.PDF.PDFA
    • PDFACompliance
    • PDFACompliance.Conformance
    • PDFACompliance.ErrorCode
    • PDFAOptions
  • pdftron.PDF.PDFUA
    • PDFUAConformance
    • PDFUAConformance.Level
    • PDFUAOptions
  • pdftron.PDF.Struct
    • ContentItem
    • ContentItem.Type
    • SElement
    • STree
  • pdftron.SDF
    • DictIterator
    • DocSnapshot
    • NameTree
    • NameTreeIterator
    • NumberTreeIterator
    • Obj
    • Obj.ObjType
    • ObjSet
    • PDFTronCustomSecurityHandler
    • ResultSnapshot
    • SDFDoc
    • SDFDoc.SaveOptions
    • SecurityHandler
    • SecurityHandler.AlgorithmType
    • SecurityHandler.Permission
    • SignatureHandler
    • SignatureHandlerId
    • UndoManager

Class OCROptions

Inheritance
object
OptionsBase
OCROptions
Implements
IDisposable
Inherited Members
OptionsBase.mObjSet
OptionsBase.mDict
OptionsBase.ColorPtToNumber(ColorPt)
OptionsBase.ColorPtFromNumber(double)
OptionsBase.GetArray(string)
OptionsBase.PutNumber(string, double)
OptionsBase.PutBool(string, bool)
OptionsBase.PutText(string, string)
OptionsBase.PutRect(string, Rect)
OptionsBase.PushBackNumber(string, double)
OptionsBase.PushBackBool(string, bool)
OptionsBase.PushBackText(string, string)
OptionsBase.PushBackRect(string, Rect)
OptionsBase.RectFromArray(Obj)
OptionsBase.insertRectCollection(string, RectCollection, int)
OptionsBase.GetInternalObj()
OptionsBase.Dispose()
OptionsBase.Dispose(bool)
OptionsBase.Destroy()
object.Equals(object)
object.Equals(object, object)
object.GetHashCode()
object.GetType()
object.MemberwiseClone()
object.ReferenceEquals(object, object)
object.ToString()
Namespace: pdftron.PDF
Assembly: PDFTronDotNet.dll
Syntax
public class OCROptions : OptionsBase, IDisposable

Constructors

OCROptions()

Constructor.

Declaration
public OCROptions()

Methods

AddDPI(int)

Sets the value for DPI in the options object.

Knowing proper image resolution is important, as it enables the OCR engine to translate pixel heights of characters to their respective font sizes.

We do our best to retrieve resolution information from the input's metadata, however it occasionally can be corrupt or missing. Hence we allow manual override of source's resolution, which supersedes any metadata found (both explicit as in image metadata and implicit as in PDF).

If input is a PDF file, the SDK will render PDF page for OCR as a raster image with 300 DPI resolution. Otherwise, if input is an image, SDK will pass to OCR raster image in original resolution. Exceptionally, if the user, by using this method, suggests the resolution (say 300 DPI), the SDK will try do as suggested.

Here, some restrictions apply:

  • New image size in pixels (Width X Height) cannot exceed 75 MP (Mega Pixels)
  • Each image side (Width or Height), in pixels, cannot exceed 16 bits (65,535)

To achieve that, SDK will iteratively scale down the DPI till both conditions are met.

Declaration
public OCROptions AddDPI(int dpi)
Parameters
Type Name Description
int dpi

The new value for DPI.

Returns
Type Description
OCROptions

This object, for call chaining.

AddIgnoreZonesForPage(RectCollection, int)

Adds a zones to the IgnoreZones array.

It is an optional list of areas that will be excluded from analysis.

The meaning and handling of the passed region coordinates will vary depending on:

  • Whether input is PDF (rather than an image), and
  • Whether SetUsePDFPageCoords() is called with value true.

If either of above is true, then the region coordinates have to be given in PDF Points and coodinate space origin at bottom-left.

Otherwise:

  • The input is a raster image,
  • The coordinates has to be given in pixel units and with coordinate space origin at top-left

See also: ProcessPDF() ImageToPDF() SetUsePDFPageCoords() AddTextZonesForPage().

Declaration
public OCROptions AddIgnoreZonesForPage(RectCollection regions, int pageNum)
Parameters
Type Name Description
RectCollection regions

The new zones to add to IgnoreZonesForPage.

int pageNum

The page number the added regions belong to

Returns
Type Description
OCROptions

This object, for call chaining.

AddLang(string)

Adds a language to the Langs array.

OCR engine is prepared for recognition of a predefined set of language scripts. Various scripts have differences in glyphs and it is recommended that OCR engine is instructed to use a correct glyph set to match the appropriate Unicode. At this moment, there are two avaliable engines for use. Let we refer them by their executable names: OCRModule and OCRModuleIRIS.

Here is a list of supported languages by their codes that may be passed as a parameter. One language per call to AddLang(). Use multiple calls to AddLang() to add additional languages.

Warning: Be aware that adding too many language codes at once may lead to wrong altertnatives regarding Latin characters with accents.

Language CodeLanguage Code
English eng Chinese Simpl.chi_sim
French fra Chinese Trad. chi_tra
German deu Japanese jpn
Italian ita Korean kor
Russian rus
Spanish spa

Here are the languages supported out of the box per engine:

Executable NameSupported Codes
OCRModule eng fra deu ita spa rus
OCRModuleIRIS eng fra deu ita spa rus chi_sim chi_tra jpn kor

Note: OCRModuleIRIS allows mix of a single Asian language and just English.

Declaration
public OCROptions AddLang(string lang)
Parameters
Type Name Description
string lang

The new language to add to Langs.

Returns
Type Description
OCROptions

This object, for call chaining.

AddTextZonesForPage(RectCollection, int)

Adds a zones to the TextZones array.

It is as an optional list of known text zones that will be used to improve OCR quality.

See also: AddIgnoreZonesForPage().

Declaration
public OCROptions AddTextZonesForPage(RectCollection regions, int pageNum)
Parameters
Type Name Description
RectCollection regions

The new zones to add to TextZonesForPage.

int pageNum

The page number the added regions belong to.

Returns
Type Description
OCROptions

This object, for call chaining.

GetAutoRotate()

Gets the value AutoRotate from the options object.

Default value is false. Setting to true will deskew the image before conducting OCR.

Note: This function doesn't apply to IRIS OCR module.

Declaration
public bool GetAutoRotate()
Returns
Type Description
bool

The current value for AutoRotate.

GetDPI()

Gets the value DPI from the options object.

Knowing proper image resolution is important, as it enables the OCR engine to translate pixel heights of characters to their respective font sizes.

We do our best to retrieve resolution information from the input's metadata, however it occasionally can be corrupt or missing. Hence we allow manual override of source's resolution, which supersedes any metadata found (both explicit as in image metadata and implicit as in PDF).

If input is a PDF file, the SDK will render PDF page for OCR as a raster image with 300 DPI resolution. Otherwise, if input is an image, SDK will pass to OCR raster image in original resolution. Exceptionally, if the user, by using this method, suggests the resolution (say 300 DPI), the SDK will try do as suggested.

Here, some restrictions apply:

  • New image size in pixels (Width X Height) cannot exceed 75 MP (Mega Pixels)
  • Each image side (Width or Height), in pixels, cannot exceed 16 bits (65,535)

To achieve that, SDK will iteratively scale down the DPI till both conditions are met.

Declaration
public int GetDPI()
Returns
Type Description
int

The current value for DPI.

GetIgnoreExistingText()

Gets the value IgnoreExistingText from the options object.

Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true will cause a pre-existing text to be duplicated with the OCR-ed ones in the PDF document or in GetOCRJsonFromPDF() and GetOCRXmlFromPDF() results.

Deprecated! Use GetIncludeExistingText instead.

Declaration
public bool GetIgnoreExistingText()
Returns
Type Description
bool

The current value for IgnoreExistingText.

GetIncludeExistingText()

Gets the value IncludeExistingText from the options object.

Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true will cause a pre-existing text to be duplicated with the OCR-ed ones in the PDF document or in GetOCRJsonFromPDF() and GetOCRXmlFromPDF() results.

Declaration
public bool GetIncludeExistingText()
Returns
Type Description
bool

The current value for IncludeExistingText.

GetOCREngine()

Gets the value OCREngine from the options object.

Options include 'default' or 'iris'. Chosen module must be present and correctly licensed.

Declaration
public string GetOCREngine()
Returns
Type Description
string

The current value for OCREngine.

GetUsePDFPageCoords()

Gets the value UsePDFPageCoords from the options object.

Defines the coordinate system, scaling and units. SDK and OCRModule will refer to this setting while dealing with potential input zone rectangle(s) (ignorable or text) and with the output result as well. The default value is false which corresponds to raster image input.

Here are the meanings for value:

ValueOrigin +Y axis dir.Unit Size Expected input
falsetop-left Downwards Pixel1/DPI inchesImage
true bottom-leftUpwards Point1/72 inches PDF

Note that, OCRModule backend has no notion about the SDK's real input. So, if not explicitelly instructed via this call and value, it will work by default and will report the coordinates as for false. This is important to know for cases when the call for OCR service comes from GetOCRJsonFromPDF() or GetOCRXmlFromPDF() where the results should be correct in JSON or XML format - and removes the need for user to do additional adjustments.

See also: AddIgnoreZonesForPage() AddTextZonesForPage() AddDPI() GetOCRJsonFromImage() GetOCRXmlFromImage().

Declaration
public bool GetUsePDFPageCoords()
Returns
Type Description
bool

The current value for UsePDFPageCoords.

SetAutoRotate(bool)

Sets the value for AutoRotate in the options object.

Default value is false. Setting to true will deskew the image before conducting OCR.

Note: This function doesn't apply to IRIS OCR module.

Declaration
public OCROptions SetAutoRotate(bool value)
Parameters
Type Name Description
bool value

The new value for AutoRotate.

Returns
Type Description
OCROptions

This object, for call chaining.

SetIgnoreExistingText(bool)

Sets the value for IgnoreExistingText in the options object.

Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true will cause a pre-existing text to be duplicated with the OCR-ed ones in the PDF document or in GetOCRJsonFromPDF() and GetOCRXmlFromPDF() results.

Deprecated! Use SetIncludeExistingText instead.

Declaration
public OCROptions SetIgnoreExistingText(bool value)
Parameters
Type Name Description
bool value

The new value for IgnoreExistingText.

Returns
Type Description
OCROptions

This object, for call chaining.

SetIncludeExistingText(bool)

Sets the value for IncludeExistingText in the options object.

Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true will cause a pre-existing text to be duplicated with the OCR-ed ones in the PDF document or in GetOCRJsonFromPDF() and GetOCRXmlFromPDF() results.

Declaration
public OCROptions SetIncludeExistingText(bool value)
Parameters
Type Name Description
bool value

The new value for IncludeExistingText.

Returns
Type Description
OCROptions

This object, for call chaining.

SetOCREngine(string)

Sets the value for OCREngine in the options object.

Options include 'default' or 'iris'. Chosen module must be present and correctly licensed.

Declaration
public OCROptions SetOCREngine(string value)
Parameters
Type Name Description
string value

The new value for OCREngine.

Returns
Type Description
OCROptions

This object, for call chaining.

SetUsePDFPageCoords(bool)

Sets the value for UsePDFPageCoords in the options object.

Defines the coordinate system, scaling and units. SDK and OCRModule will refer to this setting while dealing with potential input zone rectangle(s) (ignorable or text) and with the output result as well. The default value is false which corresponds to raster image input.

Here are the meanings for value:

ValueOrigin +Y axis dir.Unit Size Expected input
falsetop-left Downwards Pixel1/DPI inchesImage
true bottom-leftUpwards Point1/72 inches PDF

Note that, OCRModule backend has no notion about the SDK's real input. So, if not explicitelly instructed via this call and value, it will work by default and will report the coordinates as for false. This is important to know for cases when the call for OCR service comes from GetOCRJsonFromPDF() or GetOCRXmlFromPDF() where the results should be correct in JSON or XML format - and removes the need for user to do additional adjustments.

See also: AddIgnoreZonesForPage() AddTextZonesForPage() AddDPI() GetOCRJsonFromImage() GetOCRXmlFromImage().

Declaration
public OCROptions SetUsePDFPageCoords(bool value)
Parameters
Type Name Description
bool value

The new value for UsePDFPageCoords.

Returns
Type Description
OCROptions

This object, for call chaining.

Implements

IDisposable
In this article
Back to top Generated by DocFX