• Apryse API
Show / Hide Table of Contents
  • pdftron
    • PDFNet
    • PDFNet.CMSType
    • PDFNet.CharacterOrdering
    • PDFNet.ConnectionErrorHandlingMode
    • PDFNet.ConnectionErrorProc
    • PDFNet.ConnectionErrorProcNative
    • PDFNetLoader
  • pdftron.Common
    • ByteRange
    • Iterator<T>
    • Matrix2D
    • PDFNetException
    • PDFNetException.ErrorCodes
    • ProgressMonitor
    • Utility
  • pdftron.Crypto
    • AlgorithmIdentifier
    • AlgorithmParams
    • DigestAlgorithm
    • DigestAlgorithm.Type
    • ObjectIdentifier
    • ObjectIdentifier.Predefined
    • RSASSAPSSParams
    • X501AttributeTypeAndValue
    • X501DistinguishedName
    • X509Certificate
    • X509Extension
  • pdftron.FDF
    • FDFDoc
    • FDFField
    • FDFFieldIterator
    • XFDFExportOptions
  • pdftron.Filters
    • Filter
    • Filter.ReferencePos
    • FilterReader
    • FilterWriter
    • FlateEncode
    • MappedFile
    • MappedFile.OpenMode
    • MemoryFilter
  • pdftron.Layout
    • ContentElement
    • ContentNode
    • ContentNodeIterator
    • FlowDocument
    • List
    • List.NumberFormat
    • ListItem
    • Paragraph
    • Paragraph.TextJustification
    • Table
    • TableCell
    • TableCell.CellAlignmentHorizontal
    • TableCell.CellAlignmentVertical
    • TableRow
    • TextRun
    • TextStyledElement
  • pdftron.PDF
    • Action
    • Action.FormActionFlag
    • Action.Type
    • ActionParameter
    • AdvancedImagingConvertOptions
    • AdvancedImagingModule
    • Annot
    • Annot.AnnotationState
    • Annot.BorderStyle
    • Annot.BorderStyle.Style
    • Annot.Flag
    • Annot.Type
    • BarcodeModule
    • BarcodeOptions
    • BarcodeOptions.BarcodeOrientation
    • BarcodeOptions.BarcodeProfile
    • BarcodeOptions.BarcodeTypeGroup
    • BarcodeOptions.OutputFormat
    • BitmapInfo
    • Bookmark
    • CADConvertOptions
    • CADConvertOptions.LayoutSortOrder
    • CADModule
    • CMSSignatureOptions
    • CharData
    • CharIterator
    • ColorPt
    • ColorSpace
    • ColorSpace.Type
    • ContentReplacer
    • ConversionOptions
    • Convert
    • Convert.EPUBOutputOptions
    • Convert.ExcelOutputOptions
    • Convert.ExcelOutputOptions.SearchableImageSetting
    • Convert.FlattenFlag
    • Convert.FlattenThresholdFlag
    • Convert.HTMLOutputOptions
    • Convert.HTMLOutputOptions.ContentReflowSetting
    • Convert.HTMLOutputOptions.SearchableImageSetting
    • Convert.OutputOptionsOCR
    • Convert.OutputOptionsOCR.LanguageChoice
    • Convert.OutputOptionsOCR.PreferredOCREngine
    • Convert.PowerPointOutputOptions
    • Convert.PowerPointOutputOptions.SearchableImageSetting
    • Convert.Printer
    • Convert.Printer.Mode
    • Convert.SVGOutputOptions
    • Convert.StructuredOutputOptions
    • Convert.StructuredOutputOptions.SectionConversionSetting
    • Convert.TiffOutputOptions
    • Convert.WordOutputOptions
    • Convert.WordOutputOptions.SearchableImageSetting
    • Convert.WordOutputOptions.WordOutputFormat
    • Convert.XODOutputOptions
    • Convert.XODOutputOptions.AnnotationOutputFlag
    • Convert.XPSOutputCommonOptions
    • Convert.XPSOutputOptions
    • DataExtractionModule
    • DataExtractionModule.DataExtractionEngine
    • DataExtractionOptions
    • Date
    • Destination
    • Destination.FitType
    • DiffOptions
    • DigitalSignatureField
    • DigitalSignatureField.DocumentPermissions
    • DigitalSignatureField.FieldPermissions
    • DigitalSignatureField.SubFilterType
    • DigitalSignatureFieldIterator
    • DisallowedChange
    • DisallowedChange.Type
    • DocumentConversion
    • DocumentConversionResult
    • Element
    • Element.Type
    • ElementBuilder
    • ElementReader
    • ElementWriter
    • ElementWriter.WriteMode
    • EmbeddedTimestampVerificationResult
    • Field
    • Field.EventType
    • Field.Flag
    • Field.TextJustification
    • Field.Type
    • FieldIterator
    • FileSpec
    • FindReplace
    • FindReplaceOptions
    • FindReplaceOptions.HorizAlignment
    • FindReplaceOptions.MatchType
    • FindReplaceOptions.ReflowType
    • Flattener
    • Flattener.FlattenMode
    • Flattener.Threshold
    • Font
    • Font.Encoding
    • Font.StandardType1Font
    • Font.Type
    • FontCharCodeIterator
    • Function
    • Function.Type
    • GSChangesIterator
    • GState
    • GState.BlendMode
    • GState.GStateAttribute
    • GState.LineCap
    • GState.LineJoin
    • GState.RenderingIntent
    • GState.TextRenderingMode
    • GeometryCollection
    • GeometryCollectionSnappingMode
    • HTML2PDF
    • HTML2PDF.Proxy
    • HTML2PDF.Proxy.Type
    • HTML2PDF.TOCSettings
    • HTML2PDF.WebPageSettings
    • HTML2PDF.WebPageSettings.ErrorHandling
    • Highlights
    • Image
    • Image.InputFilter
    • Image2RGB
    • Image2RGBA
    • MergeXFDFOptions
    • OCRModule
    • OCROptions
    • OfficeToPDFOptions
    • OfficeToPDFOptions.AnimationMode
    • OfficeToPDFOptions.DisplayComments
    • OfficeToPDFOptions.DisplaySpeakerNotes
    • OfficeToPDFOptions.StructureTagLevel
    • OfficeToPDFOptions.UpdateDynamicFields
    • Optimizer
    • Optimizer.ImageSettings
    • Optimizer.ImageSettings.CompressionMode
    • Optimizer.ImageSettings.DownsampleMode
    • Optimizer.MonoImageSettings
    • Optimizer.MonoImageSettings.CompressionMode
    • Optimizer.MonoImageSettings.DownsampleMode
    • Optimizer.OptimizerSettings
    • Optimizer.TextSettings
    • OptionsBase
    • PDF2HtmlReflowParagraphsModule
    • PDF2WordModule
    • PDFDoc
    • PDFDoc.ExtractFlag
    • PDFDoc.FlattenAnnotationFlag
    • PDFDoc.InsertFlag
    • PDFDoc.SignaturesVerificationStatus
    • PDFDocGenerator
    • PDFDocInfo
    • PDFDocViewPrefs
    • PDFDocViewPrefs.PageLayout
    • PDFDocViewPrefs.PageMode
    • PDFDocViewPrefs.ViewerPref
    • PDFDraw
    • PDFDraw.PixelFormat
    • PDFNetInternalTools
    • PDFNetInternalToolsLogBackend
    • PDFNetInternalToolsLogLevel
    • PDFRasterizer
    • PDFRasterizer.ColorPostProcessMode
    • PDFRasterizer.OverprintPreviewMode
    • PDFRasterizer.Type
    • Page
    • Page.Box
    • Page.Rotate
    • PageIterator
    • PageLabel
    • PageLabel.Style
    • PageSet
    • PageSet.Filter
    • PathData
    • PathData.PathSegmentType
    • PatternColor
    • PatternColor.TilingType
    • PatternColor.Type
    • Point
    • Print
    • PrintToPdfModule
    • PrintToPdfOptions
    • PrinterMode
    • PrinterMode.DuplexMode
    • PrinterMode.NUp
    • PrinterMode.NUpPageOrder
    • PrinterMode.Orientation
    • PrinterMode.OutputColor
    • PrinterMode.OutputQuality
    • PrinterMode.PaperSize
    • PrinterMode.PrintContentTypes
    • PrinterMode.ScaleType
    • QuadPoint
    • Rect
    • RectCollection
    • Redactor
    • Redactor.Appearance
    • Redactor.Redaction
    • Reflow
    • RefreshOptions
    • SVGConvertOptions
    • Shading
    • Shading.Type
    • ShapedText
    • ShapedText.FailureReason
    • ShapedText.ShapingStatus
    • Stamper
    • Stamper.HorizontalAlignment
    • Stamper.SizeType
    • Stamper.TextAlignment
    • Stamper.VerticalAlignment
    • StructuredOutputModule
    • TaggingOptions
    • TemplateDocument
    • TemplateDocumentResult
    • TextDiffOptions
    • TextExtractor
    • TextExtractor.CharRange
    • TextExtractor.Line
    • TextExtractor.ProcessingFlags
    • TextExtractor.Style
    • TextExtractor.Word
    • TextExtractor.XMLOutputFlags
    • TextRange
    • TextSearch
    • TextSearch.ResultCode
    • TextSearch.SearchMode
    • TimestampingConfiguration
    • TimestampingResult
    • TransPDF
    • TransPDFOptions
    • TrustVerificationResult
    • VerificationOptions
    • VerificationOptions.CertificateTrustFlag
    • VerificationOptions.SignatureVerificationSecurityLevel
    • VerificationOptions.TimeMode
    • VerificationResult
    • VerificationResult.DigestStatus
    • VerificationResult.DocumentStatus
    • VerificationResult.ModificationPermissionsStatus
    • VerificationResult.TrustStatus
    • ViewChangeCollection
    • ViewerOptimizedOptions
    • WebFontDownloader
    • WordToPDFOptions
  • pdftron.PDF.Annots
    • Caret
    • CheckBoxWidget
    • Circle
    • ComboBoxWidget
    • FileAttachment
    • FileAttachment.Icon
    • FreeText
    • FreeText.IntentName
    • Highlight
    • Ink
    • Line
    • Line.CapPos
    • Line.EndingStyle
    • Line.IntentType
    • Link
    • Link.HighlightingMode
    • ListBoxWidget
    • Markup
    • Markup.BorderEffect
    • Movie
    • PolyLine
    • PolyLine.IntentType
    • Polygon
    • Popup
    • PushButtonWidget
    • RadioButtonGroup
    • RadioButtonWidget
    • Redaction
    • Redaction.QuadForm
    • RubberStamp
    • RubberStamp.Icon
    • Screen
    • Screen.IconCaptionRelation
    • Screen.ScaleCondition
    • Screen.ScaleType
    • SignatureWidget
    • Sound
    • Sound.Icon
    • Square
    • Squiggly
    • StrikeOut
    • Text
    • Text.Icon
    • TextMarkup
    • TextWidget
    • Underline
    • Watermark
    • Widget
    • Widget.HighlightingMode
    • Widget.IconCaptionRelation
    • Widget.ScaleCondition
    • Widget.ScaleType
  • pdftron.PDF.OCG
    • Config
    • Context
    • Context.OCDrawMode
    • Group
    • OCMD
    • OCMD.VisibilityPolicyType
  • pdftron.PDF.PDFA
    • PDFACompliance
    • PDFACompliance.Conformance
    • PDFACompliance.ErrorCode
    • PDFAOptions
  • pdftron.PDF.PDFUA
    • PDFUAConformance
    • PDFUAConformance.Level
    • PDFUAOptions
  • pdftron.PDF.Struct
    • ContentItem
    • ContentItem.Type
    • SElement
    • STree
  • pdftron.SDF
    • DictIterator
    • DocSnapshot
    • NameTree
    • NameTreeIterator
    • NumberTreeIterator
    • Obj
    • Obj.ObjType
    • ObjSet
    • PDFTronCustomSecurityHandler
    • ResultSnapshot
    • SDFDoc
    • SDFDoc.SaveOptions
    • SecurityHandler
    • SecurityHandler.AlgorithmType
    • SecurityHandler.Permission
    • SignatureHandler
    • SignatureHandlerId
    • UndoManager

Class DataExtractionOptions

Inheritance
object
OptionsBase
DataExtractionOptions
Implements
IDisposable
Inherited Members
OptionsBase.mObjSet
OptionsBase.mDict
OptionsBase.ColorPtToNumber(ColorPt)
OptionsBase.ColorPtFromNumber(double)
OptionsBase.GetArray(string)
OptionsBase.PutNumber(string, double)
OptionsBase.PutBool(string, bool)
OptionsBase.PutText(string, string)
OptionsBase.PutRect(string, Rect)
OptionsBase.PushBackNumber(string, double)
OptionsBase.PushBackBool(string, bool)
OptionsBase.PushBackText(string, string)
OptionsBase.PushBackRect(string, Rect)
OptionsBase.RectFromArray(Obj)
OptionsBase.insertRectCollection(string, RectCollection, int)
OptionsBase.GetInternalObj()
OptionsBase.Dispose()
OptionsBase.Dispose(bool)
OptionsBase.Destroy()
object.Equals(object)
object.Equals(object, object)
object.GetHashCode()
object.GetType()
object.MemberwiseClone()
object.ReferenceEquals(object, object)
object.ToString()
Namespace: pdftron.PDF
Assembly: PDFTronDotNet.dll
Syntax
public class DataExtractionOptions : OptionsBase, IDisposable

Constructors

DataExtractionOptions()

Constructor.

Declaration
public DataExtractionOptions()

Methods

AddExclusionZonesForPage(RectCollection, int)

Adds an Exclusion Zone to the ExclusionZonesForPage array. Optional list of page areas to be excluded from analysis. Zones should be provided as a collection of Rects paired with a page number. The Rects are then applied to the corresponding page. Rects are specified in User Space coordinates. If this is set, the specified areas will not be analyzed. If neither this nor InclusionZonesForPage is set, the entire page will be analyzed. This option only affects the GenericKeyValue, FormKeyValue, and FormField engines.

Declaration
public DataExtractionOptions AddExclusionZonesForPage(RectCollection regions, int pageNum)
Parameters
Type Name Description
RectCollection regions

List of page areas to be excluded from analysis.

int pageNum

The page number (1-indexed) to which the regions are applied.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

AddInclusionZonesForPage(RectCollection, int)

Adds an Inclusion Zone to the InclusionZonesForPage array. Optional list of page areas to be included in analysis (to the exclusion of all other areas). Zones should be provided as a collection of Rects paired with a page number. The Rects are then applied to the corresponding page. Rects are specified in User Space coordinates. If this is set, only the areas specified will be analyzed. If neither this nor ExclusionZonesForPage is set, the entire page will be analyzed. This option only affects the GenericKeyValue, FormKeyValue, and FormField engines.

Declaration
public DataExtractionOptions AddInclusionZonesForPage(RectCollection regions, int pageNum)
Parameters
Type Name Description
RectCollection regions

List of page areas to be included in analysis.

int pageNum

The page number (1-indexed) to which the regions are applied.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

GetDeepLearningAssist()

Gets the value DeepLearningAssist from the options object. Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

Declaration
public bool GetDeepLearningAssist()
Returns
Type Description
bool

Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

GetDetectEmptyFields()

Gets the value DetectEmptyFields from the options object. Specifies if empty fields should be recognized in the GenericKeyValue engine. The default is true. Users who don't require empty fields could benefit from setting this option to false, thus reducing processing time. This setting only affects the GenericKeyValue engine.

Declaration
public bool GetDetectEmptyFields()
Returns
Type Description
bool

The current value for DetectEmptyFields.

GetFormExtractionEngine()

Gets the value FormExtractionEngine from the options object. Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'.

Declaration
public string GetFormExtractionEngine()
Returns
Type Description
string

Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'.

GetLanguage()

Gets the value Language from the options object. Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Declaration
public string GetLanguage()
Returns
Type Description
string

Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

GetMinimumConfidenceThreshold()

Gets the value MinimumConfidenceThreshold from the options object. Specifies the minimum confidence threshold for a class to be accepted in the DocClassification engine. The default is 0.25. Classes that don't meet the minimum threshold will not be listed in the output JSON. This setting only affects the DocClassification engine.

Declaration
public double GetMinimumConfidenceThreshold()
Returns
Type Description
double

The current value for MinimumConfidenceThreshold.

GetOverlappingFormFieldBehavior()

Gets the value OverlappingFormFieldBehavior from the options object. When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

Declaration
public string GetOverlappingFormFieldBehavior()
Returns
Type Description
string

When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

GetPDFPassword()

Gets the value PDFPassword from the options object. Specifies the password if the PDF requires one. The default is no password.

Declaration
public string GetPDFPassword()
Returns
Type Description
string

Specifies the password if the PDF requires one. The default is no password.

GetPages()

Gets the value Pages from the options object. Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

Declaration
public string GetPages()
Returns
Type Description
string

Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

GetTextRecoveryNSE()

Gets the value TextRecoveryNSE from the options object. Specifies whether to use OCR in order to automatically recover text with a non-standard encoding. Default is true. This only affects the DocStructure engine. Note: This option should only be used by customers who are already familiar with the Solid Documents SDK from being a customer, or if they have been advised by support.

Declaration
public bool GetTextRecoveryNSE()
Returns
Type Description
bool

Specifies whether to use OCR in order to automatically recover text with a non-standard encoding. Default is true.

SetDeepLearningAssist(bool)

Sets the value for DeepLearningAssist in the options object. Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

Declaration
public DataExtractionOptions SetDeepLearningAssist(bool value)
Parameters
Type Name Description
bool value

Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetDetectEmptyFields(bool)

Sets the value for DetectEmptyFields in the options object. Specifies if empty fields should be recognized in the GenericKeyValue engine. The default is true. Users who don't require empty fields could benefit from setting this option to false, thus reducing processing time. This setting only affects the GenericKeyValue engine.

Declaration
public DataExtractionOptions SetDetectEmptyFields(bool value)
Parameters
Type Name Description
bool value

The new value for DetectEmptyFields.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetFormExtractionEngine(string)

Sets the value for FormExtractionEngine in the options object. Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'.

Declaration
public DataExtractionOptions SetFormExtractionEngine(string value)
Parameters
Type Name Description
string value

Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetLanguage(string)

Sets the value for Language in the options object. Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Declaration
public DataExtractionOptions SetLanguage(string value)
Parameters
Type Name Description
string value

Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetMinimumConfidenceThreshold(double)

Sets the value for MinimumConfidenceThreshold in the options object. Specifies the minimum confidence threshold for a class to be accepted in the DocClassification engine. The default is 0.25. Classes that don't meet the minimum threshold will not be listed in the output JSON. This setting only affects the DocClassification engine.

Declaration
public DataExtractionOptions SetMinimumConfidenceThreshold(double value)
Parameters
Type Name Description
double value

The new value for MinimumConfidenceThreshold.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetOverlappingFormFieldBehavior(string)

Sets the value for OverlappingFormFieldBehavior in the options object. When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

Declaration
public DataExtractionOptions SetOverlappingFormFieldBehavior(string value)
Parameters
Type Name Description
string value

When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetPDFPassword(string)

Sets the value for PDFPassword in the options object. Specifies the password if the PDF requires one. The default is no password.

Declaration
public DataExtractionOptions SetPDFPassword(string value)
Parameters
Type Name Description
string value

Specifies the password if the PDF requires one. The default is no password.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetPages(string)

Sets the value for Pages in the options object. Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

Declaration
public DataExtractionOptions SetPages(string value)
Parameters
Type Name Description
string value

Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

SetTextRecoveryNSE(bool)

Sets the value for TextRecoveryNSE in the options object. Specifies whether to use OCR in order to automatically recover text with a non-standard encoding. Default is true. This only affects the DocStructure engine. Note: This option should only be used by customers who are already familiar with the Solid Documents SDK from being a customer, or if they have been advised by support.

Declaration
public DataExtractionOptions SetTextRecoveryNSE(bool value)
Parameters
Type Name Description
bool value

Specifies whether to use OCR in order to automatically recover text with a non-standard encoding. Default is true.

Returns
Type Description
DataExtractionOptions

This object, for call chaining.

Implements

IDisposable
In this article
Back to top Generated by DocFX