Loading...
Searching...
No Matches
pdftron::PDF::Line Class Reference

#include <TextExtractor.h>

Public Member Functions

int GetNumWords ()
bool IsSimpleLine ()
const double * GetBBox ()
std::vector< double > GetQuad ()
void GetQuad (double out_quad[8])
Word GetFirstWord ()
Word GetWord (int word_idx)
Line GetNextLine ()
int GetCurrentNum ()
Style GetStyle ()
int GetParagraphID ()
int GetFlowID ()
bool EndsWithHyphen ()
bool IsValid ()
bool operator== (const Line &) const
bool operator!= (const Line &) const
 Line ()

Detailed Description

TextExtractor::Line object represents a line of text on a PDF page. Each line consists of a sequence of words, and each words in one or more styles.

Definition at line 530 of file TextExtractor.h.

Constructor & Destructor Documentation

◆ Line()

pdftron::PDF::Line::Line ( )

Member Function Documentation

◆ EndsWithHyphen()

bool pdftron::PDF::Line::EndsWithHyphen ( )
Returns
true is this line of text ends with a hyphen (i.e. '-'), false otherwise.

◆ GetBBox()

const double * pdftron::PDF::Line::GetBBox ( )
Parameters
out_bboxThe bounding box for this line (in unrotated page coordinates).
Note
To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().

◆ GetCurrentNum()

int pdftron::PDF::Line::GetCurrentNum ( )
Returns
the index of this line of the current page.

◆ GetFirstWord()

Word pdftron::PDF::Line::GetFirstWord ( )
Returns
the first word in the line.
Note
To traverse the list of all words on this line use word.GetNextWord().

◆ GetFlowID()

int pdftron::PDF::Line::GetFlowID ( )
Returns
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines/paragraphs belong to which flows.

◆ GetNextLine()

Line pdftron::PDF::Line::GetNextLine ( )
Returns
the next line on the page.

◆ GetNumWords()

int pdftron::PDF::Line::GetNumWords ( )
Returns
The number of words in this line.

◆ GetParagraphID()

int pdftron::PDF::Line::GetParagraphID ( )
Returns
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines belong to which paragraphs.

◆ GetQuad() [1/2]

std::vector< double > pdftron::PDF::Line::GetQuad ( )
Returns
The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).

◆ GetQuad() [2/2]

void pdftron::PDF::Line::GetQuad ( double out_quad[8])
Parameters
out_quadThe quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).

◆ GetStyle()

Style pdftron::PDF::Line::GetStyle ( )
Returns
predominant style for this line.

◆ GetWord()

Word pdftron::PDF::Line::GetWord ( int word_idx)
Returns
the i-th word in this line.
Parameters
word_idxA integer representing the index of the word to get.

◆ IsSimpleLine()

bool pdftron::PDF::Line::IsSimpleLine ( )
Returns
true is this line is not rotated (i.e. if the quadrilaterals returned by GetBBox() and GetQuad() coincide).

◆ IsValid()

bool pdftron::PDF::Line::IsValid ( )
Returns
true if this is a valid line, false otherwise.

◆ operator!=()

bool pdftron::PDF::Line::operator!= ( const Line & ) const

◆ operator==()

bool pdftron::PDF::Line::operator== ( const Line & ) const

The documentation for this class was generated from the following file: