#include <TextExtractor.h>
TextExtractor::Word object represents a word on a PDF page. Each word contains a sequence of characters in one or more styles (see TextExtractor::Style).
Definition at line 430 of file TextExtractor.h.
◆ Word()
| pdftron::PDF::Word::Word |
( |
| ) |
|
◆ GetBBox() [1/2]
| Rect pdftron::PDF::Word::GetBBox |
( |
| ) |
|
- Parameters
-
| out_bbox | The bounding box for this word (in unrotated page coordinates). |
- Note
- To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().
◆ GetBBox() [2/2]
| void pdftron::PDF::Word::GetBBox |
( |
double | out_bbox[4] | ) |
|
◆ GetCharStyle()
| Style pdftron::PDF::Word::GetCharStyle |
( |
int | char_idx | ) |
|
- Parameters
-
| char_idx | The index of a character in this word. |
- Returns
- The style associated with a given character.
◆ GetCurrentNum()
| int pdftron::PDF::Word::GetCurrentNum |
( |
| ) |
|
- Returns
- the index of this word of the current line. A word that starts the line will return 0, whereas the last word in the line will return (line.GetNumWords()-1).
◆ GetGlyphQuad() [1/2]
| std::vector< double > pdftron::PDF::Word::GetGlyphQuad |
( |
int | glyph_idx | ) |
|
- Parameters
-
| glyph_idx | The index of a glyph in this word. |
| out_quad | The quadrilateral representing a tight bounding box for a given glyph in the word (in unrotated page coordinates). |
◆ GetGlyphQuad() [2/2]
| void pdftron::PDF::Word::GetGlyphQuad |
( |
int | glyph_idx, |
|
|
double | out_quad[8] ) |
◆ GetNextWord()
| Word pdftron::PDF::Word::GetNextWord |
( |
| ) |
|
- Returns
- the next word on the current line.
◆ GetNumGlyphs()
| int pdftron::PDF::Word::GetNumGlyphs |
( |
| ) |
|
- Returns
- The number of glyphs in this word.
◆ GetQuad() [1/2]
| std::vector< double > pdftron::PDF::Word::GetQuad |
( |
| ) |
|
- Parameters
-
| out_quad | The quadrilateral representing a tight bounding box for this word (in unrotated page coordinates). |
◆ GetQuad() [2/2]
| void pdftron::PDF::Word::GetQuad |
( |
double | out_quad[8] | ) |
|
◆ GetString()
| const Unicode * pdftron::PDF::Word::GetString |
( |
| ) |
|
- Returns
- the content of this word represented as a Unicode string.
◆ GetStringLen()
| int pdftron::PDF::Word::GetStringLen |
( |
| ) |
|
- Returns
- the number of characters in this word.
◆ GetStyle()
| Style pdftron::PDF::Word::GetStyle |
( |
| ) |
|
- Returns
- predominant style for this word.
◆ IsValid()
| bool pdftron::PDF::Word::IsValid |
( |
| ) |
|
- Returns
- true if this is a valid word, false otherwise.
◆ operator!=()
| bool pdftron::PDF::Word::operator!= |
( |
const Word & | | ) |
const |
◆ operator==()
| bool pdftron::PDF::Word::operator== |
( |
const Word & | | ) |
const |
The documentation for this class was generated from the following file: