StructuredText

StructuredText objects hold text from a page that has been analyzed and grouped into blocks, lines and spans. To obtain a StructuredText instance use Page toStructuredText().

Instance methods

Search the text for all instances of needle, and return an array with all matches found on the page.

Each match in the result is an array containing one or more QuadPoints that cover the matching text.

Arguments:
  • needleString.

Returns:

[...].

EXAMPLE

var result = sText.search("Hello World!");
highlight(p, q)

Return an array with rectangles needed to highlight a selection defined by the start and end points.

Arguments:
  • p – Start point in format [x,y].

  • q – End point in format [x,y].

Returns:

[...].

EXAMPLE

var result = sText.highlight([100,100], [200,100]);
copy(p, q)

Return the text from the selection defined by the start and end points.

Arguments:
  • p – Start point in format [x,y].

  • q – End point in format [x,y].

Returns:

String.

EXAMPLE

var result = sText.copy([100,100], [200,100]);
walk(walker)

wasm only

Walk through the blocks (images or text blocks) of the structured text. For each text block walk over its lines of text, and for each line each of its characters. For each block, line or character the walker will have a method called.

EXAMPLE

var stext = pdfPage.toStructuredText();
stext.walk({
    beginLine: function (bbox, wmode, direction) {
        console.log("beginLine", bbox, wmode, direction);
    },
    beginTextBlock: function (bbox) {
        console.log("beginTextBlock", bbox);
    },
    endLine: function () {
        console.log("endLine");
    },
    endTextBlock: function () {
        console.log("endTextBlock");
    },
    onChar: function (utf, origin, font, size, quad, color) {
        console.log("onChar", utf, origin, font, size, quad, color);
    },
    onImageBlock: function (bbox, transform, image) {
        console.log("onImageBlock", bbox, transform, image);
    },
});

Note

On beginLine the direction parameter is a vector (e.g. [0, 1]) and can you can calculate the rotation as an angle with some trigonometry on the vector.

asJSON()

wasm only

Returns the instance in JSON format.

Returns:

String.

EXAMPLE

var json = sText.asJSON();