Simple Screen Scraping

Posted: (EET/GMT+2)

 

Simple Screen Scraping

August 8, 2001

Delphi 6

Screen scraping is a technique for "grabbing" text from the screen to another application, or simple converting character images on the screen back to text. Screen scraping is often used to convert old terminal applications to the web by running the terminal software in the background and letting a web front-end do all the manipulation of the older application.

In cases you only need simple screen scraping techniques, you can easily use the built-in Delphi VCL components to convert character images on a canvas to text. The method detailed in this article is probably too slow for real world applications, but it provides a base foundation for the simple problems you might have.

The following image shows the main screen of the application with an image of ol' DOS prompt loaded into view.

To initialize text recognition, you first need to point your mouse cursor over the first character in the upper left corner. Type in the first character to the text box left of the Init button, and select the appropriate colors to match the image. Click the Init button. Move the mouse over the image until the two enlarged images ("Original image" and "Matching char") equal.

Note that the font name and font size must also be set correctly before clicking the Init button. Once a match is found, the following message box is displayed:

Click OK to dismiss the message box.

The code behind the magic

Without doubt you are interested in how this all works. Here's what happens in response to the Init button:

procedure TScraperMainForm.InitializeClick(Sender: TObject);
begin
  With MatchImage.Canvas do Begin
    Pen.Color := BackgroundColor.Selected;
    Brush.Color := BackgroundColor.Selected;
    Rectangle(0,0,MatchImage.Width,MatchImage.Height);
    { change font }
    Font.Name := FontName.Text;
    Font.Size := StrToInt(FontSize.Text);
    Font.Color := TextColor.Selected;
    TextOut(0,0,InitCharToMatch.Text);
    FontWidth.Caption := IntToStr(TextWidth('W'));
    FontHeight.Caption := IntToStr(TextHeight('W'));
  End;
  With ImageA.Canvas do Begin
    Font.Color := TextColor.Selected;
    Pen.Color := BackgroundColor.Selected;
    Brush.Color := BackgroundColor.Selected;
    Rectangle(0,0,ImageA.Width,ImageA.Height);
  End;
  With ImageB.Canvas do Begin
    Font.Color := TextColor.Selected;
    Pen.Color := BackgroundColor.Selected;
    Brush.Color := BackgroundColor.Selected;
    Rectangle(0,0,ImageB.Width,ImageB.Height);
  End;
  Instructions.Visible := False;
  Instructions2.Visible := True;
  ScreenImage.Cursor := crCross;
  Initializing := True;
end;

Here, the helper TImage objects and their TCanvas objects are set to match the font and color settings in the UI. This allows easy pixel matching on the canvases. The actual pixel matching is done in the following function:

Function TScraperMainForm.MatchCharacter(CanvasA,CanvasB : TCanvas;
  AX,AY,BX,BY : Integer; TheChar : Char = #0) : Char;
Const MatchingSet : Set of Char = [' ','A'..'Z','a'..'z','0'..'9',',','.',
                                   '-','=',':',';','?','!','"','''',
                                   '(',')','[',']','<','>','/','\'];
Var
  CharToMatch  : Char;
  Width,Height : Integer;

  Function TryMatch : Boolean;
  Var
    X,Y : Integer;
    A,B : TColor;

  Begin
    Result := True; { default: match }
    For X := 0 to (Width-1) do Begin
      For Y := 0 to (Height-1) do Begin
        A := CanvasA.Pixels[AX+X,AY+Y];
        B := CanvasB.Pixels[BX+X,BY+Y];
        If (A <> B) Then Result := False; { no match }
        { zoom in }
        ImageA.Canvas.Pen.Color := A;
        ImageA.Canvas.Brush.Color := A;
        ImageA.Canvas.Rectangle(X*4,Y*4,X*4+4,Y*4+4);
        ImageB.Canvas.Pen.Color := B;
        ImageB.Canvas.Brush.Color := B;
        ImageB.Canvas.Rectangle(X*4,Y*4,X*4+4,Y*4+4);
      End;
    End;
  End;

begin
  Width := StrToInt(FontWidth.Caption);
  Height := StrToInt(FontHeight.Caption);
  If (TheChar <> #0) Then Begin
    If TryMatch Then Result := TheChar
    Else Result := #0;
  End
  Else Begin
    Result := #0; { no match }
    { try matching whole set }
    For CharToMatch := #32 to #127 do Begin
      If (CharToMatch In MatchingSet) Then Begin
        CanvasB.TextOut(0,0,CharToMatch);
        Application.ProcessMessages;
        If Application.Terminated Then Exit;
        If TryMatch Then Begin
          Result := CharToMatch;
          Exit;
        End;
      End;
    End;
  End;
end;

Depending on the situation, this function is able to test whether the pixels in the given canvas match the pixels in another canvas. Also, the function can iterate through all "valid" characters (the MatchingSet constant) to see if a match can be found. Note how the code relies on the fact that the canvas objects have already been initialized with correct fonts and colors.

Matching text

Once the program has been initialized, the main window expands to include a memo box and a Go button (see below).

When the user clicks the Go button, the following code is executed:

procedure TScraperMainForm.MatchGoClick(Sender: TObject);
Var
  X,Y,W,H,AX,AY     : Integer;
  CW,CH,CurrX,CurrY : Integer;
  AChar             : Char;

begin
  W := StrToInt(CharsWide.Text);
  H := StrToInt(CharsHigh.Text);
  AX := StrToInt(XPos.Caption);
  AY := StrToInt(YPos.Caption);
  CW := StrToInt(FontWidth.Caption);
  CH := StrToInt(FontHeight.Caption);
  { start }
  MatchText.Lines.Text := '';
  ProgressBar.Position := 0;
  ProgressBar.Max := H*W;
  For Y := 0 to H-1 do Begin
    For X := 0 to W-1 do Begin
      CurrX := AX+(X*CW);
      CurrY := AY+(Y*CH);
      CurrPosPix.Caption := '('+IntToStr(CurrX)+','+IntToStr(CurrY)+')';
      CurrPosChar.Caption := '('+IntToStr(X+1)+','+IntToStr(Y+1)+')';
      Application.ProcessMessages;
      AChar := MatchCharacter(ScreenImage.Canvas,MatchImage.Canvas,CurrX,CurrY,0,0);
      If (AChar <> #0) Then MatchText.Lines.Text := MatchText.Lines.Text+AChar
      Else MatchText.Lines.Text := MatchText.Lines.Text+'*';
      ProgressBar.StepIt;
    End;
    MatchText.Lines.Text := MatchText.Lines.Text+#13#10;
  End;
  ShowMessage('Done!');
  ProgressBar.Position := 0;
end;

Here, the code simply relies on the fact that all characters are equally sized (non-proportional or fixed-size) and loops through all those characters. For each character, the character is tested for a match. If successful, the character is added to the memo. In case the character is an unrecognized one, a star (*) is added.

As you can see, screen scraping needn't be difficult. Of course, the code has its problems (unflexible, not fast enough), but it will do for simple situations; just don't expect full OCX functionality. With the sample images, you can test Notepad and command prompt (CMD.EXE on Windows 2000) text recognition, but you could also try Delphi's code editor, HyperTerminal or Telnet!

Download the example code

Download simplescreenscraping.zip (248 kB) which contains the sample screen scraping application developed in this article. Please note that the sample application was created with Delphi 6 and might not work with older versions (it most probably will work anyhow).

The ZIP archive also contains two sample images to play with. Their fonts are detailed in Fonts.txt.

* * *

Need help converting your legacy applications to the web age? Let us help!