How To: Searching and scanning files
Posted: (EET/GMT+2)
How To: Searching and scanning files
Level: Beginner to Intermediate
The ability to search for files is a basic requirement for a computer with
a complex file system. For example, a small 2 gigabyte hard disk can
contain tens of thousands of files, so it's no miracle if you forgot where
you put one of them.
Also, it is often necessary to find a file that contains a particular string. For example, I often need to find files which contain simple Double Byte Character Strings (DBCS's). For example 32-bit EXE files contain such characters in their resource strings. Using Windows 95's Find command doesn't work, because if I enter "ABC", the command will scan for the exact string. But, in the EXE file, "ABC" is stored as "#0A#0B#0C", where "#0" is a null character.
For this reason, I sat down for where hours, and typed in the following program. And, it works... ;-)
Searching for files
In the Win32 API, there are functions to enumerate files and directories in a single directory, but no commands to search for files in a whole drive. Thus, we need some programming, and I mean recursive programming.
Recursive functions are functions, that call themselves. Recursion is never absolutely necessary, but sometimes elegant algorigthms are recursive. One example is the searching of files.
Our algorithm is:
-
1. Scan given directory for all files & dirs (ScanDir)
2. Call ourself (ScanDir) recursively for any found directories
3. Scan given directory for files matching filespec
4. [Optional] Scan for any file matching specs for the given string.
ScanDir in practice
The following is the implementation of the ScanDir function:
Procedure ScanDir(Dir : String);
Var
DirSearch : THandle;
DirFD : TWin32FindData;
S : String;
Begin
DirSearch := FindFirstFile(PChar(Dir+'*.*'),DirFD);
If (DirSearch <> Invalid_Handle_Value) Then Begin
Repeat
Inc(Scanned);
If (((DirFD.dwFileAttributes And File_Attribute_Directory) <> 0) And
(DirFD.cFileName[0] <> '.')) Then Begin
S := Dir+DirFD.cFileName;
If (S[Length(S)] <> '\') Then S := S+'\';
ScanDir(S);
ScanFiles(S);
End;
Until (Not FindNextFile(DirSearch,DirFD));
End;
FindClose(DirSearch);
End;
Here, the directory to be searched for is initially the root directory (\),
but as the function gets called recursively, the directory changes.
To find files using Win32 API calls, we must first initialize a search record. This is done using FindFileFirst. As parameters, we give what we want to search for. Subsequent search calls (FindFileNext) don't need such parameters.
To find a directory, we must scan for all files with the "*.*" wildcard. Of course, this will find all files too, but we can separate files from directories by checking the file attributes. Also, we need to avoid the "current" and "parent" directories ("." and "..") as well.
After we've found a directory, a recursive call occurs. We pass the newly found directory as the new "base" directory, so the new recursion doesn't check for the same files as the previous level.
Searching for files
After directories, we obviously need to scan for the files. Remember, the directories were scanned using the "*.*" wildcard. But, the user can specify another wildcard for the files, for example "*.txt". For this reason, we need an additional bundle of FindFirstFile/FindNextFile/FindClose calls. The code looks like this:
Procedure ScanFiles(Dir : String);
Var
FileSearch : THandle;
FileFD : TWin32FindData;
Begin
FileSearch := FindFirstFile(PChar(Dir+FileSpec),FileFD);
If (FileSearch <> Invalid_Handle_Value) Then Begin
Repeat
With FileFD do Begin
If ((dwFileAttributes And File_Attribute_Directory) = 0) Then Begin
If ((ScanLen <> 0) And (Not ScanFileForText(Dir+cFileName))) Then
Break;
Inc(Matching);
If (cAlternateFileName[0] = #0) Then WriteLn(Dir,cFileName)
Else WriteLn(cAlternateFilename,' ',Dir,cFileName);
End;
End;
Until (Not FindNextFile(FileSearch,FileFD));
End;
FindClose(FileSearch);
End;
As you can see, the only difference compared to ScanDir is that we need to
check if the user entered a string to search for (if so, ScanLen does not
equal zero). Also, note the result output using WriteLn. Sometimes you need
to get the short (8.3) filename of a long filename, for example because a
DOS program needs one. The cAlternateFilename field of the TWin32FindData
record gives this name.
Searching for strings
The only thing left is the searching of strings. If the user gave a string to search for on the command line, we convert the string to a DBCS string by adding the null char (#0) in front of every char the user gave. For example, if the user enters "XY", our program seraches for "#0X#0Y". The code to search for a string in a file is:
Function ScanFileForText(Filename : String) : Boolean;
Var
F : File;
D : PChar; { data }
I,J,S : Integer;
Begin
Result := True;
J := 0; S := 0; D := nil;
Try
Assign(F,Filename);
{$I-}
Reset(F,1);
{$I+}
If (IOResult <> 0) Then Begin
WriteLn('Can''t scan: ',Filename);
Result := False;
Exit;
End;
S := FileSize(F);
GetMem(D,S);
BlockRead(F,D^,S);
Close(F);
For I := 0 to S-1 do Begin
If (D[I] = ScanString[J+1]) Then Begin
Inc(J);
If (J = ScanLen-1) Then Exit; { found }
End
Else Begin
J := 0;
If (D[I] = ScanString[1]) Then Inc(J);
End;
End;
Finally
FreeMem(D,S);
End;
Result := False;
End;
First, we check if we can open the given file. Note that we disable
exception handling with the "$I-" compiler directive, and handle the errors
(IOResult) ourselves. For example, an opening operation can fail, if
another program has the file open.
Next, we just scan the file sequentally to see if the user-specified string
occurs in the file. Of course, the search algorithm used is the worst
(slowest) available, but it is the easiest to implement. For example, one
could use the Boyer-Moore algorithm instead of the "brute-force" used here.
Conclusion
Having read this article, you should be able to create your own file search engines, with or without the file scanning. The example here is very simple, for example there is no multiple drive support. Also, a nice GUI is missing. But, download the code, and examine it. Then extend it!