Table of Contents

Imagine you’re developing a tool that needs to scan for file changes across thousands of project files. Retrieving file attributes efficiently becomes critical for such scenarios. In this article, I’ll demonstrate a technique to get file attributes that can achieve a surprising speedup of over 50+ times compared to standard Windows methods.

Let’s dive in and explore how we can achieve this.

Inspiration & Disclaimer  

The inspiration for this article came from a recent update for Visual Assist - a tool that heavily improves Visual Studio experience and productivity for C# and C++ developers.

In one of their blog post, they shared:

The initial parse is 10..15x faster!

What’s New in Visual Assist 2024—Featuring lightning fast parser performance [Webinar] - Tomato Soup

After watching the webinar, I noticed some details about efficiently getting file attributes and I decided to give it a try on my machine. In other words I tried to recreate their results.

Disclaimer: This post was written with the support and sponsorship of Idera, the company behind Visual Assist.

Understanding File Attribute Retrieval Methods on Windows  

On Windows, there are at least a few options to check for a file change:

  • FindFirstFile[EX]
  • GetFileAttributesEx
  • std::filesystem

Below, you can see some primary usage of each approach:

FindFirstFileEx  

FindFirstFileEx is a Windows API function that allows for efficient searching of directories. It retrieves information about files that match a specified file name pattern. The function can be used with different information levels, such as FindExInfoBasic and FindExInfoStandard, to control the amount of file information fetched.

WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(), FindExInfoBasic, &findFileData, FindExSearchNameMatch, NULL, 0);

if (hFind != INVALID_HANDLE_VALUE) {
    do {
        // Process file information
    } while (FindNextFile(hFind, &findFileData) != 0);
    FindClose(hFind);
}

GetFileAttributesEx  

GetFileAttributesEx is another Windows API function that retrieves file attributes for a specified file or directory. Unlike FindFirstFileEx, which is used for searching and listing files, GetFileAttributesEx is typically used for retrieving attributes of a single file or directory.

WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
if (GetFileAttributesEx((directory + "\\" + fileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
    // Process file attributes
}

std::filesystem

Introduced in C++17, the std::filesystem library provides a modern and portable way to interact with the file system. It includes functions for file attribute retrieval, directory iteration, and other common file system operations.

for (const auto& entry : fs::directory_iterator(directory)) {
    if (entry.is_regular_file()) {
        // Process file attributes
        auto ftime = fs:last_write_time(entry);
        ...
    }
}

The Benchmark  

To evaluate the performance of different file attribute retrieval methods, I developed a small benchmark. This application measures the time taken by each method to retrieve file attributes for N number of files in a specified directory.

Here’s a rough overview of the code:

The FileInfo struct stores the file name and last write time.

struct FileInfo {
    std::string fileName;
    FILETIME lastWriteTime;
};

Each retrieval technique will have to go over a directory and build a vector of FileInfo objects.

BenchmarkFindFirstFileEx

void BenchmarkFindFirstFileEx(const std::string& directory,     
                              std::vector<FileInfo>& files, 
                              FINDEX_INFO_LEVELS infoLevel) 
{
   WIN32_FIND_DATA findFileData;
   HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(),
                                   infoLevel, 
                                   &findFileData, 
                                   FindExSearchNameMatch, NULL, 0);

   if (hFind == INVALID_HANDLE_VALUE) {
       std::cerr << "FindFirstFileEx failed (" 
                 << GetLastError() << ")\n";
       return;
   }

   do {
       if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
           FileInfo fileInfo;
           fileInfo.fileName = findFileData.cFileName;
           fileInfo.lastWriteTime = findFileData.ftLastWriteTime;
           files.push_back(fileInfo);
       }
   } while (FindNextFile(hFind, &findFileData) != 0);

   FindClose(hFind);
}

BenchmarkGetFileAttributesEx

void BenchmarkGetFileAttributesEx(const std::string& directory,
                                  std::vector<FileInfo>& files) 
{
   WIN32_FIND_DATA findFileData;
   HANDLE hFind = FindFirstFile((directory + "\\*").c_str(),
                                &findFileData);

   if (hFind == INVALID_HANDLE_VALUE) {
       std::cerr << "FindFirstFile failed (" 
                 << GetLastError() << ")\n";
       return;
   }

   do {
       if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
           WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
           if (GetFileAttributesEx((directory + "\\" + findFileData.cFileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
               FileInfo fileInfo;
               fileInfo.fileName = findFileData.cFileName;
               fileInfo.lastWriteTime = fileAttributeData.ftLastWriteTime;
               files.push_back(fileInfo);
           }
       }
   } while (FindNextFile(hFind, &findFileData) != 0);

   FindClose(hFind);
}

BenchmarkStdFilesystem

And the last one, the most portable technique:

void BenchmarkStdFilesystem(const std::string& directory, 
                            std::vector<FileInfo>& files) 
{
    for (const auto& entry : std::filesystem::directory_iterator(directory)) {
        if (entry.is_regular_file()) {
            FileInfo fileInfo;
            fileInfo.fileName = entry.path().filename().string();
            auto ftime = std::filesystem::last_write_time(entry);
            memcpy(&fileInfo.lastWriteTime, &ftime, sizeof(FILETIME));
            files.push_back(fileInfo);
        }
    }
}

In the code, we use the assumption that file_time_type values maps to FILETIME on Windows. Read more in this explanation std::filesystem::file_time_type does not allow easy conversion to time_t - Developer Community

The Main Function

The main function sets up the benchmarking environment, runs the benchmarks, and prints the results.

// Benchmark FindFirstFileEx (Basic)
auto start = std::chrono::high_resolution_clock::now();
BenchmarkFindFirstFileEx(directory, 
                         filesFindFirstFileExBasic, 
                         FindExInfoBasic);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsedFindFirstFileExBasic = end - start;

// Benchmark FindFirstFileEx (Standard)
start = std::chrono::high_resolution_clock::now();
BenchmarkFindFirstFileEx(directory, 
                         filesFindFirstFileExStandard, 
                         FindExInfoStandard);
end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsedFindFirstFileExStandard = end - start;

// ...

This benchmark code measures the performance of FindFirstFileEx with both FindExInfoBasic and FindExInfoStandard, GetFileAttributesEx, and std::filesystem. The results are then formatted and displayed in a table.

Performance Results  

To measure the performance of each file attribute retrieval method, I executed benchmarks on a directory containing 1000, 2000 or 5000 random text files. The tests were performed on a laptop equipped with an Intel i7 4720HQ CPU and an SSD. I measured the time taken by each method and compared the results to determine the fastest approach.

Each test run consisted of two executions: the first with uncached file attributes and the second likely benefiting from system-level caching.

The speedup factor is the factor of the current result compared to the slowest technique in a given run.

1000 files:

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0131572000         17.876
FindFirstFileEx (Standard)     0.0018139000         129.665
GetFileAttributesEx            0.2351992000         1.000
std::filesystem                0.0607928000         3.869

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0009740000         61.956
FindFirstFileEx (Standard)     0.0009998000         60.358
GetFileAttributesEx            0.0602633000         1.001
std::filesystem                0.0603455000         1.000

Directory with 2000 files:

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0023182000         54.402
FindFirstFileEx (Standard)     0.0044334000         28.446
GetFileAttributesEx            0.1261137000         1.000
std::filesystem                0.1259038000         1.002

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0022301000         55.417
FindFirstFileEx (Standard)     0.0040665000         30.391
GetFileAttributesEx            0.1235858000         1.000
std::filesystem                0.1220140000         1.013

Directory with 5000 random, small text files:

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0059723000         113.144
FindFirstFileEx (Standard)     0.0125500000         53.843
GetFileAttributesEx            0.6757297000         1.000
std::filesystem                0.3098593000         2.181

Method                         Time (seconds)       Speedup Factor
FindFirstFileEx (Basic)        0.0060349000         52.300
FindFirstFileEx (Standard)     0.0136566000         23.112
GetFileAttributesEx            0.3156277000         1.000
std::filesystem                0.3075732000         1.026

The results consistently showed that FindFirstFileEx with the Standard flag was the fastest method in uncached scenarios, offering speedups up to 129x compared to GetFileAttributesEx. However, in cached scenarios, FindFirstFileEx (Basic and Standard) achieved over 50x speedup improvements.

For the directory with 2000 files, FindFirstFileEx (Basic) demonstrated a speedup factor of over 54x in the first run and maintained similar performance in the second run. In the directory with 5000 files, the Basic version achieved an impressive 113x speedup initially and 52x in the subsequent run, reflecting the impact of caching. Notably, std::filesystem performed on par with GetFileAttributesEx.

Further Techniques  

Getting file attributes is only part of the story, and while important, they may contribute to only a small portion of the overall performance for the whole project. The Visual Assist team, who contributed to this article, improved their initial parse time performance by avoiding GetFileAttributes[Ex] using the same techniques as this article. But Visual Assist also improved performance through further techniques. My simple benchmark showed 50x speedups, but we cannot directly compare it with the final Visual Assist, as the tool does many more things with files.

The main item being optimised was the initial parse, where VA builds a symbol database when a project is opened for the first time. This involves parsing all code and all headers. They decided that it’s a reasonable assumption that headers won’t change while a project is being loaded, and so the file access is cached during the initial parse, avoiding the filesystem entirely. (Changes after a project has been parsed the first time are, of course, still caught.) The combination of switching to a much faster method for checking filetimes and then avoiding file IO completely contributed to the up-to-15-times-faster performance improvement they saw in version 2024.1 at the beginning of this year.

Read further details on their blog Visual Assist 2024.1 release post - January 2024 and Catching up with VA: Our most recent performance updates - Tomato Soup.

Summary  

In the text, we went through a benchmark that compares several techniques for fetching file attributes. In short, it’s best to gather attributes at the same time as you iterate through the directory - using FindFirstFileEx. So if you want to do this operation hundreds of times, it’s best to measure time and choose the best technique.

The benchmark also showed one feature: while C++17 and its filesystem library offer a robust and standardized way to work with files and directories, it can be limited in terms of performance. In many cases, if you need super optimal performance, you need to open the hood and work with the specific operating system API.

The code can be found in my Github Respository: FileAttribsTest.cpp

Back to you

  • Do you use std::filesystem for tasks involving hundreds of files?
  • Do you know other techniques that offer greater performance when working with files?

Share your comments below.