in

code for eternity !!!

community website for .net freaks ;-)

Technology

Using Initial Capacity Constructor of StringBuilder for Extreme Performace

You must have come across plenty of articles on the internet which talk about using the StringBuilder class when computing large strings for performance gains. Nothing wrong with that. However I have not seen many coders using the Initial Capacity constructor of the StringBuilder class which can further improve performace.

Lets take a real world example. Suppose we had to compute a csv (comma seperated values) file for a table named Users which had the following structure:

  • UserID int
  • FirstName nvarchar(20)
  • LastName nvarchar(20)

Our csv file would look like (for 2 records):

1,Paul,Graham
2,Scott,Guthrie

Most coders would write the below code to compute the csv file:

    // Populate Users DataTable
    DataTable dtUsers = SomeFunctionWhichReturnsUsersDataTable();

    // Declare new StringBuilder
    System.Text.StringBuilder sb = new System.Text.StringBuilder();

    // Loop through Users DataTable
    for (int j = 0; j < dtUsers.Rows.Count; j++)
    {
        sb.Append(dtUsers.Rows[j]["UserID"].ToString()); // Append UserID
        sb.Append(","); // Append comma
        sb.Append(dtUsers.Rows[j]["FirstName"].ToString()); // Append FirstName
        sb.Append(","); // Append comma
        sb.Append(dtUsers.Rows[j]["LastName"].ToString()); // Append LastName
        sb.AppendLine(); // Append new line
    }

    return sb.ToString(); // Return StringBuilder contents

Now lets write the above code more intelligently by using the Initial Capacity constructor of the StringBuilder class. We can actually guess the approximate length of the csv file before hand by considering below points:

  • Max length of UserID can be 10 characters, since max value of UserID can be 2147483647, since its datatype is int
  • Max length of FirstName can be 20 characters, since its datatype is nvarchar(20)
  • Max length of LastName can be 20 characters, since its datatype is nvarchar(20)
  • Also each csv file record has 2 comma characters which act as seperators, and 1 new line character

Therefore the max length per record can be 54 characters (10[UserID] + 20[FirstName] + 20[LastName] + 2[2 comma characters] + 2[1 new line character]). So now we are absolutely sure that our csv file would have a max length of (54 * Number of Records) characters.

If you are hell bent on having just one and only one memory allocation for the StringBuilder class, go ahead and set the Initial Capacity of the StringBuilder class to (54 * Number of Rows). However doing this would more often than not result in a lot of memory wastage as not all records would have their UserID set to 10 digit integers or their FirstName and LastName set to 20 character long strings. Therefore I usually follow the divide by 2 rule where I divide the max length per record value by 2. This way I am sure there wont ever be more than 2 memory (re)allocations and more often than not just a single memory allocation would do the job. Below is the intelligent version of above code:

    // Populate Users DataTable
    DataTable dtUsers = SomeFunctionWhichReturnsUsersDataTable();

    // Set a value of 54 to maxLengthPerRecord
    int maxLengthPerRecord = 54;

    // Apply divide by 2 rule
    maxLengthPerRecord = maxLengthPerRecord / 2;

    // Compute initialCapacity value
    int initialCapacity = dtUsers.Rows.Count * maxLengthPerRecord;
   
    // Declare new StringBuilder using the Initial Capacity constructor
    System.Text.StringBuilder sb = new System.Text.StringBuilder(initialCapacity);

    // Loop through Users DataTable
    for (int j = 0; j < dtUsers.Rows.Count; j++)
    {
        sb.Append(dtUsers.Rows[j]["UserID"].ToString()); // Append UserID
        sb.Append(","); // Append comma
        sb.Append(dtUsers.Rows[j]["FirstName"].ToString()); // Append FirstName
        sb.Append(","); // Append comma
        sb.Append(dtUsers.Rows[j]["LastName"].ToString()); // Append LastName
        sb.AppendLine(); // Append new line
    }
   
    return sb.ToString(); // Return StringBuilder contents
 

You might be thinking this is too much of an effort to save on a few memory reallocations. But the geek in me tries to visualize the performance gains and number of memory reallocations (read garbage collection cycles which are so expensive) which can be saved if we had to do the same task for a table with many columns and tens of thousands of records :-)

Note: You can also consider dividing the max length per record by 3 or even 4, it all depends on your data structures and data patterns. Also the above code example uses a DataTable. However you can apply the same logic on a generic list or a DataReader as well.

Cheers,
Raj

~~~ CODING FOR ETERNITY !!! ~~~

Published May 08 2008, 01:13 PM by raj
Filed under: ,

I would really appreciate votes / kicks for this blog post if you found it useful ;-)

  kick it on DotNetKicks.com     Receive Email Updates


Comments

 

DotNetKicks.com said:

You've been kicked (a good thing) - Trackback from DotNetKicks.com

May 8, 2008 6:28 AM
 

atul said:

How many records datatable (dataset.table) can hold

May 9, 2008 5:20 AM
 

Egil Hansen said:

Interesting article. You should do some performance testing to compare the two methods, otherwise your point fall sort of short. "Guestimations" are really not the way to go when you have a title that contains the words "Extreme Performance".

May 12, 2008 6:06 AM

Leave a Comment

(required)  
(optional)
(required)  
Add


StopGlobalWarming.org  
Powered by Community Server (Non-Commercial Edition), by Telligent Systems