You must have come across plenty of articles on the internet which talk about using the StringBuilder class when computing large strings for performance gains. Nothing wrong with that. However I have not seen many coders using the Initial Capacity constructor of the StringBuilder class which can further improve performace.
Lets take a real world example. Suppose we had to compute a csv (comma seperated values) file for a table named Users which had the following structure:
- UserID int
- FirstName nvarchar(20)
- LastName nvarchar(20)
Our csv file would look like (for 2 records):
1,Paul,Graham
2,Scott,Guthrie
Most coders would write the below code to compute the csv file:
// Populate Users DataTable
DataTable dtUsers = SomeFunctionWhichReturnsUsersDataTable();
// Declare new StringBuilder
System.Text.StringBuilder sb = new System.Text.StringBuilder();
// Loop through Users DataTable
for (int j = 0; j < dtUsers.Rows.Count; j++)
{
sb.Append(dtUsers.Rows[j]["UserID"].ToString()); // Append UserID
sb.Append(","); // Append comma
sb.Append(dtUsers.Rows[j]["FirstName"].ToString()); // Append FirstName
sb.Append(","); // Append comma
sb.Append(dtUsers.Rows[j]["LastName"].ToString()); // Append LastName
sb.AppendLine(); // Append new line
}
return sb.ToString(); // Return StringBuilder contents
Now lets write the above code more intelligently by using the Initial Capacity constructor of the StringBuilder class. We can actually guess the approximate length of the csv file before hand by considering below points:
- Max length of UserID can be 10 characters, since max value of UserID can be 2147483647, since its datatype is int
- Max length of FirstName can be 20 characters, since its datatype is nvarchar(20)
- Max length of LastName can be 20 characters, since its datatype is nvarchar(20)
- Also each csv file record has 2 comma characters which act as seperators, and 1 new line character
Therefore the max length per record can be 54 characters (10[UserID] + 20[FirstName] + 20[LastName] + 2[2 comma characters] + 2[1 new line character]). So now we are absolutely sure that our csv file would have a max length of (54 * Number of Records) characters.
If you are hell bent on having just one and only one memory allocation for the StringBuilder class, go ahead and set the Initial Capacity of the StringBuilder class to (54 * Number of Rows). However doing this would more often than not result in a lot of memory wastage as not all records would have their UserID set to 10 digit integers or their FirstName and LastName set to 20 character long strings. Therefore I usually follow the divide by 2 rule where I divide the max length per record value by 2. This way I am sure there wont ever be more than 2 memory (re)allocations and more often than not just a single memory allocation would do the job. Below is the intelligent version of above code:
// Populate Users DataTable
DataTable dtUsers = SomeFunctionWhichReturnsUsersDataTable();
// Set a value of 54 to maxLengthPerRecord
int maxLengthPerRecord = 54;
// Apply divide by 2 rule
maxLengthPerRecord = maxLengthPerRecord / 2;
// Compute initialCapacity value
int initialCapacity = dtUsers.Rows.Count * maxLengthPerRecord;
// Declare new StringBuilder using the Initial Capacity constructor
System.Text.StringBuilder sb = new System.Text.StringBuilder(initialCapacity);
// Loop through Users DataTable
for (int j = 0; j < dtUsers.Rows.Count; j++)
{
sb.Append(dtUsers.Rows[j]["UserID"].ToString()); // Append UserID
sb.Append(","); // Append comma
sb.Append(dtUsers.Rows[j]["FirstName"].ToString()); // Append FirstName
sb.Append(","); // Append comma
sb.Append(dtUsers.Rows[j]["LastName"].ToString()); // Append LastName
sb.AppendLine(); // Append new line
}
return sb.ToString(); // Return StringBuilder contents
You might be thinking this is too much of an effort to save on a few memory reallocations. But the geek in me tries to visualize the performance gains and number of memory reallocations (read garbage collection cycles which are so expensive) which can be saved if we had to do the same task for a table with many columns and tens of thousands of records :-)
Note: You can also consider dividing the max length per record by 3 or even 4, it all depends on your data structures and data patterns. Also the above code example uses a DataTable. However you can apply the same logic on a generic list or a DataReader as well.
Cheers,
Raj
~~~ CODING FOR ETERNITY !!! ~~~