Bad Data, Try/Catch, and Slow Performance

It can be tempting to use try/catch to check data integrity. This can degrade performance, especially if you’re throwing a lot of exceptions.

Say you need to parse XML that you don’t own. This XML has fields that are supposed to be int:

<property name="integerValue" value="0" />

You find out there are values like this:

<property name="integerValue" value="go ahead and try to parse me" />

If you know all of the possible values in this field, you could use the Tester-Doer pattern:

if (value == "go ahead and try to parse me") {
    entity.Cost = 0;
} else {
	entity.Cost = int.Parse(value);
}

What if you’re anticipating a variety of bad values? The Tester-Doer pattern isn’t feasible. You decide to wrap the int.Parse(value) call with try/catch:

try {
    entity.Cost = int.Parse(value);
} catch (Exception) {
    itemEntity.ItemCost = 0;
}

If you’re parsing thousands of records with a high portion of them containing bad data, your execution time can increase significantly with try/catch.

Costs of Throwing Exceptions

The overhead of throwing FormatExceptions in our try/catch block is causing performance issues. We incur a few costs for each thrown exception.

The first cost comes from simply having those extra constructs in your code: try and catch. There is also the cost of cleaning up after the exception is thrown. While these costs aren’t high because C# is managed, they aren’t free.

Another cost comes from actually throwing the exception. There are features available to us with managed exceptions. For example, we’re able to see the stack trace in an exception. This information has to be constructed, which also isn’t free (Mariani).

TryParse

The fix is to use Int32.TryParse instead of try/catch.

Int32.TryParse has this signature:

public static Boolean TryParse(String s, out Int32 result);

Int32.TryParse will not throw a FormatException if String s cannot be parsed (MSDN).

Int32.TryParse will still throw ArgumentException and OutOfMemoryException, but those may point to a larger problem than just bad user input (Richter).

Example

Let’s look at a real example. I’ve put all of my sample code here. This benchmark code takes a lot from user7116’s answer to this question on Stack Overflow.

We’ll start by constructing an XDocument with control over the amount of bad data in it:

private static XDocument BuildSampleDataBadString(double errorRate, int count) {
    Random random = new Random(1);
    string badPrefix = @"X";
    XDocument doc = new XDocument(
        new XDeclaration("1.0", "utf-8", "yes"),
        new XComment("Sample Data from Somewhere"),
        new XElement("SampleData"));
    for (int i = 0; i < count; i++) {
        string randomInput = random.Next().ToString();
        double errorSwitch = random.NextDouble();
        if (errorSwitch < errorRate) { // errorRate of .5 = ~50% bad data 
            randomInput = badPrefix + randomInput;
        }
        var el = new XElement("Item",
            new XElement("property",
                new XAttribute("name", "ItemId"),
                new XAttribute("value", i.ToString())),
            new XElement("property",
                new XAttribute("name", "ItemDescription"),
                new XAttribute("value", "ItemId: " + i + " Desc")),
            new XElement("property",
                new XAttribute("name", "ItemCode"),
                new XAttribute("value", "P123-456-" + i)),
            // Here's where the data gets corrupted
            new XElement("property",
                new XAttribute("name", "ItemCost"),
                new XAttribute("value", randomInput))
            );
        doc.Element("SampleData").Add(el);
    }
    return doc;
}

This code generate some XML where a percentage of the ItemCost elements might be corrupted:

<!--Sample Data from Somewhere-->
    <SampleData>
      <Item>
        <property name="ItemId" value="0" />
        <property name="ItemDescription" value="ItemId: 0 Desc" />
        <property name="ItemCode" value="P123-456-0" />
        <property name="ItemCost" value="X534011718" /> // BAD...
      </Item>
      <Item>
        <property name="ItemId" value="1" />
        <property name="ItemDescription" value="ItemId: 1 Desc" />
        <property name="ItemCode" value="P123-456-1" />
        <property name="ItemCost" value="1002897798" />
      </Item>
      <Item>
        <property name="ItemId" value="2" />
        <property name="ItemDescription" value="ItemId: 2 Desc" />
        <property name="ItemCode" value="P123-456-2" />
        <property name="ItemCost" value="X1412011072" /> // BAD...
      </Item>
      ...

Benchmarks

We’ll pretend that we have to parse this XML to a class called ItemEntity:

public class ItemEntity {
    public string ItemId { get; set; }
    public string ItemDescription { get; set; }
    public string ItemCode { get; set; }
    public int ItemCost { get; set; }
}

Here is a benchmark using try/catch to handle corrupt data:

...
stopwatch.Start();
foreach (var item in deserializedRawData) {
    var itemEntity = new ItemEntity();
    itemEntity.ItemId = item.GetPropertyValue("ItemId");
    itemEntity.ItemDescription = item.GetPropertyValue("ItemDescription");
    itemEntity.ItemCode = item.GetPropertyValue("ItemCode");
    try {
        itemEntity.ItemCost = int.Parse(item.GetPropertyValue("ItemCost"));
    } catch (Exception) {
        itemEntity.ItemCost = 0;
    }
    itemEntities.Add(itemEntity);
}
stopwatch.Stop();
...

Here are the results at different failure rates:

FailureRate  ExecutionTime
0.00 %       00:00:00.1036323
10.00 %      00:00:00.3779075
20.00 %      00:00:00.5898772
30.00 %      00:00:00.8113156
40.00 %      00:00:01.0096837
50.00 %      00:00:01.2773712
60.00 %      00:00:01.3969000
70.00 %      00:00:01.5673816
80.00 %      00:00:01.7495929
90.00 %      00:00:01.9442540
100.00 %     00:00:02.1412577

As failure rate increases, execution time increases.

Let’s use TryParse instead of try/catch.

...
stopwatch.Start();
foreach (var item in deserializedRawData) {
    var itemEntity = new ItemEntity();
    itemEntity.ItemId = item.GetPropertyValue("ItemId");
    itemEntity.ItemDescription = item.GetPropertyValue("ItemDescription");
    itemEntity.ItemCode = item.GetPropertyValue("ItemCode");
    int itemCost; 
    int.TryParse(item.GetPropertyValue("ItemCode"), out itemCost);
    itemEntity.ItemCost = itemCost;
    itemEntities.Add(itemEntity);
}
stopwatch.Stop();
...

Here are the results of our TryParse benchmark:

FailureRate  ExecutionTime
0.00 %	     00:00:00.0847548
10.00 %      00:00:00.1277454
20.00 %      00:00:00.1281583
30.00 %      00:00:00.0852676
40.00 %      00:00:00.0850859
50.00 %      00:00:00.0820767
60.00 %      00:00:00.0865482
70.00 %      00:00:00.0847623
80.00 %      00:00:00.0824647
90.00 %      00:00:00.0879326
100.00 %     00:00:00.0815192

As failure rate increases, execution time doesn’t increase as much as try/catch.

Finally, a comparison:

FailureRate  Try-Catch           TryParse         PerformanceDifference
0.00 %       00:00:00.1039053    00:00:00.1606829 -00:00:00.0567776
10.00 %      00:00:00.3806843    00:00:00.0919596 00:00:00.2887247
20.00 %      00:00:00.6369754    00:00:00.1064423 00:00:00.5305331
30.00 %      00:00:00.8974123    00:00:00.1160846 00:00:00.7813277
40.00 %      00:00:01.1749305    00:00:00.0971849 00:00:01.0777456
50.00 %      00:00:01.4035011    00:00:00.1057363 00:00:01.2977648
60.00 %      00:00:01.6944124    00:00:00.0971691 00:00:01.5972433
70.00 %      00:00:01.8109885    00:00:00.1128850 00:00:01.6981035
80.00 %      00:00:02.2946023    00:00:00.0966985 00:00:02.1979038
90.00 %      00:00:02.3253307    00:00:00.0876723 00:00:02.2376584
100.00 %     00:00:02.1783198    00:00:00.0834561 00:00:02.0948637

The above tests are microbenchmarks. For more info on why I’m pointing this out, check out this exchange between Jon Skeet and Rico Mariani about exceptions and performance:

Summary

We started with an issue where we have data that we need to parse. The data contains some user input that is not parse-able for a variety of reasons.

In order to cleanse the data as we parse it, we thought using a try/catch would be ok. If we don’t catch the exceptions, we’re good, right?

Turns out it kills our performance when we throw a lot of exceptions, even if we don’t catch them. Each exception has some costs. We needed to find a way to handle this data without involving exceptions.

TryParse turns out to be a method designed to solve our problem. We ran some benchmarks to prove it.

I hope this helps you maintain the performance of your app when you have to parse nasty data. As always, feel free to reach out if you need any help, sign up for my newsletter, and/or leave a comment below!


Sources

Tweet
comments powered by Disqus