Tips for High Performance Collection Looping in Objective-C

A common task in Cocoa programming is to loop through a collection of objects (e.g. an NSArray, NSSet or NSDictionary). This seemingly simple problem has a wide variety of solutions, many of them with subtle performance considerations.

The Need for Speed

First, a disclaimer: The raw speed of one Objective-C method versus another is one of the last things you should worry about when programming – the difference is unlikely to be dramatic enough to matter compared with other, more important considerations, such as code clarity and readability.

But not prioritising speed is no excuse for not understanding it. You should always learn the performance implications of the code you are writing so that on the rare occasions when it does matter, you’ll know what to do.

Also, in the case of looping, much of the time it doesn’t matter which technique you choose from a readability or clarity perspective, so you might as well choose the fastest. There’s no point writing code that is slower than it needs to be.

With that in mind, here are the options:

The Classic For Loop

for (NSUInteger i = 0; i < [array count]; i++)
{
  id object = array[i];
  …
}

This is a simple and familiar way to loop through an array; it’s also pretty poor from a performance perspective. The biggest problem with this code is that we’re calling the count method each time round the loop. The count doesn’t change, so this is redundant. Normally the C compiler would optimise away something like this, but the dynamic nature of Objective-C means that method calls cannot be optimised away automatically. So, to improve performance, it’s worth storing the count in a variable before beginning the loop, like this:

NSUInteger count = [array count];
for (NSUInteger i = 0; i < count; i++)
{
  id object = array[i];
  …
}

NSEnumerator

NSEnumerator is an alternative way of looping through collections. All collections have one or more enumerator methods that return a new NSEnumerator instance each time they are called. A given NSEnumerator contains an internal pointer to the first object in the collection, and has a nextObject method that returns the current object and increments the pointer. You call it repeatedly until it returns nil, indicating the end of the collection has been reached:

id obj = nil;
NSEnumerator *enumerator = [array objectEnumerator];
while ((obj = [enumerator nextObject]));
{}

The performance of NSEnumerator is comparable to an ordinary for loop, but it’s useful because it abstracts the concept of indexing, meaning it can be used with structures such as linked lists, or even infinite sequences or data streams where the item count is unknown or undefined.

Fast Enumeration

Fast enumeration was introduced in Objective-C 2.0 as a more convenient (and faster, obviously) replacement for the traditional NSEnumerator. It doesn’t make the enumerator class obsolete because it is still useful for situations such as reverse enumeration (more on this later).

Fast enumeration adds a new enumeration method that looks like this:

- (NSUInteger)countByEnumeratingWithState:(NSFastEnumerationState *)state 
   objects:(id *)stackbuf count:(NSUInteger)len;

If you’re thinking “that doesn’t look very convenient!”, I don’t blame you. But the new method came with a new loop syntax, the for…in loop. This uses the new enumeration method behind the scenes, and significantly reduces both the syntax and performance overheads of using the traditional for loop or NSEnumerator approaches:

for (id object in array)
{}

Enumeration blocks

With the advent of blocks, Apple added a fourth enumeration mechanism based on the block syntax. This is arguably more unwieldy than fast enumeration, but does have the benefit of returning both the object and index, whereas the other enumeration methods just return the object.

Another key feature of enumeration blocks is the option of concurrent enumeration (enumerating the objects on several threads in parallel). This isn’t always useful, depending on exactly what you are doing in your loop, but for cases where you are doing a lot of work per item, and where you don’t care about the enumeration order, this can yield a significant performance boost on multicore processors (which all modern Macs and iOS devices have).

Benchmarks

So how do these methods stack up, performance-wise? Here is a simple benchmark command-line app that compares performance of enumerating an array using the various different methods. We’ve run it with ARC disabled to prevent any behind-the-scenes retain or release from distorting the results. Because this is running on a fast Mac, all these methods are so fast that we’ve actually had to use an array of 10,000,000 (ten million) objects to get measurable results. If you decide to run this test on an iPhone, it would be wise to use a much smaller count!

To compile the code:

  • Save the code in a file with the name benchmark.m
  • From terminal, compile the application:
    clang -framework Foundation benchmark.m -o benchmark
  • Run the app: ./benchmark
#import <Foundation/Foundation.h>
 
int main(int argc, const char * argv[])
{
  @autoreleasepool
  {
    static const NSUInteger arrayItems = 10000000;
 
    NSMutableArray *array = [NSMutableArray arrayWithCapacity:arrayItems];
    for (int i = 0; i < arrayItems; i++) [array addObject:@(i)];
    array = [array copy];
 
    CFTimeInterval start = CFAbsoluteTimeGetCurrent();
 
    // Naive for loop
    for (NSUInteger i = 0; i < [array count]; i++)
    {
      id object = array[i];
    }
 
    CFTimeInterval forLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"For loop: %g", forLoop - start);
 
    // Optimized for loop
    NSUInteger count = [array count];
    for (NSUInteger i = 0; i <  count; i++)
    {
      id object = array[i];
    }
 
    CFTimeInterval forLoopWithCountVar = CFAbsoluteTimeGetCurrent();
    NSLog(@"Optimized for loop: %g", forLoopWithCountVar - forLoop);
 
    // NSEnumerator
    id obj = nil;
    NSEnumerator *enumerator = [array objectEnumerator];
    while ((obj = [enumerator nextObject]))
    {
 
    }
 
    CFTimeInterval enumeratorLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"Enumerator: %g", enumeratorLoop - forLoopWithCountVar);
 
    // Fast enumeration
    for (id object in array)
    {
 
    }
 
    CFTimeInterval forInLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"For…in loop: %g", forInLoop - enumeratorLoop);
 
    // Block enumeration
    [array enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
 
    }];
 
    CFTimeInterval enumerationBlock = CFAbsoluteTimeGetCurrent();
    NSLog(@"Enumeration block: %g", enumerationBlock - forInLoop);
 
    // Concurrent enumeration
    [array enumerateObjectsWithOptions:NSEnumerationConcurrent 
      usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
 
    }];
 
    CFTimeInterval concurrentEnumerationBlock = CFAbsoluteTimeGetCurrent();
    NSLog(@"Concurrent enumeration block: %g", 
      concurrentEnumerationBlock - enumerationBlock);
  }
  return 0;
}

The results are shown below:

$ For loop: 0.115235
$ Optimized for loop: 0.083858
$ Enumerator: 0.123687
$ For…in loop: 0.00683898
$ Enumeration block: 0.282543
$ Concurrent enumeration block: 0.199684

Pay no attention to the magnitudes of these timings. What we are interested in is their relative size compared with one another. If we place them in order, fastest first, we get:

  1. For…in loop – the fastest.
  2. Optimized for loop – 12 times slower than for…in.
  3. Unoptimized for loop – 16 times slower than for…in.
  4. Enumerator – 18 times slower than for…in.
  5. Concurrent enumeration block – 29 times slower than for…in.
  6. Enumeration block – almost 41 times slower than for…in.

For…in is the standout winner. Clearly they call it fast enumeration for a reason! We don’t get any reference to the current index value when using fast enumeration, but if you need to know the index you should just declare your own index variable before the loop and increment it manually – it will be faster than using a regular for loop or enumeration block.

Concurrent enumeration seems a little faster than single-threaded, but you shouldn’t read too much into that: We’re enumerating a very large number of objects here, but for a smaller array the overhead of concurrent execution outweighs the benefits.

The primary advantage of concurrent execution comes into play when the code inside your loop takes a significant time to execute. The more work you do inside the loop, the less the speed of the looping method itself matters. If you do a lot of work inside your loop, and you don’t care about the order, consider trying parallel enumeration (but measure to see if it’s faster, don’t assume).

Other Collection Types

So what about other collection types, such as NSSet and NSDictionary? NSSet is unordered, so has no concept of fetching an object by index. We can benchmark enumeration though:

$ Enumerator: 0.415506
$ For…in loop: 0.072949
$ Enumeration block: 0.30818
$ Concurrent enumeration block: 0.382384

The results are similar to NSArray; again, for…in is the clear winner. What about NSDictionary? NSDictionary is a bit different because we have both keys and objects to iterate over. It’s possibly to just iterate over either keys or values in a dictionary, but we typically need both. Here is an adapted version of our array benchmark to handle NSDictionary:

#import <Foundation/Foundation.h>
 
int main(int argc, const char * argv[])
{
  @autoreleasepool
  {
    static const NSUInteger dictItems = 10000;
 
    NSMutableDictionary *dictionary = 
      [NSMutableDictionary dictionaryWithCapacity:dictItems];
    for (int i = 0; i < dictItems; i++) dictionary[@(i)] = @(i);
    dictionary = [dictionary copy];
 
    CFTimeInterval start = CFAbsoluteTimeGetCurrent();
 
    // Naive for loop
    for (NSUInteger i = 0; i < [dictionary count]; i++)
    {
      id key = [dictionary allKeys][i];
      id object = dictionary[key];
    }
 
    CFTimeInterval forLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"For loop: %g", forLoop - start);
 
    // Optimized for loop
    NSUInteger count = [dictionary count];
    NSArray *keys = [dictionary allKeys];
    for (NSUInteger i = 0; i <  count; i++)
    {
      id key = keys[i];
      id object = dictionary[key];
    }
 
    CFTimeInterval forLoopWithCountVar = CFAbsoluteTimeGetCurrent();
    NSLog(@"Optimized for loop: %g", forLoopWithCountVar - forLoop);
 
    // NSEnumerator
    id key = nil;
    NSEnumerator *enumerator = [dictionary keyEnumerator];
    while ((key = [enumerator nextObject]))
    {
      id object = dictionary[key];
    }
 
    CFTimeInterval enumeratorLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"Enumerator: %g", enumeratorLoop - forLoopWithCountVar);
 
    // Fast enumeration
    for (id key in dictionary)
    {
      id object = dictionary[key];
    }
 
    CFTimeInterval forInLoop = CFAbsoluteTimeGetCurrent();
    NSLog(@"For…in loop: %g", forInLoop - enumeratorLoop);
 
    // Block enumeration
    [dictionary enumerateKeysAndObjectsUsingBlock:^(id key, id obj, BOOL *stop) {
 
    }];
 
    CFTimeInterval enumerationBlock = CFAbsoluteTimeGetCurrent();
    NSLog(@"Enumeration block: %g", enumerationBlock - forInLoop);
 
    // Concurrent enumeration
    [dictionary enumerateKeysAndObjectsWithOptions:NSEnumerationConcurrent 
      usingBlock:^(id key, id obj, BOOL *stop) {
 
    }];
 
    CFTimeInterval concurrentEnumerationBlock = CFAbsoluteTimeGetCurrent();
    NSLog(@"Concurrent enumeration block: %g", 
      concurrentEnumerationBlock - enumerationBlock);
  }
  return 0;
}

NSDictionary is much slower to populate than NSArray or NSSet, so we’ve reduced the item count to 10,000 (ten thousand) to avoid locking up the machine. You should therefore ignore how the magnitudes of the results below compare with those of NSArray because we are using 1000 times fewer objects:

$ For loop: 2.23218
$ Optimized for loop: 0.00342798
$ Enumerator: 0.00524801
$ For…in loop: 0.000737965
$ Enumeration block: 0.00060904
$ Concurrent enumeration block: 0.000794947

The unoptimized for loop here is spectacularly slow because we are copying the key array each time. By storing both the keys array and count in variables we get much better speeds. The cost of object lookup now dominates the other factors, so there is less of a difference between using a for loop, NSEnumerator and for…in than for NSArray. But the enumeration block method, which returns both key and value in a single function, is now very slightly faster than for…in because it enumerates the objects more efficiently than looking each one up by key.

Reverse Gear

Base on what we’ve seen, if all other factors are equal, you should try to use for…in (aka, fast enumeration) when looping over arrays, and block enumeration when looping over dictionaries. There are some scenarios where this isn’t possible though, such as when we need to enumerate backwards, or when we want to mutate the collection as we’re going through it.

To enumerate an array backwards, we can call the reverseObjectEnumerator method to get an NSEnumerator that steps through the array from the last to the first item. NSEnumerator, like NSArray itself, supports the fast enumeration protocol. That means we can still use for…in with this approach, with no loss of speed or concision:

for (id object in [array reverseObjectEnumerator]) 
{}

(In case you are wondering, there’s no equivalent method for NSSet or NSDictionary, but enumerating an NSSet or NSDictionary backwards is meaningless anyway, since the keys are unordered.)

If you want to use block enumeration, there’s an NSEnumerationReverse option you can use, like this:

[array enumerateObjectsWithOptions:NSEnumerationReverse usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {}];

Mutation

It is possible to apply these exact same looping techniques to mutable collections; the performance is roughly the same. However, if you’ve ever tried to modify an array or dictionary whilst looping over it, you may have been confronted with this exception:

'*** Collection XYZ was mutated while being enumerated.'

Like our optimized for loop, the performance of all of these looping techniques depends on storing the item count in advance, which means it won’t work correctly if you start adding or removing items mid-loop. Looping through an array and adding, replacing or removing items is quite a common thing to want to do though. So what’s the best approach for this?

Our classic for loop should work fine since it doesn’t depend on the loop count staying constant; we just have to remember to increment or decrement the index variable if we add or remove items. But we already know that for loop is not a fast solution. Our optimised for loop is a reasonable choice, as long we remember to increment or decrement the count variable as well as the index, as needed.

We can still use for…in, but only if we make a copy of the array first. This will work, for example:

for (id object in [array copy]) 
{
  // Do something that modifies the array, e.g. [array removeObject:object];
}

If we benchmark the various techniques (including the overhead of copying the array when necessary so that we can mutate the original), we see that the copy costs for…in the advantage it previously had:

$ For loop: 0.10038
$ Optimized for loop: 0.073912
$ Enumerator: 0.302246
$ For…in loop: 0.168074
$ Enumeration block: 0.438943
$ Concurrent enumeration block: 0.388876

The fastest technique for modifying an array as we loop over it seems to be to use an optimised for loop.

For an NSDictionary, we don’t need to copy the whole dictionary in order to use NSEnumerator or fast enumeration; we can just get a copy of the keys using the allKeys method. This will both work perfectly well:

// NSEnumerator
id key = nil;
NSEnumerator *enumerator = [[items allKeys] objectEnumerator];
while ((key = [enumerator nextObject]))
{
  id object = items[key];
  // Do something that modifies the value, e.g. dictionary[key] = newObject;
}
 
// Fast enumeration
for (id key in [dictionary allkeys]) 
{
  id object = items[key];
  // Do something that modifies the value, e.g. dictionary[key] = newObject;
}

The same technique doesn’t work when using the enumerateKeysAndObjectsUsingBlock method though. If we benchmark looping through a dictionary, making copies of the either the keys or dictionary as a whole as required, we get this:

$ For loop: 2.42045
$ Optimized for loop: 0.002958
$ Enumerator: 0.00495702
$ For…in loop: 0.000959992
$ Enumeration block: 0.00147301
$ Concurrent enumeration block: 0.001531

Here we can see that the for…in loop is the fastest option. That’s because the overhead of fetching the object by key in the for…in loop is now overshadowed by the cost of copying the dictionary before calling the enumeration block method.

Conclusions

When enumerating an NSArray:

  • Use for (id object in array) if enumerating forwards.
  • Use for (id object in [array reverseObjectEnumerator]) if enumerating backwards.
  • If you need to know the index value, maintain your own index variable that you increment inside the loop.
  • Try [array enumerateObjectsWithOptions:usingBlock:] if your code might benefit from parallel execution.

When enumerating an NSSet:

  • Use for (id object in set) most of the time.
  • Use for (id object in [set copy]) if you need to modify the set (but it will be slow).
  • Try [set enumerateObjectsWithOptions:usingBlock:] if your code might benefit from parallel execution.

When enumerating an NSDictionary

  • Use [dictionary enumerateKeysAndObjectsUsingBlock:] most of the time.
  • Use for (id key in [dictionary allKeys]) if you need to modify the dictionary.
  • Try [dictionary enumerateKeysAndObjectWithOptions:usingBlock:] if your code might benefit from parallel execution.

Not only are these methods the fastest available, but they’re also all very clear and readable. So remember, sometimes it’s not a choice between writing clean code and fast code; you may find that you can get the best of both worlds.


Nick Lockwood is the author of iOS Core Animation: Advanced Techniques. Nick also wrote iCarousel, iRate and other Mac and iOS open source projects.
  1. Very helpful. Helped me reduce processing time from about a minute to under 10 seconds, by using fast enumeration. Thanks for sharing!

Comments are closed.