.NET Locking Performance

Just a quick overview over the different lock types and their performance in .NET.

For this test, the following method was called as fast as possible for 4 seconds:

private void TestMethod() {
  lock (this) { // this locking is replaced depending on the locking type
    counter++;
  }
}

Here are the results:

Locking Type Calls per second Factor
No locking (fastest possible) 470,972,276 19.61
Interlocked.CompareExchange 62,439,529 2.60
lock keyword 37,554,119 1.56
SpinLock (without owner tracking) 34,489,245 1.44
ReaderWriterLockSlim with LockRecursionPolicy.NoRecursion 25,214,451 1.05
ReaderWriterLockSlim with LockRecursionPolicy.SupportsRecursion 24,013,488 1.00

Full source code: Program.cs

SQLite Performance (RFC)

I’m currently working on a cross-platform SQLite .NET wrapper. At the moment it’s not really thread-safe. So, I was looking for ways of making it thread-safe.

Basically, there are two ways to do this:

  1. Share a single connection among all threads and use .NET locking mechanisms.
  2. Let each thread have its own connection (thus no .NET locking would be required).

To be able to make this decision, I did some performance tests and – assuming I did them right – got some interesting results you can read after the break.

The Setup

The tests ran on a Notebook featuring an Intel Core 2 Duo (2.53 GHz) and 8 GB memory. The OS was Windows 7 x64.

There are 40 tests in the suite testing various combinations of the available test parameters (see below).

Each test was executed for 1 to 20 threads to test concurrency.

Each test for a certain thread count (1, 2, 3, …) ran 30 seconds and was repeated 10 times.

Thus, the overall test duration was about 2 days.

In each test scenario, the SQLite database contained only one table with four columns. For SELECT tests this table was filled with 50,000 rows of random data.

There are two kinds of concurrent access:

  • Read access: Simulated by repeatedly selecting a random row from the table and reading all four values.
  • Write access: Simulated by inserting random values into the table.

The first batch of tests simulated read access, the second batch simulated write access, and the third batch simulated both concurrently.

Note: In all tests, the CPU was the limiting factor – not the hard drive.

Test Parameters

Each test comprises of a certain combination of the following test parameters:

  • Shared connection vs. multi connection: Whether all threads share the same database connection, or whether every thread has its own connection (to the same database though). Shared connections use SQLITE_OPEN_FULLMUTEX (serialized), multi connections use SQLITE_OPEN_NOMUTEX (multithread).
  • Read-only: Whether the connection is opened in read-only or read-write mode (SQLITE_OPEN_READONLY).
  • Shared cache: Whether all connections share the same cache (SQLITE_OPEN_SHAREDCACHE), or whether each connection has its own cache.
  • WAL: Whether the connection(s) use a database in WAL (write-ahead logging) journal mode.
  • Filled table: Whether the table to read from is empty or filled (not examined in this report due to missing data; I should mention though that trying to read from an empty table is significant slower than reading from a filled table).

Batch 1: read tests

Let’s start with the tests only reading data (i.e. no data is written during these tests). Each thread randomly reads a data row and then obtains all four values stored in it. This is repeated for 30 seconds.

select-statements.csv (file containing data for charts in this section)

Test: read-only

First test is about whether opening a database connection in read-only mode (SQLITE_OPEN_READONLY) does result in any performance benefit.

: read-write
: read-only
Shared connection - read-only: yes/no
Multi connection - read-only: yes/no

As you can see, there’s no benefit from choosing a read-only connection over a read-write connection (but it doesn’t hurt either).

Test: shared cache

Next, let’s check whether using a shared cache (SQLITE_OPEN_SHAREDCACHE) affects read performance.

, : no shared cache
, : use shared cache
select-sh-con-shared-cache.png
select-mul-con-shared-cache.png

For a shared connection (first chart) you can clearly see that using a shared cache is never better than using a private cache. The same is true for multiple connection (second chart).

Test: WAL

Next, we test the use of WAL (write-ahead logging). WAL is (suppose to be) bringing a performance benefit for concurrent write operations (which we don’t have here).

: no WAL
, : use WAL
select-sh-con-wal.png
select-mul-con-wal.png

As you can see, with few threads, using WAL for read operations results in a big performance boost (400% for shared connection, 700% for multi connections). However, when using a shared connection and more than 8 threads, WAL doesn’t provide any performance benefit anymore (but it also doesn’t hurt).

Summary: read tests

Let’s summarize what we’ve learned so far (for reading operations):

  • using a read-only connection doesn’t provide any performance benefit
  • using a shared cache is never faster (but sometimes slower) than using a private cache
  • using WAL is always faster than using the default journal mode (DELETE)

As for the question whether to use a shared connection or multiple connections, see this chart:

: one shared connection
: one connection per thread
select-result.png

This chart only contains the variations for shared and multi connections with the best performance, i.e. using WAL and no shared cache. As you can see, for very few threads (my guess: thread count <= cpu count), multiple connections perform much better. However, for more threads, a single shared connection is the better choice.

Batch 2: write tests

Next, let’s look at write-only tests. With these tests, multiple threads concurrently write to the same database table, inserting random data.

insert-statements.csv (file containing data for charts in this section)

Test: shared cache

Our first tests checks the performance for when a shared cache is used.

: private cache
: shared cache
insert-sh-cache.png

As you can see, there’s no real difference between whether a shared cache or a private cache is used.

Test: WAL

Next, let’s check WAL (which improved read performance significantly even though it’s designed for write operations).

: no WAL
: use WAL
mixed-insert-wal.png

As expected, using a database in WAL mode drastically improves write performance.

Test: shared connections

The last thing to tests is whether to use multiple connections or a single shared one.

: shared connection
: one connection per thread
insert-sh-con.png

The results are clear. Using a shared connection always yields better write performance when using multiple threads.

Summary: write tests

To summarize the previous sections:

  • Using a shared cache doesn’t affect the performance.
  • Using WAL improves write performance significantly.
  • Using a shared connection is always faster than using multiple connections.

insert-results.png

Batch 3: mixed reads and writes test

The last batch combines the previous two batches. This time the same number of read and write threads read and write concurrently from/to the same database table.

mixed-statements.csv (file containing data for charts in this section)

Assumption: WAL improves general performance

The previous tests clearly showed that enabling WAL improves read as well as write performance. Let’s check whether this is still true for concurrent reads and writes.

: no WAL
: use WAL
mixed-select-wal.png
mixed-insert-wal.png

Again, enabling WAL results in a significant performance boost.

Note: Reading without WAL is extremely slow (under 1000 rows per second for 10 threads or less).

Test: Shared or multiple connections

Next, check whether we should use a shared connection or multiple connections.

mixed-select-result.png
mixed-insert-result.png

As you can see, in both cases using one connection per threads and using WAL provides the best performance.

Conclusions

Assuming, my code doesn’t contain any errors that are affecting the results in a significant way, the following conclusions can be drawn:

  • Enabling WAL for a database gives a significant performance boost for all read and write operations.
  • If memory is not an issue, shared caches shouldn’t be used as they may decrease read performance.
  • Using read-only connections doesn’t affect the read performance.

Regarding shared connection vs. multiple connections:

  • If you only have one thread, it doesn’t matter (obviously).
  • If you do primarily reading…

    • … and the thread count is <= the CPU (core) count: use multiple connections
    • … and you have more threads than CPUs (cores): use shared connection
  • If you do primarily writing, use a shared connection.
  • If you do about the same amount of reading and writing, use multiple connections.

I hope this helps. If there’s something (terribly) wrong with this analysis, please leave a comment below.

Please note that these results are based on a Windows system. Other operating system may produce other results.

Rectangle Intersection Test (with C#)

For a software project I needed to check whether two rectangles intersect (or overlap). What made my problem complicated was that one of the rectangles could be rotated. While this problem seems to be trivial (to a human being), it’s not that simple to implement. It took me a while to find the right answer.

Intersecting Rectangles

Now, the solution to this problem is called a separating axis test. Basically this means: If I can find an axis (read: line) that separates both rectangles, then they don’t intersect/overlap. (Actually this works for any convex polygon; see below) Of course, if the two rectangles don’t intersect, there are undefinitely many possible separating axes. Fortunately we can use the edges of each rectangle as axes and testing all of the is sufficient.

An axis separating two rectangles

I won’t get into the details of this algorithm here – it’s sufficiently good described in the article mentioned above – but basically you check for each point on which side of the separating axis it is. If all points of the rectangle A are on one side and all of rectangle B are on the other side, then we’ve found a separating axis.

Here’s the implementation of the method testing where a single edge (represented by points x1 and x2) is a separating axis:

/// <summary>
/// Does axis separation test for a convex quadrilateral.
/// </summary>
/// <param name="x1">Defines together with x2 the edge of quad1 to be checked whether its a separating axis.</param>
/// <param name="x2">Defines together with x1 the edge of quad1 to be checked whether its a separating axis.</param>
/// <param name="x3">One of the remaining two points of quad1.</param>
/// <param name="otherQuadPoints">The four points of the other quad.</param>
/// <returns>Returns <c>true</c>, if the specified edge is a separating axis (and the quadrilaterals therefor don't 
/// intersect). Returns <c>false</c>, if it's not a separating axis.</returns>
bool DoAxisSeparationTest(Point x1, Point x2, Point x3, Point[] otherQuadPoints) {
  Vector vec = x2 - x1;
  Vector rotated = new Vector(-vec.Y, vec.X);

  bool refSide = (rotated.X * (x3.X - x1.X)
                + rotated.Y * (x3.Y - x1.Y)) >= 0;

  foreach (Point pt in otherQuadPoints) {
    bool side = (rotated.X * (pt.X - x1.X) 
               + rotated.Y * (pt.Y - x1.Y)) >= 0;
    if (side == refSide) {
      // At least one point of the other quad is one the same side as x3. Therefor the specified edge can't be a
      // separating axis anymore.
      return false;
    }
  }

  // All points of the other quad are on the other side of the edge. Therefor the edge is a separating axis and
  // the quads don't intersect.
  return true;
}

This method is then called for each edge of each rectangle. If the method returns true, the actual intersection test method can return “not intersecting”. If the method returns false for all edges, the rectangles intersect.

test-app.png

I’ve created an C#/WPF example implementation (under FreeBSD license) that you can download and experiment with.

Rectangle Intersection Test Project (for Visual Studio 2010)

Remarks: The algorithm above works for every convex polygon. Instead of four times two edges you then have n times m edges. For concave polygons, however, this algorithm doesn’t work because there may be no separating axis even though the polygons don’t intersect.

Android and MTP (programmer’s view)

In Android 3.x, Google switched from USB Mass Storage to MTP for files stored in the “external storage”. For any device created before Android 3.x, creating a file and copying it off the device was easy:

File dir = Environment.getExternalStoragePublicDirectory("My App");
File file = new File(dir, "test.txt");
FileUtils.writeStringToFile(file, "Hello File");

This would place a file called test.txt in the directory My App on the SD card/external storage. You then would connect your Android device via USB with your computer, enable USB mass storage, and simply copy the file off the device.

USB mass storage on Android

With newer devices (like the Galaxy Nexus I’m using) that’s no longer enough because they use MTP instead of USB Mass Storage. If you’d execute the code above, the file would be created but it wouldn’t show up in the Windows Explorer (or whatever tool you’re using the view the device’s contents).

MTP on Android

To make the file show up, after closing it use the following code (API documentation):

MediaScannerConnection.scanFile(ctx, new String[] { file.getAbsolutePath() }, null, null);

After doing this the file should appear immediately in the Windows Explorer.

Notes:

  • The external storage is scanned automatically when the Android device is booting (e.g. after a restart).
  • If you pass a directory (instead of a file) to scanFile(), the directory will show up as file in Window Explorer, so don’t do this.

.NET Performance: primitive type vs. struct

For a project I was wondering what’s the performance penalty of using a C# struct (containing only one field) over using a local variable directly; i.e.:

int myVar;

vs.

struct MyStruct {
  int MyVar;
}

The result: There’s no (real) difference!

Here are some stats (“Release” build):

Running each test 50,000,000,000 times

Running 'UseLocalVar'...
Done in 61329.359 ms
Running 'UseStructField'...
Done in 61414.885 ms
Running 'UseStructProperty'...
Done in 121383.416 ms

The first two results are what I was talking about. The third result uses a property instead of a field in the struct. It’s two times slower.

Here’s the code for the benchmark:

using System;
using System.Diagnostics;

class Benchmark {
  const long LOOPS = 50000000000;

  static void Main(string[] args) {
    Benchmark benchmark = new Benchmark();

    Console.WriteLine("Running each test {0:0,0} times", LOOPS);
    Console.WriteLine();

    Console.WriteLine("Running 'UseLocalVar'...");
    Stopwatch stopWatch = Stopwatch.StartNew();
    int test = 0;
    for (long x = 0; x < LOOPS; x++) {
      test += benchmark.UseLocalVar((int)x);
    }
    TimeSpan elapsed = stopWatch.Elapsed;
    Console.WriteLine("Done in {0:0.000} ms", elapsed.TotalMilliseconds);


    Console.WriteLine("Running 'UseStructField'...");
    stopWatch = Stopwatch.StartNew();
    test = 0;
    for (long x = 0; x < LOOPS; x++) {
      test += benchmark.UseStructField((int)x);
    }
    elapsed = stopWatch.Elapsed;
    Console.WriteLine("Done in {0:0.000} ms", elapsed.TotalMilliseconds);


    Console.WriteLine("Running 'UseStructProperty'...");
    stopWatch = Stopwatch.StartNew();
    test = 0;
    for (long x = 0; x < LOOPS; x++) {
      test += benchmark.UseStructProperty((int)x);
    }
    elapsed = stopWatch.Elapsed;
    Console.WriteLine("Done in {0:0.000} ms", elapsed.TotalMilliseconds);
  }

  int UseLocalVar(int val) {
    int test = val;
    test = test + val;
    return test;
  }

  int UseStructField(int val) {
    TestStructField test = new TestStructField(val);
    test.Value = test.Value + val;
    return test.Value;
  }

  int UseStructProperty(int val) {
    TestStructProperty test = new TestStructProperty(val);
    test.Value = test.Value + val;
    return test.Value;
  }

  private struct TestStructField {
    public int Value;

    public TestStructField(int value) {
      this.Value = value;
    }
  }

  private struct TestStructProperty {
    public int Value { get; set; }

    public TestStructProperty(int value) : this() {
      this.Value = value;
    }
  }
}