Stephen A. Fuqua (saf)

a Bahá'í, software engineer, and nature lover in Austin, Texas, USA

Soon the Ed-Fi Alliance will release version 2.0 of our Analytics Middle Tier, welcoming it to the “big kids’ table” as a fully supported add-on to the ODS.

When we started this project in 2018, it was with the aim of creating a production-ready proof-of-concept (a seeming oxymoron, we know) for simplifying the Ed-Fi Data Model, thus making the ODS more useful for reporting and ad hoc analytics exploration (see From Diagrams & Definitions: Solving the Analytics Reporting Gap). As such, it was released on the Ed-Fi Exchange instead of being bundled into the core ODS/API platform.

Continue reading on ed-fi.org

analytics middle tier
diagram

Spurred on by the goal of giving a brown bag tech talk on the subjects of test-driven development and SOLID object oriented design principles, I’ve finally collected all of the “wisdom” I’ve learned over the years into one place: Best Practices in Test-Driven, Object Oriented, Green- and Brownfield Development. I plan to keep this as a living document, adding or modifying as I continue to deepen my knowledge through feedback, study, and further experiments.

Entity Framework Core has a few changes that impact unit testing, particularly with respect to EntityEntry.State management. My previous unit testing techniques also did not take into account the use of async methods. In this article I’ll present a few techniques used in the context of a POC exploration of IdentityServer4. Although .NET Core 3 is now fully available, these examples are based on .NET Core 2.2.

Background

IdentityServer4 has a ConfigurationDbContext that provides access for managing Client entities, along with an interface IConfigurationDbContext. While IdentityServer4’s infrastructure handles all of the OAuth processing, CRUD operations for clients is left up to us. Therefore I created a ClientsController and a ClientRepository, and injected the interface into the repository.

public class ClientsController : ControllerBase
{
    public ClientsController(IClientRepository repo) { ... }
}

public class ClientRepository : IClientRepository
{
    public ClientRepository(IConfigurationDbContext context) { ... }
}

public class ConfigurationDbContext : DbContext, IConfigurationDbContext
{
    public DbSet<Client> Clients { get; set; }

}

namespace IdentityServer4.EntityFramework.Interfaces
{
  /// <summary>Abstraction for the configuration context.</summary>
  /// <seealso cref="T:System.IDisposable" />
  public interface IConfigurationDbContext : IDisposable
  {
    /// <summary>Gets or sets the clients.</summary>
    /// <value>The clients.</value>
    DbSet<Client> Clients { get; set; }
    /// <summary>Gets or sets the identity resources.</summary>
    /// <value>The identity resources.</value>
    DbSet<IdentityResource> IdentityResources { get; set; }
    /// <summary>Gets or sets the API resources.</summary>
    /// <value>The API resources.</value>
    DbSet<ApiResource> ApiResources { get; set; }
    /// <summary>Saves the changes.</summary>
    /// <returns></returns>
    int SaveChanges();
    /// <summary>Saves the changes.</summary>
    /// <returns></returns>
    Task<int> SaveChangesAsync();
  }
}

If I were hand-coding the DbContext class, I would have made sure to include an interface just as IdentityServer4 did. I would also decorate it with a [ExcludeFromCodeCoverage] attribute: data access logic, which needs unit testing, belongs in the repository. The DbContext class is pattern based and, although there may be mapping logic, it is impractical to unit test. We’ll save that for full-blown API integration tests.

Unit Testing Challenges

So now we have two classes to test: the controller and the repository. Let’s focus on the repository. At first glance, it would seem trivial to write tests, and make them pass, using the Clients property and SaveChangesAsync method. The challenge comes from DbSet: it is an abstract class, it contains no implemented methods, the query logic requires an IQueryable, and the modification logic now returns EntityEntry objects. The EntityEntry object in turn is difficult to construct and the classes involved have warnings in the source code that they should not be directly relied on in non EntityFramework code.

Also of note: EntityFrameworkCore now has an Update method to go along with Add and Remove, so that those of who do not like using EntityFramework change tracking (more on this below) no longer need to use Attach and manually set EntityState.Modified, subject of my February blog post.

In order to unit test this, we will need some kind of test double that gives us the equivalent functionality while minimizing the effort required to write the tests. After all, if the testing is hard, we’re all the more likely to skip it.

IAsyncEnumerable and IAsyncQueryProvider

For this purpose, it is easier to hand-create a set of test-only classes than to use a mocking framework. Because of the async calls on IQueryable, this turns out to be harder than first thought: Linq is being used, and it invokes behind-the-scenes logic on interfaces hidden deeply away from us. This finally reveals a deep truth about ORMs: true isolation in unit tests is impossible when you are relying on a tool to generate SQL statements for you.

As I tried to work my own way through the additional difficulty of Linq with async support, I kept running up against an exception like this:

“The source IQueryable doesn’t implement IAsyncEnumerable{0}. Only sources that implement IAsyncEnumerable can be used for Entity Framework asynchronous operations.”

Thankfully Microsoft provided a leg-up in Testing with a mocking framework, albeit with Entity Framework 6 instead of Core involved. The code required for async has not changed much - primarily just a few interface name changes. My versions of these classes is available in GitHub: FakeAsyncDbSet, FakeAsyncEnumerable, and FakeAsyncQueryProvider.

Writing Unit Tests

Now we have the pieces necessary to write good unit tests for a repository, following this formula:

  1. Create a mock on the DbContext interface.
  2. Create a FakeAsyncDbSet<SomeClass>.
  3. Configure the mock to use this database set.
  4. Instantiate the repository using the mock DbContext.
  5. For query-based tests, manually add appropriate objects to the fake via theFakeDbSet.List.Add(...). Write assertsions for the correctness of the query result.
  6. For modification tests, verify that the correct objects were modified, using the fake’s convenience properties (of type List<SomeClass>) Added, Updated, and Deleted.

For a fully worked example, using NUnit 3 and FakeItEasy, see ClientRepositoryTests.cs.

Appendix: Change Tracking

Entity Framework’s change tracking mechanism handles caching of data, helping prevent extra database calls. In some systems this might be useful. In a potentially load-balanced web server, caching needs to be in a shared system - not buried inside of Entity Framework. The code will have to be written with the assumption that the object is not yet cached by EF, so you might as well just turn off change tracking altogether. EF in itself performs much better this way, although theoretically at the expense of some extra data access work.

To completely disable change tracking, call the UseQueryTrackingBehavior method on the database options object:

services.AddDbContext<ConfigurationDbContext>(options =>
{
    options.UseSqlServer(connectionString);
    options.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);
});

The most popular request at the 2018 Ed-Fi Summit’s tech town hall was for an option to run the Ed-Fi ODS / API on an open source database solution. Historically, the Operational Data Store (ODS) database has been developed on Microsoft SQL Server, matching the preference of educational agencies that rely on heavily discounted licensing terms for on-premises operation of SQL Server. The advent of cloud-based hosting has changed that dynamic, especially since Microsoft ended the “bring your own license” (BYOL) practice.

One implication of that change is that educational agencies wishing to use SQL Server may need to pay full price when using non-Azure managed services for SQL Server; however, even with BYOL, managed services with SQL Server does cost more than other database platforms. So, with the help of an Ed-Fi Special Interest Group, we narrowed the field to one alternative database platform (for now…).

continue reading on ed-fi.org…

A roadrunner

In Making a Mockery of Extension Methods - way back in 2014 - I wrote about a technique for a code workaround that would facilitate replacing extension methods (global static methods) with mock objects for unit testing. Over the years I’ve used this technique a few times and found two major problems:

  1. The technique of static delegate substitution is simply strange and requires too much thinking / analysis for good maintenance.
  2. The unit tests are brittle, often failing on the first try due to multiple tests interacting with each other as they replace the static delegate.

Interestingly, I’ve found the second to be true with both XUnit and NUnit, even when supposedly running tests serially. This problem did not occur as frequently when I first started using the technique five years ago; I was using VS Tests or NUnit 2 back then, so perhaps the more recent brittleness is from the change in frameworks.

At last I grew tired of the technique and decided it would be better to simply replace it with something more familiar: an injectable class. Thus the recipe:

  1. For a large set of extension methods over unmockable code - for example extension methods around database interaction - best to go ahead and create a thin adapter layer with an interface and constructor injection.
  2. For a small static method over unmockable code, consider a small class with optional interface for either constructor or property injection.
  3. If tempted to introduce a global static for any reason, consider instead using of these two techniques.

In my original article, I was wrapping extension methods from the micro ORM OrmLite:

public class Repository<T> where T: class
{
    private readonly IDbConnectionFactory dbFactory;

    public Repository(IDbConnectionFactory dbFactory)
    {
        if (dbFactory == null)
        {
            throw new ArgumentNullException("dbFactory");
        }

        this.dbFactory = dbFactory;
    }

    public int Save(T input)
    {
        int rowsAffected = 0;
        using (IDbConnection db = dbFactory.OpenDbConnection())
        {
            using (var tran = DelegateFactory.OpenTransaction(db))
            {
                rowsAffected = DelegateFactory<T>.Save(db, new[] { input });
                tran.Commit();
            }
        }
        return rowsAffected;
    }
}

public static class DelegateFactory
{
    public static Func<IDbConnection, IDbTransaction> OpenTransaction = (connection) => { return ReadConnectionExtensions.OpenTransaction(connection); };
}

public static class DelegateFactory<T>
{
    public static Func<IDbConnection, T[], int> Save = (connection, items) => { return OrmLiteWriteConnectionExtensions.Save(connection, items); };
}

The two delegate factories allowed Repository to be fully-testable, but at the expense of bringing in an indirection pattern that is unfamiliar to most developers and has the test-interaction problem mentioned earlier. Instead today I would simply follow the adapter pattern. Alternately I could accept that Repository is light enough to not be unit tested - that it is effectively, already, an adapter. The transaction support might be worth unit testing, and it is worth considering that a full-blown repository class would have more than just that single Save method. Improved version that is trival to unit test:

public interface IDbPersistence : IDisposable {
    IDbConnection OpenConnection();
    IDbTransaction StartTransaction();
    int Save<T>(params T[] records);
    void Commit();
}

public class Repository<T> where T: class
{
    private readonly IDbPersistence _persistenceLayer;

    public Repository(IDbPersistence persistenceLayer)
    {
        _persistenceLayer = persistenceLayer ?? throw new ArgumentNullException(nameof(persistenceLayer));
    }

    public int Save(T input)
    {
        int rowsAffected = 0;
        using (var connection = _persistenceLayer.OpenConnection())
        {
            using (_ = _persistenceLayer.OpenTransaction())
            {
                rowsAffected = _persistenceLayer.Save(input);
                _persistenceLayer.Commit();
            }
        }
        return rowsAffected;
    }
}

As another example, in the FlightNode project I used Entity Framework (EF) with tracking turned off for higher performance. When tracking is disabled, EF must be told that an object has been modified if you wish for EF to build an UPDATE SQL statement. In FlightNode, I had a business / domain class called DomainManager, with an injected IPersistenceBase. This class is arguably very similar to what most people would call a repository. I considered it business logic because it performed input validation on domain objects. In the original version, this class contained a static delegate:

public static Action<IPersistenceBase<TEntity>, TEntity> SetModifiedState = (IPersistenceBase<TEntity> persistenceLayer, TEntity input) => persistenceLayer.Entry(input).State = System.Data.Entity.EntityState.Modified;

The brittle unit tests were getting beyond annoying, so recently I finally changed this to wrap this single command in a utility class. To prevent breaking every unit test through introduction of a new constructor argument, I used property injection: thus only the unit tests that needed to mock this method would need to inject a replacement. I didn’t even bother adding an interface, knowing that I could use Mock to replace this virtual method.

public class EfStateModifier
{
    public virtual void SetModifiedState(IModifiable persistenceLayer, object input)
    {
        persistenceLayer.Entry(input).State = System.Data.Entity.EntityState.Modified;
    };

}

Usage:

public abstract class DomainManagerBase<TEntity>

    private readonly IPersistenceBase<TEntity> _persistence;{
    private EfStateModifier _efStateModifier;

    public EfStateModifier StateModifier
    {
        get => _efStateModifier ?? (_efStateModifier = new EfStateModifier());
        set => _efStateModifier = value;
    }

    protected DomainManagerBase(IPersistenceBase<TEntity> persistence)
    {
        _persistence = persistence ?? throw new ArgumentNullException(nameof(persistence));
    }

    public virtual int Update(TEntity input)
    {
        // trust the validator to handle null values
        input.Validate();
        StateModifier.SetModifiedState(Persistence, input);

        return _persistence.SaveChanges();
    }
}

This fixed the brittle test problem nicely. However, it leaves in place a glaringly-obvious problem: this base class is tainted by presence of Entity Framework! The FlightNode project was an after hours project with no one to review the code, one that an external organization was dependent on. In hindsight I see that I got sloppy here in my haste to deliver the code on a seasonally-relevant roadmap. The IPersistenceBase should have hidden the state modification.

If data lands in the ODS and no one uses it, does it empower educators?

Ed-Fi Community members are increasingly leveraging the Ed-Fi ODS as a source of data for business intelligence (BI) solutions, while many continue to develop it as a solution for compliance reporting. While the Ed-Fi vendor community provides many options for analytics based on the ODS data, some end-users wish to create their own custom reports and perform their own ad hoc analysis. The Analytics Middle Tier and the two Analytics Starter Kits recently published to the Ed-Fi Exchange aim to help this group by simplifying the ODS data model, provisioning support for role-based data access, and providing sample visualizations. These solutions aim to empower those IT staff who are empowering their educators and administrators.

continue reading on ed-fi.org…

Working with a legacy codebase using NUnit and .NET Framework, I’ve found that there is a mix of NUnit assertions and assertions using the Should library. This library is rather old and, frankly, limited compared to Shouldly and FluentAssertions. These newer two frameworks are significantly more expressive, with APIs that cover myriad situations elegantly. Questions in front of me:

  1. Are any of these libraries really worthwhile compared to simply using NUnit’s built-in assertions - either traditional or Assert.That style?
  2. If using any independent framework, which is the best choice for this code base?
  3. If selecting Shouldly or FluentAssertions, ought we to upgrade the old asserts?

My conclusion: favor Shouldly. Upgrade old asserts opportunistically for consistency, but no need to go out of the way.

Full source code for these experiments is at https://github.com/stephenfuqua/AssertionComparison.

Why Use a Separate Library

Some argue that the assertion library simply ought to be independent of the unit test framework, allowing greater flexibility in switching between frameworks. Switching unit test frameworks in a large legacy project sounds rather tedious, so that alone is an insufficient reason.

One problem I’ve run into frequently in legacy code is people switching up the expected and actual responses. In classic NUnit, expected comes first. This situation is better with the NUnit3 Assert.That style; however, the values are still hidden away inside of the assertion method call.

Assert.AreEqual(expected, actual); // old style
Assert.That(actual, Is.EqualTo(expected)); // new style

When a coder reverses the traditional style, and you need to fix a broken test, it can get a bit confusing to figure out what is going on (especially if the variables are not so clearly named). In the course of writing this project, while switching back and forth between styles, I realized I had made this mistake myself - wanting to put the actual result first and then compare it to the expected.

If continuing to use NUnit3, this alone is a good reason to switch to the new Constrain Model. The three fluent frameworks evaluated here address this by putting the actual result at the beginning of the assertion statement:

actual.Should().Be(expected); // FluentAssertions, and Should in Fluent mode
actual.ShouldBe(expected); // Shouldly
actual.ShouldEqual(expected); // Should

Another problem is coming up with a meaningful message, which is especially important if you have multiple asserts in the same unit test (many people frown at that, and I frown right back unless the coders are prone to large numbers of mistakes per function). Each of these frameworks reports failures differently. Compare these results:

  • NUnit Assert: Assert.AreEqual(-1, sum); Assert.That(sum, Is.EqualTo(-1));

    Expected: -1 But was: 0

  • FluentAssertions: sum.Should().Be(-1);

    Expected sum to be -1L, but found 0L.

  • Should: sum.ShouldEqual(-1);

    Should.Core.Exceptions.EqualException : Assert.Equal() Failure Expected: -1 Actual: 0

  • Shouldly: sum.ShouldBe(-1);

    Shouldly.ShouldAssertException : sum should be -1L but was 0L

The latter three all provide more information than the first. Of these, I find the FluentAssertion response to be the most elegant for its compactness and precision.

Documentation and Richness

Compared to the other two bolt-on libraries, FluentAssertions clearly has the best documentation. Detailed and rich with examples, it was easy for me to find the right syntax for the job. It also clear that the library has extensive support for value types, references types, collections, and exceptions.

Shouldly’s documentation seems to be a work-in-progress. I was unable to find documentation of their exception handling syntax - I had to look for the functions in the object browser.

Should’s documentation is brief but relatively complete given that it is a smaller package. Looking at the repo, it also clear that the project hasn’t been touched in many years. This could mean that it simply works - but it also means that others have passed it by in the meantime.

Exception Handling

To get a sense of the syntax richness, let’s look at exception handling. Aside: In NUnit, I never use the [ExpectedException] attribute as I prefer to have the assert clearly visible in the method body.

NUnit Outer Exception

Assert.Throws<ArgumentNullException>(RunNullSeries); // old style
Assert.That(RunNullSeries, Throws.ArgumentNullException); // new style

Fluent Assertions Outer Exception

((Action)RunNullSeries)
    .Should()
    .Throw<ArgumentNullException>();

Should Outer Exception

((Action)RunNullSeries)
    .ShouldThrow<ArgumentNullException>();

Shouldly Outer Exception

((Action)RunNullSeries)
    .ShouldThrow<ArgumentNullException>();

There is not much difference between these. Fluent Assertions requires one extra method call. This is a general philosophical difference: it wants you to call Should() first every time, and then exposes the full API. What I like about this is that it presents a more consistent looking interface, compared to combining elements together (e.g. ShouldThrow, ShouldBe, etc.) This might just be a matter of style.

Inner Exception Handling

Both Fluent Assertions and Shoudly make it easy to also check on an inner exception. So does the new NUnit3 Constraint Model. With the other two frameworks, you’re left with catching and inspecting the exception.

NUnit3 Constraint Model Inner Exception

Assert.That(RunSeriesWithNullValue,
        Throws.TypeOf<CalculatorException>()
            .With.InnerException.TypeOf<InvalidOperationException>());
```csharp

#### Fluent Assertions Inner Exception

```csharp
((Action)RunSeriesWithNullValue)
    .Should()
    .Throw<CalculatorException>()
    .WithInnerException<InvalidOperationException>();

Shouldly Inner Exception

((Action)RunSeriesWithNullValue)
    .ShouldThrow<CalculatorException>()
    .InnerException
    .ShouldBeOfType<InvalidOperationException>();

Execution Time

Across thousands of tests, execution time for these asserts could certainly add up. Here is one place where FluentAssertions is not as attractive. I wrote the same tests in each framework and ran them many times. The results below are representative of the typical results in repeated executions:

Overall execution times

Yikes! What’s going on here? Let’s drill into the results a bit…

Detailed execution times

There is one test that makes up nearly 80% of execution time. And it is a trivial test:

[Test]
public void TempVariable()
{
    var sum = AggregateCalculator.Sum(1, -1);

    sum.Should()
        .Be(-1); // purposefully wrong
}

Running that one test repeatedly, by itself, I see similar results. When I run multiple tests, that one test always seems to take the longest. Applying the [Order] attribute to the another test, to force another one to run first, the longest time shifts to that test. Thus it seems to be a bootstrapping thing (reflection?) with FluentAssertions - the first test seems to take much longer to execute. Subtract out the effect of that one test, and FluentAssertions performs better than classical NUnit asserts but a little bit worse than the others.

It is also interesting to see that the Constraint Model of NUnit3 performs very well. The most time-consuming assertion there is for the ArgumentNullException.

Conclusion

The execution time differences are troubling. This sample code base may too small to read too much into it, but this is an important finding and something to watch out for. Based on documentation and richness of syntax I would want to use Fluent Assertions, but if the project has a large number of tests, the small performance difference could add up to a meaningful increase in total execution time. If time to fully evaluate is lacking, then I feel that my best choices are to either (a) focus on Assert.That syntax or (b) upgrade Should to Shoudly and perhaps make time to pitch in on improving the documentation. Leaving existing Should tests in place would seem to be harmless since the performance is good.

Framework Documentation Richness Performance
NUnit3 (Classic) ++ + /
NUnit3 (Constraint) ++ ++ ++
FluentAssertions +++ +++ -
Should + + ++
Shouldly + ++ ++

According to Amazon Web Services (AWS) CEO Andy Jassy, at his keynote Wednesday morning, I am one of around 53,000 people from all over the world who have come out to the annual AWS re:Invent conference in Las Vegas. We come together from myriad industries and with interests that span the full range from product development to operations to account management. My personal learning objectives for the week are to deepen my understanding, and think about implications for the Ed-Fi tech stack, of four concepts:

  • Data lakes
  • Serverless
  • Business intelligence
  • .NET on AWS

continue reading on ed-fi.org…

The geeks filling in the Venetian Theater to learn about Best Practices in Big Data Analytics Architecture

We just wrapped up the 2018 Technical Bootcamp as part of the Ed-Fi Summit. The Bootcamp is a two-day event where we dig into the technical details, provide hands-on training and demonstrations, and get feedback from the technical community on challenges they’d like to address and improvements they’d like to see made. We had a number of sessions geared specifically toward local education agencies (LEA), state education agencies (SEA), and vendors.

continue reading on ed-fi.org…

Have you ever tried to write a query using the Ed-Fi ODS for reporting or analytics? To say that it is challenging is to use the mildest language. The Data Standard documentation in Tech Docs is top notch. Nevertheless, going from diagrams and definitions to actual query code for, let’s say, each student’s average math grade during a grading period, is not a trivial exercise.

continue reading on ed-fi.org…

Ed-Fi Analytics Middle Tier