About Me

My name is Mikael Eliasson. I'm a 24 year old orienteerer, runner and programmer who lives in Gothenburg, Sweden and study Software Engineering at Chalmers university. I will write about programming, training and maybe some other stuff too.

Follow me

  • 5jun

    Cannot drop database with ef

    If you have done testing with Entity Framework you might have come across "Cannot drop database XX because it is currently in use.", EFUtilities can solve that for you.

    Typically this situation occur if you debug something by querying the database in Sql Server Management Studio. Management studio then holds an open connection and once it's time to clean up for the next test you get the error message:


    Cannot drop database "Nogginator.Test" because it is currently in use.


    I got slightly tired of this an built a helper function that kills all connections to the database so it always can be delete.

    To use it you only need to make a small change to your Delete statement.

    db.Database.Delete(); //Old code
    db.Database.ForceDelete(); //New code


    Read more at https://github.com/MikaelEliasson/EntityFramework.Utilities#dbdatabaseforcedelete

    The requirement is that you have rights to the master database. The code that's run looks like this:


    public static void ForceDelete(this Database db, string name = null)
                name = name ?? GetDatabaseName(db.Connection);
                using (SqlConnection sqlconnection = new SqlConnection(db.Connection.ConnectionString)) //Need to run this under other transaction
                    // if you used master db as Initial Catalog, there is no need to change database
                    string rollbackCommand = @"ALTER DATABASE [" + name + "] SET  SINGLE_USER WITH ROLLBACK IMMEDIATE;";
                    SqlCommand deletecommand = new SqlCommand(rollbackCommand, sqlconnection);
                    string deleteCommand = @"DROP DATABASE [" + name + "];";
                    deletecommand = new SqlCommand(deleteCommand, sqlconnection);
  • 15maj

    Efutilities, bulk insert and update

    With EFUtilities 1.0.0 released it's time to tell the world about the features. This time we will look at getting data into SQL Server fast.

    Everything I'm about to show you is documented at https://github.com/MikaelEliasson/EntityFramework.Utilities.

    The install instructions can also be found there.

    Basically this is what you need to know:

    PM> Install-Package EFUtilities

    Keeping track of the weather in the world

    Wouldn't it be nice to have a list of the temperature of a lot of cities in the world in your own little project? OK, maybe not. But it will serve as an example of a domain where you might need to update plenty of entities often and where it can't be done by a query.

    There is an example file at http://openweathermap.org/current#bulk that contains 12176 cities. More exactly this file

    A row looks something like this (+ plenty of more information):



    Reading the data from file

    Note: You can download the full demo at https://github.com/MikaelEliasson/EFUtilitiesDemos

    First, let's create our model.

    public class Location
            public int Id { get; set; }
            public string Name { get; set; }
            public string Country { get; set; }
            public double Temp { get; set; }
            public double Pressure { get; set; }

    And our DbContext

        public class Context : DbContext
            public DbSet<Location> Locations { get; set; }
            protected override void OnModelCreating(DbModelBuilder modelBuilder)
                    .HasKey(x => x.Id)
                    .Property(x => x.Id)

    Then we need to read all that data, That can de done easily with Json.Net and the dynamic support it has.

    NOTE: The reason I set the temp to 0 is for demo purposes to see that the demo works.

        var filePath = "D:/Downloads/hourly_14.json/hourly_14.json";
        var sw = new Stopwatch();
        var lines = File.ReadLines(filePath);
        var locations = lines.Select(l =>
            dynamic x = JsonConvert.DeserializeObject(l);
            return new Location
                Id = x.city.id,
                Name = x.city.name,
                Country = x.city.country,
                Temp = 0.0,
                Pressure = x.data[0].main.pressure,

    Bulk inserting the data with Entity framework

    using (var ctx = new Context())
        EFBatchOperation.For(ctx, ctx.Locations).InsertAll(locations);

    That's that. EFUtilities will read the metadata and handle the rest. It will use SqlBulkCopy under the covers to stream the data into Sql Server as fast as possible. 

    This takes about 160ms on my computer.

    Inserted 12176 locations in 164 ms
    Hottest: Protaras 0K

    Bulk updating the data with Entity framework

    Now, the thing with the weather is that it's changing. We need to update this often. EFUtilities can from the 1.0.0 version help you do this efficiently (more than 20x faster than normal EF would be in this case). 

    Here is the code for the update part. First we read the data again, then update it and finally just read some information to make sure it worked.

    var locations2 = lines.Select(l =>
            dynamic x = JsonConvert.DeserializeObject(l);
            return new Location
                Id = x.city.id,
                Temp = x.data[0].main.temp,
                Pressure = x.data[0].main.pressure,
        using (var ctx = new Context())
            EFBatchOperation.For(ctx, ctx.Locations).UpdateAll(locations2, x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp));
        using (var ctx = new Context())
            var count = ctx.Locations.Count();
            var top = ctx.Locations.OrderByDescending(l => l.Temp).First();
            Console.WriteLine("Updated " + count + " locations in " + sw.ElapsedMilliseconds + " ms");
            Console.WriteLine(string.Format("Hottest: {0} {1}K", top.Name, top.Temp));

    The output for that part:

    Updated 12176 locations in 104 ms
    Hottest: Zhaoqing 310,93K

    So it updated all 12k rows in 104ms. Pretty nice.

    The update works by inserting all the data to update into a temp table. The it runs a command to update the data in the location table with the data from the temp table before it finally deletes the temp table.

    x => x.ColumnsToUpdate(l => l.Pressure, l => l.Temp)

    Tell it that it should only update Pressure and Temp. All other columns are ignored. That's why we don't overwrite the Name the city even though we didn't include that.

    Deep dive: Under the covers

    If we open upp SQL Profiler we will see the following commands


    SQL:BatchCompleted	insert bulk [dbo].[Locations] ([Id] Int, [Name] NVarChar(max), [Country] NVarChar(max), [Temp] Float, [Pressure] Float)

    In case we had a larger sample there would have been batching. EFUtilities default to 15k entities / batch. 


    SQL:BatchCompleted	CREATE TABLE dbo.[temp_Locations_635673635823747650](
                            [Id] int, 
                            [Temp] float, 
                            [Pressure] float, 
                            PRIMARY KEY ([Id]))
    insert bulk             [dbo].[temp_Locations_635673635823747650] ([Id] Int, [Temp] Float, [Pressure] Float)
    SQL:BatchCompleted	UPDATE [Locations]
                        [Temp] = TEMP.[Temp],[Pressure] = TEMP.[Pressure]
                        [Locations] ORIG
                    INNER JOIN
                         [temp_Locations_635673635823747650] TEMP
                        ORIG.[Id] = TEMP.[Id]
    SQL:BatchCompleted	DROP table dbo.[temp_Locations_635673635823747650]

    As you might guess InsertAll is doing the heavy lifting here too. Notice how the number of columns are kept to a minimum.  


    With the help of EFUtilities we can do bulk inserts and bulk updates into SQL Server. The speed depends on a lot of factores like the size of the entity and the computer but reaching speeds of around 100k entities / second is fairly reasonable and it takes very little extra work. 

    The full demo can be found, forked and downloaded at: https://github.com/MikaelEliasson/EFUtilitiesDemos

  • 1jan

    Entity framework pitfalls, include

    While EF is not as fast as plain sql it's actually quite fast nowdays but there are a few pitfalls to be aware of. This time: Include()

    Before I explain the problem with Include() let's look at some background. If you feel you have good knowledge about EF and SQL you can probably skim through the first parts.

    To make a long story short Include() will have a lot of data duplication in the returned resultset which in many cases slows down your query a lot. 

    What can Include() do for me?

    Include is a convenient way to load related collections eagerly in an EF query.

    var contacts = db.ContactInformations.Include(x => x.ContactInformationMails).Include(x => x.ContactInformationPhonenumbers).AsNoTracking().ToList();

    That query will load all contacts with their phone numbers and emails populated in a single query. Pretty nice for the developer!

    What's the catch?

    For us that has been around the game since before ORMs this should ring a warning bell. Remeber how this would be done in plain SQL?

    There are a few alternatives but basically all comes back to splitting the loading into multiple queries. A horrible, but not uncommon, way to do it was to load all contacts and then loop over them and for each contact load the releated phone numbers and emails. Which leaves us with a 2N + 1 query. (1 query to load N contacts and then 2 queries for each contact).

    That's really bad for performance so the performance aware developer would take another approach and load the data in three separate queries. One for each table and then merge the data in memory. This is faster than the 2N + 1 alternative by orders of magnitude.

    That's the two main approaches I know about. There are some alternatives to the second one, like packing the 3 queries into a single query with multiple resultsets which might be a bit faster but it's the same approach just sligtly optimized.

    What does that got to do with anything? 

    EF, or any other ORM, is not magic. It will translate your query to SQL and if the query is hard to write in an efficent way manually it will probably not be any better when the ORM creates it.

    The reason we didn't load the data in a single query before ORMs is because relational databases is not built to return hierarchial data. The result from a query is always a table. But unlike in HTML, SQL doesn't support nesting tables so we cannot get rows looking like this:


    Col1 Col2 Col3
    Id Number1
    Id Number2
    ID Mail1
    ID Mail2


    This is why I think every ORM developer should know the underlying datastore decently. If you do you would atleast recognize that there is a "hack" involved to get the query above to work. 

    If you don't understand how the ORM does something, it's time for you to look under the hood. Sql profiler or anything similar is a great tool to have to understand that's really going on.

    So how do EF load that data in one query?

    Here is where Sql Profiler comes in very handy. A quick look reveals the query to be:

    (I'll explain the concept below)

        [UnionAll1].[ID] AS [C1], 
        [UnionAll1].[ID1] AS [C2], 
        [UnionAll1].[Lastname] AS [C3], 
        [UnionAll1].[Firstname] AS [C4], 
        [UnionAll1].[Birthdate] AS [C5], 
        [UnionAll1].[LastUpdate] AS [C6], 
        [UnionAll1].[Gender] AS [C7], 
        [UnionAll1].[OnlyOwnerCanEdit] AS [C8], 
        [UnionAll1].[MemberType] AS [C9], 
        [UnionAll1].[Deleted] AS [C10], 
        [UnionAll1].[UserName] AS [C11], 
        [UnionAll1].[SICard] AS [C12], 
        [UnionAll1].[Comment] AS [C13], 
        [UnionAll1].[Address] AS [C14], 
        [UnionAll1].[C1] AS [C15], 
        [UnionAll1].[ID2] AS [C16], 
        [UnionAll1].[ContactInformationID] AS [C17], 
        [UnionAll1].[LastUpdate1] AS [C18], 
        [UnionAll1].[ShowToPublic] AS [C19], 
        [UnionAll1].[Mail] AS [C20], 
        [UnionAll1].[ReciveMailOnThis] AS [C21], 
        [UnionAll1].[MailCategory] AS [C22], 
        [UnionAll1].[C2] AS [C23], 
        [UnionAll1].[C3] AS [C24], 
        [UnionAll1].[C4] AS [C25], 
        [UnionAll1].[C5] AS [C26], 
        [UnionAll1].[C6] AS [C27], 
        [UnionAll1].[C7] AS [C28]
        FROM  (SELECT 
            CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], 
            [Extent1].[ID] AS [ID], 
            [Extent1].[ID] AS [ID1], 
            [Extent1].[Lastname] AS [Lastname], 
            [Extent1].[Firstname] AS [Firstname], 
            [Extent1].[Birthdate] AS [Birthdate], 
            [Extent1].[LastUpdate] AS [LastUpdate], 
            [Extent1].[Gender] AS [Gender], 
            [Extent1].[OnlyOwnerCanEdit] AS [OnlyOwnerCanEdit], 
            [Extent1].[MemberType] AS [MemberType], 
            [Extent1].[Deleted] AS [Deleted], 
            [Extent1].[UserName] AS [UserName], 
            [Extent1].[SICard] AS [SICard], 
            [Extent1].[Comment] AS [Comment], 
            [Extent1].[Address] AS [Address], 
            [Extent2].[ID] AS [ID2], 
            [Extent2].[ContactInformationID] AS [ContactInformationID], 
            [Extent2].[LastUpdate] AS [LastUpdate1], 
            [Extent2].[ShowToPublic] AS [ShowToPublic], 
            [Extent2].[Mail] AS [Mail], 
            [Extent2].[ReciveMailOnThis] AS [ReciveMailOnThis], 
            [Extent2].[MailCategory] AS [MailCategory], 
            CAST(NULL AS int) AS [C2], 
            CAST(NULL AS int) AS [C3], 
            CAST(NULL AS datetime2) AS [C4], 
            CAST(NULL AS varchar(1)) AS [C5], 
            CAST(NULL AS bit) AS [C6], 
            CAST(NULL AS int) AS [C7]
            FROM  [dbo].[ContactInformations] AS [Extent1]
            LEFT OUTER JOIN [dbo].[ContactInformationMails] AS [Extent2] ON [Extent1].[ID] = [Extent2].[ContactInformationID]
        UNION ALL
            2 AS [C1], 
            [Extent3].[ID] AS [ID], 
            [Extent3].[ID] AS [ID1], 
            [Extent3].[Lastname] AS [Lastname], 
            [Extent3].[Firstname] AS [Firstname], 
            [Extent3].[Birthdate] AS [Birthdate], 
            [Extent3].[LastUpdate] AS [LastUpdate], 
            [Extent3].[Gender] AS [Gender], 
            [Extent3].[OnlyOwnerCanEdit] AS [OnlyOwnerCanEdit], 
            [Extent3].[MemberType] AS [MemberType], 
            [Extent3].[Deleted] AS [Deleted], 
            [Extent3].[UserName] AS [UserName], 
            [Extent3].[SICard] AS [SICard], 
            [Extent3].[Comment] AS [Comment], 
            [Extent3].[Address] AS [Address], 
            CAST(NULL AS int) AS [C2], 
            CAST(NULL AS int) AS [C3], 
            CAST(NULL AS datetime2) AS [C4], 
            CAST(NULL AS bit) AS [C5], 
            CAST(NULL AS varchar(1)) AS [C6], 
            CAST(NULL AS bit) AS [C7], 
            CAST(NULL AS int) AS [C8], 
            [Extent4].[ID] AS [ID2], 
            [Extent4].[ContactInformationID] AS [ContactInformationID], 
            [Extent4].[LastUpdate] AS [LastUpdate1], 
            [Extent4].[PhoneNumber] AS [PhoneNumber], 
            [Extent4].[ShowToPublic] AS [ShowToPublic], 
            [Extent4].[PhoneType] AS [PhoneType]
            FROM  [dbo].[ContactInformations] AS [Extent3]
            INNER JOIN [dbo].[ContactInformationPhonenumbers] AS [Extent4] ON [Extent3].[ID] = [Extent4].[ContactInformationID]) AS [UnionAll1]
        ORDER BY [UnionAll1].[ID1] ASC, [UnionAll1].[C1] ASC

    Ughhhhh! That is ugly and if you don't understand it I will not blame you.

    What is going on here is that EF joins ContactInformations against ContactInformationPhonenumbers and ContactInformationMails and combine this into one large table.

    Now remember we cannot have nested tables and the rows must be uniform in the resultset which means that for a row returning the phonenumber we still need to include the columns for the email and the contact.  

    Here is an example of my contact data (With all it's 28 columns)

    Notice that we are actually loading the ContactInformation 3 times. So for each number or email I load the basic information an extra time. 

    Here is the pitfall with Include(). It will duplicate data in the resultset.

    So Include() does data duplication. How bad is that?

    Let's look at the numbers. But first let's briefly describe what affects the performance of an EF query. This is a simplified list so remember that there are more things in play.

    1. Translating the LINQ to sql (In newer versions of EF this can be cached)
    2. Sending the sql command to Sql Server
    3. Exectuing the query (compiling the query plan and executing it)
    4. Returning the result set
    5. Binding the data in the DataReader to entities and returning them (Materializing)

    Number 2 and 4 is directly dependent on number of bytes that is sent over the wire. Number 5 could depend on the size result. I have not looked how this is done in EF. But imagine you were going to parse the 3 rows above into one entity. Not so clear how you would do that huh? That most likely means a bunch of conditionals is needed to do it, which will lead to reduced performance.

    The test data set and queries

    That's the model. This dataset is quite small with 692 contact informations. 540 email and 897 phone numbers.

    The code for the test is below. There are three methods. 

    • First one is a simple Include().
    • Then I load all 3 collections and merge them in memory. 
    • The last one also load all 3 collection separetly but do it in a single query with multiple resultsets.

    NOTE: This example contains no filtering. For the tests where I add filtering I simple add a where clause to each query.

    private static void TestContacts()
                Benchmark("Load with include", () =>
                    using (var db = new gmokContext())
                        var contacts = db.ContactInformations.Include(x => x.ContactInformationMails).Include(x => x.ContactInformationPhonenumbers).AsNoTracking().ToList();
                Benchmark("Load separate and fix", () =>
                    using (var db = new gmokContext())
                        var contacts = db.ContactInformations.AsNoTracking().ToList();
                        var phones = db.ContactInformationPhonenumbers.AsNoTracking().ToLookup(x => x.ContactInformationID);
                        var mails = db.ContactInformationMails.AsNoTracking().ToLookup(x => x.ContactInformationID);
                        foreach (var c in contacts)
                            c.ContactInformationMails = mails.Contains(c.ID) ? mails[c.ID].ToList() : new List<ContactInformationMail>();
                            c.ContactInformationPhonenumbers = phones.Contains(c.ID) ? phones[c.ID].ToList() : new List<ContactInformationPhonenumber>();
                Benchmark("Load separate as batch", () =>
                    using (var db = new gmokContext())
                        var cmd = db.Database.Connection.CreateCommand();
                        Benchmark("generate sql", () =>
                            var csql = db.ContactInformations.ToString();
                            var psql = db.ContactInformationPhonenumbers.ToString();
                            var msql = db.ContactInformationMails.ToString();
                            cmd.CommandText =
                                csql + ";" + Environment.NewLine + Environment.NewLine +
                                psql + ";" + Environment.NewLine + Environment.NewLine +
                                msql + ";";
                        // Run the sproc 
                        var reader = cmd.ExecuteReader();
                        var contacts = ((IObjectContextAdapter)db)
                            .Translate<ContactInformation>(reader, "ContactInformations", MergeOption.NoTracking).ToList();
                        var phones = ((IObjectContextAdapter)db)
                            .Translate<ContactInformationPhonenumber>(reader, "ContactInformationPhonenumbers", MergeOption.NoTracking).ToLookup(c => c.ContactInformationID);
                        var mails = ((IObjectContextAdapter)db)
                            .Translate<ContactInformationMail>(reader, "ContactInformationMails", MergeOption.NoTracking).ToLookup(c => c.ContactInformationID);
                        foreach (var c in contacts)
                            c.ContactInformationMails = mails.Contains(c.ID) ? mails[c.ID].ToList() : new List<ContactInformationMail>();
                            c.ContactInformationPhonenumbers = phones.Contains(c.ID) ? phones[c.ID].ToList() : new List<ContactInformationPhonenumber>();



    The overview

    I did my small tests on two different machine. One is on my desktop where the sql server is located. (i7-3820, 64gb ram, 240gb intel SSD 520 series)

    The other one is on my ultrabook. ASUS Zenbook Prime UX31A which is connected over wlan and has about 1ms ping against the desktop.  

    The load all gives 692 contacts. Filtered by gender it's 388 and filtered by name it's 5 contacts.



    Results on wlan
    Method / iteration Load all Filtered by gender  Filtered by name
    Include / 1 287ms 68ms 42ms
    Include / 2 65ms 45ms 9ms
    Include / 3 59ms 36ms 8ms
    3 queries / 1 81ms 66ms 46ms
    3 queries / 2 27ms 24ms 10ms
    3 queries / 3 26ms 20ms 10ms
    MultiQuery / 1 75ms 23ms 25ms
    MultiQuery / 2 22ms 19ms 8ms
    MultiQuery / 3 27ms 15ms 9ms


    Results on localhost
    Method / iteration Load all Filtered by gender  Filtered by name
    Include / 1 223ms 49ms 20ms
    Include / 2 44ms 25ms 5ms
    Include / 3 37ms 22ms 4ms
    3 queries / 1 43ms 44ms 30ms
    3 queries / 2 6ms 5ms 6ms
    3 queries / 3 5ms 5ms 5ms
    MultiQuery / 1 54ms 9ms 12ms
    MultiQuery / 2 6ms 4ms 5ms
    MultiQuery / 3 9ms 4ms 5ms


    As you can see when the dataset is large and the query is cheap Include is much slower.

    Include will close in when the returned data decrease and the filter in the query get's more expensive. (The include does the query once and then joins while the other methods will filter on the same criteria for each query).

    Going deeper

    Let's look in Management Studio what the client statistics say about the queries.

    Trial 3 is the 3 separate queries approach and Trial 4 is with includes.

    As you can see we almost double the data returned when using include even though the actual number of rows is lower. 


    Hopefully this will make it clear what Include() actually does and how that can impact performance.

    One thing we should note is that the data set I used for the example is very far from a worst case scenario. There are 2 major factors that helps include on this dataset.

    1) In average each contact has about one email and one phonenumber which means that in average we will get 2 rows per contact which isn't so much duplication.

    2) Each ContactInformation only contains "small" data. Remember that the ContactInformation is duplicated for each phone and email.

    A situation where I expect it to be much worse is if we load a BlogPost with Include() on Tags. That means we will load the BlogPost for each text. Rember that the BlogPost probably will have long text which would be duplicated.

    I did some tests on data looking which lead to this tweet


    Also it's nice to see that when we have small datasets the Include is actually on pair with the other methods. This means if you only return a few rows and the root entity doesn't contain heavy data you can keep using include which is much faster to code.

    It's also interesting to see how little I gain by batching the calls into one call compared to 3 different queries. When we have connection pooling making a simple query is very cheap which is good to know because multiple simple queries might be much more performant that a complex one. 

    So how to load the data efficently when Include() is slow?

    I think the methods I used in my example is the best way. I you have any other ways I would be happy to hear about them. I'm, thinking about adding some utilitiy function to https://github.com/MikaelEliasson/EntityFramework.Utilities to make it easier to do the fixup but I'll have to see what I can come up with.

    A stored procedure with multiple result sets would be the fastest I expect but I haven't tried. It's only another variation on the same technique I used anyway so I don't expect much better performance but a small difference is likely.

    A few words about how to think about performance

    I'm a strong beliver in constantly measuring the performance. Even from the first day of development.  Some people will say premature optimization but I don't agree. I don't optimize everything but always measuring allows me to make an informed decision. From the second I first run a query I know how fast it is (in development though). 

    The example from above is actually from one of my projects done several years ago (I used include and haven't changed it yet). It's on an old version of EF and query time including parsing is about 250 ms in best case on the production server.

    Nowdays I know from experience that a simple query takes a couple of ms to run. Most likely 5-15ms on the datasizes we talk about here and the server in question. It's very easy to see I would easily benefit from doing it as 3 queries and end up about 50ms in worst case on that version of EF. 

    For me "premature optimization" is not about when in the development cycle you do it's but rather about if the choice is informed or not. Do you expect it to be a problem later on there is no better time to fix the code than when it's fresh in your memory.

  • 30okt

    Combining all files in folder with ps

    This is a simple way I found to combine all textfiles in a folder into a single file using powershell.

    In my case I had several exported sql files that I wanted to combine into a single file. First navigate to the current folder with powershell


    PS > $query = Get-ChildItem

    PS > $content = Get-Content $query

    PS > Add-Content combined.sql $content


  • 8maj

    Formatting xml with powershell

    Ever had to work with an xml file where the xml was minified to one line? Fear not, only two powershell commands are needed to fix it.

    Sometimes XML files are minified by removing whitespace which might be good for space but a real pain in the ass if you need to manually read it. Furthermore most texteditors are made to work efficiently with documents that has many rows but long rows are another story..

    Luckily it takes only a few seconds to fix it if you know your powershell.


    [xml]$xml = Get-Content resultlist.xml

    That's actually all you need. Happy formatting!

WSW33 - IUSR_689 -