Technology – Page 3 – Tom Fosdick

C#’s Parallel.Invoke is [Insert Hyperbole Here]

2015-06-08 by Tom Fosdick

Reading Time: 2 minutes

It’s simple – we can’t make processors go any faster. But we can add more cores.
The OS can schedule different processes to different cores but to take full advantage of the potential we need to be writing applications that run on multiple cores.
Unsurprisingly there’s a lot of investment in making this easier for programmers – to the point that it’s now easier than falling off a log…

Parallel.Invoke(
    () =>
    {
        if (!GetSomethingFromTheDb(out something))
            something = MakeUpSomethingFromSomewhere();
    },
    () =>
    {
        complexResult = SomeComplexOperation();
    },
    () =>
    {
        someObjectGraph = ReadAndParseSomeFile(fileName);
    });
SomeReallyComplexThing(something, complexResult, someObjectGraph);

The individual actions will be run in parallel, or to be more precise may be run in parallel. The task scheduler takes care of whether or not to actually run them in parallel and the degree of parallelism.

When we arrive at SomeReallyComplexThing all the previous tasks will have been done.

That’s ace. It’s a lot easier than messing about with threads.

Even before the Parallel library was around it wasn’t actually difficult, but you needed some lateral thinking…

Task[] tasks = new Task[] {
    Task.Factory.StartNew(()=>
    {
        if (!GetSomethingFromTheDb(out something))
            something = MakeUpSomethingFromSomewhere();
        }),
    Task.Factory.StartNew(()=>
    {
        complexResult = SomeComplexOperation();
    }),
    Task.Factory.StartNew(()=>
    {
        someObjectGraph = ReadAndParseSomeFile(fileName);
    })
};
Task.WaitAll(tasks);
SomeReallyComplexThing(something, complexResult, someObjectGraph);

Ok, I admit, I’m simplifying. You still need to understand the basic problems of concurrent programming but the mechanics of writing code that can take advantage of parallelism are now trivial.

Shell Attacks!

2015-05-142015-05-13 by Tom Fosdick

Reading Time: < 1 minute

Toad! — …because toads are lovely really

I’ve been having some networking issues recently so I was watching the router logs on my main gateway. I was genuinely amazed by the number of attempts to hack my ssh server, every few seconds I saw another line telling me that the firewall had rejected another attempted hack.

So I started pondering what I could do to try to stop this. Perhaps I could write a little script that works out who the appropriate admin is and emails them? Yes, but I think that might cause me rather a lot of trouble. I can’t exactly send the logs to my ISP or to the Police – simply attempting to connect to a ssh server isn’t exactly strong evidence of nefarious intent.

So instead I thought I’d just publish them – each day’s log is uploaded to the downloads section as a text file.
If I get bored and fancy writing some php then I’ll start stuffing them into a database so we can run some basic analysis on them – produce graphs and run some analysis.

The “Within” Pattern in C#

2015-03-26 by Tom Fosdick

Reading Time: 2 minutes

Here’s a little pattern that I use a lot. It happens quite often that I need to perform a set of tasks in C# that are different, but all need the same set up and tear down.

Now I could use two methods, e.g.

object someResource=PerformSetUp(someParameters);
try
{
    someTask(someResource);
    if(someCondition)
        reportSomeError();
    anotherTask();
    someResource.Conclude("It was aliens");
}
finally
{
    PerformTearDown(someResource);
}

That’s a totally reasonable way of doing it. There are a few disadvantages though, it splits up the allocation and release of resources which can lead to tracking problems. It’s also a pain if you’ve got several things that need to be set up and torn down. You either end up with a lot of local variables flying around all the time or you make an object to hold the resources you need.

So I tend to use a method to do both the set up and tear down and pass into it an Action.

So the above example would become;


WithinSetUpAndTearDown((someResource)=>
{
    someTask(someResource);
    if(someCondition)
        reportSomeError();
    anotherTask();
    someResource.Conclude("It was aliens");
});

WithinSetUpAndTearDown is defined thus;

void WithinSetUpAndTearDown(Action<object width="300" height="150"> action)
{
    Handle someHandle = new someHandle.Open();
    try
    {
        using(object someResource = someDisposableAllocation(someHandle))
        {
            action(semeResource);
        }
    }
    finally
    {
        someHandle.Close();
    }
}

It’s sort of a user-defined version of the using keyword.Of course it has other uses too, there’s one particular applications that all share a particularly complex set of error handling conditions. I’ve used this pattern there too.

There is one application where I use this pattern rather a lot. C#’s “lock” keyword is really handy if you’re operating in a multithreaded environment, its big drawback is that there’s no way of specifying a timeout. This is important for deadlock resolution – a deadlock always means there’s a bug somewhere but if your concurrent operations just sit there deadlocked it may cause some pretty serious problems that you may not be aware of for some time. So you want to know. Rather than use the “lock” keyword then you can use this…

public static void WithObjectLocked(object lockMe, int waitMsecs, Action action)
{
    if(null == lockMe)
        throw new ArgumentNullException("lockMe");
    if(0 >= waitMsecs)
        throw new ArgumentException("waitMsecs must be greater that zero");

    bool gotLock = Monitor.TryEnter(lockMe, waitMsecs);
    if (!gotLock)
        throw new TimeoutException("Timeout of [" + waitMsecs.ToString() + "] milliseconds exceeded");
    try
    {
        action();
    }
    finally
    {
        Monitor.Exit(lockMe);
    }
}

Its functionality is identical to that of “lock”, but it will throw a “TimeoutException” if the lock cannot be obtained within the specified time. This at least gives you a fighting chance of finding operations that are taking far too long or are deadlocked.

The Tube That Rocks the Remote Working World

2015-02-132015-02-13 by Tom Fosdick

Reading Time: 2 minutes

Moose and JRock of UK national (and international) radio station Team Rock Radio had a crazy idea. They were going to broadcast the breakfast show whilst on the Tube.
To be precise they were going to travel from their bunker in a secret location under an an anonymous office building in Central London to Westfield Shopping Centre in Stratford using the London Underground system.
They planned to travel during the songs and do the links live from the stations using the WiFi.

Today they’re doing it. In fact they’ve just left Liverpool Street Station. This might seem like just another silly radio stunt but actually there’a very serious point behind it for them, but moreover for a good proportion of our working population.

Until recently it’s been really difficult and expensive to arrange communications of the kind of quality that it would require to broadcast a live radio show.

So Moose and JRock wanted to prove – or perhaps really just test – the fact that this is no longer the case.

The programme isn’t over yet, but thus far it’s worked really rather well. There have been a few audio glitches, but nothing serious.

This proves to the Team Rock organisation that they can be much more responsive in covering events and festivals and indeed broadcast from anywhere in the world as long as they have a half reasonable Internet connection – which today isn’t a difficult thing to obtain.

The wider implications though are significant. I’ve been a remote worker for a little over 3 years so I know that technology has improved such that remote working (and indeed home working) is entirely possible for Software Developers like me. I also happen to work for the Computer Science Department at The University of Hull so they understand it too.

Not everyone happens to be a technologist or indeed work for a technical employer. What Moose and JRock have done is to demonstrate very publicly and very effectively to a much wider audience that the technology for remote working is now there.

I really hope this has the impact that it should.

PS: They’ve now added the show to their On-Demand section.

Using Rank In Case in SQL Server 2005, 2008, 2012, 2014 etc.

2021-08-112015-01-09 by Tom Fosdick

Reading Time: < 1 minute

Here’s a quick tip for a Friday.

I have a table that records the history of certain items. I wanted to get a list of the updates between two dates – that’s easy enough. The problem was that, for each item, I also wanted the last change before the start of the time period.

A standard way to get the latest update for each item in a history table is use a nested pair of queries, the inner one using the RANK function to provide an order. The outer query then selects all the records with a RANK of 1 as these are the latest (or the earliest, or maximum, or minimum, depending on the order specified).

e.g.

select * from (
    select ml.*,rank() over(partition by itemID order by [timeColumn] desc) rk
    from someHistoricTable ml ) T1
    where rk=1

I started wondering if I could make the inner query return the ranking for only part of its result set.

What I wanted was a resultset that gave me all the records after a given date and a ranked list of those prior to it. I wondered if I could us the RANK function within a CASE statement.
Yes, apparently you can…

select * from (
        select ml.*,
        case
            when [time] < @startTime
                then rank() over(partition by [itemID] order by [timeColumn] desc)
            else 1
            end rk
        from someHistoricTable ml
        where [timeColumn] < @stopTime ) T1
    where rk=1 order by [time] desc

The case / rank bit splits the time, it ranks records that were earlier than the @startTime whereas if they’re later it just returns ‘1’. The outer select then just takes all rows that have an ‘rk’ of 1, these being the records that are either during the time period or the highest ranked row before.

Coding Exercise Silliness

2015-01-052014-12-18 by Tom Fosdick

Reading Time: 4 minutesIt happens quite often that we assign a simple programming task for one reason or another. One thing we find is that our victims often look for the “right” answer when in reality there isn’t one. Sure some answers are better than others but even that can be subjective.

So whilst a group were busy scribbling away I thought I’d demonstrate the point by doing the same problem… in as many different ways as I could think of.

The Task

The task we set them was;

Given the string “We Wish You A Merry Xmas”, output a list of the unique letters.

So I put the string into a string variable called “leString” and got to work. My favourite implementation is this;

Console.WriteLine(String.Concat(leString.ToLower()
    .Where(x => char.IsLetter(x)).Distinct()));

If you understand a little about LINQ, particularly the extension method syntax, then this is about as clear and concise as you can get. It’s possibly not the fastest way of doing it, but speed of execution is not the only concern.

I then tried to guess how I thought a junior programmer would approach it. The result is this.

HashSet<char> alreadyThere = new HashSet<char>();
foreach (char c in leString)
{
    char lc = char.ToLower(c);
    if (char.IsLetter(c) && !alreadyThere.Contains(lc))
    {
        alreadyThere.Add(lc);
        Console.Write(lc);
    }
}
Console.WriteLine();

This is perfectly valid and indeed relatively quick. I don’t think it’s quite as obvious what it’s doing as using .Distinct(), but if you’re not a LINQ Ninja then it’s OK.

I was right here in that it’s this kind of approach that most of them went for, although we saw arrays, lists and dictionaries used to store the list of chars and nobody realised that you could output the chars as you went so the real results looked more like this;

List<char> alreadyThere = new List<char>();
foreach (char c in leString)
{
    char lc = char.ToLower(c);
    if (char.IsLetter(c) && !alreadyThere.Contains(lc))
        alreadyThere.Add(lc);
}
foreach (char c in alreadyThere)
    Console.Write(c);
Console.WriteLine();

There’s a fundamentally different approach that we can take here. Essentially we want to remove duplicates. If we sort the string then all the duplicates will be next to each other. So all we have to do is iterate through the sorted string and avoid outputting the same char twice.
This is pretty easy to implement.

char lastchar = (char)0;
foreach (char c in leString.ToLower()
    .Where(x => char.IsLetter(x)).OrderBy(n => n))
{
    if (lastchar != c)
        Console.Write(c);
    lastchar = c;
}
Console.WriteLine();

It’s a bit of a nasty mix of LINQ and more traditional iteration but I still think it’s pretty clear. We take the string, eliminate anything that isn’t a letter and then sort the result.
We then iterate over the result. If this iteration’s char is different from the last iteration’s char then we output it. If it’s the same, we don’t.

You can also implement this using a for loop, but you need to make the sorted result into an array or list so that you can get a count. This is all rather convoluted so the foreach is preferable in my opinion.

You can implement it in LINQ too. The problem is how to assign “lastChar” after it’s tested. I’ve used a nasty little hack here – in C# an assignment actually has a return type that is the value of the assignment. So lastchar = p actually returns the value of p as well as assigning it to lastchar – hence the final .Select.

char lastchar = (char)0;
Console.WriteLine(String.Concat(leString.ToLower()
    .OrderBy(n => n)
    .Where(x => char.IsLetter(x) && lastchar != x)
    .Select(p => lastchar = p)));

This isn’t good code, it’s not really clear what it’s doing.

Here’s another slightly leftfield LINQ based approach. In C# the ++ actually returns a value, if it’s used postfix (i.e. i++) the value returned is what i was before it was incremented. If it’s used prefix (i.e. ++i) then it’s the value after it’s incremented.
We can use this little gem to index into a counter array…

int[] hits = new int[26];
Console.WriteLine(String.Concat(leString.ToLower()
    .Where(x => char.IsLetter(x) && 0 == hits[(byte)x - (byte)'a']++)));

This assumes that the string is 7 bit ASCII. It goes horribly wrong if it’s unicode. This should be a pretty quick implementation though.

Here’s another 1 liner (if you ignore the declaration of a HashSet). This works in a similar way to the previous example but is unicode safe. A HashSet is a collection of unique values, duplicates are not allowed. The HashSet.Add() method returns false if the entity you’re trying to add is already in the collection. So we can use that in the same way we used the array and the ++ operator above…

HashSet<char> alreadyThere = new HashSet<char>();
Console.WriteLine(String.Concat(leString.ToLower()
    .Where(x => char.IsLetter(x) && alreadyThere.Add(x))));

If you’re a fully paid up, card carrying lunatic you can of course do it parallel…

ConcurrentDictionary<char, int> alreadyThereDict = new ConcurrentDictionary<char, int>();
Console.WriteLine(String.Concat(leString.AsParallel()
    .Select(c=>char.ToLower(c))
    .Where(x => char.IsLetter(x) && alreadyThereDict.TryAdd(x,1))));

Rather irritatingly there’s no such thing as a ConcurrentHashSet so we have to use ConcurrentDictionary but the approach is identical.

Another different approach is this.

Console.WriteLine(String.Concat(leString.ToLower()
    .Intersect("abcdefghijklmnopqrstuvwxyz")));

The result of intersecting a string with a set that contains all possible characters must be a set containing the characters that are in the string. Clearly this isn’t unicode safe either, although it’s better than the array indexing example because this won’t blow up, it just won’t work with unicode.
The advantage of this approach however is that it very clearly defines what the dictionary of permissible character is. So if you wanted to define a slightly different dictionary you could – say consonants for instance.

Finally we return to the original method, but we do it slightly differently. Most of the functionality we think of as LINQ is provided through extension methods defined in the “Enumerable” class.
You don’t need to call an extension method as an extension method at all, you can in fact call it as a static method of its defining class. Thus we can perform the entire operation in a functional manner.

Console.WriteLine(
  String.Concat(
    Enumerable.Distinct(
      Enumerable.Where(
        Enumerable.Select(leString, char.ToLower), 
      char.IsLetter))));

If you’re not familiar with this style, basically you read it from the middle out. So we take “leString” and call Enumerable.Select on it, passing char.ToLower as the predicate for selection. The result of this operation is then used as the first parameter to the call to Enumerable.Where. The other parameter to the Where call is the predicate char.IsLetter. The result of this operation is then used as the first parameter to a call to Enumerable.Distinct and the result of that goes to String.Concat.
As with many things programming it’s horses for courses and this is not the right horse. It’s useful however to know that it can be done like this.

Over to you…

Naturally this isn’t an exhaustive list – it’s just the ones I happened to think of, if you can think of another way let me know!

That Sinking Feeling…

2014-11-202014-11-17 by Tom Fosdick

Reading Time: 7 minutes“WE MUST HAVE THIS FIXED ASAP!” yells the customer’s head of ICT. The application stopped working some time last week and your support department haven’t managed to fix it yet.

“OK, OK,” you say, knowing that you will come to regret these words, “I’ll VPN in first thing tomorrow and sort it out. Will you send me the connection details.”

“I’LL SEND THE CONNECTION DETAILS IMMEDIATELY I GET OFF THE PHONE AND YOU CAN CALL WHEN YOU WANT TO CONNECT”, replies their head of ICT, because heads of ICT ALWAYS SPEAK IN CAPITALS.

08:00 the next day, no email. Send an email to the customer’s head of ICT reminding them that you need the VPN details.
No response. OK, fair enough, a lot of people don’t get in ’til 9am.

09:30 still no response. Email a bunch of other guys at the customer’s site asking if anyone knows the VPN details and is actually authorized to give them to you. Get a response saying that only the head of ICT is and he’s in a meeting ’til 10:30.

Wait ’til 10:45, no response. Try to find a phone number for the customer’s head of ICT. Apparently he doesn’t have a phone. Email him again. 11:15 get a rather terse email containing the connection details and the username and password in plain text.

Try them, they don’t work. No phone number in provided in the email. Try to find a phone number of anyone in the organisation who might actually be able to help. Find an old mobile number that now belongs to a photocopier salesperson who can’t really talk because she’s on the motorway, but she could do you a really good deal on a pre-owned Canon IRC3380i if you call back later.

Google the customer to try to find a switchboard number. End up in a multilevel call menu system where the only human option is to speak to a “customer relationship manger” about your account. Try that on the offchance. He thinks you’re from their ICT department and has no idea what you’re talking about.

11:30 Remember that you may have stored some old details for the customer’s other site, log in to the password vault. Find some that are 4 years old. Try them anyway, are genuinely surprised when they work. Make a mental note to inform the customer that they seriously need to do a security review.

Connect to the server the head of ICT said was the right one. Credentials don’t work. Neither do the stored ones. Email the head of ICT and everyone else you can think of who might possibly know how to log on to the server. Get no response.

12:00 Remember somewhere in the back of your mind that you once configured SQL Server authentication for one of their systems and there’s just a chance that they re-used the username and password from a domain account.

Log into the secure store and search the archived configs. Find a likely candidate and try the connectionstring details.

12:15 Scratch the head of ICT’s name into the side of a giant security rocket and direct it at his arse when said credentials actually work.

Spend half an hour setting up the diagnostics. Scratch head as to why nothing’s happening.

13:00 deduce that it’s the wrong server. Email the head of ICT and go for lunch.

13:30 return to an email from the head of ICT who is now working from home and doesn’t have the details to hand but has emailed an unnamed member of his team asking them to send them on.

Log back in to the wrong server. Expand the network section and have a look at the machines listed there to see if any of them look likely. No, but it does spawn an idea. Look back at the connectionstring from earlier and note that the Data Source is a name that ends -SQL try connecting to the same name with -GW on the end.
No dice. Try various combinations.

13:47 Eventually try -SVR4 and it works. Try the credentials from the connectionstring, they don’t work. Try the original details the head of ICT sent in the email, they work, but Remote Desktop is not enabled for that user. Try VNC with the same credentials, just in case. Nope.

Email the one person who responded earlier, carefully include a request for a phone number.

14:15 Get a response that contains no phone number but which politely explains that although the person can’t make the change Jack will when he gets back from lunch at 14:30.

14:45 Try logging in again. Still not enabled. Take 6 guesses at what Jack’s email address might be and email them all.

14:53 Field a call from the Customer’s Operations Director who is extremely angry that the system is still not working and your bungling incompetence at not being able to sort out your own software. Explain that there are some technical problems with the VPN connection that you’re trying to resolve right now. Try to persuade him that half hourly updates are not going to help anything and in fact are only going to slow things down.

15:07 Get an email from Jack confirming that he’s enabled the user for Remote Desktop. Log in to the server. Go to the application directory under Program Files and check the version numbers. Not only are they wrong they’re actually a mixture of the past 3 releases. Take a backup and install the latest versions.

Try to start the service. No dice. Scratch head. Just double-check the service executable location. Find it’s actually in “C:\Temp\Barry’s Memory Stick\From Old Server”. Start to get rather concerned that the customer’s assurances that “No we haven’t changed anything” omitted one small detail – the fact that they installed the app on a totally different server.

Look in the event log at the error that’s actually being reported – it transpires that none of the well documented pre-requisites for installing your application have actually been installed. Attempt to download them only to find that the server can’t connect to the Internet.

Start downloading them locally instead and pushing them one-by-one to the remote server.

15:35 Field another call from the Customer’s Operations Director who is extremely angry that he hasn’t got the half-hourly update that you talked him out of and that the system still isn’t working. Explain to him that escalating this won’t help because you are the most senior person that could possibly work on this and it’s your number 1 priority, the only thing it will do is create more admin load for everyone including him. Pretend to try transferring the call to your own CTO’s phone, pretend she’s not answering.

15:55 Ring your own CTO to let her know the situation and to expect an angry call imminently.

16:03 Final prerequisite installed; the service now starts, but it can’t connect to the database. Check the connection string. Note that it’s trying to connect to your own testing server – this is clearly the default config that someone has carelessly copied over the site config. Look at the other copy of the software and find a comment in the config that say’s it’s from Barry’s test system. There appear to be no backups of any version of the config.
Search the entire filesystem for anything that might be a past version of a valid config whilst taking a few random guesses at what the SQL Server might be called.
Email Jack asking if he knows what the configuration should be or at least where the old server is.

16:19 Answer a call from your own CTO who would just like to confirm that you really are doing everything you can. She tells you not to worry about “that email”.

16:21 Receive email from your CTO to their Operations Director into which you’re Bcc’d assuring him that his problem is “our top priority” and that disciplinary proceedings have been started against the employee he spoke to earlier for his “bad attitude” and that the situation is now being dealt with by “someone more senior”.

16:27 Get a call from a random junior techie at the customer’s site who has no idea what you’re talking about but has been told to sort it out because the Operations Director is very angry and wants something done. Note the phone number he’s called from. He promises to find the information you need, ensure you get his name.

16:47 No emails or phone calls, so call the number back. The person answering the phone is not the person you spoke to earlier and doesn’t know who he is, but thinks he may be “that new guy from ICT who was here earlier”. Try to get anyone’s actual phone number out of the person you’re speaking to who says he’d gladly give them out if he actually knew any of them. He transfers you to the Helpdesk, however. The guy you spoke to isn’t in the office and it’s a Thursday so Jack has gone to pick his kids up, but Barry’s about if you wanted to talk to him.

16:52 Barry sheepishly explains that it is actually the same server but that it’s got an entirely new RAID array because nobody noticed when the first disk in the previous array died, but they noticed really rather a lot when the second one did and they were hoping that they could get it fixed before anyone – especially the head of ICT – noticed. They did find a backup, but after the last security scare the Operations Director hired an external security consultant to come in and do an audit and he said that this server should be firewalled from the main network and nobody realised that this would mean that the backups stopped working and nobody noticed that either. So the backup was from just after the OS was installed and they tried to work out how the app should be configured.

Barry is fairly sure which server the database is on, but he doesn’t know which actual database because they’re all looked after by Dave who’s the DBA and he’s on long term sick leave with a major case of stress.

16:57 Talk Barry through SQL Server Management studio and find a database that looks like a good candidate and has recent entries.

16:59 Log on to the remote server and enter the connection string into the config. Start the Services snap-in and try to click the start button. No reaction. A few seconds later Remote Desktop says the connection is lost and it’s trying to reconnect. Realise that it’s exactly 17:00 and vaguely remember something about “office hours only” in the support agreement. Check the VPN, it’s down and wont reconnect.

17:03 Email your CTO with a progress update, casually scan Jobserve.

17:10 attempt to shut down laptop to find 12 Windows Updates waiting.

17:13 ignore the incoming call from their Operations Director, leave the laptop where it is and go to the pub.

At The End of the Day

Fortunately this wasn’t a real day, rather it’s an amalgamation of things that have genuinely happened to me either trying to work on a remote site or in some cases when I’ve actually been there.

My favourite example of a customer having “not changed anything” is in here but it’s somewhat watered down from the reality. They’d had a security audit and much like the story had added a firewall between two parts of their network. They’d also deleted a bunch of users from the database where they didn’t know what those users were for.

What they’d done was to revoke the server’s access to its database and put in place a firewall that not only prevented the clients from accessing the server but stopped the server accessing the external web service that it needed.

Apparently they definitely hadn’t changed anything and the application suddenly stopped working so therefore it couldn’t have been their fault.

ISAM Insanity

2014-11-042014-11-04 by Tom Fosdick

Reading Time: 3 minutesSQL Servers are great general purpose tools. They’re popular because they perform OK at just about everything you could want to do with data.

If you want to do something specialised however then there are almost always better techniques. Recently I needed to get medium sized sets of records identified by key, dispersed at random over an essentially read-only data set.

MS SQL Server is OK at this, but there are a bunch of products out there that are far better at this task. At a basic level though it’s not actually a difficult thing to do. It’s just a key-value lookup after all. So I wondered if in a few lines of code I could out-perform SQL Server for this one specific task.

Essentially all I did was to dig up the ISAM file. All that really means is that the main data is serialized to a file. All we store in memory is the ID (that we search for) and it’s location within the file. I then implemented a balanced b-tree to store the relationship between the ID and the file location. You could use a Dictionary<Key,Value> for the latter, it would be a bit slower but it would work. You really don’t need to write a lot of code to implement this.

So in ISAM vs. SQL Server I have on my side;

A simple, really fast search algorithm
No need for IPC.
No need to parse the query
No need to marshal writes, manage locks or indeed worry about concurrency at all

SQL Server has on its side;

Huge expertise and experience in optimising queries and indexes
Super-advanced caching meaning it doesn’t face the same I/O problems

Before the days of the SSD I’d say attempting this was pretty insane. The main problem with the performance of a mechanical disk is the head seek time. You can get data to and from the disk pretty fast provided that the head is in the right place. Moving the head however is massively slow in comparison. So if you have a large file and you need to move the head several times then you’re going to lose a lot of time against a system that has an effective cache strategy.
Half the battle with an ISAM file was to get the records that were going to be read together physically close to each other in the file so that the heads didn’t move much and the likelihood of a disk cache hit was much higher. It’s this kind of management where complex RDBMSs will always have the upper hand.

However SSDs have no head seek time. It doesn’t matter if you need to read from many locations. This rather levels the playing field.
Interestingly I did notice a small speed improvement when I implemented a parallel ID lookup and sorted the results so the FileStream read in sequence from first in the file to last. I’m guessing that this is down to the read-ahead cache (which is still faster than the real storage media).

So did I make it? Can a simple technique from yesteryear embarrass today’s latest and greatest? I wrote a little console app to test…

Not bad, SQL Server is 15% faster which is a pretty reasonable result. After all I couldn’t really expect to beat SQL Server with just a few lines of .NET, right?
Yeah, I’m not beaten that easily. So I cracked open Visual Studio’s Performance Explorer and found an obvious bottleneck. It turns out that .NET’s BinaryFormatter is dog slow. I tossed it out and replaced it with ProtoBuf-Net.

Oh yes, that’s much better. This is pretty consistent too. I was using SQL Server localdb which actually seems to be a reasonable analogue. It might be interesting to see it against an enterprise level installation. I suspect that the additional network latency will compensate for the additional performance at it will perform about the same.

So in other words, yes if you have a very specific task and you know what you’re doing then you can outperform the big boys.

So I suppose I should publish the source – but right now it’s horrible. Its only purpose was to prove that I could do it. I’m intending to tidy it up and post it in a different article.

Oracle: Stop Trying To Trick Me With Ask Toolbar

2014-10-15 by Tom Fosdick

Reading Time: 2 minutes

I’m calling Oracle out on this.

I spent the first 5 years of my career working with Oracle and I used to like them. Now though the only regular interaction I have with them is unticking the “would you like to install the Ask toolbar?” box every time there’s a Java update.

It does not give a good impression. Installing software isn’t something that’s particularly well understood by the man on the Clapham Omnibus. There is a certain fear that if you don’t do what you’re told then the software might not work properly. So most people just accept the defaults on the basis that this should work.
There is also the factor that software updates often ping onto your screen when you’re in the middle of something else. So you want to get it out of the way quickly. Again this tends to make people just click the defaults.

If you do that here you end up installing a piece of extra software that has nothing really to do with Java other than it being owned by Oracle. What’s more it’s something you almost certainly don’t want.

I have no problem with vendors advertising their other products in an install sequence and I don’t have any problem actually with them offering to install it.

There are two things that I do consider bad practice.

Failing to clearly distinguish that you would be installing something other than (or as well as) the product you initially intended.
Installing the software by default. The default should be not to.

Oracle isn’t particularly bad on the first one, one could say that the dialog looks like you’re accepting the licence agreement for Java but actually it’s reasonably clear. Oracle are a little marginal here for me. There are far, far worse offenders out there.

The second however is a straight red card. No way should an installer by default install something that the user didn’t ask for.

Please tidy your act up, Oracle.

2008 Called: It Wants Its Time Back

2014-10-02 by Tom Fosdick

Reading Time: 3 minutes

To Finish First, First You Have To Finish

I’ve just been bitten. It’s one of the standard strap-lines of software development teaching – do it properly first time, don’t hack it and think you can fix it later. When the deadline is looming however sometimes we have to make a business decision;

Do we drop the feature (or some other feature)?
Do we miss the deadline?
Do we hack it in the full knowledge that we may be bitten later?

Fortunately for us software developers, technical debt is beginning to be understood by business managers. I explain it to non-techies like this;

I could hack it and get the feature in, but if I do that might be storing up problems for the future. Next time we want to add a feature in that area or even fix a bug we might have to unpick the hack and implement it properly before we even start on new work.

Hacking it now is borrowing time from the future – at some point we’ll have to pay it back and we’ll have to pay a wedge of interest too.

A few years ago I spotted a design problem; we’d assumed that something was always related to a location. In some circumstances however it could actually be related to a vehicle.

We should have changed the database to account for this, but that meant changing pretty much every component of an n-tier system and the deadline was busy whizzing past our ears.

There’s no way we could drop the feature so I had to find a solution. What I noticed was that I could cast a vehicle ID safely into a location ID without any possibility of coincidence. We could then detect in the DAL if it was actually a vehicle and generate a fake location record that actually contained all the information we needed.

A few years later and now we’re noticing that customers are putting a whole load more records in the location table than we’d initially thought and the indexing performance of the ID type is poor. So we want to change the type to something that indexes better. None of the options available allow us to cast a vehicle ID to a location ID any more.

Since those early days the problem has been exacerbated by the fact that more and more code has been piled in on the assumption that you can safely cast a vehicle ID to a location ID.

So we’ve now got a considerable amount of code to unpick, we’re going to have to do the redesign and reimplementation work that we (perhaps) should have done in the first place, then we can start looking at improving the index performance.

In situations like this it’s easy to look back and curse the past decision. Go into any software development office and you’ll almost certainly find a developer (or several) swearing at the screen, decrying previous decisions as “stupid”, “short-sighted”, etc.

I know I’ve made mistakes, I know there are situations where I’ve chosen to incur technical debt where there were better alternatives available. On this occasion though I made the right decision – we have time to repay this debt now. Back then we most definitely did not.

So “never hack it” isn’t a rule, it’s a guideline. But you have to be aware of the business consequences if you choose not to follow it.