Building a search page, Part 2: The SQL

In Part 1, we established how our search was going to perform and described a sample domain object with some properties we will use in our search criteria. Here, we'll talk about the SQL that we'll be using to generate our search results.

To that end, there is code attached with the first draft of our search engine. It is functional in that it will allow you to specify some search criteria and display the results. It is pretty in that it is not. I'm the Salieri of UI; I can recognize when it's done right but am average (at best) at building them. For this example, I chose not to try.

I haven't addressed any of the client-side improvements mentioned in part one. But the topic of this article will be at the other end, in the database. The search results will be expanded upon in a future article. For now, we're interesting only in getting results.

Final cop-out: I don't usually use SqlDataSource in my web apps. The exception is for samples and demos which, frankly, is the only reason I can think of to use them. Having said that, they rock for samples and demos.

Until relatively recently, I was under the impression that search queries had to be done using hard-coded SQL. There are two main reasons for this.

First is the multi-select listboxes and checkbox lists. With these, a user can select one or more items on which to filter. For example, let's say the user selects four different project types. That means, of course, he or she wants all Moos that have any of those project types attached.

In SQL, this translates to:

      SELECT      *
      FROM        Moo
      WHERE ProjectType IN ( 1, 2, 3, 4 )

It's the IN criteria that presents a problem. We can't pass in a comma-delimited list to replace ( 1, 2, 3, 4 ). SQL Server simply doesn't allow this. It won't throw an error if you do this but trust me on this, it just doesn't work.

The second issue has to do with the part of the search screen where you specify how to search for the name. I.E. The radio button list where you specify "Starts with", "Ends with", etc.

In this case, there's no getting around the fact that we're going to need some conditional logic. And in SQL, that means either an IF statement or a CASE statement.

But each of these would need to occur in the WHERE clause. For example:

          SELECT    *
          FROM      Moo
          WHERE
                   IF @searchMethod = 'startswith'
                       Moo.Name LIKE @name + '%'
                   ELSE IF @searchMethod = 'endswith'
                       Moo.Name LIKE '%' + @name
                   ELSE IF @searchMethod = 'isexactly'
                       Moo.Name = @name
                   ELSE IF @searchMethod = 'notspecified'
                       Moo.Name LIKE '%'

And so on and so forth. But this SQL is, of course, just plain wrong. I don't mean wrong in the philosophical sense. I mean in the "it won't compile" sense. The same is true for putting a CASE statement in the WHERE clause in a similar fashion.

Alternatively, you could use the IF statement as follows:

          IF @searchMethod = 'startswith'
                SELECT    *
                FROM      Moo
                WHERE     Moo.Name LIKE @name + '%'
          ELSE IF @searchMethod = 'endswith'
                SELECT    *
                FROM      Moo
                WHERE     Moo.Name LIKE '%' + @name

And so on and so forth. Granted this will work in this case but that's because of the lack of imagination I put into the example. There are other cases where a conditional in the WHERE clause is unavoidable. Besides, the actual SQL being executed in the various cases is a lot more complicated than in this example and having that copied and pasted throughout the stored procedure is nigh unmaintainable.

My solution to this problem was initially to build me a SQL statement in .NET code and execute it. This is usually a pretty reasonable approach. Easy to implement and easy to follow, even if the SQL that you generate isn't. A purist will tell you that it should be in a stored procedure because of the performance and the security concerns and the fact that the DBA just read in a book somewhere that hard-coded SQL was bad.

But in many situations, the performance hit won't be noticeable and depending on your requirements, the security aspect may be moot. The application that spawned this series of articles is an application for my brothers' and dad's company to search for land surveying jobs they've done in the past. Total number of users: six. Security concerns: none. They've been running the current version of the app on hard-coded SQL for over five years now and the only complaints I hear are on lack of features. But what can I say? It's an ASP app and my psychiatrist has forbidden me from looking at it ever again.

But I'm older and wiser now and always on the lookout for ways of making my code not look like it was done by someone just out of high school. So I needed a way of addressing these two issues.

The problem with the delimiited list, it turns out, is pretty easy to solve. I'll describe the thought process I'd like to think I would have gone through had I solved it myself rather than just searching Google Groups.

Like most things in SQL, there are many ways of approaching a problem. Do you use INNER JOIN or a filter in a WHERE clause? Temporary table or table variable? Return a rowset of one column and row or a scalar?

When we examine the problem of filtering on a delimited list of values, we should consider that maybe there's another way of filtering rather than using IN or even WHERE clauses. Namely, using INNER JOINS. What if we had a table of ClientIDs that contained only the ones we wanted to filter on? Then we could do an INNER JOIN on that table, automatically filtering out the ones we don't want.

We know we can declare table variables so our problem reduces to: How can we convert a delimited list of values into a table? Again, judicious use of Google keywords leads us to a SQL function that does just that.

In the database, you'll notice a function called UTILfn_split, which I found here. The function is easy enough to understand from a black box perspective: you pass in a delimited list and a delimiter and you get back a table.

In three places, you'll see code that looks like the following:

      -- Get a list of ClientIDs
      IF @clientIDList IS NOT NULL AND @clientIDList <> ''
      BEGIN
            INSERT INTO @clientIDTable
            SELECT            [Value]
            FROM       dbo.UTILfn_split( @clientIDList, ',' )
      END
      ELSE
      BEGIN
            INSERT INTO @clientIDTable
            SELECT            ClientID
            FROM       Clients
      END

In words: If a list of client IDs was passed in, populate my client ID table with the values in the list. Otherwise, populate it with every client ID in the system.

The reason for the IF statement, of course, is that an @clientIDList indicates that we don't actually want to filter on clientID. But we still need to populate the table with something because otherwise, we'd have some conditional logic to determine if we want to do an INNER JOIN, which as far as I know, isn't possible.

Now that we have a table variable with all the client IDs from our filter, the rest of the SQL is easy:

      SELECT      *
      FROM        MOO m
      INNER JOIN @clientIDTable ct
      ON          m.ClientID = ct.ClientID

That is: Get me all the values of Moo that have a matching Client ID in our @clientIDTable variable.

The second issue isn't quite as clean but still lends itself to stored procedures. Earlier, I mentioned that we would need some sort of conditional construct in the WHERE clause to determine how to filter on the Moo Name. While this is true, I kind of lied when I said we were limited to IF and CASE.

This is probably easiest to discuss after looking at the WHERE clause:

      WHERE
       (
                 ( @nameFilter = 'startswith' AND m.Name LIKE @name + '%' )
       OR        ( @nameFilter = 'endswith' AND m.Name LIKE '%' + @name )
       OR        ( @nameFilter = 'exactly' AND m.Name = @name )
       OR        ( @nameFilter = 'contains' AND m.Name LIKE '%' + @name + '%' )
       OR        ( @nameFilter NOT IN ( 'startswith', 'endswith', 'exactly', 'contains' ) )
       )

So what kind of magic is this? This is easier to understand if you look at the individual pieces of the clause (i.e. the ones separated by OR). The first segment says:

( @nameFilter = 'startswith' AND m.Name LIKE @name + '%' )

This is a little easier to follow. Return all results where Moo.Name starts with the specified value but ONLY if @nameFilter = 'startswith'. If @nameFilter doesn't equal 'startswith' this, of course, will fail and we move on to the next OR to see if that succeeds:

( @nameFilter = 'endswith' AND m.Name LIKE '%' + @name )

This is a very similar condition. Return all results where Moo.Name ends with the specified value but ONLY if @nameFilter = 'endswith'.

Looking at the two conditions together, if @nameFilter = 'startswith', the first condition will be met and we'll filter on @name + '%'. If @nameFilter = 'endswith', the first condition will not be met but the second one will and we'll filter on '%' + @name. If neither condition is met, we move on to the rest of the statement.

The trick here is to ensure that all possible cases are covered when you compare against @nameFilter. That's why the last condition is:

( @nameFilter NOT IN ( 'startswith', 'endswith', 'exactly', 'contains' ) )

This is a catchall. By including this, it means that if @nameFilter doesn't equal one of the previously handled values, don't filter on Moo.Name at all. Which is why there isn't a filter condition after it as in the other ones.

Using these two techniques, along with the usual SQL suspects, we are able to get a pretty robust search mechanism in our stored procedure. If we wanted to, we could also combine this procedure with Microsoft Index Services. This would allow us to not only search metadata about a group of documents, but also search the contents of the documents themselves. A cost-effective way of doing document searches for cheap-ass family members who don't want to buy Windows 2003 for Sharepoint Services.

I should mention that there is now another way of addressing this problem, thanks to SQL Server 2005. That is, you can use SQL Server's CLR integration to run managed code directly from the database.

I'm making it sound like I know more about this than I do but I haven't actually done this in any way, shape, or form. From what I've heard, though, it is probably a very viable alternative, provided you are targeting SQL Server 2005 in your application. At the very least, you could probably replace the UTILfn_split function with managed code as it probably runs much cleaner and faster in the CLR. And I'm guessing you could make use of an enum for the @nameFilter parameter which would be a little less unstable. I'm sure I'll blather on this when I start experimenting, though.

Finally, credit to http://jargon.watson-net.com/ for the web service I used to generate the data. At present, the database has only fifty entries in the Moo table but if you navigate to SearchPontification/GenerateData.aspx, you can generate more. Note that since this is a web service, it will take a bit to generate each word so be prepared to wait if you put in more than, say, 100.