www.codinghillbilly.com   kyle.baley.org  Subscribe / Contact
 
 
 
 
LATEST POSTS
Monday, October 01, 2007

One of my contracts involves what is essentially a document management system where users can search for legal documents based on metadata attributes as well as their contents. I've been working on it since before the days of SharePoint and even before Content Management Server was somewhat affordable. I mention this to cut off any comments that might say, "Why don't you just use a document management system?" It's in .NET and could benefit from migration to another platform. I'm not going to do it for reasons not worth mentioning. ACCEPT IT!

There are two things worth posting about. Neither are new techniques by any stretch. The first is how to search the content and tie that into the metadata search. The second is how to restrict access to the documents (all in Word format) which are not restricted by IIS by default.

Searching

In order to search the contents of the repository, I'm using a technique that's been around for many moons. It starts with Microsoft Indexing Services (right-click My Computer, select Manage..., and it's under Services). I created a catalog and added the directory containing my documents. Easy enough. Now I have a catalog you can query from code using the Indexing Service query provider.

I'd give you sample code but I don't have any; I don't query the catalog from code. It's no good to me on its own. I need to merge the results in with the metadata search in SQL Server. And rather than get one set of results from SQL Server and another from Indexing Services and merging them, I'm letting SQL Server do everything.

To do this, I added a linked server to SQL Server:

EXEC sp_addlinkedserver MyDocs, 'Index Server', 'MSIDXS', '<Indexing Service Catalog Name>'

The first parameter can be whatever name you want. The next two are fixed and the last is the name of the Indexing Service catalog.

From here, you can query the results directly from SQL Server and join them with any other query you want. For example:

SELECT      MetadataField1, MetadataField2, DocumentFileName
FROM        doc_metadata dl
INNER JOIN  OpenQuery( MyDocs, 'SELECT Filename FROM SCOPE( ) WHERE CONTAINS( Contents, ''<search term>'' ) ') q
ON          dl.DocumentFileName = q.Filename
WHERE       <filter conditions>

And just like that, that squirrel's done, as my pappy used to say. The Indexing Service includes a bunch of properties you can return if you like, but for my poor man's document management system, all I need it for is to filter the list of search results.

NOTE: Out of the box, the Indexing Service will index all forms of Office documents. For anything else, you'll need to find an appropriate filter that plugs into the service. Adobe makes one for PDFs but that's the extent of my knowledge.

Restricting access to Word documents

The application uses Forms Authentication to ensure you have appropriate access to the system. There are three roles: admin, subscriber, and guest (and an implied fourth role: ya can't git in). With the default forms authentication setup, anytime you navigate to an .aspx page, it will automatically redirect to the login page. This doesn't hold true for "unmapped" file extensions, such as image files or documents.

By "mapped", I mean that IIS recognizes the extension and maps it to an ISAPI filter. Whenever IIS receives a request for a file with an extension it knows about, it filters the request through the corresponding ISAPI filter, which can then perform whatever funkiness it needs to before and/or after the request.

Without getting too technical about the ASP.NET ISAPI filter (mostly 'cause I don't know too much about its inner workings including whether it actually *is* an ISAPI filter vs. an ISAPI application), I wanted any requests to Word documents to redirect to the login page if the user hadn't logged in. The same as any other page. So in IIS, I went to the application properties and added a mapping for .doc files (from the Virtual Directory tab, click Configuration..., then Mappings) to the ASP.NET executable (usually C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll or something akin).

With that in place, users can no longer navigate directly to Word documents. Such requests will be re-directed to the login page. But there is one last requirement: guests aren't allowed to view documents. Easy enough: add a separate web.config to the documents folder restricting access only to subscribers and admins:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
     <system.web>
          <authorization>
               <allow roles="subscriber, admin" />
               <deny users="*" />
          </authorization>
     </system.web>
</configuration>

Note: You can also accomplish this with a <location> element in your application's main web.config file. I picked this method because I have a feeling I'll be moving the documents out to a separate virtual directory altogether very soon.

And that's that. I shan't go into the details of setting up your roles but as you may have guessed from the web.config, that is a requirement in order for this to work. It's pretty straightforward Forms Authentication stuff so Google should be able to help you out easily enough. If not, my door is always open.

So with these two techniques, I now have a method for users to search the contents of documents without allowing them to actually open them. Maybe I can get a job at www.experts-exchange.com.

Kyle the Restrictive

Monday, February 19, 2007

The short version: Setting my clock to the current date fixes invalid security certificate errors in IE7 and log-in problems with MSN Messenger. The rest of this post is embellishment so that it can be picked up by a movie producer and turned into a film starring Neil Patrick Harris as the Coding Hillbilly and Scarlett Johansson as Mrs. Coding Hillbilly.

A frantic call from Mrs. Hillbilly led to my edification of yourselves today: "Coding Hillbilly! He'p me, he'p me! I'sa gotta chekin' this har e-mail for'un my boss and I kint git into the Hotmail! Ya gotsta he'p me, Coding Hillbilly!"

"Be right thar, Cabbage!" and I moseyed on over to her place as quick as the T-bird could fly.

The error message: There is a problem with this website's security certificate.

Now I've seen this often enough with the dawn of IE7 but this happened when she navigated to Hotmail of all places. Of course, she could continue on to the website with the nice calming red address bar at the top but it didn't inspire a lot of confidence that her computer was running at peak performance.

At this point, the missus also pointed out that MSN Messenger was also puking all over her machine when she tried to log-in. Error 80048820 which is, as you could probably guess, security-related. The surprisingly effective help system in Messenger led me to a much prettier version of this page.

Being a technically-minded hillbilly, I skimmed through the page looking for the most obtuse solution on the page and I went on my merry way re-registering DLLs and adjusting SSL options, none of which worked. Then option 3 stuck out at me: Verify the date and time settings on your computer.

Huh? says I. Mrs. Hillbilly, who is hovering in a non-intrusive manner, notices my confusion: "Oh yeah, I had to change the clock for something else."

A quick update to bring the date back a few months to February 19, 2007 and all is well again with our corner of the world.

I wish no explanation as to why changing the clock affects MSN Messenger or security certificates. (I can make a partially-educated guess as to the latter.) Security is not my strong suit as anyone who has come to my house and found the doors wide open can attest to. I don't particularly care why they're related. All that matters is that someone else in the world does and he or she finds it fit to document his or her knowledge on the Internets.

And on a concluding note, I hate security and all software and hardware related to it, including but not limited to: anti-virus, spam, phishing, SSL, permissions, LDAP, NTLM, forms authentication, SecurID tokens, VPNs, swipe cards, PIN numbers, security deposits, car alarms, bike locks, and cell phones for seven-year-olds.I don't like that the major upgrade to Windows XP was a firewall that broke a bunch of apps. And that among IE7's features is not to let me into websites because I'm not smart enough to figure out if they're dangerous. And that Vista's main differentiation from XP is that it's harder to play my music. I will concede, however, that retinal scanners are pretty cool.

Not that I don't understand the need for it, which I very much do. I just hate the fact that we have to deal with it. But then, I'm far too much of an optimist to really understand the Prisoner's Dilemma that has led to everything from DRM to the "guilty until proven innocent" mentality that permeates our airport authority system.

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Copyright © 2008 Kyle Baley. All rights reserved.
 
LATEST POSTS
 
POPULAR POSTS
 
LINKS
 
BLOG ROLL
 
CATEGORIES
 
ARCHIVE