I have recently started a new job. My new company owns a couple dozen dating web sites.
One of the challenges we face is to come up with better ways to find your match. If you are a member of an online dating website, you know that having the right tools to find the people you are looking for and also if at all possible finding the people who may be looking for you makes all the difference in the world.
Now I have used Lucene.Net before to create a search engine, but I am not sure that it is the best option for this kind of job.
Here’s what the Lucene.Net website says:
Lucene.Net is a source code, class-per-class, API-per-API and algorithmic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.
And here’s what the Lucene website says:
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
The key phrase here is “text search engine” which means that is is better at Google type of searches than it is at structured searches.
Before we go on, let me clarify what do I mean when I say structured search?
Structured search is the type of search performed on structured data, it is the nature of the data what dictates the best algorithm for searching through it. Usually this type of search is best performed by using tree search algorithms which are a natural fit for this kind of problem.
Now that we have defined our problem, let’s discuss some possible solutions.
Back in college my math teacher had a very simple and interesting way of demonstrating bit arithmetic.
He used a set of perforated cards to show us how we could sort and search through that stack by using a set of holes as bits.
The reason I bring this up is that I believe we can represent each of the characteristics of our imaginary member as a sequence of bits. For example we can assign each bit in a byte a certain meaning making sure to always keep it down to a yes/no field.
Member 100 can be a smoker, has a beard, loves the outdoors and is a heterosexual male.
The representation of this data could be like this:
- smoker yes/no
- has beard yes/no
- likes the outdoors yes/no
- female yes/no
- male yes/no
- looking for a man yes/no
- looking for a woman yes/no
- Reserved for future use
Armed with this map, lets represent member 100 in a byte.
Member 100: 11101010
So now we can represent 8 characteristics of an imaginary member in a binary string.
What this gives us is a way of fingerprinting each one of our users in a way that is easy to search for patterns and come up with a list of results from that.
In an upcoming article I will write about a type of database that uses this kind of fingerprinting and searching and is developed on open source tools. Until then feel free to comment on this system and any ideas you may have that would make this work in a better way.
Tags: fingerprinting, search, structured