the blog for the eggnchips search engine research project

Eggnchips, components of search

By jasonslater • May 23rd, 2008 • Category: Features, Project Development, search

Currently the Eggnchips search engine is one large application. However, whilst putting together the basic search engine, it has become apparent that the engine can be divided into a number of distinct components that combine to form the overall platform, these are:

  • The component that submits web sites (I call it the Finder)
  • The component that Gets the web sites and stores them in the Database (I call it the Getter)
  • The component that handles the submitted search query and returns the results (I call it the Searcher)

 

image

A breakdown of the components follows:

Finder

The Finder receives work requests using a web based form. Once submitted this information (URL,title,description,keywords) is added to the work queue. The processor sub-component analyses the work queue and checks various parts of the information submitted then submits the results to the Getter. Some of the checks include whether the URL is well formed and valid, and obtaining manual authorisation to include the web site.

The Form submission may benefit from an image based input mechanism.

Getter

The Getter retrieves the information from its work queue, gets any other information that is required then places it into the various database structures. Some of the jobs of the Getter would be to retrieve counts of keywords, store the date of the submission, and categorise the web site.

The Getter needs to assess the impact of Spammers who may try to trick the Finder, possibly the addition of a Spam list to ensure that no site gets through. Also, a manual submission authorisation is required before a site goes live. There are a whole bunch of different ways that Spammers could try and add sites to the search engine so this is likely to become a topic of its own.

Searcher

The searcher is driven by a web based form. Once submitted the form checks the keywords against those submitted previously by the Getter and returns the necessary results to the end user.

image

Technorati Tags: ,,

2 Responses »

  1. I came across your site while I did a search on Google for business floor plan and your article on ps, components of search | Eggnchips Blog was informative.

  2. found your site on del.icio.us today and really liked it.. i bookmarked it and will be back to check it out some more later ..

Leave a Reply