<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Eggnchips Blog</title>
	
	<link>http://www.eggnchips.com/blog</link>
	<description>the blog for the eggnchips search engine research project</description>
	<pubDate>Tue, 28 Oct 2008 11:12:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/EggnchipsBlog" type="application/rss+xml" /><feedburner:emailServiceId>1843953</feedburner:emailServiceId><feedburner:feedburnerHostname>http://www.feedburner.com</feedburner:feedburnerHostname><item>
		<title>Dealing with unexpected input</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/434598227/</link>
		<comments>http://www.eggnchips.com/blog/2008/10/28/dealing-with-unexpected-input/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 11:12:38 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Lead Story]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/?p=43</guid>
		<description><![CDATA[Since the introduction of the Suggest a Website link in the Eggnchips project, a few interesting observations have been noted, the biggest of which is the amount of HTML that has tried to be inserted into the description field in order to try and stuff it with extra links; another thing has been a few attempted SQL [...]]]></description>
			<content:encoded><![CDATA[<p>Since the introduction of the <a href="http://www.eggnchips.com/index.php?p=s">Suggest a Website</a> link in the Eggnchips project, a few interesting observations have been noted, the biggest of which is the amount of HTML that has tried to be inserted into the description field in order to try and stuff it with extra links; another thing has been a few attempted SQL injections. Had these fields gone straight into the database no doubt trouble would have ensured and with the SQL injections the system might have become compromised.</p>
<p><img src="http://www.eggnchips.com/images/eggnchips-link.png" alt="" width="400" height="717" /></p>
<p>At least there is a way around this using <a title="Permanent Link to PHP: Removing HTML Tags from Strings" rel="bookmark" href="http://www.sl8r.co.uk/2008/10/28/php-removing-html-tags-from-strings/"><span style="color: #000000;">PHP: Removing HTML Tags from Strings</span></a> and <a title="Permanent Link to PHP: Avoiding mySQL Injections" rel="bookmark" href="http://www.sl8r.co.uk/2008/04/30/php-avoiding-mysql-injections/"><span style="color: #555555;">PHP: Avoiding mySQL Injections</span></a>. There are also a few other tricks up our sleeves to combat this sort of activity but more investigation and research needs to be carried out on the effective mechanisms for securing input data. One possibility would be to allow only text based input but that could ultimately prove difficult in a multi-lingual environment.</p>
<p> </p>
<p> </p>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F10%2F28%2Fdealing-with-unexpected-input%2F';
  addthis_title  = 'Dealing+with+unexpected+input';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/434598227" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/10/28/dealing-with-unexpected-input/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/10/28/dealing-with-unexpected-input/</feedburner:origLink></item>
		<item>
		<title>Building a bot, phase I</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/341629436/</link>
		<comments>http://www.eggnchips.com/blog/2008/07/21/building-a-bot-phase-i/#comments</comments>
		<pubDate>Mon, 21 Jul 2008 15:27:34 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Lead Story]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/07/21/building-a-bot-phase-i/</guid>
		<description><![CDATA[Some search engines however simply require a URL and the essential information is automatically retrieved using an automated mechanism - bots. ]]></description>
			<content:encoded><![CDATA[<p>Bots are a popular search engine activity. When users recommend sites they are sometimes asked to enter information such as Title, URL, Description and possibly some Keywords. Some search engines however simply require a URL and the essential information is automatically retrieved using an automated mechanism - bots.</p>
<p>Bots are ideal for this repetitive automated activity by taking a work queue (a given list of URLs) and producing a set of information that can be passed to the next stage in the submission process. Even though these techniques are called bots they are essentially just software programs that run either at a scheduled time or in a loop constantly checking the work queue.</p>
<p>In addition to the usual information a bot can periodically extract additional words from body text itself by parsing out html tags, removing stop words and building a word frequency table. A word frequency table is simply a list of words with the number of times a word appears in the given text.</p>
<p>Recently I have been working to find a way of automatically extracting information for a given URL by accessing the html from a submitted URL, parsing the relevant information and using this information as input to the submission process. Information is often placed as meta tags in the HEAD section of html but I have found this process so far to somewhat hit and miss as some sites include the information and others do not. For those that do not an additional ways needs to be identified to provide a suitable link title and description - it may be the case that this will always require some user intervention but further analysis should identify more.</p>
<p>The bot also needs to be aware of the period of time since the last change of a web page to ensure that valuable information or updates is not missed and that processing power isn&#8217;t wasted on a link that does not change very much. The way we can address this is by producing a hash for a page and comparing it with a previously stored hash.</p>
<p>A hash, simply put is the result of a consistent algorithm applied to information. MD5 is a well known hash mechanism that I may investigate in a later article. For now a simple example to demonstrate a hash would be to calculate a check digit by taking letter values of a phrase, adding them together then finding the modulus using a number say 27 (allowing 26 letters and the space), in the example the first two phrases give the same answer however the third phrase with one letter different gives a much different check digit.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/07/checkdigit.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/07/checkdigit-thumb.png" border="0" alt="checkdigit" width="640" height="204" /></a></p>
<p>I have a working program up and running - but not in bot mode yet -but can be initiated manually. I need to add the loop code to continually check the work queue for submitted URLs.</p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:d4e22367-02b2-402e-bbdb-cfa87bc7f526" class="wlWriterSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a rel="tag" href="http://technorati.com/tags/search">search</a>,<a rel="tag" href="http://technorati.com/tags/engine">engine</a>,<a rel="tag" href="http://technorati.com/tags/hash">hash</a>,<a rel="tag" href="http://technorati.com/tags/bot">bot</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F07%2F21%2Fbuilding-a-bot-phase-i%2F';
  addthis_title  = 'Building+a+bot%2C+phase+I';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/341629436" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/07/21/building-a-bot-phase-i/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/07/21/building-a-bot-phase-i/</feedburner:origLink></item>
		<item>
		<title>Taking a new theme for a ride</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/326803909/</link>
		<comments>http://www.eggnchips.com/blog/2008/07/04/taking-a-new-theme-for-a-ride/#comments</comments>
		<pubDate>Fri, 04 Jul 2008 17:43:35 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/?p=37</guid>
		<description><![CDATA[Over here at Eggnchips we do like to fiddle about with technology so we are trying out a new Theme, the Mimbo Theme from Darren Hoyt, let us know what you think.

  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F07%2F04%2Ftaking-a-new-theme-for-a-ride%2F';
  addthis_title  = 'Taking+a+new+theme+for+a+ride';
  addthis_pub    = '';

]]></description>
			<content:encoded><![CDATA[<p>Over here at Eggnchips we do like to fiddle about with technology so we are trying out a new Theme, the Mimbo Theme from <a href="http://www.darrenhoyt.com/2007/08/05/wordpress-magazine-theme-released/">Darren Hoyt</a>, let us know what you think.</p>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F07%2F04%2Ftaking-a-new-theme-for-a-ride%2F';
  addthis_title  = 'Taking+a+new+theme+for+a+ride';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/326803909" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/07/04/taking-a-new-theme-for-a-ride/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/07/04/taking-a-new-theme-for-a-ride/</feedburner:origLink></item>
		<item>
		<title>Search and stop words</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/316201077/</link>
		<comments>http://www.eggnchips.com/blog/2008/06/20/search-and-stop-words/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 13:12:34 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Lead Story]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/06/20/search-and-stop-words/</guid>
		<description><![CDATA[In this article we discuss the impact of stop words on the Eggnchips search engine project.]]></description>
			<content:encoded><![CDATA[<p>Whilst planned how to build a word and phrase vocabulary for use in the EggnChips project we need to make a decision regarding the use of stop words.</p>
<p>Stop words are a mix of short, common and everyday words such as &#8220;a&#8221;, &#8220;when&#8221;, &#8220;besides&#8221;, and &#8220;to&#8221; that are sometimes ignored by search engines when a user is searching for information. There are positive and negative elements to this, the positive element is that it makes word lists much shorter making calculations quicker but negatives can impact the context of the intended search. This can be compounded by stemming phrases (We will cover stemming in a later post but perhaps besides may be stemmed to beside).</p>
<p>An example of the negative element could be if I were searching for images of &#8220;plants beside water&#8221;, if the word besides is stemmed to beside which is filtered as a stop word then the search engine may think I am interested in &#8220;plant water&#8221; or &#8220;water plants&#8221; or something else which would likely impact the results receive.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image2.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image-thumb2.png" border="0" alt="www.eggnchips.com" width="400" height="188" /></a></p>
<p>An example portion of a stop word list may be:</p>
<p>b ba back be became because become been before began beget behind being beside best bet between big bin both but by</p>
<p>Stemming could increase the size of the list to effectively make the following variations of the word &#8220;back&#8221; into stop words: backed, backing, backs, backend, backy.</p>
<p>There does not appear to be a universal list of stop words, which complicates matters, and an important factor is language variations which could make some words &#8220;as typed&#8221; stop words in one language but not in other languages.</p>
<p>Our initial view is that we may generate a stop word list however not implement it - or even make it optional. This way we could assess the difference in the results gained from using stop words and not using stop words for a particular search phrase.</p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:2c921fb5-3c9a-4283-9e82-0996f8305323" class="wlWriterSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a rel="tag" href="http://technorati.com/tags/eggnchips">eggnchips</a>,<a rel="tag" href="http://technorati.com/tags/search">search</a>,<a rel="tag" href="http://technorati.com/tags/stop%20words">stop words</a>,<a rel="tag" href="http://technorati.com/tags/stemming">stemming</a>,<a rel="tag" href="http://technorati.com/tags/word%20list">word list</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F06%2F20%2Fsearch-and-stop-words%2F';
  addthis_title  = 'Search+and+stop+words';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/316201077" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/06/20/search-and-stop-words/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/06/20/search-and-stop-words/</feedburner:origLink></item>
		<item>
		<title>Eggnchips: Improving the search results</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/314529353/</link>
		<comments>http://www.eggnchips.com/blog/2008/06/18/eggnchips-improving-the-search-results/#comments</comments>
		<pubDate>Wed, 18 Jun 2008 10:46:03 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Theory]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/06/18/eggnchips-improving-the-search-results/</guid>
		<description><![CDATA[Using a search engine, or even a link directory, from a user point of view it almost seems that simplistic systematic method is being employed.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image.png"><img style="border-right: 0px; border-top: 0px; margin: 5px 0px 0px 10px; border-left: 0px; border-bottom: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image-thumb.png" border="0" alt="image" width="142" height="256" align="right" /></a>When using a search engine, or even a link directory, from a user point of view it almost seems that simplistic systematic method is being employed.</p>
<p>This methods appears to be: a web page is presented to the user with a search box, we type a keyword (or more) into the search box then click the search button and the relevant matching results are returned.</p>
<p>However, whilst it may seem that the process is pure information retrieval there is far more going on behind the scenes and what is going on behind the scene is information collection.</p>
<p>Information collection is a valuable part of any search engine project. Simply returning results based on a keyword pattern match is only half (or maybe even less) of the story and is unlikely to contribute to the success of a search engine project which is why we are looking at it as an important factor of the <a href="http://www.eggnchips.com">Eggnchips</a> project.</p>
<p>With information collection, results can be more intelligently returned based on a number of factors including popularity, occurrence, relevance, perhaps even context and user profile information such as geographical region and browser/computer information.</p>
<p>With the application of information collection the search engine can generate its own information based on user habits which provides a more dynamic and usable platform.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image1.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/06/image-thumb1.png" border="0" alt="image" width="448" height="480" /></a></p>
<p>The primary elements of the Search Manager include:</p>
<p><strong>Match keyword</strong></p>
<p>Begin the process with a simple keyword match as before</p>
<p><strong>Results Profiling</strong></p>
<p>Modify the results based on previous results returned, and user actions based on those results, such as click-through rate, number of back links to results, and peer grading information.</p>
<p><strong>User Profiling</strong></p>
<p>Modify the results, as appropriate, based on information detected from the user such as previous search history, browser information, geographic information. This area can be made an opt-out depending on user preferences.</p>
<p><strong>Update Manager</strong></p>
<p>Update the User Profiling and Results profiling stores based on this information. The Update Manager also collects popular search terms and results and also watches for new added content that matches these terms.</p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:8a6173b0-7bc8-4bcb-b072-91771c7c53f1" class="wlWriterSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a rel="tag" href="http://technorati.com/tags/search">search</a>,<a rel="tag" href="http://technorati.com/tags/information%20retrieval">information retrieval</a>,<a rel="tag" href="http://technorati.com/tags/data%20collection">data collection</a>,<a rel="tag" href="http://technorati.com/tags/profiling">profiling</a>,<a rel="tag" href="http://technorati.com/tags/eggnchips">eggnchips</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F06%2F18%2Feggnchips-improving-the-search-results%2F';
  addthis_title  = 'Eggnchips%3A+Improving+the+search+results';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/314529353" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/06/18/eggnchips-improving-the-search-results/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/06/18/eggnchips-improving-the-search-results/</feedburner:origLink></item>
		<item>
		<title>Search and URL Submission Mechanics</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/296703267/</link>
		<comments>http://www.eggnchips.com/blog/2008/05/23/search-and-url-submission-mechanics/#comments</comments>
		<pubDate>Fri, 23 May 2008 17:08:33 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Project Development]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/05/23/search-and-url-submission-mechanics/</guid>
		<description><![CDATA[Discussing the components of the Eggnchips Search Engine and their relationship with the resulting pages returned to the user.]]></description>
			<content:encoded><![CDATA[<p>The following diagram shows the components of the first phase of the Eggnchips search engine project. The two key functional components are URL submission and Keyword Searching.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image4.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image-thumb4.png" border="0" alt="image" width="400" height="312" /></a></p>
<p>The green areas indicate interactive web pages, blue areas are components. The blue components identified are URL Submission Validation, Keyword Validation, Update Manager, Mail Manager and Search Handler and Presentation/Formatting Component.</p>
<p>Initially I used a html table arrangement for the web pages however it would make sense to switch to CSS which I aim to do before development progresses any further.</p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:29db748e-5037-4daa-8a11-a88caabeaaeb" class="wlWriterSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a rel="tag" href="http://technorati.com/tags/search">search</a>,<a rel="tag" href="http://technorati.com/tags/keyword">keyword</a>,<a rel="tag" href="http://technorati.com/tags/submission">submission</a>,<a rel="tag" href="http://technorati.com/tags/url">url</a>,<a rel="tag" href="http://technorati.com/tags/eggnchips">eggnchips</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F05%2F23%2Fsearch-and-url-submission-mechanics%2F';
  addthis_title  = 'Search+and+URL+Submission+Mechanics';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/296703267" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/05/23/search-and-url-submission-mechanics/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/05/23/search-and-url-submission-mechanics/</feedburner:origLink></item>
		<item>
		<title>Eggnchips, components of search</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/296263049/</link>
		<comments>http://www.eggnchips.com/blog/2008/05/23/eggnchips-components-of-search/#comments</comments>
		<pubDate>Fri, 23 May 2008 02:38:06 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Features]]></category>

		<category><![CDATA[Project Development]]></category>

		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/05/23/eggnchips-components-of-search/</guid>
		<description><![CDATA[Currently the Eggnchips search engine is one large application. However, whilst putting together the basic search engine, it has become apparent that the engine can be divided into a number of distinct components that combine to form the overall platform, these are:

The component that submits web sites (I call it the Finder)
The component that Gets [...]]]></description>
			<content:encoded><![CDATA[<p>Currently the Eggnchips search engine is one large application. However, whilst putting together the basic search engine, it has become apparent that the engine can be divided into a number of distinct components that combine to form the overall platform, these are:</p>
<ul>
<li>The component that submits web sites (I call it the Finder)</li>
<li>The component that Gets the web sites and stores them in the Database (I call it the Getter)</li>
<li>The component that handles the submitted search query and returns the results (I call it the Searcher)</li>
</ul>
<p> </p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image2.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image-thumb2.png" border="0" alt="image" width="350" height="168" /></a></p>
<p>A breakdown of the components follows:</p>
<h3>Finder</h3>
<p>The Finder receives work requests using a web based form. Once submitted this information (URL,title,description,keywords) is added to the work queue. The processor sub-component analyses the work queue and checks various parts of the information submitted then submits the results to the Getter. Some of the checks include whether the URL is well formed and valid, and obtaining manual authorisation to include the web site.</p>
<p>The Form submission may benefit from an image based input mechanism.</p>
<h3>Getter</h3>
<p>The Getter retrieves the information from its work queue, gets any other information that is required then places it into the various database structures. Some of the jobs of the Getter would be to retrieve counts of keywords, store the date of the submission, and categorise the web site.</p>
<p>The Getter needs to assess the impact of Spammers who may try to trick the Finder, possibly the addition of a Spam list to ensure that no site gets through. Also, a manual submission authorisation is required before a site goes live. There are a whole bunch of different ways that Spammers could try and add sites to the search engine so this is likely to become a topic of its own.</p>
<h3>Searcher</h3>
<p>The searcher is driven by a web based form. Once submitted the form checks the keywords against those submitted previously by the Getter and returns the necessary results to the end user.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image3.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image-thumb3.png" border="0" alt="image" width="350" height="241" /></a></p>
<div id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:0fb4e780-b771-4af8-ba54-b04c0999f654" class="wlWriterSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a rel="tag" href="http://technorati.com/tags/search">search</a>,<a rel="tag" href="http://technorati.com/tags/eggnchips">eggnchips</a>,<a rel="tag" href="http://technorati.com/tags/components">components</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F05%2F23%2Feggnchips-components-of-search%2F';
  addthis_title  = 'Eggnchips%2C+components+of+search';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/296263049" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/05/23/eggnchips-components-of-search/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/05/23/eggnchips-components-of-search/</feedburner:origLink></item>
		<item>
		<title>Eggnchips, Stage One of Phase One</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/295884786/</link>
		<comments>http://www.eggnchips.com/blog/2008/05/22/eggnchips-stage-one-of-phase-one/#comments</comments>
		<pubDate>Thu, 22 May 2008 15:00:35 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/05/22/eggnchips-stage-one-of-phase-one/</guid>
		<description><![CDATA[&#160;
The first stage of the search engine is up and running with the ability to perform some simple keyword searches. The engine shows the basic screen (the idea here is to keep the homepage short and sweet). An option is included to make EggnChips the home page. A search term can be entered and I [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<p>The first stage of the search engine is up and running with the ability to perform some simple keyword searches. The engine shows the basic screen (the idea here is to keep the homepage short and sweet). An option is included to make EggnChips the home page. A search term can be entered and I have included a few other buttons to make the engine more versatile. Also included is a running count of the number of approved web sites listed.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="264" alt="image" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image-thumb.png" width="350" border="0" /></a> </p>
<p>When someone enters a keyword (or multiple keywords) the term is validated (to ensure it is safe) then checked against the URL database and appropriate entries listed. I have added a keyword highlighting mechanism so the searched phrase appears in green.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image1.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="361" alt="image" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/05/image-thumb1.png" width="350" border="0" /></a> </p>
<p>The next stage I plan to add is a URL submission facility.</p>
<div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:fbca038b-10f6-4fc9-a241-29dd5791645d" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a href="http://technorati.com/tags/search" rel="tag">search</a>,<a href="http://technorati.com/tags/engine" rel="tag">engine</a>,<a href="http://technorati.com/tags/keyword" rel="tag">keyword</a>,<a href="http://technorati.com/tags/eggnchips" rel="tag">eggnchips</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F05%2F22%2Feggnchips-stage-one-of-phase-one%2F';
  addthis_title  = 'Eggnchips%2C+Stage+One+of+Phase+One';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/295884786" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/05/22/eggnchips-stage-one-of-phase-one/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/05/22/eggnchips-stage-one-of-phase-one/</feedburner:origLink></item>
		<item>
		<title>Dynamic Information</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/290959555/</link>
		<comments>http://www.eggnchips.com/blog/2008/05/15/dynamic-information/#comments</comments>
		<pubDate>Thu, 15 May 2008 14:29:24 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Project Development]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/?p=13</guid>
		<description><![CDATA[Information needed for inclusion in the search engine:
Master Categories: #cats#
Sub-Categories: #subcats#
Total Links: #url-count#
Total Links Pending: #url-pending#
Link Submissions Today: #url-pending-today#

  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F05%2F15%2Fdynamic-information%2F';
  addthis_title  = 'Dynamic+Information';
  addthis_pub    = '';

]]></description>
			<content:encoded><![CDATA[<p>Information needed for inclusion in the search engine:</p>
<p>Master Categories: #cats#</p>
<p>Sub-Categories: #subcats#</p>
<p>Total Links: #url-count#</p>
<p>Total Links Pending: #url-pending#</p>
<p>Link Submissions Today: #url-pending-today#</p>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F05%2F15%2Fdynamic-information%2F';
  addthis_title  = 'Dynamic+Information';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/290959555" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/05/15/dynamic-information/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/05/15/dynamic-information/</feedburner:origLink></item>
		<item>
		<title>Building the basics of search</title>
		<link>http://feeds.feedburner.com/~r/EggnchipsBlog/~3/281537403/</link>
		<comments>http://www.eggnchips.com/blog/2008/04/23/building-the-basics-of-search/#comments</comments>
		<pubDate>Wed, 23 Apr 2008 18:29:32 +0000</pubDate>
		<dc:creator>jasonslater</dc:creator>
		
		<category><![CDATA[Project Development]]></category>

		<guid isPermaLink="false">http://www.eggnchips.com/blog/2008/04/23/building-the-basics-of-search/</guid>
		<description><![CDATA[The EggnChips development is getting underway. The first part of the project is to build a simple keyword search engine. 
In this project a form will be presented the user with a familiar text input box followed by a button for performing a search. To make the system usable and to provide a mechanism for [...]]]></description>
			<content:encoded><![CDATA[<p>The EggnChips development is getting underway. The first part of the project is to build a simple keyword search engine. </p>
<p>In this project a form will be presented the user with a familiar text input box followed by a button for performing a search. To make the system usable and to provide a mechanism for comparison several other buttons will be in operation to redirect searches to other reference sites.</p>
<p>Upon a user entering a keyword and clicking on the search button, a simple lookup will be made in a back-end database against the keyword and the matching results presented.</p>
<p><a href="http://www.eggnchips.com/blog/wp-content/uploads/2008/04/image.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="114" alt="image" src="http://www.eggnchips.com/blog/wp-content/uploads/2008/04/image-thumb.png" width="339" border="0" /></a></p>
<p>The next step will be to enable the use of multiple keywords in an and/or mechanism. Simply entering two keywords will give assume an OR whilst prefixing each keywords with a plus symbol will force an ADD decision to be made. Enclosing a series of keywords in quotes will turn the keywords into a phrase entity which will itself become a keyword for search purposes.</p>
<p>In order to conduct a search the back-end database will be populated with a number of popular web site address URL together with the site Title, Description and Keyword Tags. A facility to add a URL to the database will be added in the next phase.</p>
<p>At present the form is working together with the alternate buttons however the back-end database has yet to be populated. This will be performed over the next few weeks.</p>
<p>The project progress can be viewed at <a href="http://www.eggnchips.com">http://www.eggnchips.com</a> and is always to be considered a &#8216;work in progress&#8217;.</p>
<div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:455870a0-8d31-4151-8233-c51a830787ee" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati Tags: <a href="http://technorati.com/tags/eggnchips" rel="tag">eggnchips</a>,<a href="http://technorati.com/tags/search" rel="tag">search</a>,<a href="http://technorati.com/tags/seo" rel="tag">seo</a>,<a href="http://technorati.com/tags/project" rel="tag">project</a>,<a href="http://technorati.com/tags/keyword" rel="tag">keyword</a></div>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.eggnchips.com%2Fblog%2F2008%2F04%2F23%2Fbuilding-the-basics-of-search%2F';
  addthis_title  = 'Building+the+basics+of+search';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
<img src="http://feeds.feedburner.com/~r/EggnchipsBlog/~4/281537403" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.eggnchips.com/blog/2008/04/23/building-the-basics-of-search/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.eggnchips.com/blog/2008/04/23/building-the-basics-of-search/</feedburner:origLink></item>
	</channel>
</rss>
