Tux

...making Linux just a little more fun!

Web Search Option

Ramanathan Muthaiah [rus.cahimb at gmail.com]
Mon, 20 Nov 2006 05:50:52 +0530

Am looking for ways to enable search options for one of the websites (open source, not meant for business :-) ) am responsible for maintenance.

However, it's not for the entire website for a specific section (FAQ pages).

Any ideas ?

/Ram

P.S: Prefer to do it using scripts written in sed / awk.


Top    Back


Samuel Bisbee-vonKaufmann [sbisbee at bu.edu]
Sun, 19 Nov 2006 20:22:06 -0500 (EST)

On Mon, 20 Nov 2006, Ramanathan Muthaiah wrote:

> Am looking for ways to enable search options for one of the websites (open
> source, not meant for business :-) ) am responsible for maintenance.
>

While your problem is not that clear, I believe you are trying to provide search capabilities for information already posted on your web site. If so, then I would suggest using Google. This is easily done by creating a form that queries Google; an example of this implementation (HTML)...

<form method=GET action="http://www.google.com/search">
<input type=text name=q value="">
<input type=submit name=btnG value="Search">
</form>

> P.S: Prefer to do it using scripts written in sed / awk.

Why would you be searching web page content with sed/awk? Unless you are trying to search through some other material? Please explain yourself more clearly so that we can help you better.

Thanks,

-- 
Samuel Kotel Bisbee-vonKaufmann | "A computer once beat me at chess, but
   Boston University, Undergrad. | it was no match for me at kick boxing."
   OFTC.net, Network Operator    | -Emo Philips


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]
Mon, 20 Nov 2006 09:35:49 +0530

Hello,

On Sun, 19 Nov 2006, Samuel Bisbee-vonKaufmann wrote:

> On Mon, 20 Nov 2006, Ramanathan Muthaiah wrote:
> > Am looking for ways to enable search options for one of the websites (open
> > source, not meant for business :-) ) am responsible for maintenance.
> 
> > P.S: Prefer to do it using scripts written in sed / awk.
> 
> Why would you be searching web page content with sed/awk? Unless you are 
> trying to search through some other material? Please explain yourself more 
> clearly so that we can help you better.

I supposehe wants to setup a local search engine which is not dependent on remote access for the search facility. Here are some known options:

1. htdig 2. swish-e 3. swish++ 4. do-it-yourself

Since 1-3 do not use sed/awk and Ramanathan wants to it looks like 4 is the only solution available. Be aware that speed of search is directly correlated with the programming effort involved in setting up the search engine :)

Regards,

Kapil. --


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Mon, 20 Nov 2006 07:57:00 -0500

On Mon, Nov 20, 2006 at 09:35:49AM +0530, Kapil Hari Paranjape wrote:

> 
> I supposehe wants to setup a local search engine which is not
> dependent on remote access for the search facility. Here are some
> known options:
> 
> 1. htdig 2. swish-e 3. swish++ 4. do-it-yourself
> 
> Since 1-3 do not use sed/awk and Ramanathan wants to it looks like 4
> is the only solution available. Be aware that speed of search is
> directly correlated with the programming effort involved in setting
> up the search engine :)

Oh, and - *do not* make this classic rookie's mistake with your CGI:

read input
grep $input /my/web/tree/*
Here's the reason:

<strong>Please enter the search term:</strong>
a b; rm -rf *
Your suicide weapon^W^Wcommand line now looks like this:

grep a b; rm -rf * /my/web/tree/*
Unless you're absolutely sure that you understand and can do good-quality, secure web scripting, I strongly suggest sticking with Kapil's first three options or something similar.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Fri, 24 Nov 2006 23:34:20 +0530

> While your problem is not that clear, I believe you are trying to provide
> search capabilities for information already posted on your web site. If
> so, then I would suggest using Google. This is easily done by creating a
> form that queries Google; an example of this implementation (HTML)...
>
> `
> <form method=GET action="http://www.google.com/search">
> <input type=text name=q value="">
> <input type=submit name=btnG value="Search">
> </form>
> `

You got it right. Information (huge content to browse thru') available already in the website should be searchable. I think your query tips should help me.

One question (sorry for my ignorance :-(( ) :

Will your form query search the content in the current website OR the entire WWW via Google.

/Ram


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Fri, 24 Nov 2006 23:36:04 +0530

>
> I supposehe wants to setup a local search engine which is not
> dependent on remote access for the search facility. Here are some
> known options:
>
> 1. htdig 2. swish-e 3. swish++ 4. do-it-yourself
>
> Since 1-3 do not use sed/awk and Ramanathan wants to it looks like 4
> is the only solution available. Be aware that speed of search is
> directly correlated with the programming effort involved in setting
> up the search engine :)

Thanks Kapil and Samuel.

But I may not be able to setup the local search engine as I do not have the necessary access privileges on the system where the website (FAQ page) is hosted.

/Ram


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Sat, 25 Nov 2006 00:05:50 +0530

>  Thanks Kapil and Samuel.
>
> But I may not be able to setup the local search engine as I do not have
> the necessary access privileges on the system where the website (FAQ page)
> is hosted.
>

Initial expectations, while posting to [TAG], was to have something like shown in the attachment.

It has options to search the web and the local website content as well.

Is this possible with the "form query" sample code provided by Samuel ?

/Ram


Top    Back


Samuel Bisbee-vonKaufmann [sbisbee at bu.edu]
Fri, 24 Nov 2006 13:54:16 -0500 (EST)

On Fri, 24 Nov 2006, Ramanathan Muthaiah wrote:

>> 
>> While your problem is not that clear, I believe you are trying to provide
>> search capabilities for information already posted on your web site. If
>> so, then I would suggest using Google. This is easily done by creating a
>> form that queries Google; an example of this implementation (HTML)...
>> 
>> `
>> <form method=GET action="http://www.google.com/search">
>> <input type=text name=q value="">
>> <input type=submit name=btnG value="Search">
>> </form>
>> `
>
> Will your form query search the content in the current website OR the entire
> WWW via Google.
>

Whoops, sorry, that will search the whole Internet. Add the following line between the form tags...

<input type=hidden name=sitesearch value="DOMAIN">
...replacing DOMAIN with your web site's URL. You can also give your users the option between searching the whole Internet or your web site. To do this instead of the above line you would use the following radio buttons...

<input type=radio name=sitesearch value=""> WWW <input type=radio 
name=sitesearch value="DOMAIN" checked> DOMAIN
...again replacing DOMAIN with your URL. However, for your application you will probably want the first method.

-- 
Samuel Kotel Bisbee-vonKaufmann | "A computer once beat me at chess, but
   Boston University, Undergrad. | it was no match for me at kick boxing."
   OFTC.net, Network Operator    | -Emo Philips


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Sat, 25 Nov 2006 21:46:11 +0530

> Whoops, sorry, that will search the whole Internet. Add the following line
> between the form tags...
>
> `
> <input type=hidden name=sitesearch value="DOMAIN">
> `
>
> ...replacing DOMAIN with your web site's URL. You can also give your users
> the option between searching the whole Internet or your web site. To do
> this instead of the above line you would use the following radio
> buttons...
>
> `
> <input type=radio name=sitesearch value=""> WWW <input type=radio
> name=sitesearch value="DOMAIN" checked> DOMAIN
> `
>
> ...again replacing DOMAIN with your URL. However, for your application you
> will probably want the first method.

Hmm... I started to test this in my local web server but looks like this simple thing is taking quite difficult to get working.

On my Windows system running Apache v2.x with Activestate Perl installed, this is the code is returning the famous 500 internal server error.

Yes, the Apache server configuration has been modified to recognize Perl scripts.

#!C:/Program Files/ActivePerl/bin/perl.exe

<html> <head> <title>Search Page</title> </head> <body> <p> <h1>Search this site</h1> <form method=GET action=" http://www.google.com/search"> <input type=radio name=sitesearch value=""> WWW </form> </body> </html>

Am I missing something simple ?

/Ram


Top    Back


Samuel Bisbee-vonKaufmann [sbisbee at bu.edu]
Sat, 25 Nov 2006 11:28:48 -0500 (EST)

On Sat, 25 Nov 2006, Ramanathan Muthaiah wrote:

> #!C:/Program Files/ActivePerl/bin/perl.exe
>
> <html>
> <head>
> <title>Search Page</title>
> </head>
> <body>
> <p>
> <h1>Search this site</h1>
> <form method=GET action=" http://www.google.com/search">
> <input type=radio name=sitesearch value=""> WWW
> </form>
> </body>
> </html>
> Am I missing something simple ?
>

Just having one radio button is not going to create an easily usable form. Use the text boxes and hidden field from my previous messages as well and then see what happens.

Also, I do not know anything about Windows Apache. Nor is this the place to talk about Windows software (a few select topics aside). Asking about the HTML is fine, but please do not expect us to debug Windows errors on a Linux mailing list.

-- 
Samuel Kotel Bisbee-vonKaufmann | "A computer once beat me at chess, but
   Boston University, Undergrad. | it was no match for me at kick boxing."
   OFTC.net, Network Operator    | -Emo Philips


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]
Sat, 25 Nov 2006 16:34:01 +0000

On Sat, Nov 25, 2006 at 11:28:48AM -0500, Samuel Bisbee-vonKaufmann wrote:

> Also, I do not know anything about Windows Apache. Nor is this the place 
> to talk about Windows software (a few select topics aside). Asking about 
> the HTML is fine, but please do not expect us to debug Windows errors on a 
> Linux mailing list.

Actually, on this occasion, it's perfectly acceptable.

-- Thomas Adam

-- 
"Wanting to feel; to know what is real.  Living is a lie." -- Purpoise
Song, by The Monkees.


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Sat, 25 Nov 2006 22:13:59 +0530

>
> Also, I do not know anything about Windows Apache. Nor is this the place
> to talk about Windows software (a few select topics aside). Asking about
> the HTML is fine, but please do not expect us to debug Windows errors on a
> Linux mailing list.

Certainly my intention was not to touch on this topic or discuss anything related to Win.

Shall try with the sample code (of course, studying some material regd CGI scripting also), so should be able to fix it soon.

thanks, /Ram


Top    Back


Francis Daly [francis at daoine.org]
Sat, 25 Nov 2006 18:45:37 +0000

On Sat, Nov 25, 2006 at 10:13:59PM +0530, Ramanathan Muthaiah wrote:

Hi there,

> Shall try with the sample code (of course, studying some material regd CGI
> scripting also), so should be able to fix it soon.

I suspect that one of the confusions is that so far, the sample offered is not code, and has nothing to do with CGI scripting.

It's just part of a simple html form which submits to Google. Save it in a file called something like search.htm, and view it in your browser (either as a file or as a http: url).

The four lines in the first reply, plus the name=sitesearch one from a later reply in the right place, should do the job. No coding or anything else on your side.

It will only work if Google has already indexed your site, of course. If that isn't the case, then you want to fall back to plan B.

Good luck,

f

-- 
Francis Daly        francis@daoine.org


Top    Back


Ramanathan Muthaiah [rus.cahimb at gmail.com]
Sun, 26 Nov 2006 01:37:43 +0530

> I suspect that one of the confusions is that so far, the sample offered
> is not code, and has nothing to do with CGI scripting.

Yes, I realized this after my last posting to TAG. Part of the problem was my reference to code written months back for a simple form-based app using CGI.

It's just part of a simple html form which submits to Google. Save it

> in a file called something like search.htm, and view it in your browser
> (either as a file or as a http: url).

Exactly, this is what I did. Copied the code into htdocs dir and renamed the file as .html. It works. Thanks ;-P

The four lines in the first reply, plus the name=sitesearch one from a

> later reply in the right place, should do the job. No coding or anything
> else on your side.
>
> It will only work if Google has already indexed your site, of course. If
> that isn't the case, then you want to fall back to plan B.

I safely assume this "index" is already there, as I _will be doing_ this for the FAQ section in Subversion's (that open-source SCM tool) website.

/Ram


Top    Back


Samuel Bisbee-vonKaufmann [sbisbee at bu.edu]
Sat, 25 Nov 2006 16:06:36 -0500 (EST)

On Sun, 26 Nov 2006, Ramanathan Muthaiah wrote:

> I safely assume this "index" is already there, as I _will be doing_ this for
> the FAQ section in Subversion's (that open-source SCM tool) website.
>

Yes, the index is there but you may not be in it. Once your site is complete and online then you should add your site to Google's index manually at http://www.google.com/addurl/?continue=/addurl (unless the HTML form works). For more web master help with Google: http://www.google.com/support/webmasters/

Enjoy,

-- 
 Samuel Kotel Bisbee-vonKaufmann | "A computer once beat me at chess, but
   Boston University, Undergrad. | it was no match for me at kick boxing."
   OFTC.net, Network Operator    | -Emo Philips


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Sat, 25 Nov 2006 21:09:34 -0500

On Sun, Nov 26, 2006 at 01:37:43AM +0530, Ramanathan Muthaiah wrote:

> 
>      I suspect that one of the confusions is that so far, the sample offered
>      is not code, and has nothing to do with CGI scripting.
> 
> 
>    Yes, I realized this after my last posting to TAG. Part of the problem was
>    my reference to code written months back for a simple form-based app using
>    CGI.
> 
>      It's just part of a simple html form which submits to Google. Save it
>      in a file called something like search.htm , and view it in your browser
>      (either as a file or as a http: url).
> 
> 
>    Exactly, this is what I did. Copied the code into htdocs dir and renamed the
>    file as .html.
>    It works. Thanks ;-P

Ram, I'd appreciate it if you'd turn off the HTML encoding and use standard quoting when posting to this list (i.e., precede the lines that you're replying to with '>' instead of tabs.) Your current methods lead to extra work for our Mailbag editor, and annoy those of us who have to process this stuff before reading it. In fact, I'd suggest that you review "Asking Questions of The Answer Gang" at http://linuxgazette.net/tag/ask-the-gang.html - lots of good tips on standard list behavior there, as well as helpful hints and instructions.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back