Home Discord Chat
Go Back   ChiefsPlanet > Nzoner's Game Room
Register FAQDonate Members List Calendar

Reply
 
Thread Tools Display Modes
Old 01-18-2006, 04:11 PM  
Simplex3 Simplex3 is offline
MVP
 
Simplex3's Avatar
 
Join Date: Sep 2003
Casino cash: $10004900
Anybody ever designed a search engine?

I'm looking for some insight. I need to design a moderately complex search engine and I'm looking to learn from someone else's experience before I eat it myself. Basically I'm trying to index text from a database, none of this is static html anywhere. Here's the scenario:

The app is hosted by regions, cross-region searches won't happen.
A customer may belong to multiple regions.
A customer may have multiple offerings.

The search engine needs to search a few text fields for each offering that is available in a given region based on keyword relevance and return a listing of the ids, in order of relevance.

Any pointers are greatly appreciated.
Posts: 28,527
Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.
    Reply With Quote
Old 01-18-2006, 04:13 PM   #2
teedubya teedubya is offline
Most Valuable Poster
 
teedubya's Avatar
 

Join Date: Oct 2003
Casino cash: $9480002
would www.atomz.com be a solution that you could utilize?
Posts: 36,652
teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.teedubya is obviously part of the inner Circle.
    Reply With Quote
Old 01-18-2006, 04:17 PM   #3
Simplex3 Simplex3 is offline
MVP
 
Simplex3's Avatar
 

Join Date: Sep 2003
Casino cash: $10004900
Quote:
Originally Posted by Ali Chi3fs
would www.atomz.com be a solution that you could utilize?
Actually, that's way more than I need here but it's very interresting for another project I have going...

Thanks for the link.
Posts: 28,527
Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.
    Reply With Quote
Old 01-18-2006, 04:19 PM   #4
dirk digler dirk digler is offline
Please squeeze
 
dirk digler's Avatar
 

Join Date: Jul 2003
Location: Clinton, MO
Casino cash: $3154644
Doesn't Google have where you can do local searches? I have seen alot of websites do local google searches.
Posts: 66,341
dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.
    Reply With Quote
Old 01-18-2006, 04:21 PM   #5
Simplex3 Simplex3 is offline
MVP
 
Simplex3's Avatar
 

Join Date: Sep 2003
Casino cash: $10004900
Quote:
Originally Posted by dirk digler
Doesn't Google have where you can do local searches? I have seen alot of websites do local google searches.
Yes. I did fail to mention one thing, though. My customer doesn't want their entire database exposed to google, yahoo, etc. This has to be internal to the application.
Posts: 28,527
Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.
    Reply With Quote
Old 01-18-2006, 04:26 PM   #6
SLAG SLAG is offline
Superbowl MVP
 
SLAG's Avatar
 

Join Date: Oct 2005
Location: OOOOOOOOOOOOOLATHE
Casino cash: $9910252
http://www.google.com/enterprise/
__________________
Ad astra per aspera


Posts: 11,177
SLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby PiscitelliSLAG 's adopt a chief was Sabby Piscitelli
    Reply With Quote
Old 01-18-2006, 04:32 PM   #7
dirk digler dirk digler is offline
Please squeeze
 
dirk digler's Avatar
 

Join Date: Jul 2003
Location: Clinton, MO
Casino cash: $3154644
Quote:
Originally Posted by SLAG02
Damn that shit is expensive
Posts: 66,341
dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.dirk digler is obviously part of the inner Circle.
    Reply With Quote
Old 01-18-2006, 04:37 PM   #8
kepp kepp is offline
MVP
 
kepp's Avatar
 

Join Date: Aug 2005
Casino cash: $5299212
I've worked on a couple different search engines and designed/implemented one from the ground up. They can pretty much get as complicated as you want.

* What volume of traffic are we talking about? If the volume is light, you can have a very simple implementation that will work fine. For instance, my current employer needed me to design an engine that would handle 100 million requests/day, so that kind of did away with the lightweight options.
* Is there an existing infrastructure that it has to conform to? Windows or Linux? IIS or Apache? etc...
* Will it be type-in queries or some sort of category or index-based search?
* I take it that this will be a web page/site on an intranet?
* What kind of data are you talking about? I know you said "text", but what industry?

Pointers? Hmmm...

* You can't spend too much time designing your database. You can have super-fast code, but if your relations and indexes are bad, it won't do squat.
* The volume also comes into play when choosing your database (if you're not tied to a pre-existing installation). For example, from my experience, MySQL becomes unstable when a table reaches around 60 million rows. Oracle doesn't seem to have that problem. However MySQL is way faster than Oracle and Oracle is super-pricey. A lot of tradeoffs here.
* If you can, its easier to use the "free" route: Linux, MySQL, PHP/Perl. Although this won't be as fast as other implementations, it is easy to implement and maintain.

I'd kind of need more info for more specific help.
Posts: 14,496
kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.
    Reply With Quote
Old 01-18-2006, 04:43 PM   #9
htismaqe htismaqe is offline
'Tis my eye!
 
htismaqe's Avatar
 

Join Date: Aug 2000
Location: Chiefsplanet
Casino cash: $10269900
I haven't ever designed a search engine, but I did work on a program that could search headers on Usenet and download just .GIF and .JPG attachments automatically...

Posts: 100,022
htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.
    Reply With Quote
Old 01-18-2006, 04:59 PM   #10
Simplex3 Simplex3 is offline
MVP
 
Simplex3's Avatar
 

Join Date: Sep 2003
Casino cash: $10004900
Quote:
Originally Posted by kepp
...
I'd kind of need more info for more specific help.
Wow.

Here's some more info.

It's *nix based, FreeBSD 6 to be exact. It's going to be MySQL for now, if it breaks the 60M row barrier buying Oracle won't be an issue. The web-app will be running apache2/php5 but the search engine can update on a schedule rather than with every insert, so Perl is fine for that task, too (only two *nix languages I'm really comfortable with).

I'm very familiar with database design, tuning indexes, etc. I should be fine there once I figure out how to design the engine itself.

* The traffic volume will be fewer than 1000 queries per minute at all times. Likely fewer than 100 per minute.

* These are all keyword based queries against free-form text. Nothing pre-set.

* The search will originate from a web page, but will go through a search object. The entire app is OOP.

* It won't be industry specific, it's pretty much whatever they want to put in there. On the plus side, I'm only required to index and search 4 character fields across two tables. I'm not comfortable just using a sql query per search because I have to join three tables to get the results by region.

I need them indexed by region so I'm guessing I'll have a set of index tables for each region so that will cut down on total rows per table. There will never be a cross-region search.

The only other monkey-wrench I have is that the content can be scheduled. Each item has a start and end date that must be adhered to.
Posts: 28,527
Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.
    Reply With Quote
Old 01-18-2006, 05:00 PM   #11
Simplex3 Simplex3 is offline
MVP
 
Simplex3's Avatar
 

Join Date: Sep 2003
Casino cash: $10004900
Quote:
Originally Posted by htismaqe
I haven't ever designed a search engine, but I did work on a program that could search headers on Usenet and download just .GIF and .JPG attachments automatically...

You were looking for pictures of people's dogs, right?
Posts: 28,527
Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.Simplex3 would the whole thing.
    Reply With Quote
Old 01-18-2006, 05:08 PM   #12
htismaqe htismaqe is offline
'Tis my eye!
 
htismaqe's Avatar
 

Join Date: Aug 2000
Location: Chiefsplanet
Casino cash: $10269900
Quote:
Originally Posted by Simplex3
You were looking for pictures of people's dogs, right?
We weren't looking for anything in particular. We were providing a "service".

If you were looking for dogs, they could be found in alt.sex.bestiality.*.
Posts: 100,022
htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.htismaqe is obviously part of the inner Circle.
    Reply With Quote
Old 01-18-2006, 09:43 PM   #13
ferrarispider95 ferrarispider95 is offline
She reads at a sophomore level
 
ferrarispider95's Avatar
 

Join Date: Jul 2005
Location: KANSAS
Casino cash: $10004945
just build a form out of php and query the database, mysql & php are super easy
__________________
www.steerplanet.com
Show Steers Directory and Forum

www.emporiaks.org
Free Emporia, KS Apartment Rental Listings
Posts: 1,493
ferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking aboutferrarispider95 is the dumbass Milkman is always talking about
    Reply With Quote
Old 01-19-2006, 08:56 AM   #14
kepp kepp is offline
MVP
 
kepp's Avatar
 

Join Date: Aug 2005
Casino cash: $5299212
Quote:
Originally Posted by Simplex3
It's *nix based, FreeBSD 6 to be exact. It's going to be MySQL for now, if it breaks the 60M row barrier buying Oracle won't be an issue. The web-app will be running apache2/php5 but the search engine can update on a schedule rather than with every insert, so Perl is fine for that task, too (only two *nix languages I'm really comfortable with).

I'm very familiar with database design, tuning indexes, etc. I should be fine there once I figure out how to design the engine itself.

* The traffic volume will be fewer than 1000 queries per minute at all times. Likely fewer than 100 per minute.

* These are all keyword based queries against free-form text. Nothing pre-set.

* The search will originate from a web page, but will go through a search object. The entire app is OOP.

* It won't be industry specific, it's pretty much whatever they want to put in there. On the plus side, I'm only required to index and search 4 character fields across two tables. I'm not comfortable just using a sql query per search because I have to join three tables to get the results by region.

I need them indexed by region so I'm guessing I'll have a set of index tables for each region so that will cut down on total rows per table. There will never be a cross-region search.

The only other monkey-wrench I have is that the content can be scheduled. Each item has a start and end date that must be adhered to.
That setup will be more than enough to handle 100/min.

* I try to let the SQL handle most of the work - especially if you're using MySQL because its fast. I wouldn't be wary of joining multiple tables as long as the DB is tuned/indexed properly. My main query joins 6 tables and it does fine.
* You don't necessarily need a separate table for each region. Just have a region_code or region_id field in the table you use for the indexing. That will make it easier to maintain and will allow for cross-region searching JUST IN CASE someone changes their mind.
* Are you actually going to "index" the 4 character fields, or are you just going to use "WHERE field like 'abc%'" queries? If you're actually going to index them, you may want to have a separate table just for the indexes whose rows 'point' to corresponding rows in the other table(s).
* The date scheduling can actually be pretty easy. You'll have a "main" table where you keep the stuff that will be the objects of you searches. Just add 'start_date' & 'end_date' fields and include the appropriate constraints in your query.
* Since your indexes will possibly point to rows in more than one table, sub-selects and/or unions will come in handy. I know Oracle supports them, but the last time I used MySQL, it didn't. You may want to check into that.

This is getting a little long so I'll make another post here in a while...
Posts: 14,496
kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.
    Reply With Quote
Old 01-19-2006, 09:09 AM   #15
kepp kepp is offline
MVP
 
kepp's Avatar
 

Join Date: Aug 2005
Casino cash: $5299212
Actually, after thinking a little about it, if you use a 'region_id' field in your main table, you wouldn't need to use sub-selects. You'd have something like this for your tables:

CREATE TABLE object_table
(
object_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
region_id INT UNSIGNED NOT NULL,
text_field VARCHAR(4) NOT NULL,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL,
INDEX idx1 (object_id, region_id, start_time, end_time)
) TYPE = InnoDB;

CREATE TABLE index_table
(
indexed_text VARCHAR(4) NOT NULL,
object_id INT UNSIGNED NOT NULL,
INDEX idx2 (indexed_text)
) TYPE = InnoDB;

...and a main query like this:

SELECT ot.object_id
FROM object_table ot, index_table it
WHERE it.indexed_text like ... or = ...
AND ot.object_id = it.object_id
AND ot.start_time <= NOW()
AND ot.end_time >= NOW()
...

Then, if you needed to add other constraints like the status of an account or something, you just add the appropriate table to the FROM clause and add another AND in, an you're done.
Posts: 14,496
kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.kepp threw an interception on a screen pass.
    Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On

Forum Jump




All times are GMT -6. The time now is 10:57 PM.


This is a test for a client's site.
Fort Worth Texas Process Servers
Covering Arlington, Fort Worth, Grand Prairie and surrounding communities.
Tarrant County, Texas and Johnson County, Texas.
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2024, vBulletin Solutions, Inc.