|
01-18-2006, 04:13 PM | #2 |
Most Valuable Poster
Join Date: Oct 2003
Casino cash: $9480002
|
would www.atomz.com be a solution that you could utilize?
|
Posts: 36,652
|
01-18-2006, 04:17 PM | #3 | |
MVP
Join Date: Sep 2003
Casino cash: $10004900
|
Quote:
Thanks for the link. |
|
Posts: 28,527
|
01-18-2006, 04:19 PM | #4 |
Please squeeze
Join Date: Jul 2003
Location: Clinton, MO
Casino cash: $3154644
|
Doesn't Google have where you can do local searches? I have seen alot of websites do local google searches.
|
Posts: 66,341
|
01-18-2006, 04:21 PM | #5 | |
MVP
Join Date: Sep 2003
Casino cash: $10004900
|
Quote:
|
|
Posts: 28,527
|
01-18-2006, 04:26 PM | #6 |
Superbowl MVP
Join Date: Oct 2005
Location: OOOOOOOOOOOOOLATHE
Casino cash: $9910252
|
|
Posts: 11,177
|
01-18-2006, 04:32 PM | #7 | |
Please squeeze
Join Date: Jul 2003
Location: Clinton, MO
Casino cash: $3154644
|
Quote:
|
|
Posts: 66,341
|
01-18-2006, 04:37 PM | #8 |
MVP
Join Date: Aug 2005
Casino cash: $5299212
|
I've worked on a couple different search engines and designed/implemented one from the ground up. They can pretty much get as complicated as you want.
* What volume of traffic are we talking about? If the volume is light, you can have a very simple implementation that will work fine. For instance, my current employer needed me to design an engine that would handle 100 million requests/day, so that kind of did away with the lightweight options. * Is there an existing infrastructure that it has to conform to? Windows or Linux? IIS or Apache? etc... * Will it be type-in queries or some sort of category or index-based search? * I take it that this will be a web page/site on an intranet? * What kind of data are you talking about? I know you said "text", but what industry? Pointers? Hmmm... * You can't spend too much time designing your database. You can have super-fast code, but if your relations and indexes are bad, it won't do squat. * The volume also comes into play when choosing your database (if you're not tied to a pre-existing installation). For example, from my experience, MySQL becomes unstable when a table reaches around 60 million rows. Oracle doesn't seem to have that problem. However MySQL is way faster than Oracle and Oracle is super-pricey. A lot of tradeoffs here. * If you can, its easier to use the "free" route: Linux, MySQL, PHP/Perl. Although this won't be as fast as other implementations, it is easy to implement and maintain. I'd kind of need more info for more specific help. |
Posts: 14,496
|
01-18-2006, 04:43 PM | #9 |
'Tis my eye!
Join Date: Aug 2000
Location: Chiefsplanet
Casino cash: $10269900
|
I haven't ever designed a search engine, but I did work on a program that could search headers on Usenet and download just .GIF and .JPG attachments automatically...
|
Posts: 100,022
|
01-18-2006, 04:59 PM | #10 | |
MVP
Join Date: Sep 2003
Casino cash: $10004900
|
Quote:
Here's some more info. It's *nix based, FreeBSD 6 to be exact. It's going to be MySQL for now, if it breaks the 60M row barrier buying Oracle won't be an issue. The web-app will be running apache2/php5 but the search engine can update on a schedule rather than with every insert, so Perl is fine for that task, too (only two *nix languages I'm really comfortable with). I'm very familiar with database design, tuning indexes, etc. I should be fine there once I figure out how to design the engine itself. * The traffic volume will be fewer than 1000 queries per minute at all times. Likely fewer than 100 per minute. * These are all keyword based queries against free-form text. Nothing pre-set. * The search will originate from a web page, but will go through a search object. The entire app is OOP. * It won't be industry specific, it's pretty much whatever they want to put in there. On the plus side, I'm only required to index and search 4 character fields across two tables. I'm not comfortable just using a sql query per search because I have to join three tables to get the results by region. I need them indexed by region so I'm guessing I'll have a set of index tables for each region so that will cut down on total rows per table. There will never be a cross-region search. The only other monkey-wrench I have is that the content can be scheduled. Each item has a start and end date that must be adhered to. |
|
Posts: 28,527
|
01-18-2006, 05:00 PM | #11 | |
MVP
Join Date: Sep 2003
Casino cash: $10004900
|
Quote:
|
|
Posts: 28,527
|
01-18-2006, 05:08 PM | #12 | |
'Tis my eye!
Join Date: Aug 2000
Location: Chiefsplanet
Casino cash: $10269900
|
Quote:
If you were looking for dogs, they could be found in alt.sex.bestiality.*. |
|
Posts: 100,022
|
01-18-2006, 09:43 PM | #13 |
She reads at a sophomore level
Join Date: Jul 2005
Location: KANSAS
Casino cash: $10004945
|
just build a form out of php and query the database, mysql & php are super easy
__________________
www.steerplanet.com Show Steers Directory and Forum www.emporiaks.org Free Emporia, KS Apartment Rental Listings |
Posts: 1,493
|
01-19-2006, 08:56 AM | #14 | |
MVP
Join Date: Aug 2005
Casino cash: $5299212
|
Quote:
* I try to let the SQL handle most of the work - especially if you're using MySQL because its fast. I wouldn't be wary of joining multiple tables as long as the DB is tuned/indexed properly. My main query joins 6 tables and it does fine. * You don't necessarily need a separate table for each region. Just have a region_code or region_id field in the table you use for the indexing. That will make it easier to maintain and will allow for cross-region searching JUST IN CASE someone changes their mind. * Are you actually going to "index" the 4 character fields, or are you just going to use "WHERE field like 'abc%'" queries? If you're actually going to index them, you may want to have a separate table just for the indexes whose rows 'point' to corresponding rows in the other table(s). * The date scheduling can actually be pretty easy. You'll have a "main" table where you keep the stuff that will be the objects of you searches. Just add 'start_date' & 'end_date' fields and include the appropriate constraints in your query. * Since your indexes will possibly point to rows in more than one table, sub-selects and/or unions will come in handy. I know Oracle supports them, but the last time I used MySQL, it didn't. You may want to check into that. This is getting a little long so I'll make another post here in a while... |
|
Posts: 14,496
|
01-19-2006, 09:09 AM | #15 |
MVP
Join Date: Aug 2005
Casino cash: $5299212
|
Actually, after thinking a little about it, if you use a 'region_id' field in your main table, you wouldn't need to use sub-selects. You'd have something like this for your tables:
CREATE TABLE object_table ( object_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, region_id INT UNSIGNED NOT NULL, text_field VARCHAR(4) NOT NULL, start_time DATETIME NOT NULL, end_time DATETIME NOT NULL, INDEX idx1 (object_id, region_id, start_time, end_time) ) TYPE = InnoDB; CREATE TABLE index_table ( indexed_text VARCHAR(4) NOT NULL, object_id INT UNSIGNED NOT NULL, INDEX idx2 (indexed_text) ) TYPE = InnoDB; ...and a main query like this: SELECT ot.object_id FROM object_table ot, index_table it WHERE it.indexed_text like ... or = ... AND ot.object_id = it.object_id AND ot.start_time <= NOW() AND ot.end_time >= NOW() ... Then, if you needed to add other constraints like the status of an account or something, you just add the appropriate table to the FROM clause and add another AND in, an you're done. |
Posts: 14,496
|
|
|