From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


  • downloaded 615 (385 .com) raw content (bz2 format)

Next steps:

  • build feature lists using new wikipedia lexicon


  • have amazon and shopping for lexicon.txt

Next steps:

  • need ebay; figure out soap/php interface to ebay and get
  • rebuild cat maps


  • have 120K/174K front pages; 1link.csv has "key features" now

Next steps:

  • build corpus of key features for each category in 1link.csv


  • prototyped code, seen it work for "thinkpad laptops"

Next steps:

  • test out search_by_product/brand on "600x ipod nano" etc.
  • write search_by_model code


  • use new city/state features to detect city/state combos quickly on "contact us" pages


  • have wikipedia and product lexicon merged

Next steps:

 foreach ($titlearr as $title) {
   expand the associations on
     productbrand:   any product-brand combo appearing
     brandmodel:     anything that looks like a model (alphanumeric or 00 or short)
     productfeature: any product-feature combo appearing
     productunit:    any product-unit mapping      
 foreach ($brandarr as $brand) {
   // determine product associations
 foreach ($brandmodel as $brand => $modelarr) {
   foreach ($modelarr as $model => $n) {
     //  determine product associations
 how to determine product associations
 read in the productbrand table
 read yhoo search response, google suggest reponse
 detect "ma" features from output
 for brand links, check the productbrand table
 for brand-model links, check the productbrand table