Data extraction with bots
Your bots will have to:
- (more or less) automatically recognize patterns inside web pages
- find the data you want them to scrape
- extract contents from pages
Techniques:
- regular expressions
HTML::TreeBuilder
HTML::TokeParser
- "intelligent" wrappers
Once you have semantic back, you can use object to link to other data sources.
Slideshow ^ |< << Slide 37 of 40 >> >| |