Is this posible...? the index the internet starting from one page....? Yes, I have tested this: starting from iuliumaniu.ro, a site that was builded for a Christian comunity, by me a long time ago (when I create such a program for college diploma). This consist in some small steps which followed can collect the entire Internet content, or what you need from there.
Steps for crawling the internet:
1. set u = starting url
2. load u
3. [?store data about page u]
4. process page u - extract links from u content
5. foreach u = extracted link go yo step 2.
1st step is simple - you have to select a page/site where exists some external links, to walk and on other sites. 2nd step means that you have to get that page content ussualy using a http web request; 3rd step can be placed before or after step 4, depending about what do you need to collect (if you need to collect and the links, or you have to process the stored data, probably this step will be after step 4); it consists in some data storage (database, xml, ...) implementation. Processing the page content be managed in more maniers, I can give to you 2 simple ways to process this - XML/HTML or process as a text, eventually using the regular expressions - XML is more harder to implement but this can give to you some advantages. And in the end you follow all page urls and jump to step 2 - this will ensure that the internet will be indexed entirely by you application.
This is a small theory about crawlers, it is not very dificult to implement it. Come back soon for a small implemenation of this.
a7ee74b9-e40b-4de4-bebb-16c463df74a2|0|.0