Deep Web, Dark Internet and Darknets

From IT

(Difference between revisions)
Jump to: navigation, search
(To search the Deep Web)
(The Deep Web)
Line 33: Line 33:
==== To search the Deep Web ====
==== To search the Deep Web ====
 +
Most users tend to use search engines in a very elementary way, and perhaps for this reason the navigation on the Deep Web is limited to a smaller part of Web users. Despite of this fact, the Deep Web is considered to be the " '''''paradigm for the next generation Internet''''' " (2005, Deep Web FAQ, par. 35). In fact, a good use of the Deep Web can drastically reduce the time necessary for a research and give back high quality information: let consider only the fact that, if the Surface Web contains 1% of the information available on the Web, the Deep Web contains the 99% of them.<br>
 +
With the evolution of the Web and with the portals describe below, the Deep Web will become easier to use and surf.</p>
In order to search Deep Web content, it's necessary to use specific portals, such as for instance '''CompletePlanet''' (http://www.completeplanet.com/). Because of the presence of thousands of databases that contain Deep Web content, CompletePlanet offers the possibility to navigate these Deep Web databases.<br>
In order to search Deep Web content, it's necessary to use specific portals, such as for instance '''CompletePlanet''' (http://www.completeplanet.com/). Because of the presence of thousands of databases that contain Deep Web content, CompletePlanet offers the possibility to navigate these Deep Web databases.<br>
The linked image [http://www.detools.ca/wp-content/2010/11/completeplanet3.jpg] shows the home page of the portal, with the list of all the available dynamic searcheable databases. it's in fact possible to to go to various topic areas (medicine, art&design, science, politics, and so on) and find content that are not display by using conventional search engines [http://www.detools.ca/wp-content/2010/11/complete_planet4.jpg].<br>
The linked image [http://www.detools.ca/wp-content/2010/11/completeplanet3.jpg] shows the home page of the portal, with the list of all the available dynamic searcheable databases. it's in fact possible to to go to various topic areas (medicine, art&design, science, politics, and so on) and find content that are not display by using conventional search engines [http://www.detools.ca/wp-content/2010/11/complete_planet4.jpg].<br>
Line 45: Line 47:
* using a search engine that contains both Surface and Deep Web resources, it's possible to make researches in databases by usgin specific research terms:
* using a search engine that contains both Surface and Deep Web resources, it's possible to make researches in databases by usgin specific research terms:
** on Google and InfoMine: what you are looking for (database OR repository OR archive)
** on Google and InfoMine: what you are looking for (database OR repository OR archive)
-
** on Teoma: what you are looking for (resources OR meta site OR portal OR pathfinder).
+
** on Teoma: what you are looking for (resources OR meta site OR portal OR pathfinder) (http://techdeepweb.com/4.html).
===How big is it?===
===How big is it?===
It seems that the Deep Web is 500 times bigger than the Surface Web, containing 7.500 terabytes of data and 550 billion of documents (Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden Value". The Journal of Electronic Publishing 7).
It seems that the Deep Web is 500 times bigger than the Surface Web, containing 7.500 terabytes of data and 550 billion of documents (Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden Value". The Journal of Electronic Publishing 7).

Revision as of 22:22, 4 January 2011

Contents

Introduction

Deep Web, Dark Internet and Darknet represent three concepts that might seem similar and that are sometimes used as synonimous. Really, these three concepts represent three very different worlds.

Hearing terms like these, they may seem fascinating, perhaps because are subjects that does not usually get talk about on the Web. More in deep, when a user surfs the Web, he thinks that all the information available all over the world are also available on it. There are, on the other hand, areas of the World Wide Web that cannot be easily accessed by the public.

In the following sections, these three concepts will be explained, in a way in which the differences between them can be clear.

The Deep Web

The Deep Web it's simply that part of the Internet which is not indexed by search engines.
Before to begin, we have to consider a preliminary difference between the Surface Web and the Deep Web. [1]

Surface Web is the term used to identify that portion of the World Wide Web that is indexed by conventional search engines: in other words, is what you can find by using general web search engines.

On the contrary, the Deep Web is defined as that portion of the Wolrd Wide Web that is not accessible through a research executed using general search engines, and is much bigger than the previous one (http://www.internettutorials.net/deepweb.asp).

Beyond the trillion pages a search engines such as Google knows, there is a really vast world of hidden data. This content could be:

  • the content of database, that is accessible only by query;
  • files such as multimedia ones, images, software;
  • the content on web sites protected by passwords or other kinds of restrictions;
  • the content of "full text" articles and books;
  • the content of social networks;
  • financial information;
  • medical research.

Nowadays, we have to consider also other kinds of content, such as:

  • blog postings;
  • bookmarks and citations in bookmarking sites;
  • flight schedules.

Search engines and the Deep Web

Search engines rely on crawlers, or spiders, that wander the Web following the trails of hyperlinks that link the Web together: it means that spidersindex the addresses of the pages they discover.

The negative aspect referable to this indiscriminate crawl approach had been replaced with the so called "popularity of pages" in a search engine like Google: in other words, the most popular pages, and so those that register the highest frequence of research, have priority both for crawling and displaying results.

In the Deep Web, happens that spiders, when finding a page, don't know what to do with it: it means that spiders can record these pages, but aren't able to display the content of them. The most frequent reasons can be refereable to technical barriers (database driven content, for instance) or decisions taken by the owners of web sites (the necessity to be register with a password to access the site, for instance), that make impossible for spiders to do their work. (http://websearch.about.com/od/invisibleweb/a/invisible_web.htm)

Another important reason refers to the linkage: if a web document is not linked to another, it will never be discovered.

Commercial search engineselaborated some methods that permit to navigate the Deep Web and find resources on specific Web Servers. Sitemap Protocol is one of them, and allows search engines to crawl the sites in a more intelligent way (http://en.wikipedia.org/wiki/Deep_Web).

To search the Deep Web

Most users tend to use search engines in a very elementary way, and perhaps for this reason the navigation on the Deep Web is limited to a smaller part of Web users. Despite of this fact, the Deep Web is considered to be the " paradigm for the next generation Internet " (2005, Deep Web FAQ, par. 35). In fact, a good use of the Deep Web can drastically reduce the time necessary for a research and give back high quality information: let consider only the fact that, if the Surface Web contains 1% of the information available on the Web, the Deep Web contains the 99% of them.

With the evolution of the Web and with the portals describe below, the Deep Web will become easier to use and surf.

In order to search Deep Web content, it's necessary to use specific portals, such as for instance CompletePlanet (http://www.completeplanet.com/). Because of the presence of thousands of databases that contain Deep Web content, CompletePlanet offers the possibility to navigate these Deep Web databases.
The linked image [2] shows the home page of the portal, with the list of all the available dynamic searcheable databases. it's in fact possible to to go to various topic areas (medicine, art&design, science, politics, and so on) and find content that are not display by using conventional search engines [3].
Other Deep Web search engines are (http://websearch.about.com/od/invisibleweb/tp/deep-web-search-engines.htm):

  • Clusty, that is a meta search engine able to combine results from different sources and give back the best possible result;
  • SurfWax, that gives the possibility to obtain results from different search engines at the same time, and to create personalized set of sources;
  • InternetArchive, that gives access to specific searcheable topics such as live music, audio and printed materials;
  • Scirus, that is dedicated only to scientific material;
  • USA.gov for access information and databases from the USA government.</p>

Search engines may find deep data, but their coverage sometimes give back only less relevant content. In addition, the research of this kind of data needs a certain ability in navigating the web. A possible technique to find Deep Web resources without using specific portals like the above mentioned ones is for instance this:

  • using a search engine that contains both Surface and Deep Web resources, it's possible to make researches in databases by usgin specific research terms:
    • on Google and InfoMine: what you are looking for (database OR repository OR archive)
    • on Teoma: what you are looking for (resources OR meta site OR portal OR pathfinder) (http://techdeepweb.com/4.html).

How big is it?

It seems that the Deep Web is 500 times bigger than the Surface Web, containing 7.500 terabytes of data and 550 billion of documents (Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden Value". The Journal of Electronic Publishing 7).

Personal tools