Thursday, June 3, 2010

10 Search Engines That Access The Invisible Web

I'm not sure how this directly applies to music but it sure is fascinating. We think of the Web as everything that Google can find, but did you know that there's a huge amount of data that's not indexed or searchable?

It's estimated that the size of the searchable Web is 167 terabytes (a terabyte is 1024 gigabytes) while the so-called "Invisible Web" or "Deep Web" is 91,000 terabytes!! Wow, that's a lot of data that can't be easily found.

Why isn't this data available via Google? Google sends out spiders to regularly index websites, but there are some that require a password that just won't allow that kind of access. These include private networks and library sties, which have huge amounts of information.

There are a number of ways to access the data of the "invisible web" though, and here are 10 search engines that are expert in just such a task, thanks to a great article on MakeUseOf. I'll give you a brief overview here, but see the entire article for more detail.

1) Infomine has been built by a pool of libraries in the United States. Some of them are University of California, Wake Forest University, California State University, and the University of Detroit. Infomine ‘mines’ information from databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other resources.

2) The WWW Virtual Library is considered to be the oldest catalog on the web and was started by started by Tim Berners-Lee, the creator of the web. So, isn’t it strange that it finds a place in the list of Invisible Web resources? Maybe, but the WWW Virtual Library lists quite a lot of relevant resources on quite a lot of subjects. You can go vertically into the categories or use the search bar. The screenshot shows the alphabetical arrangement of subjects covered at the site.

3) Intute is UK centric, but it has some of the most esteemed universities of the region providing the resources for study and research. You can browse by subject or do a keyword search for academic topics like agriculture to veterinary medicine. The online service has subject specialists who review and index other websites that cater to the topics for study and research.

4) Complete Planet calls itself the ‘front door to the Deep Web’. This free and well designed directory resource makes it easy to access the mass of dynamic databases that are cloaked from a general purpose search. The databases indexed by Complete Planet number around 70,000 and range from Agriculture to Weather. Also thrown in are databases like Food & Drink and Military.

5) Infoplease is an information portal with a host of features. Using the site, you can tap into a good number of encyclopedias, almanacs, an atlas, and biographies. Infoplease also has a few nice offshoots like for kids and Biosearch, a search engine just for biographies.

6) DeepPeep aims to enter the Invisible Web through forms that query databases and web services for information. Typed queries open up dynamic but short lived results which cannot be indexed by normal search engines. By indexing databases, DeepPeep hopes to track 45,000 forms across 7 domains.

7) IncyWincy is an Invisible Web search engine and it behaves as a meta-search engine by tapping into other search engines and filtering the results. It searches the web, directory, forms, and images.

8) DeepWebTech gives you five search engines (and browser plugins) for specific topics. The search engines cover science, medicine, and business. Using these topic specific search engines, you can query the underlying databases in the Deep Web.

9) Scirus has a pure scientific focus. It is a far reaching research engine that can scour journals, scientists’ homepages, courseware, pre-print server material, patents and institutional intranets.

10) TechXtra concentrates on engineering, mathematics and computing. It gives you industry news, job announcements, technical reports, technical data, full text eprints, teaching and learning resources along with articles and relevant website information.

