Robots on the Net

From IT

Jump to: navigation, search

Contents


Introduction

Aim of this paper is to investigate and discuss about a phenomenon that is becoming popular on the internet. Robots are running on the net accomplishing different kind of tasks and replacing human activity for several purpouses. Basically robots operate when is required an activity too complex for a human being, or maybe just too repetitive and boring. Initially robots accomplished task for common aim, such as retrieve web pages in the net. Nowadays robots are instead personalized, users program them for several uses. Thus, beside web crawlers, other bots conceived for personal needs are appearing. For instance in some cases in online videogames human cheats using robot that play or do repetitive task for them in order to make an high score or to accumulate points.

However, use of bots is wide and cover different tasks but in general it can be differentiated in two opposite group. The first group is the one related to positive usage of bots: here a bot is used in order to filter spam mail, keep safe and efficient a website or more in general to memorize and index new websites that are put in the net. In case of collaborative website, such as wikipedia, bots are used to check the quality of pubblications and avoid that vandals damage contents. These operations improve the quality of experience in the internet and particularly help in managing the immense quantity of documents and informations spreaded all over the web. In the paper I will explain for instance how a web crawler works and why it is important for internauts community.

But, as I said before, there is a second group that refers to negative usage: here malicious user program bots in order to steal sensible data, such as credit card number, or damage software system and websites leading DDoS attacks. Doing illecit operations using bots allow malicious users to keep their identity in the anonimity. If at the begins negative usage was done by experts that for ethical code attacks e-commerce or institutions in their opinion not so ethics, nowadays the situation is different. Criminal organisations are the client of those bad user and the aim is mainly steal or destroy “annoying” websites.

As can be noticed by these paragraphs the subject is quite wide and currently the phenomenon is growing and evolving. I will try to have a general overview about the kind of bots that are daily operating in the net and then I will focus on two cases in which robots are essential. The first case will be about wikipedia and the second one will be about botnet and DoS attacks. During my excursus the always present questions will be: what bots exactly do to improve quality of internet? looking at what is happening on the internet, bots can be considered an advantage or a risk for future internauts community? Is the internet safe and controllable if we don’t use bots? How can we limit bad bot usage? Why spam is so difficult to avoid? Let’s try to answer these questions.

What is a bot?

In informatic’s terminology a “bot” (abbreviation of robot) is a software that runs on the web using the same channels used by humans[1](4 Jan 2011). Bots interact with other network services intended for people as if it was a real person. A bot is able to surf on the net visualizing different web pages, send e-mail messages, chat with humans or play with them in a videogames. Actually, most of people using the internet, especially the world wide web, is not completely aware about the existence of bots, even if they daily exploit the benefits derived by their activities. Let assume that everybody uses regularly a search engine to find information on the world wide web: how many users are aware that the onerous work of finding and indexing webpages inside search engine database is done by a spider? Probably few people and probably the largest part won’t care about these technical aspects. Nevertheless, bots usage is a phenomenon that is growing and in next year everybody should be aware of their existance, especially because bots, as we will see, are not used just for improve services but also to accomplish illecit task and damage people and services provided by the internet. Before facing these theme is better to have an overview on general uses of bots, so in the next section we will answer to the following question: what bots do to improve internet services? Let’s list and describe the most common.

Web crawlers

Web crawlers or spiders are the most common bots and maybe the most useful. Their activity consists on cross the web following hypertextual links found inside text, harvesting informations about contents of webpages and memorizing and indexing them in the database of search engines[2](4 Jan 2011). Thanks to web crawlers a web site can be retrieved everytime a query is made on a search engine. If we think that in the World Wide Web we have more or less 255 millions active websites[3](7 Jan 2011) we can understand that bots replace humans in complex activities that they wouldn’t be able to do by themselves. Spiders may also be used to interact dynamically with a site in a particular way, for example to exploit or locate arbitrage opportunities for financial gain.

Looking at spiders activity is clear that they represent an advantage for internet community, namely spiders are essentials for the access and the navigation to the world wide web and for these case is possible to say that bots represent and advantage and not a risk.

IRC bots

An IRC bot is a set of scripts or an independent software used for Internet Relay Chat. As we know, Internet Relay Chat is a popular form of real-time internet text messaging. In this case bots are used to perform automated functions on chat. Basically IRC bots remains connected to a IRC server 24 hours simulating a user in the chat. In this way bot allows a chat room to remains opened keeping the configurations previously setted by the users, since a channel opened on IRC is existent if at least one user is present. IRC bots remaining always connected to a chat room, maintain the original configuration of it[4](7 Jan 2011).

IRC bots can also talk with other users. One of the first and most famous chatterbots (prior to the Web) was Eliza[5](10 Jan 2011), a program that pretended to be a Rogerian psychotherapist and answered questions with other questions. Other tasks that can be performed by IRC bots are reporting weather, sports scores, zip code information, converting currency or other units, and much more. In some cases IRC bots are used to entertain us or learn from us, like Jabberwacky[6](10 Jan 2011).

Another role of IRC-bots is to lurk in the background of a conversation channel, commenting on certain phrases uttered by the participants. They can do this thanks to pattern matching. This is sometimes used as a help service for new users, or even for mild censorship. In this way IRC bots can guard channels, by managing kicks and bans and instantly kicking spammer/flooders[7](8 Jan 2011).

As in the case of spiders, IRC bots contribute to improve the experience of using internet, so even in this case they represent and advantage for internauts community.

Spambot

This kind of bot is instead controversial since it can be used for good purpouses as well as bad ones[8](10 Jan 2010). Some of them are in fact able to harvest email address behaving like webcrawlers. They surf the web retrieving public email and using it to send spam messages. The problem of spamming represent obviously a disatvantage for internauts community: users are annoyed to have their email box full of useless and sometime offensive messages while Internet Service Providers oppose to spam for the cost of traffic generated by indiscriminate use of emails. To face this problem by the has been created another kind of spambot that perform the opposite action: it filters email deleting useless and annoying spamming mail. Nevertheless, the problem of spamming remains at the level of traffic, in this case it represents a cost for Internet Service Provider, and also for user spamming is never definitively eliminated cause bots easily retrieve their emails and sometimes are able to elude filters.

Computer game bot

These bot are maybe the most common and “experienced” by videogamers generation. Computer game bot are artificial intelligence software that controls players inside a videogame: they may play against other bots or against human player or play in cooperation with other bots or with human players. These bots are able to run via Internet, Lan or in a local session. The intelligence of a computer game bot can vary greatly. Advanced bots are able to learn the way in which humans play and response to them according to the game context.

Computer game bot has evolved in the recent years mainly because of the advent of online videogames. As soon as programmers understood the popularity of onlines games they started to program “useful” bots. These bots, that are usually sold on several websites, play the game substituting humans and accomplishing repetitive tasks to gain for example money or scores. The most known example is World of Warcraft: it is an online fantasy MMORPG that count 12 millions subscribed players[9](11 Jan 2011). Several bots are available on the net[10](11 Jan 2011). Basically these bots play for humans and do several tasks: the aim is generally to gain advantage and let the avatar become stronger without spend hours on computer. Blizzard, that is the company producer of WoW, is working to detect bots in the online game platform and ban players that use it.

In the case of computer game bot, the evolution took them to illecit use demostrating that the problem of illecit come up in common activities where a possibility of earn money is tangible.



Thus, there’s a lot of bot in the net. Some of them are strictly conceived for users needs and in many cases are users that program their own bots for different purposes. For istances a shopbot is a program that shops around the Web on your behalf and locates the best price for a product you're looking for. A knowbot[11](5 Jan 2011) is a program that collects knowledge for a user by automatically visiting Internet sites and gathering information that meets certain specified criteria.

Until now we have seen more or less useful and “good practise” bot that help human in activities around the web. As I anticipated in the introduction there are other practises that refers to bot usage: in fact malicious people program bot in order to steal passwords or identity, or to spy activities of unaware users. Before go in deep in this direction I would like to present a case in which bots behave as sentry and fight against those malicious user that want to damage important services and targets in the internet world. Is the case of wikipedia.

Bot as Sentry: Wikipedia Vandalism Detection

Wikipedia is a free, web-based, collaborative, multilingual enciclopedya. Its power and success is due to the fact that every one can contribute to edit it in a real-time. So, wikipedia is continously growing and contents are always updated by contributors. Thanks to this wikipedia has became a pole for internauts community. Unfortunately, the strength of wikipedia is also a weakness: the fact that everyone can subscribe and help editing pages in a easy way allow for the presence of malicious user, the so called vandals. In the last years wikipedia has been continously damaged by vandals, that publish inappropriate contents on the encyclopedia with the aim of reduce the quality of the service. Here is the defintion given by wikipedia:

"Vandalism is any addition, removal, or change of content in a deliberate attempt to compromise the integrity of Wikipedia. Examples of typical vandalism are adding irrelevant obscenities and crude humor to a page, illegitimately blanking pages, and inserting patent nonsense into a page. Vandalism is prohibited” .

Keeping Wikipedia safe to the attacks of vandals can be considered an onerous task that requires a lot of work and an immediate intervention. Wikipedia has at the moment 3’522’000 articles just for english language. To have an overview on the total number of webpages you can look at this url http://www.wikipedia.org/. However, wikipedia produced a lot of webpages and have a control of them becomes very difficult if this task is accomplished just by human. Bots give a consistent help in detecting vandals over the platform. Several techniques are employed to discover obscene words or inappropriate contents. On this link[12] is pubblished a list of bots used to keep wikipedia safe from vandalism actions. Many of them use regular expression to find racist terms or obscenities, or also fake or spam links. Let’s present here some example:

- AntiSpamBot is a bot programmed in Perl. Its task is to reverts spam links inserted into Wikipedia. Basically it is able to detect spam links thanks to a blacklist continously updated. This bot was used until December 2007.

- SaleBot is currently active in wikipedia. It looks for modfications in articles and check if the IP that correspond to those changement is of a recently subscribed user. In this case the bot compares terms inserted in the modification with a dictionary of expressions. In this dictionary are listed inappropriate terms with a score associated to each of them. So, for each text modified we have a score. If the score overcome the threshold setted the text is removed and the user banned.

- PseudoBot aims to remove a subset of inappropriate edits to the date pages. Date pages are those pages in which are listed all the events or births related to a given date. Inappropriate edits has to be considered as links to nonexistent pages or link to inappropriate events: for instance a malicious user can put inside year 1495 the discover of Americas when Americas were discovered by Colombo in the year 1492. Being similar the years some people may be confused and use false information in their activities.

The role of bots is essential in preserving the autorithy and the credibility of wikipedia. Nevertheless the practises in which bot are involved are many, as I said several times, and the use of bots is also aimed to bad practises. In this sense bot are “useful” to keep the identity anonimous and exploit several computers to lead attacks against institutions or person, steal identity, password, credit card and reserved informations. Fortunately seems that we are going in a more safe direction. Software developer are delivering more and more safe programs that avoid external intrusion. However, also hacker are growing their experience and developing new programs to access private network.

Bots and illecit practices

The possibility of having a software that performs automated actions is a very powerful feature. Nevertheless bot are used also for accomplish illegal task and this compromise the safety of the net and user that surf on it. Nets of computer, the so called botnets, are remotly controlled by someone and directed through a target or spyed on in order to retrieve sensible data. Cybercrime is a wide term that obviously is not only reduced to the usage of robots, but somehow bots are very useful and powerful tools for “hackers”. This term is maybe not correct, in fact hackers are people that like to exercise their intellectual skills solving problems or surrounding limitations. The aim of this figure is just to demonstrate his ability. Nowadays the problem is instead related to criminal that operate on internet, working for real criminal organisations, with the aim of damage something or steal something or do something like that. The correct term nowadays used for these people is “craker”, that is informatics criminal. Robots are thus tools that craker uses to accomplish their malicious and criminal tasks, for instance accessing to several computers and creating a botnet where the computers “captured” are called zombie or exactly robot.

Botnet

A botnet is a net of computers connected to internet. Those computers are part of a whole that is controlled by a unique entity, that is botmaster. How can happen that? Generally the possibility for botmaster to create a net of computer is due to lacks in the safety of computers or lacks in attention by users or administrators. In this way botmaster is able to infect and take control over a net of computers using virus or trojan. Controller of the botnet is then able to exploit compromised system to hurl distributed attacks such as denial-of-service against any system in the web or accomplish other illecit operations. Sometimes botmasters operate for criminal organisation.

Computer “captured” in the botnet are called zombie or bot. In this sense those computers behave as bots and accomplish automated task as soon as botmaster send to them instructions. Each instruction is generally transmitted by Internet Relay Chat[13](8 Jan 2011) that is connected to a given channel situated on a given server. Server is protected by a password and only the author is able to access. The chat channel is thus the medium thanks to which a botmaster can control contemporaneously computers of a botnet and give to them different orders. For instance a botmaster can give the order to perform a Denial of Service attack to a chosen system. This attack is called “distributed” since many computers perform it at the same time. In other cases botmasters use peer to peer nets to control computers[14](9 Jan 2011).This is a relatively new architecture for botnets [4]. The fact that this botnets are distributed, and small, make them difficult to locate and destroy.

Botnet are generally used also for others purpouses. For instance once a botmaster having the control on a botnet is able to spy it and steal password and other sensible data. Is possible also to have access to computer infected by backdoor or proxy services that ensure anonimity in the web.

Distributed Denial of Service Attacks

Once a botnet is created and controlled a botmaster is able to cast a Distributed Denial of Service Attack. These kind of attack are done sending several requests to a Web server, FTP server or email server. First botmaster install on the zombies a program, that is conceived to accomplish DoS attacks. After that botmaster can give the instruction to the unaware controlled computers (sometimes thousands) that activate DoS programs. Once do that all the computers controlled start to send data towards the selected target. If the number of computer is high the flow of data transmitted becomes so big that the target system is completely saturated and become unstable.

A well known and recent example of Distributed Denial of Service Attack, pheraphs the biggest of the history was hurl against Wikileaks, the famous website that in the recent months has been at the center of global attention[15](10 Jan 2011). Wikileaks released a quarter-million U.S. diplomatic cables. After that an enormus DDoS attack has been launched. We are talking about 10 Gb/s. However, Wikileaks survived after that attack even if, in order to repsonse to continuos tentative of obscuring the website, wikileaks had to engage a mirroring campaign on Twitter. At the moment there are at least 1300[16](10 Jan 2011) mirror that has copied and distributed contents of wikileaks on different providers Dns.

Conclusions

At the end, bots are used for the most disparated things but in general they are very useful when we talk about internet management. Some tasks, such as retrieval and updating of webpages over the World Wide Web would be impossible without the support of robots. Robots are also essentials for operations of patrol, we have in fact seen how robots contribute in mantain Wikipedia safe from vandals or in keeping opened IRC channel and ban uncouth users. Bots are the response of humans to the increasing of complexity in Internet. The quantity of informations, practises and internauts, that constitute the so complex virtual reality called Internet, are not controllable and manageable without robots and automated system that accomplish very complex tasks. Nevertheless bots are also able to destablish the balance and cause damages even if seems that at the moment developers has programmed effective defenses against crackers and the damages created by them are generally isolated and focused on target that has particular positions, such as wikileaks.

Personal tools