Mass validation of websheets
It was at night, I had a lot to do, and here comes Harmoshka, who brought about 20k web-shelves on the cheap. But he warned me right away, that half of them are not valid. And it took me half a day to check even 1k niches by hand, so I figured out, that I need a chekalka. Seller websheets offered to buy it for $200, the price was clearly inadequate and no one, of course, did not buy.
The topic seemed to quiet down, but then Harmon knocks and shoves my part of the websheets to check by hand. I was not satisfied with such layouts, so I decided I should think over how to check them. There were many ideas how to do this, but the simplest and most obvious one came to fruition.
As far as the number of shells was very high and half of them were dead, on the first stage I had to check them by http code 200 OK. , which tells me that there is some page at this URL. To complete the task I chose the usual bash, of course, php is much more versatile, but it takes longer to write and run it on a hosting service, the script running time is limited almost everywhere. Here is the listing of the ready script:
wget -t 1 -T 5 says that we will try to connect to the address once and set the timeout to 5 seconds. Otherwise, it might take several days. If you want to see a detailed report on the console instead of the raw links, remove /dev/nul.
After about 5 hours of checking, the script finished its work and got a file with half of all existing websheets. After looking through the links by hand, it became clear that there is a lot of trash. For example, instead of a shell a page with the message “Aren’t you crazy? It was absurd to look through the received links by hand. So I moved on to the second stage.
We have a list of links where some page exists, now we need to determine if it is a shell or some nonsense. Since 99% of all the web shells were from rst, we decided to download the page and grep it for the right info, in this case r57shell. Here is the listing:
As you can see this is the same script, only wget downloads the file and then grep is done for r57shell. After another 4 hours of checking, we got only finished and valid webshields in the output. Voila.
Certainly, it is possible to write a whole program, which will check the rights, set iframe and even show the approximate distribution of resource popularity, but as they say, all in good time, place and, of course, its own price. The main thing to remember is that most of the tasks can be delegated to the machine, especially those which are connected with routine work.
1) For the script to work, bash must be installed on the system. If in linux it is usually defotl, in bsd you have to install it like this: whereis bash, cd /urs/ports/…, make make make install 2) The path to bash may be different, you can locate bash to find out where it is. 3) Before you start the script, you have to create a file with the result. You can do it with touch file command and change the path in the script. 3) Before running the script, you need to chmod 755 to it and run from the same directory like this ./script 4) I do not recommend running from a normal server, it is better if it will be anti-abuse or hosting with ssh.
The author does not pretend to present the problem completely. He just described his personal experience of solving the problem of checking web-sheets for validity in a simple and popular way. Additions to this article are welcome, criticism with specific examples of better implementation is welcome too.
The rights to this article belong to the author. Reprinting, using parts of it, etc. for personal purposes on other resources is only permitted with the author’s verbal agreement.
Copyright (C) 2006 Emptiness specially for https://ver.sc
online shopping without cvv code australia