29/08/2007
simple shell script to download the frontpage of major greek newspapers
www.in.gr has a very usefull feature on their site, it has all major greek newspapers’ front page scanned and posted in a place called kiosk.
Even though this is very nice, it doesn’t fit my viewing needs, I want all newspapers on my local drive every morning so I can view them with my favorite image viewer. In order to do so I created a small shell script.
The script:
#!/bin/sh
#simple shell script to download the frontpage of major greek newspapers from www.in.gr/kiosk/
#feel free to modify it as you wish :)
YEAR=`date +%Y`
MONTH=`date +%m`
DAY=`date +%d`
mkdir -p ~/in.gr/${YEAR}/${MONTH}/${DAY}/
cd ~/in.gr/${YEAR}/${MONTH}/${DAY}/
i="0"
j="0"
k="0"
exclude=(30 31 32 33 34 58 60 61 69 78 79)
include=()
while [ "$i" -lt 80 ]; do
if [ "$i" = "${exclude[$j]}" ]; then
echo "excluding $i"
j=$[$j+1]
i=$[$i+1]
else
wget -q -nc -c http://assets.in.gr/dGenesis/assets/Content60/Issue/${YEAR}/${MONTH}/${DAY}/${i}_h.jpg
i=$[$i+1]
fi
done
include_len=${#include[*]}
while [ "$k" -lt $include_len ]; do
wget -q -nc http://assets.in.gr/dGenesis/assets/Content60/Issue/${YEAR}/${MONTH}/${DAY}/${include[$k]}_h.jpg
k=$[$k+1]
done
Just add the script to your user’s crontab and you are ready. Since not all newspapers come out in the morning at the same time, you can add that script to run on your crontab every one hour in the morning from 7 o’clock until 12 o’clock.
Some details:
The kiosk has an interesting and weird “feature”. To find a newspaper’s ID-url you can go to www.in.gr/kiosk/ and click on the newspaper you want. A window with a thumbnail will appear, click on the thumbnail and a new pop-up window with a bigger image will come forward. Now right click on the image and select copy image location. It should be something like: http://assets.in.gr/dGenesis/assets/Content60/Issue/2007/08/29/3_h.jpg. Even though most newspapers feature sequential numbering until number 34, some come with a higher number like 53, 60, 61, 78, 79. So while one might think that it’s safe to iterate until 80 to catch them all, that’s not the case. Some sports and all local newspapers have ID numbers like 69389! In order to cope with these, for anyone who might want them, I added another loop in the script that uses an “include” array. Put any high numbers above 80 inside the include array (seperated by a whitespace) and the script will download them. Since I don’t like reading sports and gossip newspapers I have added an exclude array in the main loop in order to avoid downloading them. If you want to download all newspapers simply remove the numbers I have in my exclude list.
I don’t understand what’s the purpose of having both small sequential numbers and bigger “random” ones as IDs. Do you ?
Filed by kargig at 23:05 under Internet,Linux
No Comments | 5,831 views