Intro
Flicker is a shell script that downloads(flicks) images from flickr. Given a string, it searches for that string on flickr and downloads the resulting images.
Flicker is a shell script that downloads(flicks) images from flickr. Given a string, it searches for that string on flickr and downloads the resulting images.
Firstly, why Flickr? The initial idea was to get Google image results,
but I decided that Google's results were a bit too generic... plus,
Google results contain Java scripts that can't be parsed :) . A photo
album like flickr is more interesting because people actually shot most
of those images with their own cameras, and they generally shoot
something that catches the eye.
Secondly, it makes sense to script a monotonous task like searching for
images because it saves the user from the un-necessary page navigation, waiting,
and constant interaction that is required of most web pages. I can just start
a flick-er download in the background and go about my work. I can check
back later and find all the images saved in a folder, for easy viewing.
Finally, such a script can be modified to work with other sites like picasa,
and serves as a simple starting point for a selective image download web-crawler.
Usage:
./flick.sh "IIT Bombay"
Input is taken from the command line, as an argument to the shell script.
A search is executed on flickr for this argument string.
The output contains the first 12 images from flickr, matching the search.
The images downloaded are of moderate size (about 500x500 pixels), and they
are saved in separate directory. The directory is named using the search string.
Each image downloaded also contains a description (if any) of that image,
and the flickr ID of the contributor.
This script is based primarily on wget, grep and sed. The 'algorithm' is as follows:
Click here to download the script file (tar.gz)
#!/bin/bash #Flick-Er : A Flickr bot Shell Script. #Date : 10-Aug-2008 #Author : Sriram Kashyap M S (08305028) #Description: The script takes one argument. It searches Flickr for the argument, # and downloads medium sized (around 500x300 pixels) images of the first # 10 search results. It saves these files in a directory created in the path of # the script. The name of this directory is the string passed as the argument. #NOTE: You need to modify the proxy.config file and put your username,password # for the proxy server. ###############START FUNCTION DEFINITIONS##################### searchFlickr() # this function downloads the flickr page for a given query. # $1 is the query, $2 is the page number of the result. { flickURL="http://flickr.com/search/?q=$1&page=$2" wgetnow $flickURL ".temp.html" } parse_flicker_page1() #this function parses the search results and finds the links to the pictures { cat $1 | grep "photo_container" | sed 's/.*href="//' | sed 's/".*//' } parse_flicker_page2() #this function parses the page containing the image and finds the image url, and the alt text. it separates them by a pipe { cat $1 | grep "<img" | grep "static" | grep -v "view profile" | grep -v "\/groups\/" |head -n 1 | sed 's/^<img src="//' | sed 's/title.*//' | sed 's/alt=/|/' | tr -d '"' } wgetnow() #this function retrieves a web page. It uses wget, with proxy option. #the first argument is the url to get, and the 2nd argument is the target filename where page is saved. { wget --proxy -O"$2" -o".wget.log" $1 if [ $? -ne 0 ] then echo "Failed to download file: $1" echo "Please check if you are connected to internet, and if proxy.config is correct" exit 0 fi } ##############END OF FUNCTION DEFINITIONS################## #base path of the urls, because some urls are relative. basePath="http://flickr.com" #replace spaces in search query by + signs value=`echo $1 | tr ' ' '+'` #Only one argument must be supplied if [ $# -lt 1 ] then echo "Please enter a search string as argument. Example: ./flickr.sh moon" exit 0 elif [ $# -gt 1 ] then echo "Specify multiple words within quotes. Example: ./flickr.sh \"iit bombay\"" exit 0 fi #delete temp files. do this at the beginning, because the script may be prematurely ended. rm -f .wget.log rm -f .temp.txt rm -f .temp.html #set proxy server from file. This includes user name and password export http_proxy=`cat proxy.config` echo "Please Wait..." #the target directory is the place where images are saved export targetDir=$1 #check if target directory exists, else make it if ! test -d "$targetDir" then mkdir "$targetDir" fi #search on flickr, for $value, and get page 0 of the result searchFlickr $value 0 #Find the links to the individual result pages, and put them in .temp.txt #Also, save only the first 10 entries. Discard the rest. parse_flicker_page1 .temp.html | head -n 10 > .temp.txt #for each result link, do the following. (the links are read from .temp.txt) while read pic do #append the base path to the picure name, to construct the URL pic=`echo $basePath$pic` #get the result page and store it in .temp.html wgetnow $pic ".temp.html" #get the url and alt text of the image from the result page URL_ALT=`parse_flicker_page2 .temp.html` #imageURI is the actual link to the image on the flickr server. #imageAlt is the alt text of the image. This is part of the final file name. #imageRand is the last numeric part of the file name of the image. This #is appended to the filename, so we can save multiple images with same Alt Text. imageURI=`echo $URL_ALT | sed 's/?.*//'` imageAlt=`echo $URL_ALT | sed 's/^.*|//' | tr ' ' '_'` imageRand=`echo $imageURI | sed 's/^.*_//'` echo "Getting $imageAlt" #download the image to destination folder wgetnow $imageURI "./$targetDir/$imageAlt$imageRand" done < ".temp.txt" exit 0