Intro

Flicker is a shell script that downloads(flicks) images from flickr. Given a string, it searches for that string on flickr and downloads the resulting images.

Why flick-er?

Firstly, why Flickr? The initial idea was to get Google image results, but I decided that Google's results were a bit too generic... plus, Google results contain Java scripts that can't be parsed :) . A photo album like flickr is more interesting because people actually shot most of those images with their own cameras, and they generally shoot something that catches the eye.

Secondly, it makes sense to script a monotonous task like searching for images because it saves the user from the un-necessary page navigation, waiting, and constant interaction that is required of most web pages. I can just start a flick-er download in the background and go about my work. I can check back later and find all the images saved in a folder, for easy viewing.

Finally, such a script can be modified to work with other sites like picasa, and serves as a simple starting point for a selective image download web-crawler.

Using the script

Usage:
./flick.sh "IIT Bombay"

Input is taken from the command line, as an argument to the shell script. A search is executed on flickr for this argument string.
The output contains the first 12 images from flickr, matching the search. The images downloaded are of moderate size (about 500x500 pixels), and they are saved in separate directory. The directory is named using the search string. Each image downloaded also contains a description (if any) of that image, and the flickr ID of the contributor.

How stuff works

This script is based primarily on wget, grep and sed. The 'algorithm' is as follows:

Construct the search URL using the command line argument
Use wget to download the search results page
Search the page for specific tags/keywords. This gives us all the lines containing relevant links.
Strip out the URLs for individual results using sed
Follow each link now, to get the result page for each image
Finally, get a link to the correct image on the page, and download it

Downloads

Click here to download the script file (tar.gz)

Code


#!/bin/bash

#Flick-Er	: A Flickr bot Shell Script.
#Date 	: 10-Aug-2008
#Author	: Sriram Kashyap M S (08305028)

#Description: 	The script takes one argument. It searches Flickr for the argument,
#		and downloads medium sized (around 500x300 pixels) images of the first
#		10 search results. It saves these files in a directory created in the path of
#		the script. The name of this directory is the string passed as the argument.

#NOTE: 	You need to modify the proxy.config file and put your username,password
#		for the proxy server.


###############START FUNCTION DEFINITIONS#####################

searchFlickr()
# this function downloads the flickr page for a given query.
# $1 is the query, $2 is the page number of the result.
{
	flickURL="http://flickr.com/search/?q=$1&page=$2"
	wgetnow $flickURL ".temp.html"
}

parse_flicker_page1()
#this function parses the search results and finds the links to the pictures
{
	cat $1 | grep "photo_container" | sed 's/.*href="//' | sed 's/".*//'
}

parse_flicker_page2()
#this function parses the page containing the image and finds the image url, and the alt text. it separates them by a pipe
{
	cat $1 | grep "<img" | grep "static" | grep -v "view profile" | grep -v "\/groups\/" |head -n 1 | sed 's/^<img src="//' | sed 's/title.*//' | sed 's/alt=/|/' | tr -d '"'
}

wgetnow()
#this function retrieves a web page. It uses wget, with proxy option.
#the first argument is the url to get, and the 2nd argument is the target filename where page is saved.
{
	wget --proxy -O"$2" -o".wget.log" $1
	if [ $? -ne 0 ]
	then
		echo "Failed to download file: $1"
		echo "Please check if you are connected to internet, and if proxy.config is correct"
		exit 0
	fi
}

##############END OF FUNCTION DEFINITIONS##################

#base path of the urls, because some urls are relative.
basePath="http://flickr.com"

#replace spaces in search query by + signs
value=`echo $1 | tr ' ' '+'`

#Only one argument must be supplied
if [ $# -lt 1 ]
then
	echo "Please enter a search string as argument. Example: ./flickr.sh moon"
	exit 0
elif [ $# -gt 1 ]
then
	echo "Specify multiple words within quotes. Example: ./flickr.sh \"iit bombay\""
	exit 0
fi

#delete temp files. do this at the beginning, because the script may be prematurely ended.
rm -f .wget.log
rm -f .temp.txt
rm -f .temp.html

#set proxy server from file. This includes user name and password
export http_proxy=`cat proxy.config`

echo "Please Wait..."

#the target directory is the place where images are saved
export targetDir=$1

#check if target directory exists, else make it
if ! test -d "$targetDir"
then
	mkdir "$targetDir"
fi

#search on flickr, for $value, and get page 0 of the result
searchFlickr $value 0

#Find the links to the individual result pages, and put them in .temp.txt
#Also, save only the first 10 entries. Discard the rest.
parse_flicker_page1 .temp.html | head -n 10 > .temp.txt

#for each result link, do the following. (the links are read from .temp.txt)
while read pic
do
	#append the base path to the picure name, to construct the URL
	pic=`echo $basePath$pic`
	
	#get the result page and store it in .temp.html
	wgetnow $pic ".temp.html"

	#get the url and alt text of the image from the result page
	URL_ALT=`parse_flicker_page2 .temp.html`

	#imageURI is the actual link to the image on the flickr server.
	#imageAlt is the alt text of the image. This is part of the final file name.
	#imageRand is the last numeric part of the file name of the image. This
	#is appended to the filename, so we can save multiple images with same Alt Text.
	imageURI=`echo $URL_ALT | sed 's/?.*//'`
	imageAlt=`echo $URL_ALT | sed 's/^.*|//' | tr ' ' '_'`
	imageRand=`echo $imageURI | sed 's/^.*_//'`

	echo "Getting $imageAlt"
	#download the image to destination folder
	wgetnow $imageURI "./$targetDir/$imageAlt$imageRand"

done < ".temp.txt"

exit 0

Flick-er