So why would you want to customize the extraction? There are a number of things you can do. Some users wanted the title in uppercase. Another user didn't like ReaderwareVW extracting categories, she wanted to use her own. You can even substitute your category. Someone else wanted all new CDs to be set to Played. All this is possible with a simple script.
The way this is implemented is that ReaderwareVW will call a Python
script called vwuserexit.py
after it extracts data from a web site and before it adds a CD to the
database. Using this script you can customize the data. You will find a
basic copy of this script in your readerware\scrapers
directory.
It is called vwuserexit_sample.py, you must rename it to vwuserexit.py.
Mac OS X users will find the scrapers folder inside the application package. Control-click on the ReaderwareVW program icon. Select Show package contents from the popup menu. Double click on Contents, Resources, Java, scrapers.
Here is what it looks like:
# Scraper user exit.By itself this script does nothing, but it is the starting point for developing your own scripts. Note the global statements. These identify the global variable names that ReaderwareVW uses, in other words the variable "title" contains the extracted title etc. This is really all you need to know about how the process works, you need to set or change the contents of the variables to the required data. So for example, if you don't want ReaderwareVW to extract categories from a web site, you could add the following line at the end of the script:
#
# Copyright © 1999-2009 Readerware Corporation. All Rights Reserved.
#
# To activate this sample vwuserexit script it must be renamed to
# vwuserexit.py. If vwuserexit.py exists it is called immediately
# before the scraper process returns. You can change any of the
# global variables to customize the extraction process.
# More info: http://www.readerware.com/help/vwCustomExtraction.html
#
#
# This is the basic vwuserexit.py script, it does nothing by itself.
# You can add your statements to customize the extracted data.
#
import string
def userextract():
global title,actor1,actor2,actor3,actor4,actor5,actor6
global actor7,actor8,actor9,actor10,director,writer
global screenwriter,photographer,composer,editor,series
global upc,isbn,lccn,dewey,userNumber,format,studio,place
global date,copyDate,mpaa,wide,closedCap,sound,copies
global rating,condition,category,viewed,pflag,eflag,value
global valueDate,comments,dateEntered,dataSource,cart,ordered
global copies,location,keywords,book,author,running,color
global track1,track2,track3,track4,track5
global track6,track7,track8,track9,track10
global track11,track12,track13,track14,track15
global track16,track17,track18,track19,track20
global user1,user2,user3,user4,user5,user6,user7,user8,user9,user10
global usedprice,usedcount,collectibleprice,collectiblecount
global newprice,newcount,listprice,salesrank,available
global buyerwaiting,editionNumber,image,fullDateFormat,source
# Add your statements here
userextract()
category = ""For something a little trickier, suppose you wanted to map the categories extracted from a web site to your own categories:
if (string.find(category, "Mystery") != -1):You would need to add these kinds of statements for every category and every web site. You can probably see the basic idea, check for a string in the extracted category, if found replace the category with another.
category = "Movies : Mystery"
title = string.upper(title)This may all look very strange, the script is written in the Python language. If you know Python, you're all set. If you know another scripting language like Perl, it shouldn't be much of a challenge.
The process is very simple. First define a user column for the data. Select the Preferences menu item, User Columns tab. Enter the column title and check the active box for each user defined column you want to add.
Next you have to tell ReaderwareVW to move the data to this user column. You do this in a custom extraction script. For example the following line will store the sales ranking in the first user defined column:
user1 =
salesrank
The optional fields are:
available |
Item is available |
buyerwaiting |
Buyer waiting for item |
collectiblecount |
Number of collectible items
available |
collectibleprice |
Lowest collectible price |
listprice |
List price |
newcount |
Number of new items available |
newprice |
Lowest new price |
salesrank |
Sales Rank |
usedcount |
Number of used items available |
usedprice |
Lowest Used Price |
editionNumber |
Edition Number |
Note that you don't have to install Python, all necessary libraries are included with the Readerware distribution.
Python is a very powerful language and fairly easy to learn. If you're wondering about the name, yes it was named after Monty. Unfortunately I cannot offer support on Python itself. You will need to discover the power of Python for yourself.
A book I really like is "Learning Python by Mark Lutz", it has a
very
readable approach, covers the basics and advanced topics. The "Python
Pocket
Reference by Mark Lutz" is a handy thing to keep by your keyboard.
Another good one is "Text Processing in Python by David Mertz". A
friend
recommends "Python Programming on Win 32 by Mark Hammond", it covers
Python
with particular emphasis on using it with Windows.
Use ReaderwareVW as normal. When extracting data ReaderwareVW will output debugging information and any error messages to a log file, rwuser.log. You can view this file in any text editor.
Also with debug on, ReaderwareVW will write the HTML file it retrieved from the web site to the Readerware directory as trace.html. This can be useful sometimes when debugging scripts.