As the design of our program is now stable, we can write the code which is an implementation of our solution.
Example 10.1. Backup Script - The First Version
#!/usr/bin/python # Filename: backup_ver1.py import os import time # 1. The files and directories to be backed up are specified in a list. source = ['/home/swaroop/byte', '/home/swaroop/bin'] # If you are using Windows, use source = [r'C:\Documents', r'D:\Work'] or something like that # 2. The backup must be stored in a main backup directory target_dir = '/mnt/e/backup/' # Remember to change this to what you will be using # 3. The files are backed up into a zip file. # 4. The name of the zip archive is the current date and time target = target_dir + time.strftime('%Y%m%d%H%M%S') + '.zip' # 5. We use the zip command (in Unix/Linux) to put the files in a zip archive zip_command = "zip -qr '%s' %s" % (target, ' '.join(source)) # Run the backup if os.system(zip_command) == 0: print 'Successful backup to', target else: print 'Backup FAILED'
$ python backup_ver1.py Successful backup to /mnt/e/backup/20041208073244.zip
Now, we are in the testing phase where we test that our program works properly. If it doesn't behave as expected, then we have to debug our program i.e. remove the bugs (errors) from the program.
You will notice how we have converted our design into code in a step-by-step manner.
We make use of the os
and time
modules and so we import them. Then, we specify the files and directories
to be backed up in the source
list. The target directory
is where store all the backup files and this is specified in the
target_dir
variable. The name of the zip archive that
we are going to create is the current date and time which we fetch using the
time.strftime()
function. It will also have the
.zip
extension and will be stored in the
target_dir
directory.
The time.strftime()
function takes a specification
such as the one we have used in the above program. The %Y
specification will be replaced by the year without the cetury. The
%m
specification will be replaced by the month as a
decimal number between 01
and 12
and
so on. The complete list of such specifications can be found in the
[Python Reference Manual] that comes with your Python
distribution. Notice that this is similar to (but not same as) the
specification used in print
statement (using the
%
followed by tuple).
We create the name of the target zip file using the addition operator
which concatenates the strings i.e. it joins the
two strings together and returns a new one. Then, we create a string
zip_command
which contains the command that we are
going to execute. You can check if this command works by running it on
the shell (Linux terminal or DOS prompt).
The zip command that we are using has some options
and parameters passed. The -q
option is used to indicate
that the zip command should work quietly.
The -r
option specifies that the zip command should work
recursively for directories i.e. it
should include subdirectories and files within the subdirectories as well.
The two options are combined and specified in a shorter way as
-qr
. The options are followed by the name of the zip archive
to create followed by the list of files and directories to backup. We
convert the source
list into a string using the
join
method of strings which we have already seen
how to use.
Then, we finally run the command using the
os.system
function which runs the command as if it
was run from the system i.e. in the shell - it returns
0
if the command was successfully, else it returns an
error number.
Depending on the outcome of the command, we print the appropriate message that the backup has failed or succeeded and that's it, we have created a script to take a backup of our important files!
You can set the source
list and
target
directory to any file and directory names
but you have to be a little careful in Windows. The problem is that
Windows uses the backslash (\
) as the directory
separator character but Python uses backslashes to represent
escape sequences!
So, you have to represent a backslash itself using an escape
sequence or you have to use raw strings. For example, use
'C:\\Documents'
or r'C:\Documents'
but do not use
'C:\Documents'
- you are using an unknown
escape sequence \D
!
Now that we have a working backup script, we can use it whenever we want to take a backup of the files. Linux/Unix users are advised to use the executable method as discussed earlier so that they can run the backup script anytime anywhere. This is called the operation phase or the deployment phase of the software.
The above program works properly, but (usually) first programs do not work exactly as you expect. For example, there might be problems if you have not designed the program properly or if you have made a mistake in typing the code, etc. Appropriately, you will have to go back to the design phase or you will have to debug your program.
The first version of our script works. However, we can make some refinements to it so that it can work better on a daily basis. This is called the maintenance phase of the software.
One of the refinements I felt was useful is a better file-naming mechanism - using the time as the name of the file within a directory with the current date as a directory within the main backup directory. One advantage is that your backups are stored in a hierarchical manner and therefore it is much easier to manage. Another advantage is that the length of the filenames are much shorter this way. Yet another advantage is that separate directories will help you to easily check if you have taken a backup for each day since the directory would be created only if you have taken a backup for that day.
Example 10.2. Backup Script - The Second Version
#!/usr/bin/python # Filename: backup_ver2.py import os import time # 1. The files and directories to be backed up are specified in a list. source = ['/home/swaroop/byte', '/home/swaroop/bin'] # If you are using Windows, use source = [r'C:\Documents', r'D:\Work'] or something like that # 2. The backup must be stored in a main backup directory target_dir = '/mnt/e/backup/' # Remember to change this to what you will be using # 3. The files are backed up into a zip file. # 4. The current day is the name of the subdirectory in the main directory today = target_dir + time.strftime('%Y%m%d') # The current time is the name of the zip archive now = time.strftime('%H%M%S') # Create the subdirectory if it isn't already there if not os.path.exists(today): os.mkdir(today) # make directory print 'Successfully created directory', today # The name of the zip file target = today + os.sep + now + '.zip' # 5. We use the zip command (in Unix/Linux) to put the files in a zip archive zip_command = "zip -qr '%s' %s" % (target, ' '.join(source)) # Run the backup if os.system(zip_command) == 0: print 'Successful backup to', target else: print 'Backup FAILED'
$ python backup_ver2.py Successfully created directory /mnt/e/backup/20041208 Successful backup to /mnt/e/backup/20041208/080020.zip $ python backup_ver2.py Successful backup to /mnt/e/backup/20041208/080428.zip
Most of the program remains the same. The changes is that we check if there
is a directory with the current day as name inside the main backup
directory using the os.exists
function. If it doesn't
exist, we create it using the os.mkdir
function.
Notice the use of os.sep
variable - this gives the
directory separator according to your operating system i.e. it will be
'/'
in Linux, Unix, it will be '\\'
in Windows and ':'
in Mac OS. Using
os.sep
instead of these characters directly will make
our program portable and work across these systems.
The second version works fine when I do many backups, but when there are lots of backups, I am finding it hard to differentiate what the backups were for! For example, I might have made some major changes to a program or presentation, then I want to associate what those changes are with the name of the zip archive. This can be easily achieved by attaching a user-supplied comment to the name of the zip archive.
Example 10.3. Backup Script - The Third Version (does not work!)
#!/usr/bin/python # Filename: backup_ver2.py import os import time # 1. The files and directories to be backed up are specified in a list. source = ['/home/swaroop/byte', '/home/swaroop/bin'] # If you are using Windows, use source = [r'C:\Documents', r'D:\Work'] or something like that # 2. The backup must be stored in a main backup directory target_dir = '/mnt/e/backup/' # Remember to change this to what you will be using # 3. The files are backed up into a zip file. # 4. The current day is the name of the subdirectory in the main directory today = target_dir + time.strftime('%Y%m%d') # The current time is the name of the zip archive now = time.strftime('%H%M%S') # Take a comment from the user to create the name of the zip file comment = raw_input('Enter a comment --> ') if len(comment) == 0: # check if a comment was entered target = today + os.sep + now + '.zip' else: target = today + os.sep + now + '_' + comment.replace(' ', '_') + '.zip' # Create the subdirectory if it isn't already there if not os.path.exists(today): os.mkdir(today) # make directory print 'Successfully created directory', today # 5. We use the zip command (in Unix/Linux) to put the files in a zip archive zip_command = "zip -qr '%s' %s" % (target, ' '.join(source)) # Run the backup if os.system(zip_command) == 0: print 'Successful backup to', target else: print 'Backup FAILED'
$ python backup_ver3.py File "backup_ver3.py", line 25 target = today + os.sep + now + '_' + ^ SyntaxError: invalid syntax
This program does not work!. Python says there is a syntax error which means that the script does not satisfy the structure that Python expects to see. When we observe the error given by Python, it also tells us the place where it detected the error as well. So we start debugging our program from that line.
On careful observation, we see that the single logical line has
been split into two physical lines but we have not specified that
these two physical lines belong together. Basically, Python has
found the addition operator (+
) without any
operand in that logical line and hence it doesn't know how to
continue. Remember that we can specify that the logical line continues
in the next physical line by the use of a backslash at the end of the
physical line. So, we make this correction to our program. This is
called bug fixing.
Example 10.4. Backup Script - The Fourth Version
#!/usr/bin/python # Filename: backup_ver2.py import os, time # 1. The files and directories to be backed up are specified in a list. source = ['/home/swaroop/byte', '/home/swaroop/bin'] # If you are using Windows, use source = [r'C:\Documents', r'D:\Work'] or something like that # 2. The backup must be stored in a main backup directory target_dir = '/mnt/e/backup/' # Remember to change this to what you will be using # 3. The files are backed up into a zip file. # 4. The current day is the name of the subdirectory in the main directory today = target_dir + time.strftime('%Y%m%d') # The current time is the name of the zip archive now = time.strftime('%H%M%S') # Take a comment from the user to create the name of the zip file comment = raw_input('Enter a comment --> ') if len(comment) == 0: # check if a comment was entered target = today + os.sep + now + '.zip' else: target = today + os.sep + now + '_' + \ comment.replace(' ', '_') + '.zip' # Notice the backslash! # Create the subdirectory if it isn't already there if not os.path.exists(today): os.mkdir(today) # make directory print 'Successfully created directory', today # 5. We use the zip command (in Unix/Linux) to put the files in a zip archive zip_command = "zip -qr '%s' %s" % (target, ' '.join(source)) # Run the backup if os.system(zip_command) == 0: print 'Successful backup to', target else: print 'Backup FAILED'
$ python backup_ver4.py Enter a comment --> added new examples Successful backup to /mnt/e/backup/20041208/082156_added_new_examples.zip $ python backup_ver4.py Enter a comment --> Successful backup to /mnt/e/backup/20041208/082316.zip
This program now works! Let us go through the actual enhancements that we
had made in version 3. We take in the user's comments using the
raw_input
function and then check if the user actually
entered something by finding out the length of the input using the
len
function. If the user has just pressed
enter for some reason (maybe it was just a routine backup
or no special changes were made), then we proceed as we have done before.
However, if a comment was supplied, then this is attached to the name of
the zip archive just before the .zip
extension.
Notice that we are replacing spaces in the comment with underscores - this
is because managing such filenames are much easier.
The fourth version is a satisfactorily working script for most users, but there is
always room for improvement. For example, you can include a
verbosity level for the program where you can specify a
-v
option to make your program become more talkative.
Another possible enhancement would be to allow extra files and directories to be
passed to the script at the command line. We will get these from the
sys.argv
list and we can add them to our
source
list using the extend
method
provided by the list
class.
One refinement I prefer is the use of the tar command instead
of the zip command. One advantage is that when you use the
tar command along with gzip, the backup is
much faster and the backup created is also much smaller. If I need to use this
archive in Windows, then WinZip handles such
.tar.gz
files easily as well. The tar
command is available by default on most Linux/Unix systems. Windows users can
download
and install it as well.
The command string will now be:
tar = 'tar -cvzf %s %s -X /home/swaroop/excludes.txt' % (target, ' '.join(srcdir))
The options are explained below.
-c
indicates creation
of an archive.
-v
indicates verbose
i.e. the command should be more talkative.
-z
indicates the gzip
filter should be used.
-f
indicates force
in creation of archive i.e. it should replace if there is a file
by the same name already.
-X
indicates a file which contains a list of filenames
which must be excluded from the backup.
For example, you can specify *~
in this file to not
include any filenames ending with ~
in the backup.
The most preferred way of creating such kind of archives would be using
the zipfile
or tarfile
module
respectively. They are part of the Python Standard Library and available
for you to use already. Using these libraries also avoids the use of the
os.system
which is generally not advisable to use
because it is very easy to make costly mistakes using it.
However, I have been using the os.system
way of
creating a backup purely for pedagogical purposes, so that the example
is simple enough to be understood by everybody but real enough to be
useful.