Preparing to Download The Steps in Downloading Files identify DLI title and its delivery medium (ftp or cd-rom) know about different file types choose a tool to download the files have large enough storage space to hold the files choose a file naming convention
Delivery Medium The files of DLI titles are available either on CD-ROM, FTP, or both. Determining the delivery medium The DLI web siteDLI web site Name and Acronym of Products Name and Acronym of Products A list of Files A list of Files Searching the Collection Searching the Collection
Name & Acronym of Product Column specifies if a DLI title is available on CD, FTP, or Web
Name & Acronym of Product Exercise: Go to http://www.statcan.ca/english/Dli/contents.htm and select the link from “Name and Acronym of Product”
Name & Acronym of Product Exercise (continued): What delivery media are available for the following titles? Industrial Monitor Health PCCF+ General Social Survey, Cycle 14
A List of Files “Files” is a misnomer. This is a list of titles and the delivery media on which they are available.
Searching the Collection Searchs by DLI title result with a description of the product and the delivery medium OR
Searching the Collection the files that can be downloaded using ftp.
Delivery Medium Summary If the medium is CD-ROM, place an order from the web site link to “Submit an Order”. If the medium is FTP or Web, you must determine how you wish to download the file or files.
Background Knowledge Before discussing which method to use to download files, it is useful to understand two characteristics of files the encoding of their content, and the relationship between file extensions and their corresponding computing applications
Content and Extensions The encoding of file content: Binary executable, compressed, or proprietary (e.g., Self-extracting, Zip, IVT or PDF) ASCII plain text (e.g., raw data or read-me instructions)
Content and Extensions File extensions and applications: The extensions used with file names can help identify the general contents of files because of the relationship between specific extensions with applications. For example,.pdf is associated with the Adobe Acrobat Reader and a file with this extension is expect to contain a document.
Content and Extensions File extensions and applications: Knowing the application associated with a file extension can also help identify the nature of its encoded contents. The file formats of most applications are binary. For example,.pdf is a binary file format.
Basic Rules to Downloading Knowing whether a file is to be treated in binary or ASCII mode is fundamental to downloading files. Why? Because the file transfer protocol used to move files between computers operates in two modes: binary and ascii (or text).
Downloading Modes Which mode to use? Binary mode preserves all of the content in a file upon transfer, including text and special characters.
Downloading Modes Which mode to use? ASCII mode preserves text but lets the operating system process special characters as commands. ASCII also corrects the end-of-line characters between operating systems.
Downloading Modes Everything can be downloaded in binary mode and the contents will always be safe. The only disadvantage of downloading text files in binary is that end-of-line designators differ across operating systems.
End of Line Characters Operating Systems and End of Lines
End of Line Characters Exercise: View the user guide in cycle 4 of the GSS using WS_ftp both in binary and ascii mode. The file to view is c4microe.txt. (click on the file name, the mode, and then View)
Files and the FTP Modes The DLI FTP site contains for each title a ‘readme’ file that lists the names of all files, their FTP mode, a brief description, and the number of records and record length for data files.
Files and the FTP Modes The readme file for the General Social Survey is at the top of the gss directory and called Readgss.txt
Files and the FTP Modes The FTP mode for each file is identified as A for ASCII and B for binary
Readme File Content The brief description of the contents of files in the readme file is also helpful in knowing what to expect in each file.
Preparing to Download We’ve reviewed the delivery media of DLI titles and the different file types and their transfer mode. Next we need to discuss the tools to download files.
File Transfer Tools Two general types of file transfer tools for downloading DLI files: independent FTP clients Web browser FTP clients
Independent FTP Clients Different FTP clients have become popular on different operating systems. MS Windows : WS_FTP Mac OS : fetch UNIX : ftp
Independent FTP Clients One distinct advantage of all ftp clients is that they allow viewing and retrieving multiple files with a single command or click of a mouse button.
Independent FTP Clients These clients also allow setting the file transfer mode and generally provide a great deal of flexibility in controlling an ftp session.
Independent FTP Clients A disadvantage of these clients is that they rely strictly on the names of directories and files to display what is available for downloading. Therefore, you have to know what it is that you want to download by its file name.
Web Browser FTP Clients Each Web browser has some level of ftp capability incorporated. Two options exist in using most Web browsers to download files. connect to the DLI ftp site using the DLI FTP URL, id and password “searching the collection” on the DLI Web site
Using the DLI FTP URL Using the FTP URL, the browser displays the directory and file structure of the DLI FTP site. ID & password are displayed when using this method.
Using the DLI FTP URL Single or multiple directories or file can be selected using a combination of the shift and control key. Right- click of the mouse allows a “copy to folder” in IE.
“Searching the Collection” Links to files within DLI titles have been organized on the DLI Web site under the “Searching the Collection” section of the site.
“Searching the Collection” The files in this survey can be downloaded by right- clicking on the mouse and using “Save target as…”. The data file requires Id and password to access.
Summary of Pros and Cons FTP ClientDLI Web SiteWeb Browser Plus Have access to all of the files on the FTP site Has an interface similar to Windows Explorer in selecting files Full text descriptions simplify locating files Minus Must rely on abbreviated file and directory names ID and password must be entered on the URL Can only retrieve one file at a time
Compression Tools DLI uses two types of compression PKZIP (.zip) Self-extracting Zip File (.exe)
Compression Tools PKZIP can be uncompressed on multiple platforms MS Windows : WinZip http://www.winzip.com/ Mac OS : unstuffit http://www.aladdinsys.com/ UNIX : unzip http://www.info-zip.org/pub/infozip/UnZip.html
Compression Tools Self-extracting Zip files (.exe) are only executable on MS Windows / DOS. Some unzip utilities will also open self-extracting zip files, including WinZip and Unix unzip.
Compression Tools Pay attention in WinZip to the directory in which files are being written. Also, you may wish to turn off the option to restore the folder names used in the compressed archive.
File Sizes Pay attention to the sizes of files as you download them. The DLI Web site as well as the Readme file lists the compressed and uncompress sizes of files.
File Sizes You can also determine the uncompressed size of a file in WinZip before attempting to uncompress it.
Maxline Utility The DLI FTP site has a useful utility to check the record length and number of records in files. This is particularly useful in confirming the contents of raw data files.
Maxline Utility The maxline utility is under the directory: util and is named: maxline.exe Maxline uses DOS naming conventions (8.3). To find proper DOS names, you may need to use the DOS command: dir /x
Maxline Utility The line length of raw data files should match the maximum specified in documentation. And the number of records is identified as line feeds by maxline.
Naming Conventions You may choose to institute a naming convention to help store files locally. For example, you may choose to use the DLI directory names. Alternatively, you may use an accession number to categorize DLI titles.
Naming Conventions The only concern about changing names of files is that you may at some point need to return to the DLI FTP site to confirm something about a file. You’ll then need to know the original file name that is used on the DLI FTP site.