UP | HOME

Accessing CRSP data on Linux

Table of Contents

The CRSP data structure is rich and complex. The information about accessing it, is distributed across different online sources and manuals. Below I will briefly review the steps necessary to have a functioning installation on your Linux machine and getting started with using CRSP database.

Installation

Using your username and password access the Move It Cloud service. From the folder Product Download->Stock_1925_ANNSUB get the file faz201812_cadb.zip. This is a compressed file. Just extract it in a suitable place. Then from the folder Utility Download->CUPL download the file setupLinux[64].bin. Select the 64 or 32 version as it suits you. Execute this file to install the CUPS utility and libraries on the local system. Provide a suitable path to the installation program when requested.

Configuration

This is a rather obscure piece of information but it is very relevant. Move in the root folder in which the CUPS programs were installed. There should be a folder named "accbin[64]". With or without the 64 part, depending on the installed version of the tools. Inside this folder, find the utility crsp_setup.[csh,sh] and run it. Use the .sh version if you are using the Bash shell or the .csh version if you are using cshell or kshell. The command

> echo $SHELL

will tell you which shell version is in use. You will be requested to provide the location (path) of your CUPS programs and of your databases. The output of crsp_setup.[csh,sh] is a text file containing a set of environment variable definitions. These are the variables the CUPS utilities will rely upon. The default name is something like mycrsp.kshrc but you can change it. Now you have two choices. One possibility is adding the lines of this file to your login initialization files. You can use any editor to do it. Check the manual of your shell to discover the proper names of these files. Another possibility is just parsing this file before using the command line tools,

> source mycrsp.kshrc

In this way the necessary environment variable definitions are added to you working session. For more information on the configuration step check the release notes.

Example of use

The utility you are more likely wanting to use is ts_print, which "prints time series". Two steps are necessary to use this program. First, you have to prepare a "request" text file that contains the definition of what you want to get. You can use the editor of your choice for this. Second, you have to pass the request file to the program. Assuming the name of the prepared file is query.txt, from anywhere on your system just run (this works ONLY if your environment variables where properly defined, see Configuration above)

> $CRSP_BIN/ts_print query.txt

The output produced by ts_print is a very convenient and well structured ASCII file. In general, query.txt contains all the information ts_print needs. If no output file is specified in query.txt, then the data are printed to standard output. This is very useful when you want to do post-processing using shell's pipe.

An example of request file is the following

ENTITY
#download data for Microsoft
LIST|TICKER MSFT |ENTFORMAT 3
END

ITEM
#adjusted opening price
ITEMID AdjOpenPrc
#adjusted highest traded price during the day
ITEMID Adjaskhi
#adjusted lowest traded price during the day
ITEMID Adjbidlo
#adjusted official closing price
ITEMID Adjprc
#adjusted total volume
ITEMID Tvol
#adjusted dividend
ITEMID Adjdiv
#number of outstanding shares
ITEMID Shr
END

DATE
CALNAME daily|RANGE 19860313-20181231
CALFORMAT 1
END

OPTIONS
X ITEM,YES|Y DATE,YES|Z ENTITY,YES,1
END

The lines beginning with # are comments. Using the previous file you can download several daily quantities, prices, volume and dividend, about Microsoft stock. I use the command

> $CRSP_BIN/ts_print query_MSFT.txt | gzip > MSFT.txt.gz

to compress the output of ts_print on the fly and save the compressed data in MSFT.txt.gz. There are many things you can do by tweaking the request file. I'm not going to review them. A good starting point might be request file on-line resource.

Another useful utility program is dstksearch.sh. It can be run with

> $CRSP_BIN/dstksearch.sh

However, what this program does is just looking for security information in the $CRSP_DSTK"/headfile.dat. This is a simple ASCII file that you can parse yourself, for instance with the grep command, to find information about one company or ticker

>grep "GENERAL ELECTRIC|" "$CRSP_DSTK/headfile.dat"

will produce the output

12060 20792          GENERAL ELECTRIC CO                    1 19251231-19620701
12060 20792          GENERAL ELECTRIC CO             GE     1 19620702-19680101
12060 20792 36960410 GENERAL ELECTRIC CO             GE     1 19680102-20181231
32168 23832          GENERAL ELECTRIC CO LTD         GLE    2 19620702-19640908
32168 23832          GENERAL ELECTRIC CO LTD         GLE   -2 19640909-19641123
32168 23832          GENERAL ELECTRIC CO LTD         GLE    2 19641124-19680101
32168 23832 36964060 GENERAL ELECTRIC CO LTD         GLE    2 19680102-19681128
32168 23832 36959520 GENERAL ELECTRIC & ENGH ELEC CO GLE    2 19681129-19691204
45858 21430 73650810 PORTLAND GENERAL ELECTRIC CO    PGN    1 19680306-19860211
91204 21430 73650884 PORTLAND GENERAL ELECTRIC CO    POR    1 20060410-20181231

The different columns contain in order:

  1. PERMNO, the CRSP unique issue identification code;
  2. PERMCO, the CRSP unique company identification code;
  3. CUSIP, the COMPUSTAT company identification code;
  4. The name of the company;
  5. The ticker symbol associated to the company stock;
  6. The exchange where the stock is traded: 1=NYSE, 2=AMEX, 3=NASDAQ, 4=ARCA
  7. The range of available data

Created: 2023-07-06 Thu 18:13