So far, this tutorial has only dealt with one type of data: a spreadsheet,
which we use in .csv (comma separated variable) file format. This format is
suitable for plotting data in the form of points or locations, but if you want
to show data on a per-region basis (such as life expectancy per country), the
easiest way to do so is to use an ESRI shapefile.
A shapefile is actually a group of several files that must be kept in the same
folder to work. There are a few ways to get them. The first is to simply
download a shapefile from one of any
number of websites (a list of suggestions is provided below). The second is to
create a new shapefile from a
spreadsheet or .csv data, which is primarily useful if you can’t find a
pre-existing shapefile with the right elements. The easiest way to create your
own shapefile is to use CartoDB.com, a mapmaking website. And finally, you can
add new data to a pre-existing
shapefile by adding columns in the
layer’s attribute table.
Start by searching for whatever dataset you like as an ESRI shapefile. Some
good places to start looking are:
Natural Earth Data. These
datasets are good as base maps, but probably won’t contain data that can be
used to color-code regions — these maps are usually just borders. However, if
you want a base map composed of vector data, this is a good place to start
Geocommons. User-submitted maps containing all
sorts of data. Search for any kind of map, then click on the “shapefile”
button to the right of the map (if available) to download the shapefiles
luck-of-the-draw eclectic collection of maps and shapefiles. Some shapefiles
contain data that can color-code countries, such as historical earthquake
occurrences per country
For the purposes of this example, I’m going to use the Life
Expectancy shapefile from Atlas of the Biosphere.
Although this dataset is labeled “Life Expectancy,” it actually has numerous
other statistics included in it, such as infant mortality rates, access to safe
water per country, etc. You can click on the link that says “Download a GIS
grid of this data (ESRI ArcGIS format). This will open up a new page, with a
button that says “Download Now.” Click on it.
It will download a zipped file. At the bottom of your screen, a Downloads bar
should 21appear with an icon that says “lifeexpectancy.zip.” We need to unzip
it, so double click the icon to open the zipped folder. Once the folder opens,
you should see all the files inside it. Go up a folder in your file tree so you
can see the folder the files are stored in, lifeexpectancy.zip. Right-click it
and select “Extract all.” Make sure you extract the files into a location where
you’ll be able to find them again.
To style this shapefile, you’ll want to choose a column of data such as Life
Expectancy and tell QGIS to color countries based on the values in that column.
In the interest of not reinventing the wheel, please see Ujaval Gandhi’s
tutorial on the basic vector
Some points to be aware of:
Assuming you’ve followed the steps above, you can skip to step 4 of Ujaval’s
tutorial. Steps 1–3 are instructions on how to get the data into QGIS
When you color countries according to values in a column, countries with no
values in that column will get grouped with the countries that have the
smallest values. QGIS assigns those countries the value “-99” for some reason
If you only get two colors in your map after assigning it a color scheme, try
changing the mode or adding more classes
Once you have styled your map to your satisfaction, please skip to the
Exporting Maps as Images section of this tutorial. You’ll
also add legends, colorbars, and other map features in that step.
CartoDB is a website that allows users to upload spreadsheets of geographic
information and plot them on a map of the world. Unfortunately, the map CartoDB
uses is in the wrong projection to work with SOS (see the write-up on CartoDB
for more information), but it’s still a useful tool for converting CSV files to
shapefiles. The process of converting text or image information to information
that is associated with location coordinates is called georeferencing. CartoDB
can do that for us.
In this case, getting the mapmaking software is very simple: go to
www.cartodb.com and create an account. CartoDB designs plans based on storage
space. If you’re importing a lot of data for individual maps, or creating a
lot of maps, you’ll need to get one of the paid versions. If not, scroll down
past the descriptions of the paid versions and click on “free version.”
For this tutorial I will be using the International Telecommunications Union’s
database of landlines and mobile phones registered throughout the world. These
files weren’t quite in the format CartoDB can use, so I had to mess with them
in Excel first. This is frequently the case with databases on the web, and
since this particular spreadsheet’s issues were very specific, I won’t go into
how I solved them. If you would like to recreate this dataset, please see the
collection of datasets provided along with this tutorial. The
file is called Fixed_tel_2000-2012.csv.
To start, go to the green Dashboard button in the
top-right corner of the page. This will get you to your dashboard page, where
your datasets are stored in table format. Click on the large
+ button labeled New Table.
Select Select a file, and find the spreadsheet you
need. CartoDB can take Excel, CSV, TSV, ESRI Shapefiles, KMLs and KMZs,
GeoJSON, GPS eXchange (GPX), OSM and BZ2, OpenDocument Spreadsheets (ODS), and
SQL. See the CartoDB
for more information. Click Open. Your table will
At the top of the screen, just above your table in the left hand corner, will
be the words Table view and Map
view. Click on Map view. This window will
appear. If it does not, click on the Options button in
the top right corner of the screen and select the
On the next window that appears, you’ll want to click You have a
column identifying administrative regions since our data is
referenced by country, not latitude and longitude coordinates. In the In
your table the polygons are identified by… field, select
field_1. This means that the countries are listed in
the field_1 column in your dataset. If you look at the column heading in
CartoDB’s Table View over the column with country names, you’ll see that
CartoDB has labeled it field_1. You want your data in
the World by Countries just
like the default settings say, so click Georeference.
A screen will come up with two boxes; one will be greyed out and say No
point data available for your selection and the other will say
Georeference your data with administrative regions. It
should be highlighted. Click Continue. The data will
take a minute to render.
CartoDB will then give you a message “X out of Y rows were successfully turned
into polygons!” If the number of rows turned into polygons was lower than you
expected, go through your data in Table view to make
sure country names are spelled correctly. If you have rows with no value in
field_1 (that’s the column with the country names), CartoDB thinks it’s
misreading the rows and will count them along with the rows it couldn’t
transform, so check to see if that’s the cause of the discrepancy. If it isn’t,
go into Map view and find the countries that aren’t
overlaid with a color— these are the ones CartoDB couldn’t parse. For example,
Iran in this dataset was labeled as “Iran (I.R.).” CartoDB couldn’t recognize
that so I changed it to “Iran” by double-clicking the name to edit the text.
You may have to delete and re-upload your table to make the changes take
effect, in which case you should make your changes in the original spreadsheet,
save that spreadsheet as a .csv file (or whatever format was originally used),
and upload it just as you did before.
Now click on the Options button in the top-right corner
and select Export.
Select shp as your file type and save your file in a
place you can remember it.
The shapefile will be downloaded onto your computer as a zipped file. Unzip it
by right-clicking the file and selecting Extract all.
Extract the files into a place where you’ll be able to find them again.
Now that you have a shapefile, you can follow the instructions for the previous
section, Getting a Shapefile from the Web, beginning with Loading the
To “join” a shapefile and a spreadsheet means to add the data in the spreadsheet to
the data in the shapefile. For example, if you have a shapefile of countries and a
spreadsheet of population data for the same countries, to join the file, you would tell
QGIS that the two columns with country names in them should be matched. Then, in
addition to the shapefile’s original data for each country, the shapefile will contain the
population data as well.
To use it as described in this tutorial, simply save it on your computer, then
open it. Go to FileSave as. Save the spreadsheet as a
.csv file by changing the file extension from .xls to .csv.
First, you need to open the spreadsheet in QGIS. To do so, go to
LayerAdd Vector Layer and select the CSV file you just
created. Note that you’re not adding it as a delimited text file, which is what
we have done in the past. Click Open. It will appear in
your Layers bar on the left side of your screen, but no data will
appear on your map, since QGIS doesn’t know how to interpret it yet. Please
follow the steps in Loading the Shapefile for
importing a shapefile to import the Atlas of the Biosphere shapefile. Once you
have your shapefile, right-click on it in the Layers panel on the
left side of the screen and select Open attribute
You need to find an identifying column of shapefile data that will match up
with a column on your spreadsheet — in our case, we’re looking for the country
codes. We’re using those instead of names because QGIS has to see exactly the
same word in the spreadsheet as it sees in the shapefile’s data, or it won’t be
able to match the two. Differences in abbreviation or spelling errors are
easier to avoid if you’re using three letter codes instead of names, since the
codes are standardized.
Look through the attribute tables of your shapefile and your speadsheet to find
the name of the column that contains the country codes. As it turns out, those
are under the column labeled “Code” on the spreadsheet and the column labeled
“WB_CNTRY” in the shapefile’s attribute table. Once you’ve found both, close
both attribute tables.
Now open the shapefile’s properties window. You can do this by right-clicking
the name of the layer in the layer bar to the left and selecting
Properties or by simply double clicking the layer name.
Go to the Joins tab and click on the “Add” button.
A window labeled Add vector join will appear. Select the layer you
want to join to the shapefile (our Language Spreadsheet layer), the join field
(that’s the column of the spreadsheet, “Code”) and the target field (that’s the
column of the shapefile’s attribute table, “WB_CNTRY”). Make sure that
Cache join layer in virtual memory is checked. Click
OK on the Add vector join window and the
Your shapefile should now have added the information from the spreadsheet to
its attribute table. You can check this by simply opening the attribute table
by right clicking on the layer name and selecting Open attribute
table, then finding the new columns in the table.