Cours Stats

In 2008, I took a job as an assistant professor at the University of Lille in the business administration faculty (IAE). Here I met an amazing colleague, Mihai Calciu. He is the person that turned me into an R user; actually I began by proudly showing him the (huge) SAS programs I had made for my PhD, he looked at me for a minute or so, turned to his computer, and wrote the equivalent in R in less than acouple minutes with 20 lines of code. For sure, I was far from the best SAS programmer in the place, and my programs were quite long, but hell, replace them with 20 lines of code ! Along the monts, and now years, he taught me a lot about the beauty of R.

One of the biggest project we worked on together was the design of a platform that would allow marketers and business analysts to do some geomarketing vizualisation and analysis online though a web-browser. We worked with a company specialised in clothes retail who gace us access to its sales data from her stores in the region Brittany.

The architecture of this platform revolved around R (of course), PostGreSQL, and MapServer for the map rendering. We also include two PostGreSQL add-ons: PostGIS for geostatistics, and PL/R for R computations. The data came from the retail company, INSEE (the French Institute for National Statistics), Google API for address encoding, and OpenStreetMap for map layers. Some of these tools are a bit out of date now (notably with the development of Shiny or OpenCPU), and the whole platform has never really been updated, but this project is the main reason I fell into maps.

Today I decided to give a quiet try to ggplot2 for plotting maps. An other project of mine involves the diffusion of retail stores (and the link with retail consumers…) in my region: Nord-Pas-De-Calais. This region is very important in the french retail sector as it gave birth to one of the biggest retail group: the Mulliez Family, one of the wealthiest family in Europe who owns a hundred of retail chains at least (malls, supermakets, clothings, restaurants…). For those reasons, I wanted to see how stores are spread in this region.

The data

The data I used are from two sources:

  1. Geographic information comes from the IGN (French National Institute of Georgraphy) who offers to download shp files of administrative boundaries in France
  2. INSEE who provide information on the presence of stores in each city. This stores are classified into
  • Big stores (hypermarkets, supermarkets, specialized hypermarkets such as FNAC or Brico-dépôt)
  • Small stores - Food (grocery…)
  • Small stores - non-Food (clothes, furnitures, jewelry…)

As both of these databases have city names as main id, the merging process is quite straightforward.

The maps

ggplot2 is an amazing tools for plotting maps. I was used to plot maps using standard geographical libraries and I must admit I had some trouble getting into ggplot. But once I fugured out what the fortify function was about, I was really amazed. Actually, this fortify function let you handle maps with simple plain dataframes, and it becomes very easy to add any kind of information relating to polygons or points. For this matter, the very-well written tutorial by Lovelace and Cheshire was of great help.

The case of Lille

As in many regions, the capital city (Lille in this case) is quite specific. Lille is, for hundreds of years, a commercial capital (it is even called the capital of Flanders). A important amount of stores are located within Lille, and the city also concentrates a great portion of the population.

This situation actually makes it difficult to produce a good quality choropleth maps, because the high value we get for Lille nearly wipes out all other cities. Hence, I inculded in the Shiny App below the ability to remove Lille from the map and display the digit related to Lille just above the map. By a click on the button, the user can remove Lille in order to get a better view at the rest of the region. Keeping Lille also allows the user to see the dominance of this city in the region Nord-Pas-De-Calais.

That’s all

Have a good day