Data visualization is a broader term that describes any effort to help people understand the importance of data by placing it in a visual context. Patterns, trends, and correlations can be easily shown visually which otherwise might go unnoticed in textual data. It is a fundamental part of the data scientist’s toolkit. Creating visualisations is pretty easy but creating good ones is much harder. It requires an eye for detail and a good amount of expertise to create visualisations which are simple yet effective. Powerful visualisation tools and libraries are available today which have redefined the meaning of visualisation.
The beauty of using Python is that it offers libraries for every data visualisation need. One such library is Folium which comes in handy for visualising Geographic data (Geodata). Geographic data (Geodata) science is a subset of data science that deals with location-based data i.e description of objects and their relationship in space.
This tutorial assumes basic knowledge of Python and Jupyter notebook, along with the Pandas library.
Introduction to Folium
Folium is a powerful data visualisation library in Python that was built primarily to help people visualize geospatial data. With Folium, one can create a map of any location in the world as long as its latitude and longitude values are known. Also, the maps created by Folium are interactive in nature, so one can zoom in and out after the map is rendered, which is a super useful feature.
Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. The data is manipulated in Python and then visualised in a Leaflet map via folium.
Before being able to use Folium, one may need to install it on the system by any of the two methods below:
pip install folium
Exploring the data set
The World Development Indicators dataset is just a slightly modified version from the dataset that’s actually available from the World Bank. It contains over a thousand annual indicators of economic development from about 247 countries around the world from 1960 to 2015. Few of the Indicators are:
1. Adolescent fertility rate (births per 1,000 women)
2. CO2 emissions (metric tons per capita)
3. Merchandise exports by the reporting economy
4. Time required to build a warehouse (days)
5. Total tax rate (% of commercial profits)
6. Life expectancy at birth, female (years)
Downloading the dataset
We’ll be working with the World Development Indicators Dataset which is an open dataset on Kaggle. We will be using the ‘indicators.csv’ file in the dataset.
Also, since we are dealing with geospatial maps, we also need the country coordinates for plotting.
Here Life expectancy at birth, female (years) as a better investigation. So, we will be taking the data from the year of 2013.
country_geo = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json'
data = pd.read_csv('Dataset\Indicators.csv')
Here the below criteria need to apply the indicator dataset.
The hist_indicator having string info by applying ‘contains’ operator for the IndicatorName field.
The hist-Year to filter out the data applied in 2013.
hist_indicator = 'Life expectancy at birth'
hist_year = 2013mask1 = data['IndicatorName'].str.contains(hist_indicator)
mask2 = data['Year'].isin([hist_year])stage = data[mask1 & mask2]
Creating a data frame that allows plotting the values.
#Creating a data frame with just the country codes and the values we want plotted.
data_to_plot = stage[['CountryCode','Value']]
hist_indicator = stage.iloc['IndicatorName']
Applying the plotting using Folium
# Setup a folium map at a high-level zoom
map = folium.Map(location=[100, 0], zoom_start=1.5)# choropleth maps bind Pandas Data Frames and json geometries.
#This allows us to quickly visualize data combinations
fill_color='YlGnBu', fill_opacity=0.7, line_opacity=0.2,
Export to HTML for better viewability
# Import the Folium interactive html file
from IPython.display import HTML
HTML('<iframe src=plot_data.html width=700 height=450></iframe>')