Time Series, Geographical and Graph Data in Python

Last updated on Jan 23 2023
Prabhas Ramanathan

Time series is a series of data points in which each data point is associated with a timestamp. A simple example is the price of a stock in the stock market at different points of time on a given day. Another example is the amount of rainfall in a region at different months of the year.
In the below example we take the value of stock prices every day for a quarter for a particular stock symbol. We capture these values as a csv file and then organize them to a dataframe using pandas library. We then set the date field as index of the dataframe by recreating the additional Valuedate column as index and deleting the old valuedate column.

Table of Contents

Sample Data

Below is the sample data for the price of the stock on different days of a given quarter. The data is saved in a file named as stock.csv
ValueDate Price
01-01-2018, 1042.05
02-01-2018, 1033.55
03-01-2018, 1029.7
04-01-2018, 1021.3
05-01-2018, 1015.4




23-03-2018, 1161.3
26-03-2018, 1167.6
27-03-2018, 1155.25
28-03-2018, 1154
Creating Time Series
from datetime import datetime

import pandas as pd
import matplotlib.pyplot as plt

 

data = pd.read_csv('path_to_file/stock.csv')
df = pd.DataFrame(data, columns = ['ValueDate', 'Price'])

# Set the Date as Index
df['ValueDate'] = pd.to_datetime(df['ValueDate'])
df.index = df['ValueDate']
del df['ValueDate']


df.plot(figsize=(15, 6))
plt.show()

Its output is as follows −

p 29

Python – Geographical Data

Many open source python libraries now have been created to represent the geographical maps. They are highly customizable and offer a varierty of maps depicting areas in different shapes and colours. One such package is Cartopy. You can download and install this package in your local environment from Cartopy. You can find numerous examples in its gallery.
In the below example we show a portion of the world map showing parts of Asia and Australia. You can adjust the values of the parameters in the method set_extent to locate different areas of world map.

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

 

fig = plt.figure(figsize=(15, 10))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())

# make the map global rather than have it zoom in to
# the extents of any plotted data

ax.set_extent((60, 150, 55, -25))

ax.stock_img()
ax.coastlines()

ax.tissot(facecolor='purple', alpha=0.8)

plt.show()

Its output is as follows −

p 30

Python – Graph Data

CSGraph stands for Compressed Sparse Graph, which focuses on Fast graph algorithms based on sparse matrix representations.

Graph Representations

To begin with, let us understand what a sparse graph is and how it helps in graph representations.

What exactly is a sparse graph?

A graph is just a collection of nodes, which have links between them. Graphs can represent nearly anything − social network connections, where each node is a person and is connected to acquaintances; images, where each node is a pixel and is connected to neighbouring pixels; points in a high-dimensional distribution, where each node is connected to its nearest neighbours and practically anything else you can imagine.
One very efficient way to represent graph data is in a sparse matrix: let us call it G. The matrix G is of size N x N, and G[i, j] gives the value of the connection between node ‘i’ and node ‘j’. A sparse graph contains mostly zeros − that is, most nodes have only a few connections. This property turns out to be true in most cases of interest.
The creation of the sparse graph submodule was motivated by several algorithms used in scikit-learn that included the following −
• Isomap − A manifold learning algorithm, which requires finding the shortest paths in a graph.
• Hierarchical clustering − A clustering algorithm based on a minimum spanning tree.
• Spectral Decomposition − A projection algorithm based on sparse graph laplacians.
As a concrete example, imagine that we would like to represent the following undirected graph −

p 31

This graph has three nodes, where node 0 and 1 are connected by an edge of weight 2, and nodes 0 and 2 are connected by an edge of weight 1. We can construct the dense, masked and sparse representations as shown in the following example, keeping in mind that an undirected graph is represented by a symmetric matrix.
G_dense = np.array([ [0, 2, 1],
[2, 0, 0],
[1, 0, 0] ])

G_masked = np.ma.masked_values(G_dense, 0)
from scipy.sparse import csr_matrix

G_sparse = csr_matrix(G_dense)
print G_sparse.data
The above program will generate the following output.
array([2, 1, 2, 1])

p 32

This is identical to the previous graph, except nodes 0 and 2 are connected by an edge of zero weight. In this case, the dense representation above leads to ambiguities − how can non-edges be represented, if zero is a meaningful value. In this case, either a masked or a sparse representation must be used to eliminate the ambiguity.
Let us consider the following example.
from scipy.sparse.csgraph import csgraph_from_dense
G2_data = np.array

([
[np.inf, 2, 0 ],
[2, np.inf, np.inf],
[0, np.inf, np.inf]
])

G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf)
print G2_sparse.data
The above program will generate the following output.
array([ 2., 0., 2., 0.])

So, this brings us to the end of blog. This Tecklearn ‘Time Series , Geographical and Graph Data in Python’ blog helps you with commonly asked questions if you are looking out for a job in Python Programming. If you wish to learn Python and build a career in Python Programming domain, then check out our interactive, Python with Data Science Training, that comes with 24*7 support to guide you throughout your learning period.

Python with Data Science Training

About the Course

Python with Data Science training lets you master the concepts of the widely used and powerful programming language, Python. This Python Course will also help you master important Python programming concepts such as data operations, file operations, object-oriented programming and various Python libraries such as Pandas, NumPy, Matplotlib which are essential for Data Science. You will work on real-world projects in the domain of Python and apply it for various domains of Big Data, Data Science and Machine Learning.

Why Should you take Python with Data Science Training?

• Python is the preferred language for new technologies such as Data Science and Machine Learning.
• Average salary of Python Certified Developer is $123,656 per annum – Indeed.com
• Python is by far the most popular language for data science. Python held 65.6% of the data science market.

What you will Learn in this Course?

Introduction to Python

• Define Python
• Understand the need for Programming
• Know why to choose Python over other languages
• Setup Python environment
• Understand Various Python concepts – Variables, Data Types Operators, Conditional Statements and Loops
• Illustrate String formatting
• Understand Command Line Parameters and Flow control

Python Environment Setup and Essentials

• Python installation
• Windows, Mac & Linux distribution for Anaconda Python
• Deploying Python IDE
• Basic Python commands, data types, variables, keywords and more

Python language Basic Constructs

• Looping in Python
• Data Structures: List, Tuple, Dictionary, Set
• First Python program
• Write a Python Function (with and without parameters)
• Create a member function and a variable
• Tuple
• Dictionary
• Set and Frozen Set
• Lambda function

OOP (Object Oriented Programming) in Python

• Object-Oriented Concepts

Working with Modules, Handling Exceptions and File Handling

• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions

Introduction to NumPy

• Introduction to arrays and matrices
• Indexing of array, datatypes, broadcasting of array math
• Standard deviation, Conditional probability
• Correlation and covariance
• NumPy Exercise Solution

Introduction to Pandas

• Pandas for data analysis and machine learning
• Pandas for data analysis and machine learning Continued
• Time series analysis
• Linear regression
• Logistic Regression
• ROC Curve
• Neural Network Implementation
• K Means Clustering Method

Data Visualisation

• Matplotlib library
• Grids, axes, plots
• Markers, colours, fonts and styling
• Types of plots – bar graphs, pie charts, histograms
• Contour plots

Data Manipulation

• Perform function manipulations on Data objects
• Perform Concatenation, Merging and Joining on DataFrames
• Iterate through DataFrames
• Explore Datasets and extract insights from it

Scikit-Learn for Natural Language Processing

• What is natural language processing, working with NLP on text data
• Scikit-Learn for Natural Language Processing
• The Scikit-Learn machine learning algorithms
• Sentimental Analysis – Twitter

Introduction to Python for Hadoop

• Deploying Python coding for MapReduce jobs on Hadoop framework.
• Python for Apache Spark coding
• Deploying Spark code with Python
• Machine learning library of Spark MLlib
• Deploying Spark MLlib for Classification, Clustering and Regression

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "Time Series, Geographical and Graph Data in Python"

Leave a Message

Your email address will not be published. Required fields are marked *