Monday, August 19, 2019

Here is my Data Science Capstone Project

I have just completed my project in the Data Science Professional Course by IBM through Coursera.
In this project, I tried to accomplish the following tasks.

  1. Using the KMeans algorithm to cluster a certain geographic location.
  2. Then put them on the map using map feature providing library
  3. Practical extensive use of  Restful API
  4. then providing general but very good information to depending on where you are and what you want. 
Of course, everything is done in a python programming language.
Why the KMeans algorithm, this is a very classic algorithm for any kind of clustering problems you have for example customer segmentation is a very common example of Retailers. In my project, I used the project geographic location cluster using latitudes and longitudes of Toronto a city in Canada. There is a very good library in python called Sklearn I usually use it features those are very good and can accomplish a lot of tasks with it.



This is one magic of the first task I will upload jupyter notebook link, so you are welcome to visit and see through it
click here




Secondly, there are lots of maps features providing libraries Shapely, Rasterio and Folium, etc
I used the Folium library to accomplish my project. In Folium, there are many cool features, just not mapping, for example, you can choose what kind of map you want to draw on your screen or on your notebook. You can go here and learn more.


Third, Restful API is my favorite part since I try to do time series analysis by getting the most stable and good live stream data. You can make tons of requests by using back end development, in order to do that you have to learn more about this and most importantly you have to have knowledge about the website you about to go and want to get some data. But, As I have found out that, if you need static data then you are almost no need to have API access to websites but for the dynamic data it is very important to live or most recent versions of the data due to it's changing nature.
There are several types of analysis if you want to do analysis.

  1. Exploratory analysis- you can have static data
  2. Predictive analysis - you have to have the most recent data it is almost most needed
  3. Perspective analysis - you also need to have more recent data it is better that I believe.
Both 2nd and 3rd analysis's types are based on Exploratory, it is very important to have to good data to scrape. So just beware that you can always study and choose and find a good source for you to use API.

Finally, I pulled some data about some data from Eventbrite, Foursquare, and Google Map API
I used all of the to extract important to accomplish the task I want to do

  1. I want to list events in a radius of  4km in the location that I made a search query. so I will get several event and title and whether it is free to attend or not etc. It is indeed, very important any person, especially programmers, to have this kind of application to their platform so they can provide important information to customers.  
  2. from FourSquare I can Extract any kind of data I want to extract, for example, if I am in Toronto the I search coffee then it will display best coffee shops from Foursquare
  3. Google map API, I used it extract to longitude and Latitude of the certain string query that user input to my program. Most importantly I can't do everything without google map API or similar API since in order to build a map of the certain place I have to give that above location info at least.

This current version of my the notebook is better to use with the Jupyter notebook or Jupyter Lab, it very recommended. You also need to install useful libraries and Get necessary API keys since I can't provide them to you
written by Nyam Ochir Bold as nick
Thank you for your attention