Samy Palaniappan My Journey into the World of Data Science:
    About     Archive     Feed

Introduction

My Journey into the World of Data Science

My name is Samy Palaniappan and I am a curious engineer. In this blog, I describe my exploration into the world of data science. The purpose of this blog is to illustrate my learning experiences as I wander into this vast jungle of data with untapped knowledge. In this process, I hope to create a repository of information, that will help navigate any new and brave venturers aspiring to enter this field after spending a long time in any other field, such as myself.

Before, I go and describe my experiences, I would like to tell you a bit about myself. I am an engineer and I have a PhD in this field. I also have 6 years of post-graduate work experience including 4 years in the industry as a process engineer and two years as a postdoctoral fellow. Most of my work has been in the fields of electrochemistry, chemicals, and materials engineering. During this time as a researcher, I did a lot of experimental design, experiments, data analysis, writing, and presenting.

Why data science, and why now ?

After spending so long in electrochemistry and materials research i got good at it, however I also became tired of doing the same thing over and over. So, it was time to bid adieu(temporary or permanent, is to be seen) to my old friends Butler, Volmer, Nernst, Navier, and Stoke’s. Also, 10 years in research, was not enough for me take a liking to the dry nature of academic writing. That said, I can vividly recollect my inner joy during the data analysis part of all my project. Trying to find the underlying meaning in the data gave me the kicks. Needless, to say I enjoyed every bit of it. My curiosity to find the story behind the data and the proclivity to modeling and simulation could explain my inclination to learning the field. I also tried semiconductor and nanoscience, there were number two and three on my list. Unlike the last time, i had a thorough decision making process, where I spoke with a number of people with similar backgrounds, recruiters, as well as my friends, and family. I hope to detail, this in one of the upcoming blogs in the coming weeks. As for the timing, why NOT now ? I only hope it is not too late, to get on this gravy train.

My approach

I took courses on basic Python on LinkedIn Learning, IBM data analysis and data visualization with Python on Coursera, and working with Pandas in Python on Udemy. One month later, I realized that my learning was slower than I preferred and i required a structured program to accelerate the learning process. So, i looked into the bootcamps in data science programs. It seemed ideal for me, as it was setup to offer intense training in coding and all the required tools required for a data scientist within a span of 3 months.

I did some background research on the existing bootcamps offering datascience courses (There are about a dozen of them). It finally came down to (1) Metis, (2) Galvanize, (3) Insight, and (4) Springboard. After speaking to a couple of alumni, I decided upon Metis. I chose Metis, mainly for a few reasons: (1) They specifically focussed on Data Science and they seem to do exceptionally well,(2) They had live bootcamps, and (3) They had a cohort starting within a month. I took the hacker rank admissions test, which included about 7 coding challenges and 29 questions in subjects ranging from probability and statistics, linear algebra, and calculus. They also had a remote interview via zoom, which was more of a discussion for them to find out more about myself and my motivations for the course. I received my acceptance email shortly afterwards. They immediately provided me with all the necessary material with the necessary material for the 60 hour prebootcamp work.

The Program

At this time, i have already been one week into the program. I have submitted myself to the capable minds of John Tate, Lara Kattan, Alice Zhao, and Google. It has been an intense week, where we dived into the concepts of:

  1. Basic Python, Datatypes Tuples, Lists, Sets, and Arrays. Manipulating series and dataframes with Pandas and Numpy. Plotting with Matplotlib and Seaborn

  2. Advanced datatypes such as DefaultDict, NamedTuples, and Deques.

  3. Advanced python concepts of Complexity Evaluation and the Big O notation, Generators, Pickle, and Shallow / Deep Copy

  4. How to navigate and use Git and Git Hub to ensure version control is done appropriately.

  5. Do’s and Dont’s of coding.

    So far, I completed a project on Exploratory Data Analysis project on NYC MTA turnstile data with my team mates Stephen Schneider and Rita Biagioli. I also solved ten challenges on data analysis which, I pushed into our cohort repository. In the coming weeks I look forward to transferring to you at least a tenth of my excitement, my curiosity, and interesting bits in machine learning, distributed computing, etc. As i go through with this course, I expect to document my fascinations, joy, trials, and tribulations that I experience. More importantly, you will be reading about my experiment, the outcomes, and what I will be doing with this newfound knowledge.