Why Big Data and Deep Learning Are Transforming Data Science
Big data and deep learning are some of the hottest topics in data science, and NYC Data Science Academy is the only bootcamp to offer a comprehensive curriculum of both topics via its "dual track" course offerings. The dual track offers two separate courses in big data and deep learning, with one class scheduled in the morning and the other class scheduled in the afternoon. Students can elect to take either course, and especially ambitious students can opt for both.
We sat down with NYCDSA senior instructors Aiko Lui and Zeyu Zhang (Sam) to learn more about the big data and deep learning courses and get their take on the ideal dual track student. In our interview, Aiko and Sam also unpack terms like big data and cloud computing, talk about the dual track's benefits, and discuss how deep learning is used in a wide variety of industries.
Can you tell me about your background and your role in the program?
Aiko: My name is Aiko Liu. I have a background in mathematics/theoretical physics.I received my Ph.D in math from Harvard, specializing geometry/topology and their interplay with string theory. I then went on to teach at MIT for 3 years, and at U.C. Berkeley for over 6 years before I landed in the hedge fund industry for another 10 years. I joined the academy two years ago and now teach machine learning theory and applications with Python as well as various coursework in the deep learning track.
Sam: My name is Zeyu Zhang but some students find it easier to call me Sam. I'm an electrical engineer turned data scientist and received my Masters in Electrical Engineering at NYU. I have been managing the curriculum and logistics of our bootcamp for two years, trying to make NYCDSA the top gun school of data science. I am a casual late night Kaggle expert when I have free time. I teach webscraping tools such as BeautifulSoup4, Scrapy, Selenium, as well as Natural Language Processing using Python in the deep learning track.
What are big data and cloud computing? How are they used in the industry?
Aiko: Big data is the nickname for the newer generation of 'distributed computing'. As the hardware/CPU/GPU become cheaper, it makes sense to chain up commodity desktop machines to form highly scalable yet individually unreliable computation networks to parallelize the computation tasks. Given the vast amounts of data collected, big data offers a manageable, efficient solution to warehousing these data. In NYCDSA's immersive bootcamp, you will learn how to use Hadoop, Spark, and Hive.
What is deep learning? How is it used in the industry?
Aiko: While deep learning officially covers more networks than only neural networks alone, it often refers to neural network architecture. This type of architecture involves multiple layers of basic computation units, called neurons. By going deep, it offers potential applications in various business and industrial domains functionality that the shallow networks fail to achieve. Tasks such as image recognition, time series, and natural language processing are just some of the areas that deep learning can assist with. In the NYCDSA bootcamp, you will learn how to use Tensorflow and Keras and how to harness deep learning models for tackling the aforementioned industry tasks.
What made you decide to offer lessons in both Big Data and Deep Learning as part of dual track?
Sam: Our bootcamp has been very intense since the day it was launched. We look to improve our curriculum based on feedback from our hiring partners and graduates. The problem is we don't have time to squeeze everything we want to teach into 12 weeks. I came up with the parallel dual track idea so that we can teach big data in the morning and deep learning in the afternoon in the last three weeks of the bootcamp. This ensures students can have more in-depth coverage rather than just surface overviews of the more advanced topics. The students also receive video recordings and lectures notes from both tracks no matter which one they take so they can always review them later after completion of the program.
What type of student would benefit from taking the big data course? Who would benefit from deep learning?
Aiko: Both big data and deep learning are hot topics nowadays. Big data is the newer distributed architecture/languages responding to loads of data from cheaper hard drive storage.While big data can definitely join force with machine learning, it itself does not require a machine learning background. I expect people with a programming background and hardware knowledge can step up to the big data track.
On the other hand, deep learning is a special branch of machine learning. Some specific deep learning models are even derived from physics. To understand the models thoroughly, it helps to have certain level of quantitative training and machine learning beforehand as the depth (no pun intended!) will be daunting otherwise. I expect the threshold to be successful in deep learning will be slightly higher than big data.
Sam: Big data and deep learning are supposed to be targeting different job markets, data engineer vs data scientist (machine learning heavy), so students should consider picking only one of them during the bootcamp. To be more specific, if you have a software engineering background, then you may want to consider the data engineering track which covers all the big data tools needed for the industry. In addition, data engineering problems are just as complex as data science ones and the demand has always grown. Behind every data scientist, there'll be a data engineering fitting the pipelines. On the other hand, the deep learning track would be ideal for you if you have a more quantitative background like a Ph.D. in STEM field or are interested in the more "sexy" areas such as computer vision or natural language processing.
What outcomes can students expect after completing the dual track?
Sam: There are a couple of lab sessions in both tracks, and students gain hands-on experience by working on a mini-project through the labs. Upon completion, students will be able to apply skills such as out-of-memory computation, Amazon cloud setup and deployment. They'll also be able to utilize algorithms within Tensorflow as part of their capstone projects. As they begin their job searching, students will have gained the skills necessary to apply for junior roles in big data and deep learning. They'll know how to fetch data from Hadoop and Spark and use Tensorflow to complete a wide range of tasks including image recognition. Dual track graduates leave the program with working knowledge in a wide range of skills, making them highly in-demand with potential employers.
This post was sponsored by NYC Data Science Academy
To learn more about NYC Data Science Academy, visit https://nycdatascience.com