This Course Will Give You:
You will master all the essential machine learning skills for streaming data and a distributed environment. The program includes the necessary knowledge from the fields of Data Science and Data Engineering, which will allow you to process big data and write distributed algorithms in Spark.
You will practice each module by doing your homework. At the end of the training, a final project awaits you, which will allow you to summarize all the knowledge gained and replenish your portfolio. It can be performed as part of work tasks on your dataset, or it can be a training project based on data provided by OTUS.
Who is This Course For?
For Machine Learning Professionals or Software Engineers who want to learn how to work with Big Data. Typically, such tasks are available in large IT companies with a large-scale digital product.
For Data Scientist who want to enhance their skillset with engineering skills. Thanks to the course, you will be able to process data and independently display the results of ML solutions in production.
You Will Learn to:
Use standard ML pipeline tools in a distributed environment;
Develop your own blocks for ML pipelines;
Adapt ML algorithms to a distributed environment and big data tools;
Use Spark, SparkML, Spark Streaming;
Develop algorithms for streaming data preparation for machine learning;
Provide quality control at all stages of the movement of ML-solutions into industrial operation.
Features of the Сourse:
A lot of practice with data
Up-to-date tools and technologies: Scala, Spark, Python, Docker
A wide range of skills from distributed ML and data streaming to production output
Live chat with experts on webinars and Slack chat
Basic programming skills:
control structures, loops, recursion;
basic data structures: arrays, lists, dictionaries, trees;
basic principles of OOP;
acquaintance with one of the languages: Python, Java, Scala, C ++.
linear algebra: vectors, matrices and their products;
matan: derivative of simple and composite functions;
Subtraction methods: gradient descent, Newtonian iterations;
probability theory: random events and quantities, mathematical expectation, variance.
understanding of the basics of computing in the framework of the von Neumann architecture (processor, memory, cache, pluggable storage);
understanding of the general principles of relational DBMS, knowledge of SQL.