how-to-become-a-data-engineer
- Metadata:
- #article #data-science #career
- Source: How to become a data engineer
- Responsible for building and maintaining the process of delivering, storing and processing data
-
Forms the foundation of the hierarchy of data science needs: collect, move/store, explore/transform, aggregate/label, learn/optimize
-
Modern skillsets should include: intermediate knowledge of SQL and python, experience with cloud providers (AWS, Azure, GCP), knowledge with Java and Scala, understand SQL/noSQL databases (modelling, warehousing, performance optimization)
-
Expanding this skillset for FAANG companies: experience with big data tools (Hadoop, Kafka, Spark), knowledge of algorithms and data structures, understand distributed systems, BI tools (Tableau, QlikView, Looker, Superset)
- Algo and Data Structure Data Science Learning
- SQL
- There are many databases that still use SQL and many SQL based engines (Presto, Apache Hive, Impala)
- https://mode.com/sql-tutorial/introduction-to-sql/
- https://modern-sql.com/
- https://use-the-index-luke.com/
- Programming
- Since many big data systems are written in Java or Scala, it is vital to learn your way around these two languages
- Scala: Apache Kafka, Apache Spark
- Java: Hadoop HDFS, Cassanra, HBase, Apache Hive, Presto
- https://twitter.github.io/scala_school/
- Since many big data systems are written in Java or Scala, it is vital to learn your way around these two languages
- Big Data Tools
- A rapidly changing landscape but the most popular tools are
- Apache Kafka for message queue/event bus/event streaming
- Apache Spark for large-scale data processing
- Apache Hadoop is a big data framework that includes Hadoop HDFS, Apache Hive, HBase for moving and storing data
- Apache Druid is a real-time analytics database
- https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
- A rapidly changing landscape but the most popular tools are
- Data Pipelines
- Apache Airflow, Spotify Luigi, Perfect, Dagster
- Introduction to Apache Airflow
-