« Back to Glossary Index

The process of preparing, managing, and optimizing data for analytical or operational use.

Core tasks include:

  • Data Collection: Gathering structured and unstructured data from various sources (databases, logs, sensors, APIs).
  • Data Ingestion: Efficiently importing data into storage systems or data lakes.
  • Data Storage: Managing data warehouses, lakes, or databases (e.g., Snowflake, Azure Data Lake, Databricks).
  • Data Processing & Transformation: Converting raw data into meaningful formats (ETL/ELT pipelines, Apache Spark, Apache Airflow).
  • Data Modeling: Structuring data to facilitate efficient querying and analysis.
  • Data Quality Management: Ensuring data accuracy, consistency, and reliability through validation and monitoring.
  • Data Infrastructure Management: Maintaining scalable and reliable data platforms and infrastructure.

Common Tools:

  • Apache Spark, Hadoop, Kafka, Airflow, DBT, Databricks, Azure Data Factory, AWS Glue, Google BigQuery, Snowflake.
« Back to Glossary Index