The process of preparing, managing, and optimizing data for analytical or operational use.

Core tasks include:
- Data Collection: Gathering structured and unstructured data from various sources (databases, logs, sensors, APIs).
- Data Ingestion: Efficiently importing data into storage systems or data lakes.
- Data Storage: Managing data warehouses, lakes, or databases (e.g., Snowflake, Azure Data Lake, Databricks).
- Data Processing & Transformation: Converting raw data into meaningful formats (ETL/ELT pipelines, Apache Spark, Apache Airflow).
- Data Modeling: Structuring data to facilitate efficient querying and analysis.
- Data Quality Management: Ensuring data accuracy, consistency, and reliability through validation and monitoring.
- Data Infrastructure Management: Maintaining scalable and reliable data platforms and infrastructure.
Common Tools:
- Apache Spark, Hadoop, Kafka, Airflow, DBT, Databricks, Azure Data Factory, AWS Glue, Google BigQuery, Snowflake.