The storage layer is usually loaded with data using a batch process.
The integration component of the ingestion layer invokes various mechanisms—like Sqoop, MapReduce jobs, ETL jobs, and others—to upload data to the distributed Hadoop storage layer (DHSL).
- Open source framework
- Allows us to store huge volumes of data in a distributed fashion.
- Provides de-coupling between the distributed computing software engineering and the actual application logic that you want to execute.
- Enables to interact with a logical cluster of processing and storage nodes.
- File system designed to store a very large volume of information (terabytes or petabytes) across a large number of machines in a cluster.
- It stores data reliably, runs on commodity hardware
- Uses blocks to store a file or parts of a file
- Supports a write-once-read-many model of data access.
2. Map reduce
- Computes results in batch
- Communication from ingestion layer to storage layer
- Can be implemented based on the performance, scalability, and availability requirements.