In some companies have their Raw data in files and they have ETL (Extract, transform, load) tools to aggregate and load them into to a RDMS, This process makes a huge stress on network layer since the data from the file stores has to move into the ETL tool for processing, through the network. Once the data is aggregated it is moved to a achieve location. The archived data retrieval is very slow process as well if some one wants read the raw data again.
So there are three problems:
- Cannot access Raw high fidelity data
- Computation cannot scale (Since all the data should be taken into the computation engine through the network)
- To avoid stress on computation, the old data needs to be archived, and once they are achieved it is an expensive process to access them again.
How the RDMS process data
Hadoop does not get the data to the computation engine through the network instead it keeps the computation Just Next to the Data.
Each server is processing independently it's local data.
No comments:
Post a Comment