Traditional methods for storing and processing data in large companies began to be insufficient, generating problems and increasing expenses to meet their needs.
Due to these events, the concept of Big Data emerged, an area of knowledge that aims to study ways to treat, analyze and generate knowledge through large data sets that cannot be worked on in traditional systems.
To better understand this concept, we can think about how this traditional data storage and processing system is carried out. Note that it is used here in the present tense as “it is done” because the work processes with Big Data do not exclude the way of working in the traditional system in most cases, since many companies do not need to use Big Data tools Data to manipulate data, and even large companies use a hybrid system. In this way, the two ways of working with the data coexist.
The traditional system uses the famous DBMS, or database management systems, which store information in a structured way, in the form of tables, with rows and columns. They use machines with large storage and processing capacity. When there is a need to expand the capacity of these machines, it is necessary to introduce new hardware components so that they have more memory and processing.
The problems that start to appear when reaching a large volume of data using this traditional system are related to scalability, availability, and flexibility. As examples, we can mention that it is very costly to improve these machines vertically every time an upgrade is necessary, usually at that moment the system is unavailable, since the machine is in the process of maintenance. To solve the problems that arose, it was necessary to create new tools to meet all needs. Vertical scaling, in which we improve a machine by adding more resources like memory and processing, does not guarantee effectiveness when it comes to Big Data.
To work around the difficulties, large companies researched a new system that was scalable, and Hadoop emerged, a form of distributed storage and processing. The idea is to use a cluster of machines or grouping of computers. Isolated, a single computer in this cluster does not have a mighty processing power, but together, they manage to provide processing power and storage capable of meeting the needs.
Comments