What Is Big Data?
It is the concept of making large amounts of data collected over time in an economic and scalable manner and where the relational database techniques are not sufficient. The importance and value of the data is increasing day by day. The reason for the increase in the importance and value of the data is the increasing amount and diversity of data. When we look at the Big Data concept dates back to 2004. In other words, a concept that has been invested and used by the world’s largest technology giants since 2004.
Big Data is a concept and contains open source software. The core of these softwares includes a set of softwares called Hadoop. Some of these software can be listed as follows Example: Hadoop Core, HDFS (Hadoop Distributed File System),Cloudera Manager,Sqoop,Mahout,Hive (Data Warehouse). The concept of Big Data is not limited to the above software components. The concept of Big Data is actually limited to what you imagine.
The Four V’s of Big Data
In order for the Big Data concept to be clearly in mind, let’s talk about how we can separate from the non-relational environment for a while and combine the four V’s and the non-relational environment with the relational environment (RDBMS) that we know and are used to.
Volume: Perhaps the most important reason for the formation of the Big Data concept is that the volume of data we have is increasing logarithmically with each passing day. While the volume of data increases so much, companies’ IT costs naturally increase. It is necessary to reduce the increased IT costs and to set up the environment in which all this data will be stored and managed.
Velocity: Imagine that, in addition to increasing volume, this data flows into the system very quickly and must meet it. Uploading the data to the relational database so fast will be troublesome and costly. Therefore, the fact that the data flows into the system very quickly besides the volume is another V that explains the concept and usage of Big Data.
Variety: Social media, sensor data, CRM files, documents, images, videos, etc. Imagine all the data, resources, and types you can think of. It’s not possible to store all of this in a relational database, even on a file system that we know, and it’s costly. If the diversity of data has increased and we want to process, analyze and store all this data, then the Big Data concept is the right one.
Value: Another V is formed by the combination of the other 3 V’s. This is also value, ie value. High volumes, various and very fast flowing and entering the data must have a value. Otherwise, the cost incurred becomes> the value obtained. In order to prevent this from happening, we need to make sense of the data we have, add value and perform analysis. Thus, the fourth V of the Big Data concept becomes value.
Oracle has developed an Engineered System for the storage, storage, analysis, value addition and reporting of data and data types covering the 4 V I mentioned above. We call this the Big Oracle Big Data Appliance ”. Before moving on to Big Data Appliance, let’s talk about some Hadoop and cluster structure, current approaches
Hadoop Cluster and Traditional Architecture
As you know, Oracle is developing engineering systems which we call as Engineered Systems. Oracle Exadata Database Machine is the pioneer of these systems. The aim is to present software and hardware together and to eliminate some of the problems we know in traditional architecture and which we are no longer used to. For example, if you cannot get support from a single manufacturer, there is a problem; the hardware team assigns the problem to the software, the software network attachment, and the network team to the hardware again. Therefore, the hardware A manufacturer, the software B manufacturer, cluster C, network components D, etc. When service is received in a manner that can be difficult to get solutions and support when conventional problems occur.
The same traditional installation and management apply to a Hadoop Cluster and the software components of this cluster mentioned above. You supply the equipment first. You then supply the network components, and you are involved in combining and testing the hardware. You then spend time and effort installing and running an operating system software that you supply on these hardware. Once you understand that everything is working properly at this stage, you will start running the software and the cluster. When you install Hadoop Cluster and make it work, your weeks, maybe months, will pass. Then, when this cluster needs to grow, you rewind the whole process and perhaps the same type of product from different manufacturers
Oracle Big Data Appliance
Oracle Big Data Appliance communicates with the Infiniband SFP (40 Gb / sec) network and communicates at least 4 times faster than any conventional architecture. It can also be fully integrated with other Oracle engineering systems.
Step 1 – Data Flow: This is the stage where different types of data flow from multiple sources into the system. At this stage, the data is streamed and configured to be uploaded to the Oracle Big Data Appliance.
Step 2 – Collect and Organize: Data is stored and stored on the Oracle Big Data Appliance, and therefore on HDFS, with high volume, diversity and speed. By default, HDFS “triple mirroring” “stores the incoming data in 3 copies kurulmuş distributes it on the installed Hadoop Cluster 18 18 nodes in total”.
Step 3 – Analysis and Reporting: After passing the data on Oracle Big Data Appliance through a Java code we call “Map / Reduce, and converting it to a format that can be loaded into a relational database, we transfer the data to Oracle Exadata which we connect via Infiniband and include it in the relational world.
At the beginning of the article, I stated that Big Data is limited to your imagination. When you consider all these possibilities, there are many areas where you can use Big Data and its scope. You can also connect up to eight Big Data Appliances via infiniband without the need for additional switches.
Oracle Big Data Connectors
The purpose of these software components is to easily extract data from the Big Data Appliance to the Oracle database. In order to extract the data from the HDFS and therefore from the Hadoop Cluster to the relational environment, you also need to spend a lot of time and effort. At the same time, you need to have enough personnel or consultancy services within your organization. Let’s talk about these connector software which will serve as a bridge between Big Data and Oracle database and reduce installation effort and cost;
Oracle Data Integrator Application Adapter for Hadoop
ODI software is combined with Big Data. Its purpose is to transfer data to Oracle database via HDFS. Hadoop implementations require serious knowledge of Java and Map / Reduce code. With the ODI Connector, Map / Reduce functions can be written and used with a graphical interface. This developed code is then run on Hadoop and Map / Reduce operations can be performed.
Oracle Loader for Hadoop
This software is a kind of Map / Reduce tool and its purpose is to optimize the transfer of data from Hadoop to the Oracle database. Oracle Loader for Hadoop optimizes and modifies data to bring it into a format that can be loaded into a database. It also helps to reduce the amount of CPU and I / O possible.
Oracle Direct Connector for Hadoop Distributed File System (HDFS)
The purpose of this software is to provide fast access to the HDFS environment from the Oracle database. Thanks to Direct Connector, we can query from relational to non-relational Big Data environment at any time. I’m talking about direct SQL access here, which is a kind of Table External Table ”. Data on HDFS can be queried or uploaded to a relational database.