What is Data Generalization?

Data generalization refers to the creation of successive layers of summary information in an evaluational database. This is the process of zooming in to see a larger picture of a problem, trend, or situation. This is also called rolling-up data.

As companies grow, there are millions to millions of data stored in the database. To manage the data in the data warehouse, an assortment of processes called extract, transform and load (ETL), is performed periodically.

Data warehouses are rich repositories of data. Most of the data is historical company data. Modern data warehouses can also contain data from other sources. The overall business intelligence system of any company greatly benefits from having data from multiple sources. The company can gain a wider perspective, not only about the patterns and trends within the organization, but also the global industrial trends.

It can be difficult to see trends and patterns using the business intelligence system’s analytical outputs. It can be hard to create reports with all the data available, many of which are not consistent (but that is largely fixed by the ETL process).

The network management tools of companies can be affected by dealing with large volumes of data to ensure consistent delivery of business-critical applications. Many businesses have discovered that their existing network management tools are unable to handle the large volume of data needed by organizations to monitor and manage applications and networks.

It was difficult for existing tools to capture, store, and report traffic at speed and with the granularity required for network improvements. Some network tools remove the details in order to reduce traffic volume and speed up delivery. They would convert the detailed data into summaries that are hourly, daily, or weekly. Data generalization, or rolling up data as database professionals refer to it, is the process of converting data into summary reports. Data generalization is also beneficial for network management.

Online Analytical Process (OLAP) technology can offer a lot of help with about data generalization. OLAP can be used to quickly answer multidimensional analytical questions. These are often used in conjunction with a larger category of business intelligence. OLAP is primarily used for business reporting. This includes reports for sales, marketing and management.

In the implementation of Online transaction processing (OLTP), data generalization is a key benefit. OLTP is a class system that manages and facilitates transaction-oriented applications, especially those that involve data entry and retrieval. OLTP was created earlier than OLAP and has slight differences from OLTP.

Companies that have used OLTP for a long time cannot simply abandon OLTP and re-engineer to support OLAP. The information system department must create, manage, and support a dual-database system in order to upgrade OLTP. These two databases are the operational and evaluational. The operational database provides data that can be used to support OLTP.