Big Data in ECM

Big Data: Definition and Features

What is Big Data from the perspective of modern corporate systems and information technologies? Big Data is a series of approaches, instruments and methods used to process structured and unstructured data characterized by huge volumes, vast variety and constant growth. The purpose of Big Data processing lies in finding relevant information and representing it in human-understandable format, so that it could further be used for defining important trends in various spheres of life and for taking decisions based on these trends.

The definition of Big Data is articulated as the three V’s: volume, velocity and variety. Huge amounts of data obtained, unprecedented speed of data streaming and many various types of data formats are the main characteristics and the greatest challenges of Big Data.

Big Data in Day-to-Day Life

There are a lot of examples of applying Big Data methods in various spheres of life.

Mass media, for example, use Big Data techniques to estimate users’ interests and preferences, monitor social media activity and automatically prepare news-sheets or news bulletins.

In healthcare Big Data is accumulated from electronic medical records, body sensors, hospital medical devices. By analyzing such Big Data scientists can predict epidemics.

In science Big Data approaches are widely used in metrology, geology, meteorology, astronomy, etc.

Sport management is all about Big Data that helps games organizers forecast ticket sales and enables bookmakers calculate betting rates.

Big Data in Corporate Systems and ECM

We have mentioned examples from day-to-day life. Let’s have a closer look at the level of corporate systems, particularly enterprise content management systems.

We have analyzed a real company’s 20-year operation in the ECM system. The data gained from the analysis will be used for illustrative purposes.

Number of Users

Number of Documents, Processes and Records

You can see that data growth dynamics is higher than the dynamics of the system users’ growth.

Employees have started using the ECM system more actively. More business processes get initiated, including both those dealing with traditional records management (incoming, outgoing, organizational and executive documents) and those going beyond, such as contract and invoice management, fiscal management and business-to-business collaboration. It was non-traditional tasks that provoked the seismic shift.

The volume of data per employee as well as its variety is constantly growing. While the trend is up, there should come a moment, when people will fail to manage information and drown in it. That’s a given. However, we first should answer the question: is Big Data already here or is it still on the horizon?

To sort the things out, let us divide all business tasks into traditional ones that can be easily handled (those dealing with organizational and executive documentation, incoming and outgoing documents, trivial business processes) and non-traditional ones (accounts payable processing and fiscal management, credit applications and others). Such non-traditional tasks often lead to explosive data growth.

Traditional Docflow Tasks in Big Company

Here is an example of a big company with 1,000 users simultaneously working in an ECM system, where the number of irregular users increased to 3,000 by the end of the year. Even if we take into account only traditional tasks, millions of business processes get initiated, millions of documents and record registration cards (RRCs) are created in the system during a year. These are gigabytes of information that will be constantly growing.

Millions are no longer dozens of thousands! Having such volumes of traditional docflow tasks, we start dealing with Big Data. At that, we find it rather essential to provide higher scalability, obtain statistics and analyze the process in real time mode. Big companies cannot wait.

Non-Traditional Tasks and Explosive Data Growth

As an example, let us examine various industry-specific tasks of banks and retailers.


Every day a bank receives up to 10,000 consumer requests. 200 clerks deal solely with accepting these requests. When accepted, requests must be processed, brought through the credit committee and either approved or disapproved. Ideally, one request should take an hour, not days.

This is a real conveyor! While special information systems are used in banks to automate the process of making decisions, IT solutions for information input and initial processing have just started to be used.


A large retail chain processes more than 100,000 invoices per month. One of the most important tasks retailers deal with is establishing electronic communication with their counterparts, as the current volumes of data confuse accountants when they have to prepare documents for cross- and in-house audits. Another serious challenge is organizing space for document archiving.

After a switch to electronic communication with EDI integration, the number of documents will multiply even more due to the growth of electronic contract documentation that is associated with invoices.

Data Storage or Search for New Efficiency?

As we have shown, ECM systems already accumulate and process big data sets (or will accumulate them in the near future). These are documents, business processes, reference records, history, access rights, etc.

Apart from storage big data sets require managing and analyzing. Analysis can help discover new opportunities for increasing the working efficiency. While reporting issues are clear and can be easily handled, analyzing users’ behavior and business process efficiency issues offers new challenges for ECM systems and their users.

It can be argued that ECM systems are transferring from Records Systems to Business Intelligence systems and even Engagement Systems.

Data Analysis and Employees Engagement

Modern corporate systems are capable to provide various tools for data analysis. Development of the standard software intelligence has been put on production line: using various reports, monitoring and analyzing information in real time mode (as a business dashboard) is something that top managers are already used to.

Managing Documents Correctly

We can evaluate the frequency of working with documents in the system with an aim to organize automatic decision making concerning their archiving and changing their access rights. Also, such evaluation can help arrange compiling documents related to current tasks for specific groups of employees.

How Often Users Access Documents that Describe Technologies and Processes

We see a great number of employees who looked through the document just after it had been created. After some time, the number of document accesses decreased and stopped at the level of newcomers (as experienced employees had adopted basic technologies). Based on such statistics the system itself can decide to move the document into the archival repository and add it to the list of documents that should be studied by newcomers.

Employee Workload Profile

The system can evaluate an employee workload profile including statistics for dealing with documents and performing tasks, it signals the need to balance the workload and distribute the processes among other employees.

Employee Task and Workload Statistics in ECM System

The system can understand that the employee got a managing position in 2009 by noticing that the working profile changed and the employee started to deal more with “fast” tasks and probably delegate more. The percent of overdue tasks remained unchanged, but taking into account the increasing amount of tasks, the number of overdue ones became critical. Apparently, having monitored this fact with the help of the system, the management should take measures to reduce the growing number of overdue tasks.

Employee’s “Digital Footprint”

From the perspective of Big Data, this may appear to be quite fascinating to discover employees’ “digital footprints” left while they worked with various data sets, cooperated with colleagues and did other working activities. Basing on this “digital footprint”, different behavioral patterns may be revealed. Such patterns could improve convenience of user interaction with the system, other systems, coworkers, etc.

As a result we will be able to find means to engage employees in working within the system and increase the general efficiency.

Gamification Tools

Introducing game elements into a corporate environment may help analyze users’ behavior, encourage employees to use the system functions correctly and study them in details, follow corporate technologies and increase efficiency.

Gamification elements implemented in the company confirmed the idea that the new approach is becoming more and more inspiring for employees. We have also discovered a surprising fact that the main participants of the experiment were not young employees, but experienced 30-year-olds and top managers. For the top managers gamification became a new instrument of searching for active employees and evaluating their efficiency.

Percent of Employees Who Noted Specific Gamification Advantages


Another engagement technique can be found in revising principles of employees’ working in corporate systems, from content creation to collaboration within workgroups. Establishing groups of interests (departments, large-scale projects, working groups, professional clubs and guilds), getting information from such groups, making professional newsletters and internal chats may appear to be helpful tools for engaging employees and increasing their working efficiency.

Finding standard patterns of working processes within closely collaborating groups, delegating them power and information, fixing results, practicing informal subordination, applying new principles of data storage and use – all these options become possible when social techniques are incorporated into working environment.

If we take a big company with 1,000 employees, gathering and analyzing corporate statistics are directly related to Big Data.

Where Big Data in ECM Starts

The need for a detailed analysis of unstructured data and defining new patterns requires bigger sets of data. Data get accumulated while employees are working, and modern devices help to ensure their cheap storage. Thus, constant accumulation of corporate information becomes an accustomed practice. ECM systems start taking their place in organizations’ infrastructure due to these factors also.

However, where is the edge beyond which data volumes, variety and velocity of growth will cross the border of corporate systems’ capacity for immediate processing by means of BI, OLAP, etc.?

Our analysis leads us to 3 basic components of a possible evaluation: the number of users, variety and volume of processes, as well as the history of working in the ECM system.

Big Data Edges in ECM

While your company stays within the triangle of up to 1,000 users, up to 100,000 launched process instances and up to 10 years of working history, you deal with data that can be quite easily analyzed by means of your current corporate systems. When you cross the triangle edges, you may face Big Data that will force you to change your approaches and information management methods.


As you get closer to the triangle edges, we recommend you to think of applying tools for storing and processing big data sets, such as InMemory Databases, CEP, BI systems, Data Mining tools and others. At the same time your ECM system should provide high scalability and fail-safety.

Evaluate volumes, variety, growth velocity and promptness of your data analysis. If you expect an explosive growth or get closer to Big Data, this is going to be a good reason for you to discuss a big data tool support with your ECM system provider. Also, this is likely to be a good opportunity to raise your business efficiency due to discovering new unexpected patterns in the process of your employees’ working with corporate information.