Data Warehouse Concepts and Tools: Do’s and Don’ts While Building Your Solution

23/04/20
ALL articles
By Alexey Utkin
Principal Solution Consultant, Finance Practice
ALL articles
By Oleg Komissarov
Principal Consultant, Finance Practice
Share
Data Warehouse Concepts and Tools: Do’s and Don’ts While Building Your Solution

Are you looking for data warehouse concepts and tools? Do you need to understand how to successfully migrate your data architecture and adopt a more flexible, scalable, and cost-efficient modern data platform? Unfortunately, the path to success is a lot more complex than simply choosing a package or solution.

DataArt consultants have extensive experience building modern data platforms; this is why we can share our recommendations for each step of your migration journey. Our teams have worked with many clients and varying business requirements, using custom patterns optimized for each client’s needs - as there is no single unified data warehousing (DWH) model that meets all business needs.

So, let’s discuss the entire flow — from the idea stage to DWH building — step by step, with all its do’s and don’ts along the way.

Step 1: Decide Whether You Need Outside Help

Companies that want to implement cloud-based data solutions don’t usually have enough expertise to do so, simply because these platforms are not standard IT or tech projects. Internal IT departments shoulder the responsibility of building a solution and, in the end, frequently fall short of expectations. A knowledge gap leads to high expenses and results in a cloud solution that is merely a copy of the previously used on-premise solution, with all its limitations included.

The best approach is to combine efforts of in-house IT specialists who know all the internal business processes and external consultants who can facilitate the migration process. This collaboration may considerably reduce both development and infrastructure costs, and lets the company make conscious choices in all needed design areas.

Don’t: Try to build a solution with insufficient expertise, by relying solely on internal resources. This leads to crossing budget limits. Moreover, the result of amateur work is unlikely to meet the expectation of the company’s CTO or COO.

Do: Get ready to look for a consultant who is specialized in building mature data solutions and who knows which architecture pattern will best suit your individual case. What if your business does not require any DWH at all?

Step 2: Outline Your Strategy and Tactics

Prior to starting to build a solution, the team responsible for this task has to determine the strategy and tactics required, based on business objectives. It is critical to capture and communicate the results your business wants to see.

Among our client’s projects, we see one or several of the following high-level strategic drivers when implementing modern data architecture:

  • Enable insight-driven organization, or giving business users a combination of traditional BI and reporting workloads, with self-service and agile BI and ad-hoc querying, while addressing traditional challenges of data integration, governance, and quality.
  • Enable next-generation data products, data-driven apps, embedded BI, and data delivery APIs. In a way this is similar to the first driver, yet focused on external clients.
  • Enable advanced analytics: address the needs of data scientists, data engineers, and implement use cases powered by real-time analytics and machine learning.
  • Re-platform, often with cloud technologies, to improve scale and reduce the cost of infrastructure, implementation, and maintenance of your analytics solution.

Generate a structured plan, including the objective metrics the board of directors wants to achieve. This may be the speed of solution deployment, cost performance index, time to market, or combating legacy challenges in your data platforms.

If you fail to do this, your development process is likely to fail for one of these reasons:

  • The business reality changes much quicker than you can develop your solution.
  • Your business is unable to accept, process and adjust to multiple changes at once.
  • Your new solution is not what’s really needed, because of a lack of frequent feedback.

Don’t: Rely on Big Bangs. Moving directly from the idea of a DWH solution to its development carries lots of drawbacks - such as a long time to market, low solution capacity, and lots of money spent along the way.

Do: Start with business value, iterate, and evolve. Your team has to generate an envisioned, specific successful business scenario, based on dialog with decision-makers, the company CTO, and/or COO, and only then should you move to another step in the journey.

Step 3: Find Stakeholders Committed to the Project

Managing the entire process of integrating a DWH solution with your own resources is exhausting and time-consuming. The knowledge gap in the expertise of your IT team, along with an unclear vision of the future project, are key blockers in DWH implementation success.

When you have outlined your strategy and tactics, create a team of stakeholders who express the same level of interest in your project, and commit to its success.

Don’t: Initiate the project if you see that stakeholders are not committed to results and do not contribute.

Do: Find a committed group of stakeholders who have a clear benefit from and interest in the project’s success, and allow this group to facilitate the DWH development process. Preferably, this team should include business decision-makers, tech leaders, and analytics champions (e.g. CDO).

Step 4: Perform a High-Level Assessment of Your Current and Target States

At this stage, your task is to think over appropriate methods for evaluating the effectiveness of data warehouse implementation for your business and create an elaborate vision of a specific successful business scenario. This means you must understand whether the DWH meets your expectations or not upon its integration.

Don’t: Launch the project without knowing how to assess its success in the future. Simply building and integrating a DWH does not suffice.

Do: Identify metrics to measure DWH implementation success, performance, and adoption by the business. These metrics may include but are not limited to: the speed and scale of data processing, data volume it supports, and how fast new data and analytics use cases can be introduced.

What’s even more important is to envision how end-users will engage with your data, and what will change in their lives. Then, digitize these indicators so you can rely on them while planning your data model and analyze the efficiency of your result later.

Step 5: Decide on Data Warehouse Concepts and Tools

With an exploded set of data technologies used to build data solutions, it has become difficult to identify which tools to use for your project. So before choosing a technology to build your modern data solution, you need to understand your choices.

Looking to modernize your data platform?

By relying on three of the four big data Vs (Volume, Variety, and Velocity), you can distinguish the following platforms:

  • DWHs are optimized for structured, cleansed and integrated data and target a wide range of business users.
  • Data lakes are used for unstructured raw data, where volume and variety matter. These solutions let you store and process data in a low-cost and scalable way. Data lakes are used more by sophisticated business data analysts, data scientists, data and software engineers.
  • To support data velocity and provide real-time analysis, implement streaming analytics solutions, which may use technology similar to data lakes, but are specially configured to hit required latencies.

Depending on your type of data and its usage, you have to choose the appropriate technology solution, or more often adopt a hybrid solution.

By the way, if you still doubt what data architecture to use, watch our recent webinar here and learn how to modernize your data management and analytics platform.

Another approach to data solution concepts is to distinguish between them by the workloads they address:

  • Traditional BI and reporting workloads are covered mainly by structured data from DWH. Here, the team of data engineers is responsible for sourcing, integrating, and modeling of data, development of reports, dashboards, and data marts. This approach is time-consuming and expensive but well justified for the most important organizational data being used by a wide group of business users, including CxOs and senior management.
  • Self-service BI allows business users to perform data sourcing and aggregation, as well as reporting and dashboarding. In this case, a team of data engineers and analysts may monitor and support this solution and serve business users.
  • Ad-hoc querying allows business users to source data and query a wide set of available data, often unstructured and stored in different systems.
  • Data science workloads cover the needs of data scientists, such as querying big data and the use of data science tools.
  • Machine learning production pipeline supports models created by data scientists for self-studying, self-monitoring, and self-adjusting.

Snowflake, Oracle Exadata, Teradata, Microsoft Parallel Data Warehouse, and AWS are among the top cloud-based data solution providers that can facilitate any of the above data types. 

Don’t: Choose a solution without understanding whether it suits your specific needs, whether it is cost-efficient, and whether it provides sufficient scaling and flexibility.

Do: Choose the cloud solution, technology provider, tools and concepts based on your type of corporate data and your business needs, to avoid incompatibilities.

Step 6: Validate Your Solution with an MVP

Move forward by generating a simple MVP to demonstrate your data solution functionality, and engage with users to get real-life early feedback. This is a budget-optimal way to understand the real potential of the solution for your business.

Don’t: Start by building a mature product.

Do: Demonstrate all the benefits of the future project through a simple MVP. Сreate a PoC to design and validate the elements of your solution.

Step 7: Create a Scaled Deployment Roadmap and Evolve Your Solution

The next step in your journey is to generate a roadmap with all project delivery points and metrics included. Good data solution implementation approaches take into account three threads: incremental implementation of business use cases, increments of architecture and tooling foundation, and gradual business adoption of the new data capability and operating model. Once the roadmap is ready, start building your data solution. At this point, it would make sense to work in partnership with an experienced consultant who can share their knowledge and experience with your team.

Don’t: Neglect the consultant’s assistance and the chance to learn from their experience.

Do: Try to learn from your partner, and invest in relevant team education to stick to the latest technology news and trends on the market.

Step 8: Monitor and Optimize

In the old days, the data platform capacity was planned before its functionality was deployed to users. But in the modern cloud and self-service reality, this could happen just after deployment - but it should happen anyway.

Don’t: Once your data platform is deployed, do not leave it without control. Otherwise, storage and computing costs may grow exponentially.

Do: Regularly monitor data platform workloads and pipelines to identify whether your solution needs any optimization.

Conclusion

The entire process of integrating data solutions may seem very time-consuming. Most companies mistakenly think that it’ll take months to implement data warehousing in their business. But in reality, with the right process facilitation, you can benefit from the first results in just weeks.

So don’t neglect the steps described in this article. And if you need additional information or consultation, feel free to contact the DataArt team for more help.

Sign Up for Updates!

Subscribe now to receive industry-related articles and updates

Choose industries of interest
Thank You for Joining!

You will receive regular updates based on your interests. No spam guaranteed

Add another email address
Read more
Sign Up for Updates!
Choose industries of interest
Thank You for Joining!

You will receive regular updates based on your interests. No spam guaranteed

Add another email address
LI Webinar The Role of Data Lakes in Modern Data Platforms 1406 (1).png
Register for the Webinar «Intelligent Automation: Advance Your Business Processes»
BONUS

Discover how to achieve up to 60% in cost savings by automating your business processes


Welcome
We are glad you found us
Please explore our services and find out how we can support your business goals.
Let's Talk