Data Lake vs. Data Warehouse - what strategy is working at your organization?
Sort by:
<mention id="620cf9aef5b0130001e423c5" displayname="Paul Nichols"></mention> Problem domain is business apps including CRM, ERP etc. We have data lake as well today, but it is lacking governance and structure. We are finding that self-service model is working but users have to spend a lot of time massaging the data in order to report.<br><br>Which solution you are leveraging today?
I would not even try to make a recommendation without a good working knowledge of your platform and/or problem domains. It all depends upon the size and scale of the Data Lake you are envisioning, how it is to be used, and what you are attempting to accomplish. <br><br>As in everything else, one size does not fit all and I have no idea about the size, scope, and overall CRM /ERP solutions you are looking to attack or address, not to mention overall budget. :)<br><br>I guess my main question is are these large Enterprise CRM and ERP solutions like Peoplesoft or SAP? Or are they smaller type CRM and ERP solutions like Dynamics 365, ePROMIS, SalesForce, etc.? Are they currently Cloud based or are they residing on legacy hardware in a true HA or non HA (hot or passive standby) based environments?<br><br>In the past, for large scale fortune 500-1000 type Enterprise solutions, we have used both Databricks and Cloudera based solutions. Also some experience with IBM Data Lake for SAP and Peoplesoft migrations.<br><br>If you have already employed some type of earlier Hadoop repository, Databricks has some pretty good migration and management tools. Cloudera began as a managed Hadoop offering so they are pretty good about migrating earlier Hadoop and Spark stacks. If you are using a Cloud provider like AWS, Azure, or Google, you might want to spec and/or compare their Data Lake solution pricing with the other major Data Lake solutions (two or which are mentioned above). <br><br>For smaller Data Lakes (like where I am now), we base it off of a more "roll your own" solution. We are not ready to scale to a traditional Hadoop type repo, so we are primarily basing our solution off DataStax Cassandra. We are not handling copious amounts of unstructured data; -- our primary data (outside of corporate and MDM type data) is IoT based using message based protocols and communication layers between our K8 based microservices. These are hosted on our private clouds (sister corporation). <br><br>There are some litigious type requirements around this data, but our primary objective for a Data Lake is based upon reporting, analytics, and predictive analysis based upon the IoT data we are collecting across several business verticals (with some overlap and cross cutting business domains). <br><br>Our data warehouse solution is handling most of our internal business apps which are relatively small, but save us money on compliance and retention requirements and provide us with cost and performance enhancements through archival opportunities. We have to transfer part of this corp data into our IoT core solutions based upon contractual agreements, billing, and SLAs, but not for much else.<br><br>Best of luck in finding the right solution. Shop around and if you are selecting a vendor, I strongly suggest having your senior tech staff get on a couple of calls with the Vendors and let them walk you through demos of the environment(s) under consideration [preferably a demo that is as close as possible to your problem domain). <br><br>Ask the Vendors for architectural blueprints, overall tech stacks, and of course business success stories. If you have the latitude and budget for a prelim PoC (after selecting a Vendor), I highly recommend it. <br><br>I sincerely hope this helps and does not further muddy the waters. :)<br><br>If you are using AWS today, a good solution to augment with Snowflake.<br><br>Of course<br><br>
Yes. We are successfully using both.
Howard, We are also planning to leverage both as there is a lot of self-service reporting as well. We should connect sometime on this.
<mention id="5f80322fab92d9cdfb7986d8" displayname="Mudit Agarwal"></mention> sounds good.
Data warehousing.
Data Warehouse has been working for us over the years. Data Lake is something that I am interested in implementing down the road.
We utilize both, but preference is Data Lake..
As in all solutions, it depends upon how they are being used, what problem domains you are attempting to solve, the competency of the staff using the technologies, and the overall cost to benefit ratio.