April, 2022

article thumbnail

Personal Knowledge Management Workflow for a Deeper Life — as a Computer Scientist

Simon Späti

With burnout and mental stress at every level of our lives, I find my Personal Knowledge Management (PKM) system even more valuable. As a human, I forget lots of things. As a dad, I have more responsibilities with remembering all things related to my kid. As a developer and knowledge worker, I re-use code snippets or create new things. That’s why a PKM system such as a Second Brain to store all of it in a sustainable way is crucial to me.

article thumbnail

What is the difference between a data lake and a data warehouse?

Start Data Engineering

Introduction Data lakes and data warehouses Data lake Data warehouse Criteria to choose lake and warehouse tools Conclusion Further reading References Introduction With the data ecosystem growing fast, new terms are coming up every week. Some of the most popular ones include “data lakes” and “data warehouses” If you are Trying to understand the differences between a data lake and a data warehouse Frustrated by vendor marketing content aimed at selling their lake/warehouse

Data Lake 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DAG Dependencies in Apache Airflow: The Ultimate Guide

Marc Lamberti

DAG Dependencies in Apache Airflow might be one of the most popular topics. I received countless questions about DAG dependencies, is it possible? How? What are the best practices? and the list goes on. It’s funny because it comes naturally to wonder how to do that even when we are beginners. Do we like to complexify things by nature? Maybe, but that’s another question 😉 At the end of this article, you will be able to spot when you need to create DAG Dependencies, which metho

Metadata 130
article thumbnail

The 8 Basic Statistics Concepts for Data Science

KDnuggets

Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Reasons for Data Mesh on Pulsar

Jesse Anderson

Data mesh is quickly becoming a way for companies to roll out their data strategy. If you haven’t already learned about data mesh , I suggest doing so. It comes with organizational and technical changes. I think a crucial part of your data mesh revolves around the choice of publish/subscribe technologies. At the crux of data mesh is a desire for flexibility.

Kafka 124
article thumbnail

Telco 5G Returns Will Come from Enterprise Data Solutions

Cloudera

This blog post was written by Dean Bubley , industry analyst, as a guest author for Cloudera. . Communications service providers (CSPs) are rethinking their approach to enterprise services in the era of advanced wireless connectivity and 5G networks, as well as with the continuing maturity of fibre and Software-Defined Wide Area Network (SD-WAN) portfolios. .

More Trending

article thumbnail

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

Data Engineering Podcast

Summary There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful library for flexibly instrumenting all of your data systems to understand the entire lifecycle of your data from source to productionized model. In this episode Andy Dang explains why the project was created, how you can apply it to your existing data systems, and how it functions to provide detailed context for being able to gain insight into all o

article thumbnail

How Netflix Content Engineering makes a federated graph searchable

Netflix Tech

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform. GraphQL federation enables domain teams to independently build and operate their own Domain Graph Services (DGS) and, at the same time, connect their domain with other domains in a unified GraphQL schema exposed by a federated gateway.

article thumbnail

Naïve Bayes Algorithm: Everything You Need to Know

KDnuggets

Naïve Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of classification tasks. In this article, we will understand the Naïve Bayes algorithm and all essential concepts so that there is no room for doubts in understanding.

Algorithm 160
article thumbnail

Responsible AI: Ways to Avoid the Dark Side of AI Use

AltexSoft

“AI systems (will) take decisions that have ethical grounds and consequences.”. Prof. Dr. Virginia Dignum from Umeå University. On March 23, 2016, Microsoft released its AI-based chatbot Tay via Twitter. The bot was trained to generate its responses based on interactions with users. But there was a catch. Various users started posting offensive tweets toward the bot, resulting in Tay making replies in the same language.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Becoming an AI-first Organization

Cloudera

The term “AI-first” has received its share of attention lately, especially in the boardroom where strategies to gain a competitive advantage are always welcome. But before a company embarks on an AI-first strategy, it pays to understand what it is and how it will transform the organization. If you’re AI-first, that means you have figured out how to leverage artificial intelligence to boost organizational agility so you can continuously adapt operational processes to deliver the right business ou

article thumbnail

Building a Dependable Real-Time Betting App with Confluent Cloud and Ably

Confluent

Our everyday digital experiences are in the midst of a revolution. Customers increasingly expect their online experiences to be interactive, immersive, and real time by default. The need to satisfy […].

Building 124
article thumbnail

Operational Analytics At Speed With Minimal Busy Work Using Incorta

Data Engineering Podcast

Summary A huge amount of effort goes into modeling and shaping data to make it available for analytical purposes. This is often due to the need to simplify the final queries so that they are performant for visualization or limited exploration. In order to cut down the level of effort involved in making data usable, Matthew Halliday and his co-founders created Incorta as an end-to-end, in-memory analytical engine that removes barriers to insights on your data.

article thumbnail

Stop Trying to be a Digital Bank

Teradata

Digitization is necessary, but not sufficient to meet evolving customer demands & create the bank of the future. Use data analytics to help customers achieve their goals not deliver better apps.

Banking 98
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

5 Different Ways to Load Data in Python

KDnuggets

Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.

Python 160
article thumbnail

Evolution of ML Fact Store

Netflix Tech

by Vivek Kaushal At Netflix, we aim to provide recommendations that match our members’ interests. To achieve this, we rely on Machine Learning (ML) algorithms. ML algorithms can be only as good as the data that we provide to it. This post will focus on the large volume of high-quality data stored in Axion?—?our fact store that is leveraged to compute ML features offline.

article thumbnail

The Sprint towards Digital Healthcare

Cloudera

The pandemic changed our healthcare behaviors. Planned hospital and doctor visits were reduced while telemedicine, for physical and mental health, increased. As healthcare providers and insurers /payers worked through mass amounts of new data, our health insurance practice was there to help. I have noticed a growing excitement with health insurers around the world exploring these data driven types of capabilities, and I am looking forward to experiencing more of this in my personal life while I

article thumbnail

Announcing Multi-Year Microsoft Partnership to Accelerate Cloud Data Streaming

Confluent

We’re pleased to share a new multi-year partnership between Confluent and Microsoft to accelerate enterprises’ journey to cloud data streaming on Azure. Today’s announcement builds upon the partnership agreement we […].

Cloud 111
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Connecting To The Next Frontier Of Computing With Quantum Networks

Data Engineering Podcast

Summary The next paradigm shift in computing is coming in the form of quantum technologies. Quantum procesors have gained significant attention for their speed and computational power. The next frontier is in quantum networking for highly secure communications and the ability to distribute across quantum processing units without costly translation between quantum and classical systems.

SQL 100
article thumbnail

Emerging Risks are Systemic

Teradata

Managing the new class of emerging risks requires infusing the principles of resiliency and efficient risk analytics into traditional risk management frameworks.

Systems 97
article thumbnail

Uncertainty Quantification in Artificial Intelligence-based Systems

KDnuggets

The article summarizes the plethora of UQ methods using Bayesian techniques, shows issues and gaps in the literature, suggests further directions, and epitomizes AI-based systems within the Financial Crime domain.

Systems 160
article thumbnail

Reflections of a Rockset UXer

Rockset

It is often said time flies when you are having fun and I couldn't agree more. I have been at Rockset for almost three years now and it is still so interesting to me. On one hand, I am just getting started and have so much more to do and on the other, I am so proud of the distance we have covered in the last few years! Photo by Daoudi Aissa on Unsplash Our customers tell us that the work we are doing matters to them: Rockset made me a hero on day three of my new job.

Medical 52
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Data Is Now a Team Sport

Cloudera

This week I participated in an informative event that Cloudera hosted with TechCrunch: Data and the Culture Transformation. The event was moderated by tech industry analyst Maribel Lopez, and we were joined by Shirley Collie, chief health analytics actuary at Discovery Health in South Africa. The conversations focused on how company data cultures are rapidly evolving and delivering new levels of value to businesses with the emergence of data ecosystems.

article thumbnail

Introducing Current 2022: The Next Generation of Kafka Summit

Confluent

Data streaming is a new category of technology that is reshaping the way businesses operate, but there hasn’t been a place for everyone in the ecosystem to come together and […].

Kafka 106
article thumbnail

What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?

Data Engineering Podcast

Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycle of data cleaning, training, deployment and monitoring. This requires a repeatable and evolvable set of processes to keep it functional. The term MLOps has been coined to encapsulate all of these principles and the broader data community is working to establish a set of best practices and useful guidelines for streamlining adoption.

article thumbnail

Big tech versus the airlines – who’s going to win in the modern retailing battle?

Teradata

Find out why data analytics and connectivity will be the difference between retailing taking off and being grounded.

Retail 98
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

4 Factors to Identify Machine Learning Solvable Problems

KDnuggets

The near future holds incredible possibility for machine learning to solve real world problems. But we need to be be able to determine which problems are solvable by ML and which are not.

article thumbnail

3 Simple Steps For Snowflake Cost Optimization Without Getting Too Crazy

Monte Carlo

Most data pros know Snowflake’s pricing model is consumption based–you pay for what you use. What many don’t know is Snowflake actually WANTS you to optimize your costs and has provided helpful features to rightsize your consumption. Waste isn’t good for anyone. Instead of spinning cycles on deteriorated SQL queries, the data cloud provider would rather have you focus those Snowflake credits toward projects like building data apps.

article thumbnail

A Window Into the Future of Data in Motion and What It Means for Businesses

Cloudera

Modern businesses have vast amounts of data at their fingertips and are acutely aware of how enterprise data strategies positively impact business outcomes. Despite this, only a handful of organisations interact with all stages of the data life cycle process to truly distill information that distinguishes future-ready businesses from the rest. Much potential remains untapped when businesses do not translate their data into actionable insights from the point it is created, eroding the usefulness

IT 99
article thumbnail

Kafka Summit London 2022: The Full Recap

Confluent

It’s official: Kafka Summit is back! Technically, it never went away—it just went online. But this week in London, Kafka Summit returned in all its glory to welcome over 1,200 […].

Kafka 105
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.