April, 2022

article thumbnail

Personal Knowledge Management Workflow for a Deeper Life — as a Computer Scientist

Simon Späti

With burnout and mental stress at every level of our lives, I find my Personal Knowledge Management (PKM) system even more valuable. As a human, I forget lots of things. As a dad, I have more responsibilities with remembering all things related to my kid. As a developer and knowledge worker, I re-use code snippets or create new things. That’s why a PKM system such as a Second Brain to store all of it in a sustainable way is crucial to me.

article thumbnail

Data Scientist, Data Engineer & Other Data Careers, Explained

KDnuggets

In this article, we will have a look at five distinct data careers, and hopefully provide some advice on how to get one's feet wet in this convoluted field.

article thumbnail

Telco 5G Returns Will Come from Enterprise Data Solutions

Cloudera

This blog post was written by Dean Bubley , industry analyst, as a guest author for Cloudera. . Communications service providers (CSPs) are rethinking their approach to enterprise services in the era of advanced wireless connectivity and 5G networks, as well as with the continuing maturity of fibre and Software-Defined Wide Area Network (SD-WAN) portfolios. .

article thumbnail

What is the difference between a data lake and a data warehouse?

Start Data Engineering

Introduction Data lakes and data warehouses Data lake Data warehouse Criteria to choose lake and warehouse tools Conclusion Further reading References Introduction With the data ecosystem growing fast, new terms are coming up every week. Some of the most popular ones include “data lakes” and “data warehouses” If you are Trying to understand the differences between a data lake and a data warehouse Frustrated by vendor marketing content aimed at selling their lake/warehouse

Data Lake 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

DAG Dependencies in Apache Airflow: The Ultimate Guide

Marc Lamberti

DAG Dependencies in Apache Airflow might be one of the most popular topics. I received countless questions about DAG dependencies, is it possible? How? What are the best practices? and the list goes on. It’s funny because it comes naturally to wonder how to do that even when we are beginners. Do we like to complexify things by nature? Maybe, but that’s another question 😉 At the end of this article, you will be able to spot when you need to create DAG Dependencies, which metho

Metadata 130
article thumbnail

How Apache Kafka Works: An Introduction to Kafka’s Internals

Confluent

It’s not difficult to get started with Apache Kafka®. Learning resources can be found all over the internet, especially on the Confluent Developer site. If you are new to Kafka, […].

Kafka 125

More Trending

article thumbnail

15 Python Coding Interview Questions You Must Know For Data Science

KDnuggets

Solving the Python coding interview questions is the best way to get ready for an interview. That’s why we’ll lead you through 15 examples and five concepts these questions cover.

Coding 160
article thumbnail

Becoming an AI-first Organization

Cloudera

The term “AI-first” has received its share of attention lately, especially in the boardroom where strategies to gain a competitive advantage are always welcome. But before a company embarks on an AI-first strategy, it pays to understand what it is and how it will transform the organization. If you’re AI-first, that means you have figured out how to leverage artificial intelligence to boost organizational agility so you can continuously adapt operational processes to deliver the right business ou

article thumbnail

How Netflix Content Engineering makes a federated graph searchable

Netflix Tech

By Alex Hutter , Falguni Jhaveri and Senthil Sayeebaba Over the past few years Content Engineering at Netflix has been transitioning many of its services to use a federated GraphQL platform. GraphQL federation enables domain teams to independently build and operate their own Domain Graph Services (DGS) and, at the same time, connect their domain with other domains in a unified GraphQL schema exposed by a federated gateway.

article thumbnail

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

Data Engineering Podcast

Summary There are very few tools which are equally useful for data engineers, data scientists, and machine learning engineers. WhyLogs is a powerful library for flexibly instrumenting all of your data systems to understand the entire lifecycle of your data from source to productionized model. In this episode Andy Dang explains why the project was created, how you can apply it to your existing data systems, and how it functions to provide detailed context for being able to gain insight into all o

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Building a Dependable Real-Time Betting App with Confluent Cloud and Ably

Confluent

Our everyday digital experiences are in the midst of a revolution. Customers increasingly expect their online experiences to be interactive, immersive, and real time by default. The need to satisfy […].

Building 124
article thumbnail

Big tech versus the airlines – who’s going to win in the modern retailing battle?

Teradata

Find out why data analytics and connectivity will be the difference between retailing taking off and being grounded.

Retail 98
article thumbnail

The 8 Basic Statistics Concepts for Data Science

KDnuggets

Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.

article thumbnail

The Sprint towards Digital Healthcare

Cloudera

The pandemic changed our healthcare behaviors. Planned hospital and doctor visits were reduced while telemedicine, for physical and mental health, increased. As healthcare providers and insurers /payers worked through mass amounts of new data, our health insurance practice was there to help. I have noticed a growing excitement with health insurers around the world exploring these data driven types of capabilities, and I am looking forward to experiencing more of this in my personal life while I

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Responsible AI: Ways to Avoid the Dark Side of AI Use

AltexSoft

“AI systems (will) take decisions that have ethical grounds and consequences.”. Prof. Dr. Virginia Dignum from Umeå University. On March 23, 2016, Microsoft released its AI-based chatbot Tay via Twitter. The bot was trained to generate its responses based on interactions with users. But there was a catch. Various users started posting offensive tweets toward the bot, resulting in Tay making replies in the same language.

article thumbnail

Operational Analytics At Speed With Minimal Busy Work Using Incorta

Data Engineering Podcast

Summary A huge amount of effort goes into modeling and shaping data to make it available for analytical purposes. This is often due to the need to simplify the final queries so that they are performant for visualization or limited exploration. In order to cut down the level of effort involved in making data usable, Matthew Halliday and his co-founders created Incorta as an end-to-end, in-memory analytical engine that removes barriers to insights on your data.

article thumbnail

Announcing Multi-Year Microsoft Partnership to Accelerate Cloud Data Streaming

Confluent

We’re pleased to share a new multi-year partnership between Confluent and Microsoft to accelerate enterprises’ journey to cloud data streaming on Azure. Today’s announcement builds upon the partnership agreement we […].

Cloud 111
article thumbnail

Stop Trying to be a Digital Bank

Teradata

Digitization is necessary, but not sufficient to meet evolving customer demands & create the bank of the future. Use data analytics to help customers achieve their goals not deliver better apps.

Banking 98
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Machine Learning Books You Need To Read In 2022

KDnuggets

I have a list of Machine Learning books you need to read in 2022; beginner, intermediate, expert, and for everybody.

article thumbnail

Data Is Now a Team Sport

Cloudera

This week I participated in an informative event that Cloudera hosted with TechCrunch: Data and the Culture Transformation. The event was moderated by tech industry analyst Maribel Lopez, and we were joined by Shirley Collie, chief health analytics actuary at Discovery Health in South Africa. The conversations focused on how company data cultures are rapidly evolving and delivering new levels of value to businesses with the emergence of data ecosystems.

Insurance 105
article thumbnail

The Next Wave of ‘Ops’ Advances on the Data Center

DataKitchen

Data 95
article thumbnail

Connecting To The Next Frontier Of Computing With Quantum Networks

Data Engineering Podcast

Summary The next paradigm shift in computing is coming in the form of quantum technologies. Quantum procesors have gained significant attention for their speed and computational power. The next frontier is in quantum networking for highly secure communications and the ability to distribute across quantum processing units without costly translation between quantum and classical systems.

SQL 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Introducing Current 2022: The Next Generation of Kafka Summit

Confluent

Data streaming is a new category of technology that is reshaping the way businesses operate, but there hasn’t been a place for everyone in the ecosystem to come together and […].

Kafka 106
article thumbnail

Emerging Risks are Systemic

Teradata

Managing the new class of emerging risks requires infusing the principles of resiliency and efficient risk analytics into traditional risk management frameworks.

Systems 97
article thumbnail

How to Determine the Best Fitting Data Distribution Using Python

KDnuggets

Approaches to data sampling, modeling, and analysis can vary based on the distribution of your data, and so determining the best fit theoretical distribution can be an essential step in your data exploration process.

Python 160
article thumbnail

A Window Into the Future of Data in Motion and What It Means for Businesses

Cloudera

Modern businesses have vast amounts of data at their fingertips and are acutely aware of how enterprise data strategies positively impact business outcomes. Despite this, only a handful of organisations interact with all stages of the data life cycle process to truly distill information that distinguishes future-ready businesses from the rest. Much potential remains untapped when businesses do not translate their data into actionable insights from the point it is created, eroding the usefulness

IT 105
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Dapper Data Podcast: DataKitchen and DataOps – Episode #54 w/ Chris Bergh

DataKitchen

Data 91
article thumbnail

What Does It Really Mean To Do MLOps And What Is The Data Engineer's Role?

Data Engineering Podcast

Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycle of data cleaning, training, deployment and monitoring. This requires a repeatable and evolvable set of processes to keep it functional. The term MLOps has been coined to encapsulate all of these principles and the broader data community is working to establish a set of best practices and useful guidelines for streamlining adoption.

article thumbnail

Kafka Summit London 2022: The Full Recap

Confluent

It’s official: Kafka Summit is back! Technically, it never went away—it just went online. But this week in London, Kafka Summit returned in all its glory to welcome over 1,200 […].

Kafka 105
article thumbnail

Evolution of ML Fact Store

Netflix Tech

by Vivek Kaushal At Netflix, we aim to provide recommendations that match our members’ interests. To achieve this, we rely on Machine Learning (ML) algorithms. ML algorithms can be only as good as the data that we provide to it. This post will focus on the large volume of high-quality data stored in Axion?—?our fact store that is leveraged to compute ML features offline.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.