Introductory Pandas Tutorial
KDnuggets
MARCH 31, 2022
A gentle introduction to data analysis with Pandas.
KDnuggets
MARCH 31, 2022
A gentle introduction to data analysis with Pandas.
Start Data Engineering
MARCH 18, 2022
Objective Setup Pre-requisites Components Source systems Schedule & Orchestrate Extract Load Transform Data visualization Choosing tools & frameworks Future work & improvements Conclusion Further reading References Objective It can be difficult to know where to begin when starting a data engineering side project. If you have wondered What data to use for your data project?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Data Engineering Podcast
MARCH 27, 2022
Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project.
Confluent
MARCH 9, 2022
Imagine your team wants to design a data streaming architecture and you’re in charge of creating the prototype. Within a few minutes, you provision a fully managed Apache Kafka® cluster […].
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Cloudera
MARCH 24, 2022
Earlier this month, the multi-national carrier MTN announced a rebranding, and along with its logo refresh, announced that it was moving to focus on being a technology provider. The new look, “aligns with our evolution from a telecommunications company to a technology company,” said Nompilo Morafo, Chief Corporate Affairs officer at the company. Across APAC too, telcos are looking at the shift to becoming technology companies, and last week’s TMForum Leadership Summit “ The Tech Driven Telco ” s
Teradata
MARCH 24, 2022
Teradata stopped conducting business in Russia earlier this month, and has ceased customer interactions & services with all Russian accounts. Teradata fully supports & is complying with all sanctions.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Monte Carlo
MARCH 14, 2022
The companies we talk to are diligently building their data product or platform. This includes migrating to Snowflake , integrating with Databricks, moving towards a data mesh , or generally investing in their data stack. Increasingly, we are seeing data departments modernize their team structure with data product managers at the helm of such projects.
Data Engineering Podcast
MARCH 27, 2022
Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the disk. RocksDB is one of the most popular choices for this component and has been incorporated into popular systems such as ksqlDB. As these systems are scaled to larger volumes of data and higher throughputs the RocksDB engine can become a bottleneck for performance.
Confluent
MARCH 15, 2022
We are excited to announce ksqlDB 0.24! It comes with a slew of improvements and new features. Access to Apache Kafka® record headers will enable a whole host of new […].
Cloudera
MARCH 30, 2022
Sometimes it takes a billion-dollar mistake to bring the murkier side of data ethics into sharp focus. Equifax found this out to their own cost in 2017 when they failed to protect the data of almost 150 million users globally. The catastrophic breach was bad enough on its own — but Equifax waited three months to go public with the news. As the public furore rose to a crescendo, the credit organization dragged its feet on disclosing exactly what kind of information had been leaked.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Teradata
MARCH 31, 2022
In honor of Women's History Month, we are spotlighting Molly Treese, Teradata's Chief Legal Officer, as she looks back at her career in law & recounts the importance of inclusion in the workplace.
KDnuggets
MARCH 25, 2022
GitHub's Copilot code generation tool is currently only available via approved request. Here are 4 Copilot alternatives that you can use in your programming today.
Rock the JVM
MARCH 31, 2022
Akka, Cats, and Cassandra in a larger Scala project integrating multiple pieces of the Scala ecosystem
Data Engineering Podcast
MARCH 20, 2022
Summary Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user product creates a significant shift in requirements for your data layer. The Pinot OLAP datastore was created for this purpose, optimizing for low latency queries on rapidly updating datasets with highly concurrent queries.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Confluent
MARCH 25, 2022
Logging is an important component of managing service availability, security, and customer experience. It allows Site Reliability Engineers (SREs), developers, security teams, and infrastructure teams to gain insights to how […].
Cloudera
MARCH 10, 2022
As an official sponsor of International Women’s Da y, Cloudera is excited to celebrate Women’s History Month and International Women’s Day, and to take up the mantle of this year’s theme #BreakTheBias. . Even in industries where women are underrepresented, like tech, women have made a lot of progress. Progress over many decades has slowly transformed the workplace into an environment where women’s strengths are recognized and valued.
Teradata
MARCH 29, 2022
Consumers expect personalized experiences when they interact with a brand. But organizations are losing the ability to listen to their customers via digital channels. Fixing this is critical.
KDnuggets
MARCH 25, 2022
Most companies look at it like it’s one big technology, and assume the vendors’ offerings might differ in product quality and price but ultimately be largely the same. Truth is, NLP is not one thing; it’s not one tool, but rather a toolbox.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Rockset
MARCH 31, 2022
I’ve been working as a data and software engineer for more than 20 years. Not long after I joined my current employer Sounding Board , I had to normalize nested JSON arrays in a complex document schema so that I could join the child records to other collections and then denormalize data into a single result set — and I had to do it fast. On top of that, I had to make that data available to our custom-built application via a secure RESTful endpoint with a less than one second response time.
Data Engineering Podcast
MARCH 20, 2022
Summary Data assets and the pipelines that create them have become critical production infrastructure for companies. This adds a requirement for reliability and management of up-time similar to application infrastructure. In this episode Francisco Alberini and Mei Tao share their insights on what incident management looks like for data platforms and the teams that support them.
Confluent
MARCH 30, 2022
From fraud detection and predictive analytics, to real-time customer experiences and cyber security, stream processing has countless benefits for use cases big and small. By unlocking the power of continuous […].
Cloudera
MARCH 8, 2022
Bias is everywhere. . We’re surrounded by it. . And it’s natural. We are alive today as a species because of biases. But it has a tangible impact on our personal and professional lives. Biases shape us and our experience. . As primary caregivers, women have felt the impact of biases and expectations more keenly during the pandemic. Last year women in my network felt like they were being expected to do everything at home and at work.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Teradata
MARCH 24, 2022
In honor of Women's History Month, we are spotlighting Claire Bramley, Teradata's Chief Financial Officer, as she looks back at her career in finance and tech.
KDnuggets
MARCH 31, 2022
Let's revisit the automated machine learning project TPOT, and get back up to speed on using open source AutoML tools on our way to building a fully-automated prediction pipeline.
U-Next
MARCH 31, 2022
Let’s face it! Product Management CAN BE TOUGH, but only if you haven’t laid your hands on the best training experience for Product enthusiasts in all its glory: the PG Certificate Program in Product Management by IIM Indore & Jigsaw. Several present-day Product Experts started their journeys with this exclusive 6-month program & found multiple doors of opportunities, wide open to welcome them.
Data Engineering Podcast
MARCH 13, 2022
Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it as a holistic approach to understanding the computational and logical elements that power your analytical capabilities. In this episode Tristan Spaulding, head of product at Acceldata, explains the multi-dimensional nature of gaining visibility into your running data platform and how they have architected their platform to assist in that endeavor.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Confluent
MARCH 10, 2022
Decentralized architectures continue to flourish as engineering teams look to unlock the potential of their people and systems. From Git, to microservices, to cryptocurrencies, these designs look to decentralization as […].
Cloudera
MARCH 11, 2022
With the launch of CDP Public Cloud 7.2.14, Cloudera Streams Messaging for Data Hub deployments has gotten some powerful new features! In this release , the Streams Messaging templates in Data Hub will come with Apache Kafka 2.8 and Cruise Control 2.5 providing new core features and fixes. KConnect has been added and gains additional capabilities with new connectors and Stateless Apache NiFi capabilities which can run NiFi Flows as connectors.
Teradata
MARCH 16, 2022
Neither crystal balls nor black boxes will provide the agility needed for accurate demand forecasting in today’s retail & CPG environment. Learn more about new approaches to FDP.
KDnuggets
MARCH 24, 2022
A tensor is a container which can house data in N dimensions, along with its linear operations, though there is nuance in what tensors technically are and what we refer to as tensors in practice.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Let's personalize your content