Introductory Pandas Tutorial
KDnuggets
MARCH 31, 2022
A gentle introduction to data analysis with Pandas.
KDnuggets
MARCH 31, 2022
A gentle introduction to data analysis with Pandas.
Data Engineering Podcast
MARCH 27, 2022
Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Cloudera
MARCH 30, 2022
Sometimes it takes a billion-dollar mistake to bring the murkier side of data ethics into sharp focus. Equifax found this out to their own cost in 2017 when they failed to protect the data of almost 150 million users globally. The catastrophic breach was bad enough on its own — but Equifax waited three months to go public with the news. As the public furore rose to a crescendo, the credit organization dragged its feet on disclosing exactly what kind of information had been leaked.
Teradata
MARCH 31, 2022
In honor of Women's History Month, we are spotlighting Molly Treese, Teradata's Chief Legal Officer, as she looks back at her career in law & recounts the importance of inclusion in the workplace.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
MARCH 31, 2022
Let's revisit the automated machine learning project TPOT, and get back up to speed on using open source AutoML tools on our way to building a fully-automated prediction pipeline.
Data Engineering Podcast
MARCH 27, 2022
Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the disk. RocksDB is one of the most popular choices for this component and has been incorporated into popular systems such as ksqlDB. As these systems are scaled to larger volumes of data and higher throughputs the RocksDB engine can become a bottleneck for performance.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Teradata
MARCH 29, 2022
Consumers expect personalized experiences when they interact with a brand. But organizations are losing the ability to listen to their customers via digital channels. Fixing this is critical.
KDnuggets
MARCH 30, 2022
Create a data science learning path with courses from the world’s most prestigious university.
Confluent
MARCH 30, 2022
From fraud detection and predictive analytics, to real-time customer experiences and cyber security, stream processing has countless benefits for use cases big and small. By unlocking the power of continuous […].
Elder Research
APRIL 1, 2022
The post Get with the Times appeared first on Elder Research.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Monte Carlo
APRIL 1, 2022
Today, on April 1st , Monte Carlo announced the release of Data Observability Small Batch, a next-generation platform for locally-sourced, small-batch data. The solution was painstakingly crafted by artisan developers to serve a new wave of data engineers who are nostalgic for data platforms the way they used to be. “The world is tired of over-processed data, mass-marketed to them in over-hyped dashboards,” says Barr Moses, CEO, Monte Carlo.
Rock the JVM
MARCH 31, 2022
Akka, Cats, and Cassandra in a larger Scala project integrating multiple pieces of the Scala ecosystem
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Rockset
MARCH 31, 2022
I’ve been working as a data and software engineer for more than 20 years. Not long after I joined my current employer Sounding Board , I had to normalize nested JSON arrays in a complex document schema so that I could join the child records to other collections and then denormalize data into a single result set — and I had to do it fast. On top of that, I had to make that data available to our custom-built application via a secure RESTful endpoint with a less than one second response time.
KDnuggets
MARCH 28, 2022
Let me walk you through the top 13 data science skills that you should have to become a successful data scientist. Following this outline, you’ll have a great path of digestible steps to educate yourself and be prepared to apply for data scientist positions.
U-Next
MARCH 31, 2022
Let’s face it! Product Management CAN BE TOUGH, but only if you haven’t laid your hands on the best training experience for Product enthusiasts in all its glory: the PG Certificate Program in Product Management by IIM Indore & Jigsaw. Several present-day Product Experts started their journeys with this exclusive 6-month program & found multiple doors of opportunities, wide open to welcome them.
DareData
MARCH 30, 2022
Many times, a data developer is constrained by the data they were given. In Data Science / Engineering projects, it is not unusual that extra data is added from other sources - even ones that are outside of the organization. But extra data can be available in non-standard ways, that require new processing techniques. For example, when it comes to geographical data many governments provide open spatial data about their territory.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
A Cloud Guru: Data Engineering
MARCH 30, 2022
Considering your options when it comes to Google Cloud (GCP) certification paths? This post will talk about the various GCP cloud certifications, what each cert covers, what it could mean for your career, and how you can set (and achieve) your own personal goals. Accelerate your career Get started with ACG and transform your career with […] The post Which Google Cloud certification is best for me?
KDnuggets
MARCH 31, 2022
The most ambitious Enterprise MLOps conference is coming to New York City on May 5-6, bringing the data science community together in-person for a one-of-a-kind event. This year, you’ll hear from 30+ speakers across dozens of industries. Save 50% with the promo code KDN. Register now!
U-Next
MARCH 31, 2022
With the onset of the 5th industrial revolution, the world is moving closer towards embracing newer technologies in almost every walk of life. In the business ecosphere, those who upskill & transform into the best professionals versions of themselves are bound to be at the forefront of this revolution. The Sales domain, too, cannot be home to traditional sales methods for too long.
Monte Carlo
MARCH 30, 2022
Say it with me: your data will never be perfect. Any team striving for completely accurate data will be sorely disappointed. Data testing , anomaly detection, and cataloging are important steps, but technology alone will not solve your data quality problem. Like any entropic system, data breaks. And as we’ve learned building solutions to curb the causes and downstream impact of data issues, it happens more often than you think.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Booking.com Engineering
MARCH 30, 2022
Introduction We are going to review the subtleties and complications of trying to compare objects for equality in Java, where the problem originates, why it is important, Kotlin’s approach on the problem and some recommendations on the topic. Determining if two entities are the same is a fundamental operation in mathematics and we implement this operation in programming by the weaker notion of equivalency; the difference being that we are content with equality across a specific subset of propert
KDnuggets
APRIL 1, 2022
Several factors must be taken into consideration when designing experiments for data collection.
Rockset
MARCH 29, 2022
At Dimona , a leading Latin American apparel company founded 55 years ago in Brazil, our business is t-shirts. We design them, manufacture them, and sell them to consumers online and through our five retail stores in Rio de Janeiro. We also supply B2B companies for their customers in Brazil and the United States. Source: [link] We’ve come a long way since 2011 when I joined Dimona to launch our first website.
Monte Carlo
MARCH 29, 2022
You may not have heard the term data downtime, but I’m willing to bet you’ve experienced it and the cost of bad data firsthand. Urgent ping from your CEO about “missing data” in a critical report? Duplicate tables wreaking havoc in your Snowflake warehouse, all titled some variation of “Mikes_Table_GOOD-V3.”? Or, perhaps you’ve unintentionally made a decision based on bad data from last year’s forecasts?
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Elder Research
MARCH 28, 2022
The post Previewing the MYOB Podcast appeared first on Elder Research.
KDnuggets
MARCH 31, 2022
A loss function measures how wrong the model is in terms of its ability to estimate the relationship between x and y. Find out about several common loss functions here.
AltexSoft
MARCH 28, 2022
Vacation and short-term rentals are experiencing a post-COVID renaissance. The data clearly shows the stable, worldwide increase in demand for alternative accommodations, from apartments to farm stays to igloos. The data also indicates that more and more companies in the sector tie their bright future with… data. According to the Global Vacation Rental Report 2022 , 40 percent of property managers rely on market business intelligence (BI) or analytics services, a big leap compared to just 13 per
Confluent
MARCH 28, 2022
When Sanjana Kaundinya chose Confluent for her first job out of college, she was eager to learn as much as possible—and in the two years since, that’s exactly what she’s […].
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content