5 Tricky SQL Queries Solved
KDnuggets
AUGUST 19, 2022
Explaining the approach to solving a few complex SQL queries.
KDnuggets
AUGUST 19, 2022
Explaining the approach to solving a few complex SQL queries.
Confluent
AUGUST 17, 2022
Confluent Hackathon ‘22: Using Apache Kafka a Raspberry Pi, and a camera, Simon Aubury builds a detection and monitoring system to better understand wildlife population trends over time.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Cloudera
AUGUST 17, 2022
?. What if you could access all your data and execute all your analytics in one workflow, quickly with only a small IT team? CDP One is a new service from Cloudera that is the first data lakehouse SaaS offering with cloud compute, cloud storage, machine learning (ML), streaming analytics, and enterprise grade security built-in. Data practitioners can now produce end to end analytic pipelines through one service.
Data Engineering Podcast
AUGUST 13, 2022
Summary Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first solution to this problem, but they are only helpful if you know what you are looking for. In this episode Shinji Kim discusses the challenges of data discovery and how to collect and preserve additional context about each piece of information so that you can find what you need when you don’t even know what you’re looking for yet.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
AUGUST 15, 2022
ETL during the process of producing effective machine learning algorithms is found at the base - the foundation. Let’s go through the steps on how ETL is important to machine learning.
Teradata
AUGUST 17, 2022
In conversations with c-level execs at banks & financial institutions, one theme always crops up. How do we change our operating model to be more agile & customer focused in a digital first world?
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Data Engineering Podcast
AUGUST 13, 2022
Summary Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and process heavy undertaking, focusing instead on building automation and repeatable systems. Watchful is a platform to make labeling a repeatable and scalable process that relies on codifying domain expertise.
KDnuggets
AUGUST 18, 2022
If you’re considering a career in data science, it’s important to understand how these two fields differ, and which one might be more appropriate for someone with your skills and interests.
Cloudera
AUGUST 15, 2022
Cloudera’s August Volunteer Spotlight is Thatiane Freire, account executive for the public sector, located in Bras í lia, Brazil, and one of the company’s Cloudera Cares Ambassadors. . Thatiane volunteers with a local organization called Casa de Caridade Inacio Daniel, which began with the goal to meet the day-to-day, foundational needs of the homeless community in Bras í lia.
dbt Developer Hub
AUGUST 17, 2022
When running a job that has over 1,700 models, how do you know what a “good” runtime is? If the total process takes 3 hours, is that fantastic or terrible? While there are many possible answers depending on dataset size, complexity of modeling, and historical run times, the crux of the matter is normally “did you hit your SLAs”? However, in the cloud computing world where bills are based on usage, the question is really “did you hit your SLAs and stay within budget ”?
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Confluent
AUGUST 19, 2022
Learn the basics of Kafka security, including authentication, authorization, encryption, and audit logs for compliant, secure data streaming within any Kafka system.
KDnuggets
AUGUST 17, 2022
Density-based clustering algorithm explained with scikit-learn code example.
Pinterest Engineering
AUGUST 18, 2022
Ankita Girish Wagh | Senior Software Engineer, Storage and Caching Introduction and Motivation At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. canno
U-Next
AUGUST 18, 2022
Introduction: Data Science Projects for Beginners. You have your sights set on a lucrative Data Science position that literally screams “you” in the job title. You know that you possess the Data Science expertise needed for the position. The issue is that you have nothing to show for your broad Data Science skill set. Anyone can claim to be a good data scientist on their CV, but hiring managers want to see examples to support that claim.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Monte Carlo
AUGUST 17, 2022
Over the years, I’ve helped companies of all sizes build and maintain data systems—from my days as a data engineer at Facebook to my current role as an end-to-end data solutions consultant. As a YouTuber and blogger , I’ve connected with data engineers from all over the world. And these days, everyone seems to share a common concern: how do we make sure the data we rely on to make all of our important business decisions is actually reliable?
KDnuggets
AUGUST 19, 2022
Looking to sort out the difference between Type I and Type II errors? Read on for more.
Rockset
AUGUST 16, 2022
Streams for Everyone If you have come this far it means you have already considered or are considering using event streaming in your data architecture for the wide variety of benefits it can offer. Or perhaps you are looking for something to support a Data Mesh initiative because that’s all the rage right now. In either case, both Amazon Kinesis and Apache Kafka can help but which one is the right fit for you and your goals.
U-Next
AUGUST 18, 2022
The power function in Java allows users to deal with mathematical equations and procedures. Read on to learn about it in detail. An Introduction to Power Functions in Java. A large library allowing the calculation of many complex mathematical equations and procedures is available in Java. In Java, the library is known as the Math class. It is contained in the Java Lang package.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Monte Carlo
AUGUST 16, 2022
When it comes to trusting your data, Monte Carlo, the creator of the data observability category, and dbt Labs , creators of dbt, are better together. “Why didn’t my job run?” “What happened to this dashboard?” “Why is this column missing?” “What went wrong with my data?!” If you’ve been on the receiving end of a broken data pipeline, these questions probably look familiar to you.
KDnuggets
AUGUST 16, 2022
This blog outlines a solution to the Kaggle Titanic challenge that employs Privacy-Preserving Machine Learning (PPML) using the Concrete-ML open-source toolkit.
Rock the JVM
AUGUST 15, 2022
Learn how to effortlessly set up an HTTP server with zio-http: the powerful HTTP library in the ZIO ecosystem
U-Next
AUGUST 18, 2022
A name that “identifies” either a singular thing or a particular class of objects can be an idea, a countable physical object, or a physical uncountable substance. For in-depth understanding, read the full blog. Introduction to Java Identifiers. A program’s basic building blocks are variables, methods, and classes. There is no use in writing a program if it does not include class, process, and variable.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Monte Carlo
AUGUST 15, 2022
When you’re the first data hire at a startup, the sky’s the limit—and that can be incredibly overwhelming. Who do you hire first? What tools should you invest in? What KPIs should you measure? And much more. No matter how you cut it, you don’t have an instruction manual, and given how fast the data landscape is evolving, it’s hard to find (let alone follow) best practices for building a data team from scratch.
KDnuggets
AUGUST 16, 2022
High data availability may help power digital transformation, but data management systems are needed to keep that data organizaed and make it accessible. Read this article to see why data management is important to data science.
Rock the JVM
AUGUST 15, 2022
Learn how to effortlessly set up an HTTP server with zio-http: the powerful HTTP library in the ZIO ecosystem
U-Next
AUGUST 18, 2022
Introduction to the 7 Ps of Marketing. A strategic marketing framework helps us define targets based on the existing position of a firm. The strategy outlines how those goals will be met, including the target market and the firm’s position. So we need to specify the techniques to make this strategy a reality, which is where the 7 ps of marketing comes into play.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Data Science Blog: Data Engineering
AUGUST 14, 2022
Many Process Mining projects mainly revolve around the selection and introduction of the right Process Mining tools. Relying on the right tool is of course an important aspect in the Process Mining project. Depending on whether the process analysis project is a one-time affair or daily process monitoring, different tools are pre-selected. Whether, for example, a BI system has already been established and whether a sophisticated authorization concept is required for the process analyzes also play
KDnuggets
AUGUST 15, 2022
The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.
Propel Data
AUGUST 17, 2022
We've picked Recharts, Echarts, React ChartJS 2, and VISX as the best charting libraries for data visualization and data analytics in React.
U-Next
AUGUST 17, 2022
Introduction: Data Science Projects for Beginners. You have your sights set on a lucrative Data Science position that literally screams “you” in the job title. You know that you possess the Data Science expertise needed for the position. The issue is that you have nothing to show for your broad Data Science skill set. Anyone can claim to be a good data scientist on their CV, but hiring managers want to see examples to support that claim.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content