Sat.Oct 15, 2022 - Fri.Oct 21, 2022

article thumbnail

Pollen’s enormous debt left behind: exclusive details

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe. Pollen, the events festival tech startup, went bankrupt in August after raising more than $200M in venture funding. In an exclusive investigative article , I covered the events and details leading up this bankruptcy.

Banking 130
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Independent Anniversary

Jesse Anderson

I have a calendar reminder that tells me when I founded Big Data Institute. It just told me I founded the company eight years ago. The reminder is called “Independent Anniversary.” It’s the day I split off and executed my vision for an independent, big data consulting company. Independence has all sorts of manifestations. For you, it’s an independent look at technology and vendors from someone who’s worked at a vendor (Cloudera) and worked in distributed systems for even longer.

article thumbnail

Frameworks for Approaching the Machine Learning Process

KDnuggets

This post is a summary of 2 distinct frameworks for approaching machine learning tasks, followed by a distilled third. Do they differ considerably (or at all) from each other, or from other such processes available?

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

Data Engineering Podcast

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Dremio is one of the companies leading the development of products and services that support the open lakehouse. In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to.

Data Lake 100
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

More Trending

article thumbnail

Working With Sparse Features In Machine Learning Models

KDnuggets

Sparse features can cause problems like overfitting and suboptimal results in learning models, and understanding why this happens is crucial when developing models. Multiple methods, including dimensionality reduction, are available to overcome issues due to sparse features.

article thumbnail

Speeding Up The Time To Insight For Supply Chains And Logistics With The Pathway Database That Thinks

Data Engineering Podcast

Summary Logistics and supply chains are under increased stress and scrutiny in recent years. In order to stay ahead of customer demands, businesses need to be able to react quickly and intelligently to changes, which requires fast and accurate insights into their operations. Pathway is a streaming database engine that embeds artificial intelligence into the storage, with functionality designed to support the spatiotemporal data that is crucial for shipping and logistics.

Database 100
article thumbnail

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

We say ‘xerox’ speaking of any photocopy, whether or not it was created by a machine from the Xerox corporation. We describe information search on the Internet with just one word — ‘google’. We ‘photoshop pictures’ instead of editing them on the computer. And COVID-19 made ‘zoom’ a synonym for a videoconference. Kafka can continue the list of brand names that became generic terms for the entire type of technology.

Kafka 93
article thumbnail

Cloudera Uses CDP to Reduce IT Cloud Spend by $12 Million

Cloudera

Like all of our customers, Cloudera depends on the Cloudera Data Platform (CDP) to manage our day-to-day analytics and operational insights. Many aspects of our business live within this modern data architecture, providing all Clouderans the ability to ask, and answer, important questions for the business. Clouderans continuously push for improvements in the system, with the goal of driving up confidence in the data.

Cloud 89
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

7 Free Platforms for Building a Strong Data Science Portfolio

KDnuggets

Outshine others and increase your odds of getting hired by maintaining a data science portfolio with projects, resumes, blogs, and reports.

Portfolio 160
article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Netflix Tech

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations. A large number of batch workflows run daily to serve various business needs.

Java 83
article thumbnail

5 Steps To A Successful Data Warehouse Migration

Monte Carlo

Platform and data warehouse migrations aren’t something you do everyday or even every few years, but they’re becoming much more frequent as organizations seek to modernize their data infrastructure with the new capabilities being offered by Snowflake, Databricks, Google, AWS, and others. [Editor’s note: We agree. Cloud database migrations were listed in our latest ebook The 22 Hottest Trends In Data Right Now ] Migrations are like Schrodinger’s cat.

article thumbnail

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Cloudera

The telecommunications industry continues to develop hybrid data architectures to support data workload virtualization and cloud migration. However, while the promise of the cloud remains essential — not just for data workloads but also for network virtualisation and B2B offerings — the sheer volume and scale of data in the industry require careful management of the “journey to the cloud.”.

Cloud 78
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

A Data Science Portfolio That Will Land You The Job in 2022

KDnuggets

Check out this article on crafting a data science portfolio that will get you that job. And learn 4 resume mistakes to avoid at any cost.

Portfolio 156
article thumbnail

Public SQL Endpoints in Rockset

Rockset

Introduction Making use of real-time data for analytics is a deeply collaborative project. We’ve helped data engineers, data architects, engineering leaders, ML teams, and product managers connect the dots between various systems to deliver on Rockset’s promise of fast queries on fresh data. Not only are we collaborating with customers on analytics projects, we use our own product daily and collaborate across teams internally.

SQL 52
article thumbnail

Data and Analytics Keep the Wheels on the Bus!

Teradata

The complexity of modern vehicles means that spotting root-causes that prevent them from working is difficult. Mechanics, operators & OEMs must step into a new era of digital data-based diagnostics.

Data 52
article thumbnail

Cybersecurity: A Big Data Problem

Cloudera

Information technology has been at the heart of governments around the world, enabling them to deliver vital citizen services, such as healthcare, transportation, employment, and national security. All of these functions rest on technology and share a valuable commodity: data. . Data is produced and consumed in ever-increasing amounts and therefore must be protected.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

5 Free Courses to Master Calculus

KDnuggets

Calculus is one of the foundational pillars of understanding the mathematics behind machine learning algorithms. The post shares five free courses to help you master calculus and learn its real-world applications.

Algorithm 116
article thumbnail

Building Real-Time Recommendations with Kafka, S3, Rockset and Retool

Rockset

Real-time customer 360 applications are essential in allowing departments within a company to have reliable and consistent data on how a customer has engaged with the product and services. Ideally, when someone from a department has engaged with a customer, you want up-to-date information so the customer doesn’t get frustrated and repeat the same information multiple times to different people.

Kafka 52
article thumbnail

Hypothesis Testing: A Step-by-Step Guide With Easy Examples

U-Next

Introduction . When we hear the word ‘hypothesis,’ the first thing that comes to our mind is a kind of theory. Assuming and explaining theories is a fundamental part of Business Analytics. In the past few years, the field of Business Analytics has proliferated and made several advancements. As the number of people interested in its statistical applications in business has increased, the concept of hypothesis testing has grabbed everyone’s attention.

article thumbnail

Using Kafka Connect Securely in the Cloudera Data Platform

Cloudera

In this post I will demonstrate how Kafka Connect is integrated in the Cloudera Data Platform (CDP), allowing users to manage and monitor their connectors in Streams Messaging Manager while also touching on security features such as role-based access control and sensitive information handling. If you are a developer moving data in or out of Kafka, an administrator, or a security expert this post is for you.

Kafka 73
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

KDnuggets Top Posts for September 2022: Free Python for Data Science Course

KDnuggets

Free Python for Data Science Course • 7 Machine Learning Portfolio Projects to Boost the Resume • Free Algorithms in Python Course • How to Select Rows and Columns in Pandas • 5 Data Science Skills That Pay & 5 That Don't • Everything You’ve Ever Wanted to Know About Machine Learning • Free SQL and Database Course • 7 Data Analytics Interview Questions & Answers.

article thumbnail

React SEO: How To Optimize React Websites for SEO

Trio

React enables much of the modern web you’re familiar with: fluid, responsive, and animation-rich websites. It’s no wonder that React.js is the most used JavsScript framework for web development, according to the 2021 State of JavaScript survey.

article thumbnail

What Is Data Collection? Methods, Types, Tools, and Techniques

U-Next

Introduction . The primary goal of data collection is to gather high-quality information that aims to provide responses to all of the open-ended questions. Businesses and management can obtain high-quality information by collecting data that is necessary for making educated decisions. . It is necessary to gather data to draw conclusions and decide what is factual to increase the quality of the information. .

article thumbnail

Apache Hop 2.1.0 is available

know.bi

The Apache Hop team just released version 2.1.0. This new release is the result of four and a half months of work on over 200 tickets and comes packed with new functionality, bug fixes and improvements.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Implementing Adaboost in Scikit-learn

KDnuggets

It is called Adaptive Boosting due to the fact that the weights are re-assigned to each instance, with higher weights being assigned to instances that are not correctly classified - therefore it ‘adapts’.

IT 112
article thumbnail

PostgreSQL vs. MySQL: 10 Key Differences 

Meltano

PostgreSQL and MySQL are among the most popular open-source relational database management systems (RDMS) worldwide. Both RDMS enable businesses to organize and interlink large amounts of data, allowing for effective data management. For all of their similarities, PostgreSQL and MySQL differ from one another in many ways. In this PostgreSQL vs. MySQL comparison, we analyze crucial differences between the two database management systems to discover how they work and when to use them.

article thumbnail

Building Properties with AWS Step Functions

Booking.com Engineering

Developers love new technologies and are always eager to try things hands on. As a community, we thrive on problems that pave the way to new learning opportunities. Maintaining the operations of a legacy system is tedious. Developing a new feature on top of this is another challenge altogether. This post is about one such legacy process that is responsible for creating new properties on our platform, namely PropertyBuilder.

AWS 52
article thumbnail

Stronger together: Python, dataframes, and SQL

dbt Developer Hub

For years working in data and analytics engineering roles, I treasured the daily camaraderie sharing a small office space with talented folks using a range of tools - from analysts using SQL and Excel to data scientists working in Python. I always sensed that there was so much we could work on in collaboration with each other - but siloed data and tooling made this much more difficult.

SQL 52
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.