April, 2021

article thumbnail

People in Data (my favorite for Q1-2021) : Taylor Brownlow (Head of data @ Count)

François Nguyen

This is my second article on “Why do you find Data so interesting after all these years ?” and my anwser is always “it is not about the subject, it is about the people”. A distinctive and instantly-recognizable style I was reading this article “ Is the Tableau Era Coming to an End? ” with no author and long before the conclusion I was telling to myself “looks like an article from Taylor Brownlow” It is clearly not easy with so many authors on the Data topic to have a dist

BI 130
article thumbnail

Writing memory efficient data pipelines in Python

Start Data Engineering

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Flipr: Making Changes Quickly and Safely at Scale

Uber Engineering

Introduction. Uber’s many software systems require a high volume of changes every day. Because of our systems’ size and complexity, it is a significant challenge to implement these changes without unintended consequences, ultimately slowing down developer productivity. Flipr is a … The post Flipr: Making Changes Quickly and Safely at Scale appeared first on Uber Engineering Blog.

article thumbnail

What’s New in Apache Kafka 2.8

Confluent

I’m proud to announce the release of Apache Kafka 2.8.0 on behalf of the Apache Kafka® community. The 2.8.0 release contains many new features and improvements. This blog post highlights […].

Kafka 138
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

article thumbnail

Drinking our own champagne – Cloudera upgrades to CDP Private Cloud

Cloudera

Like most of our customers, Cloudera’s internal operations rely heavily on data. For more than a decade, Cloudera has built internal tools and data analysis primarily on a single production CDH cluster. This cluster runs workloads for every department – from real-time user interfaces for Support to providing recommendations in the Cloudera Data Platform (CDP) Upgrade Advisor to analyzing our business and closing our books.

Cloud 121

More Trending

article thumbnail

How to gather requirements to re-engineer a legacy data pipeline

Start Data Engineering

Introduction Gathering requirements 0. Understand the current state of the data pipeline 1. Think like the end user 2. Know the why 3. End user interviews 4. Reduce the scope 5. End user walkthrough for proposed solution 6. Timelines & deliverables Deliver iteratively Conclusion Further reading References Introduction As data engineers, you will have to re-engineer legacy data pipelines.

article thumbnail

Making Customer Experience Your Competitive Advantage

Teradata

Customers expect organizations to know them, provide relevant & personalized experiences, and be good stewards of their data. Yet many businesses still struggle with this. Why?

Data 91
article thumbnail

How to Survive a Kafka Outage

Confluent

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka 134
article thumbnail

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Relationship intelligence will shape the workplace of the future

Cloudera

Our latest Influential Women in Data session featured Brenda Le Sueur from Cambridge Assessments. Brenda has worked across many organisations and continents, but what has always been crucial to her is relationships – how we cultivate them, how we nurture them and how they, in turn, define us. I sat down with Brenda to ask her about her journey as a woman in tech and understand more about the impact of relationships on our career.

article thumbnail

DataOps Enables Your Data Fabric

DataKitchen

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

article thumbnail

7 Things that Make SQLite Unique and Awesome

Grouparoo

I became very close with SQLite in the few weeks it took me to build out Grouparoo's SQLite plugin. Through that process I came to find that SQLite is not like the others. It has a handful of quirks, caveats, and gotchas when compared to other databases like MySQL and PostgreSQL. Here are seven of those quirks that I find most interesting: 1. SQLite is serverless SQLite doesn't require a separate process to run, as other databases do.

article thumbnail

Meet the New Analytics Superhero - The CFO

Teradata

The CFO’s broad remit & natural ownership of core financial data can provide the foundation for an enhanced role that leverages data analytics to enable new value opportunities.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Building the Confluent UI with React Hooks – Benefits and Lessons Learned

Confluent

Updating a fundamental paradigm in your React app can be as easy as search and replace, or at other times, as difficult as convincing your entire frontend engineering to buy […].

Building 125
article thumbnail

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand

Data Engineering Podcast

Summary "Business as usual" is changing, with more companies investing in data as a first class concern. As a result, the data team is growing and introducing more specialized roles. In this episode Josh Benamram, CEO and co-founder of Databand, describes the motivations for these emerging roles, how these positions affect the team dynamics, and the types of visibility that they need into the data platform to do their jobs effectively.

article thumbnail

Apache Ozone and Dense Data Nodes

Cloudera

This post was co-authored by two Cisco Employees as well: Karthik Krishna, Silesh Bijjahalli. Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it.

article thumbnail

10 Upcoming Data Science Platforms for Massive Disruption

DataKitchen

The post 10 Upcoming Data Science Platforms for Massive Disruption first appeared on DataKitchen.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

article thumbnail

Apple Migration Tips for M1 Macs

Grouparoo

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

article thumbnail

Debuting a Modern C++ API for Apache Kafka

Confluent

Morgan Stanley uses Apache Kafka® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. […].

Kafka 124
article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

Cloudera

As we continue to work toward diversity, equality, and inclusion in every aspect of our company culture and beyond, we’ve learned so much from our employees’ unique perspectives on allyship. One such employee is Suzy Tonini, a Talent Researcher with a globe-trotting childhood. Growing up with parents who worked for the U.S. State Department, Suzy had the opportunity to hop from country to country with her family, experiencing a variety of cultures. .

article thumbnail

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

84
article thumbnail

Reshaping the supermarket post-pandemic

Retail Insight

Social distancing and a life lived largely online have been the reality for over a year. But, as the world gradually emerges from lockdown, ha s the shape of retail really changed forever?

Retail 52
article thumbnail

Understanding Types with SQLite and Node.js

Grouparoo

Two fun facts about SQLite : The initial release was more than 20 years ago! It is the most widely used database (and likely one of the most widely deployed pieces of software). And here are a few of my opinions on SQLite: It's super cool. We don't talk about it enough. It's actually really easy to use (which is likely why it's so widely used).

Bytes 52
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Announcing ksqlDB 0.17.0

Confluent

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

article thumbnail

How to Approach Your Data Engineering Transformation

Silectis

Should you build your own tooling, take a “best of breed” approach, or buy a turnkey data engineering platform? We’ve got you covered. Data Engineering Platforms: Build, Best of Breed, or Buy? Every company wants to be data-driven. Modern organizations that thrive based on data have a common strength: a solid data engineering practice.

article thumbnail

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

article thumbnail

CFO Analytics - CFO of the Future

Teradata

As finance teams evolve into the providers of strategic insights, leveraging analytics will result in a new user base, new insights & reposition the CFO to a predictor of the future.

Finance 52
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.