November, 2020

article thumbnail

A Data Scientist in Engineering Wonderland

Team Data Science

As a data scientist, I always felt a missing link between my developed models and putting them in the production process. Yes, I can create a pipeline, write a model, get results, and interpret the results, but if I cannot scale it, these all will sit on my Jupiter notebooks. This thought led me to my data engineering adventure. I am confident that learning data engineering will make me a better data scientist.

article thumbnail

How to Pull Data from an API, Using AWS Lambda

Start Data Engineering

Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below My company does not have the budget to purchase a tool like fivetran, What should I use to pull data from an API ?

AWS 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 1)

Netflix Tech

Netflix is known for its loosely coupled and highly scalable microservice architecture. Independent services allow for evolving at different paces and scaling independently. Yet they add complexity for use cases that span multiple services. Rather than exposing 100s of microservices to UI developers, Netflix offers a unified API aggregation layer at the edge.

IT 144
article thumbnail

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety of queries both small and large; there are many circumstances where Impala queries small amounts of data; when end users are iterating on a use case, filterin

Metadata 143
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Analysing historical and live data with ksqlDB and Elastic Cloud

Confluent

Building data pipelines isn’t always straightforward. The gap between the shiny “hello world” examples of demos and the gritty reality of messy data and imperfect formats is sometimes all too […].

Cloud 139
article thumbnail

Streaming Data Integration Without The Code at Equalum

Data Engineering Podcast

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analytics the need to integrate that data in near real time is growing. With the improvements and increased variety of options for streaming data engines and improved tools for change data capture it is possible for data teams to make that goal a reality.

More Trending

article thumbnail

How to Make the Most of Big Data Analytics in Your Business

Teradata

Big data's growth and its impact on business is undeniable. But how do you make the most of your data analytics to create real business value? Find out more.

article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

Netflix Tech

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Everyone slows to a crawl, sometimes for a minor issue or sometimes for no reason at all.

article thumbnail

Announcing the 2020 Data Impact Award Winners

Cloudera

What a fantastic 24-hours it has been here at Cloudera. During the first-ever virtual broadcast of our annual Data Impact Awards (DIA) ceremony, we had the great pleasure of announcing this year’s finalists and winners. Streamed to hundreds of people around the globe, we were able to come together to celebrate some incredible successes. . In a year marked by unusual events, and disruption to our “normal” lives, it was a pleasure to recognize our customers’ most impressive achievements.

Medical 123
article thumbnail

Digital Transformation in Style: How Boden Innovates Retail Using Apache Kafka

Confluent

As a clothing retailer with more than 1.5 million customers worldwide, Boden is always looking to capitalise on business moments to drive sales. For example, when the Duchess of Cambridge […].

Retail 135
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Keeping A Bigeye On The Data Quality Market

Data Engineering Podcast

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of your data pipelines it can be overwhelming to stay up to date with how they all compare.

Hadoop 100
article thumbnail

The Journey Begins

Team Data Science

Week 1: 10/9/20 - 10/16/20 In my quest to further improve my overall data science skills, I pulled the trigger on October 9th, 2020, and enrolled in a Data Engineering boot camp lead by Andreas Kretz. First a little bit about myself. I have a background in Aerospace Engineering and have been in the industry for close to 15 years now. A little more than a year ago, I decided to pivot to Machine Learning and Data Science.

article thumbnail

Risk-Based Wealth Management: What the Insurance Industry Gets Wrong

Teradata

Product-centric processes degrade customer experience. Insurers must insulate consumers from internal & regulatory-driven controls by placing them in the center of the customer experience.

article thumbnail

Simple streaming telemetry

Netflix Tech

Introducing gnmi-gateway: a modular, distributed, and highly available service for modern network telemetry via OpenConfig and gNMI By: Colin McIntosh, Michael Costello Netflix runs its own content delivery network, Open Connect , which delivers all streaming traffic to our members. A backbone network underlies a large portion of the CDN, and we also run the high capacity networks that support our studios and corporate offices.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Veterans Day: What Service Means to Clouderan Vets

Cloudera

Around the world, a number of countries celebrate November 11 as a day to give thanks and recognition for their veterans. Originally designated to honor the end of World War I ( Armistice Day and Remembrance Day ), in some countries it is now used to pay respect to all veterans ( Veterans Day ). . Year after year, we use this time to express our support and appreciation to those who have served in the military.

article thumbnail

IBM and Confluent Announce Strategic Partnership

Confluent

I’m excited to announce a new strategic partnership with IBM. As part of this partnership, IBM will be reselling Confluent Platform, enabling its customers to leverage their existing IBM relationships […].

IT 128
article thumbnail

Self Service Data Management From Ingest To Insights With Isima

Data Engineering Podcast

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of impressive tools and products built to fulfill this mission, it is still an uphill battle to tie everything together into a cohesive and reliable platform.

article thumbnail

Branding Yourself

Team Data Science

Week 2: 10/16/20 - 10/23/20 Week 2 of the course consists of Modules 3 & 4. If you have not read my first blog go here. Module 3 focuses on creating a professional LinkedIn profile. Your LinkedIn profile is the world's access to you and how you want to be seen professionally. Below is a screenshot. So here, I have a professionally taken photograph, what I am interested in below, and the 'About' section that summarizes Me.in a professional sense.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Boost Your Customer Experience with Better Payment Conversions

Teradata

With digital payments on the rise, payment processing has become more complex. Fortunately, advanced data technologies can create better customer experience via streamlined payment processes.

article thumbnail

5 things you should know about Real-Time Analytics

A Cloud Guru: Data Engineering

Running analytics on real-time data is a challenge many data engineers are facing today. But not all analytics can be done in real time! Many are dependent on the volume of the data and the processing requirements. Even logic conditions are becoming a bottleneck. For example, think about join operations on huge tables with more […] The post 5 things you should know about Real-Time Analytics appeared first on A Cloud Guru.

article thumbnail

How a modern data platform supports government fraud detection

Cloudera

November 15-21 marks International Fraud Awareness Week – but for many in government, that’s every week. From bogus benefits claims to fraudulent network activity, fraud in all its forms represents a significant threat to government at all levels. Some experts estimate the U.S. government loses nearly 150 billion dollars due to potential fraud each year, McKinsey & Company reports.

article thumbnail

Use Cases and Architectures for HTTP and REST APIs with Apache Kafka

Confluent

This blog post presents the use cases and architectures of REST APIs and Confluent REST Proxy, and explores a new management API and improved integrations into Confluent Server and Confluent […].

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Building A Cost Effective Data Catalog With Tree Schema

Data Engineering Podcast

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs.

Building 100
article thumbnail

Building a Search Engine for Afterpay’s Shop Directory

Afterpay Tech

Photo by Markus Winkler on Unsplash By Jose Picado , Qiao Wang , and Yi Li Context Our Shop Directory is used by our consumers to discover stores, brands, and products. It is incredibly valuable for our retail partners: Almost 20 million referrals per month in the 2020 July - Sept quarter. The Shop Directory, available on the Web and our mobile app, contains nearly 64,000 stores, and each store sells 1000s of products.

article thumbnail

Connect Teradata Vantage to Salesforce Data With Azure Data Factory

Teradata

This "how-to" guide will help you to connect Teradata Vantage using the Native Object Store feature to query Salesforce data sourced by Microsoft Azure Data Factory.

Data 59
article thumbnail

Data Quality at Airbnb

Airbnb Tech

Part 2 — A New Gold Standard Authors: Vaughn Quoss, Jonathan Parks, Paul Ellwood Introduction At Airbnb, we’ve always had a data-driven culture. We’ve assembled top-notch data science and engineering teams, built industry-leading data infrastructure, and launched numerous successful open source projects, including Apache Airflow and Apache Superset.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

How insurers can better deliver at “The Moment of Truth”

Cloudera

It’s all about the Customer. Customers today expect services to be highly personalized. In a digital world tuned to understand your likes, dislikes, interests and preferences we expect a similar level of customization in all aspects of our lives. Insurance is no different. Insurance is not something the average consumer thinks about every day but when a life changing event happens, insurance becomes extremely important.

Insurance 115
article thumbnail

How Real-Time Stream Processing Safely Scales with ksqlDB, Animated

Confluent

Software engineering memes are in vogue, and nothing is more fashionable than joking about how complicated distributed systems can be. Despite the ribbing, many people adopt them. Why? Distributed systems […].

Process 122
article thumbnail

Add Version Control To Your Data Lake With LakeFS

Data Engineering Podcast

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. In order to address these challenges the team at Treeverse created LakeFS to introduce version control capabilities to your storage layer.

Data Lake 100
article thumbnail

Web Scraping Using R.!

Data Science Blog: Data Engineering

In this blog, I’ll show you, How to Web Scrape using R.? What is R.? R is a programming language and its environment built for statistical analysis, graphical representation & reporting. R programming is mostly preferred by statisticians, data miners, and software programmers who want to develop statistical software. R is also available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.