Sat.May 22, 2021 - Fri.May 28, 2021

article thumbnail

The Ethics of AI Comes Down to Conscious Decisions

Cloudera

This blog post was written by Pedro Pereira as a guest author for Cloudera. . Right now, someone somewhere is writing the next fake news story or editing a deepfake video. An authoritarian regime is manipulating an artificial intelligence (AI) system to spy on technology users. No matter how good the intentions behind the development of a technology, someone is bound to corrupt and manipulate it.

Algorithm 120
article thumbnail

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

Data Engineering Podcast

Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse platform that was built from the ground up for speed, and can work across clouds and all the way to the edge.

article thumbnail

Announcing ksqlDB 0.18.0

Confluent

We’re pleased to announce ksqlDB 0.18.0! This release includes pull queries on table-table joins and support for variable substitution in the Java client and ksqlDB’s migration tool. We’ll step through […].

Java 85
article thumbnail

My (Seemingly) Random Walk to Netflix

Netflix Tech

Part of our series on who works in Analytics at Netflix?—?and what the role entails By Sean Barnes, Studio Production Data Science & Engineering I am going to tell you a story about a person that works for Netflix. That person grew up dreaming of working in the entertainment industry. They attended the University of Southern California, double majored in data science and television & film production, and graduated summa cum laude.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The Four Upgrade and Migration Paths to CDP from Legacy Distributions

Cloudera

The move into any new technology requires planning and coordinated effort to ensure a successful transition. This blog will describe the four paths to move from a legacy platform such as Cloudera CDH or HDP into CDP Public Cloud or CDP Private Cloud. The four paths are In-place Upgrade, Side-car Migration, Rolling Side-car Migration, and Migrate to Public Cloud. .

Cloud 86
article thumbnail

Easily Build Advanced Similarity Search With The Pinecone Vector Database

Data Engineering Podcast

Summary Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works natively with vectors.

Database 100

More Trending

article thumbnail

Intelligent Document Processing: Technology Overview

AltexSoft

Whatever the industry, various documents accompany at least a quarter of business operations. Healthcare, for example, is filled with millions of patient records and medical forms. As far as transportation, these can be maintenance and driver logs. The documents often come in semi-structured and unstructured data formats, which makes them difficult to process quickly and accurately.

article thumbnail

Auditing to external systems in CDP Private Cloud Base

Cloudera

Cloudera is trusted by regulated industries and Government organisations around the world to store and analyze petabytes of highly sensitive or confidential information about people, healthcare data, financial data or just proprietary information sensitive to the customer itself. Anybody who is storing customer information, healthcare, financial or sensitive proprietary information will need to ensure they are taking steps to protect that data and that includes detecting and preventing inadverte

Systems 78
article thumbnail

Asynchronous APIs in CRM and marketing tools

Grouparoo

When integrating with Destinations , there are generally two main approaches made available by API providers: single or batched. With the "single" approach, one API request usually affects a single profile in the destination. The "batched" approach, which you can read more about here , allows you to affect multiple profiles in a single API request.

Process 52
article thumbnail

Data Transformations Using the Data Build Tool

Ripple Engineering

At Ripple , we are moving towards building complex business models out of raw data. To do this successfully, we need to automate our historically manual processes. Even with a digital-first approach, many of our internal processes were done by hand, making them great candidates to be automated. A prime example of this was the process of managing our data transformation workflows.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Off-shore, On-shore or Not Sure? How Data Can Help Solve the Shared Services Conundrum

Teradata

This post isn't an attack on off-shoring your Shared Service Center. Instead, it is to caution you against leaping in or out of off-shoring initiatives before you have the full picture.

Data 52
article thumbnail

Pushing Past Pilot Paralysis to Launch and Scale IIOT Use Cases

Cloudera

With billions of industrial IoT (IIOT) devices in place, generating massive volumes of data from “the edge,” the potential for proof of concept success for use cases in the factory can be paralyzing. While the value of this digital revolution, aka Industry 4.0, is clear, realizing the full promise has been slow. Research and real-life experience from Accenture shows that many manufacturers get stuck early on or can’t get beyond proof-of-concept pilots to scale.

article thumbnail

What is Azure Data Factory? A beginner’s guide to ADF

A Cloud Guru: Data Engineering

With Microsoft Build 2021 currently underway, what better time to take a beginner-friendly deep dive into Azure Data Factory. In this post, we’ll talk about what Azure Data Factory is, how to get started using it, and what you might use it for. Keep up with all things Azure in the ACG original series Azure […] The post What is Azure Data Factory?

Data 52
article thumbnail

Are MySQL columns names case sensitive?

Grouparoo

There is a debate among a very specific set of people about what case to use in SQL queries. This debate is made possible by the fact that, generally, it does not matter. I believed this to be true even about identifiers like columns names. For example, both of these queries returns the same data even though the "real" column is defined in lowercase.

MySQL 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Will Open Banking Enhance the Quality of Daily Life?

Teradata

Banking organizations embracing an interest in improving the quality of their customers’ lives will be rewarded with the sustained inspiration needed to anticipate & deliver personalized services.

Banking 52
article thumbnail

Session-based Recommender Systems

Cloudera

Recommendation systems have become a cornerstone of modern life, spanning sectors that include online retail, music and video streaming, and even content publishing. These systems help us navigate the sheer volume of content on the internet, allowing us to discover what’s interesting or important to us. The classic modeling approaches to recommendation systems can be broadly categorized as content-based, as collaborative filtering-based, or as hybrid approaches that combine aspects of the two.

Systems 63
article thumbnail

DS Building Blocks - A quick guide on experimentation for Non-Technical Users

DareData

Do you get overwhelmed when your data team rambles on about correlation, causality, A/B testing and other terms? Or you are a manager with some projects that include statistics and machine learning and you feel that you should contribute more on guiding your team to the correct outcome? These types of situations are common for business and non-technical users.

article thumbnail

Superset and Aws Athena Tutorial - Data Lake

Preset

Visualize your data lake using AWS Athena and Apache Superset™.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

We Rise as One in our Mission to Eradicate Racism

Teradata

Teradata reinforces its pledge to diversity, equity, and inclusion. We are committed to eradicate racism and expand diversity into all aspects of our business.

IT 52
article thumbnail

What's a Typical Data Scientist Career Path like in 2023?

ProjectPro

Is "becoming a data scientist" one of your resolutions for 2021? Data science careers have seen tremendous growth over the years. On top of commanding high data scientist salaries( average data scientist salary is $96501), data science beginners can expect growth opportunities to level up in their data science career as they upskill and gain experience.

article thumbnail

Compare and Contrast Search Indexing With Real-Time Converged Indexing

Rockset

Let's compare and contrast search indexing with real-time converged indexing and explain what converged indexing is, how it's similar, how it's different, how the architecture is set up, and then review some of the details of how it is different in terms of operations. When you talk about serverless systems and cloud-native systems, there's a huge advantage that we have in the cloud and we really want to spend some time talking about initial setup, in terms of day two operations.

MongoDB 40
article thumbnail

Watch: Generating Data for Story-Driven Demos

Silectis

At a recent DC Data Engineering Meetup , a community group Silectis created and sponsors, we had the pleasure of having Tim Tutt , CEO of Night Shift Development as our guest speaker. Through Tim’s presentation about using data engineering to enable data analytics through the masses, our teams started to notice how complementary our products are.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Business Processing Dashboards

FreshBI

Business Processing Dashboards Explained A Business Workflow is visual representation of steps through a business activity used to support and validate your Business Intelligence Objectives. Best Benefits of Business Processing Dashboards Improved Business Intelligence: Business Processing Dashboards will assist your business to reach extraordinary achievements based on your live dashboards.

Process 52
article thumbnail

8 Feature Engineering Techniques for Machine Learning

ProjectPro

“Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning is basically feature engineering.” — Prof. Andrew Ng. Data Scientists spend 80% of their time doing feature engineering because it's a time-consuming and difficult process. Understanding features and the various techniques involved to deconstruct this art can ease the complex process of feature engineering.

article thumbnail

What Is a Serverless Database and Why Use One

Rockset

The move to serverless has been a fast one. Of AWS users, over half have adopted Lambda , but serverless isn't just Lambda functions. Serverless is a way to utilize infrastructure to build applications and services without needing to provision or scale out servers. This can be an advantage when it comes to development because developers and engineers don’t need to manage as much in terms of infrastructure.

article thumbnail

Shorten time to critical insights with Streaming SQL

Cloudera

Data and analytics have become second nature to most businesses, but merely having access to the vast volumes of data from these devices will no longer suffice. Leading enterprises realize that the speed of data presents a new frontier for competitive differentiation. It is imperative for organizations to reduce time-to-insights to gain a competitive advantage by responding decisively to competitors, fine-tuning operations, and serving fickle customers. .

SQL 78
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Democratizing Data Through Search and Natural Language Processing in Cloudera Data Visualization

Cloudera

Since the release of Cloudera Data Visualization (DV) back in Oct 2020 , our primary mission has been to expand access to data analytics and predictive insights across enterprise businesses. Since that launch, we’ve worked tirelessly to deliver best-in-class data visualization, dashboarding, and predictive applications capabilities across our cloud and on-premises infrastructures through Cloudera’s machine learning and data warehousing products — all without additional costs, moving data or pur

Process 74