January, 2020

article thumbnail

I wanna be a data scientist, but… how?

KDnuggets

It’s easy to say "I wanna be a data scientist," but. where do you start? How much time is needed to be desired by companies? Do you need a Master’s degree? Do you need to know every mathematical concept ever derived? The journey might be long, but follow this plan to help you keep moving forward toward your career goal.

Data 160
article thumbnail

Featuring Apache Kafka in the Netflix Studio and Finance World

Confluent

Netflix spent an estimated $15 billion to produce world-class original content in 2019. When stakes are so high, it is paramount to enable our business with critical insights that help […].

Finance 27
article thumbnail

Data Privacy and Why it Matters to Our Customers

Teradata

People want control over their personal data, but are also willing to trade it away for convenience. When does the exploitation of our data become unethical? Read more!

IT 115
article thumbnail

Engineering SQL Support on Apache Pinot at Uber

Uber Engineering

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform. . As Uber’s operations became more complex and we offered additional features and … The post Engineering SQL Support on Apache Pinot at Uber appeared first on Uber Engineering Blog.

SQL 112
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Pay Down Technical Debt In Your Data Pipeline With Great Expectations

Data Engineering Podcast

Summary Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to test, leading to a significant amount of technical debt which contributes to slower iteration cycles. In this episode James Campbell describes how he helped create the Great Expectations framework to help you gain control and confidence in your data delivery workflows, the challenges of validating and monitoring the quality and accuracy of your dat

article thumbnail

Simulating Cohorts

Grouparoo

In the last post , I made a case that the way to make the biggest difference in a metric like retention is to increase how many tests you can run each month. It turns out, going from 1 to 4 tests a month makes a huge difference, especially as those cohorts build on each other over time. To prove this out, I built a spreadsheet. Because I learned even more from creating the spreadsheet itself than writing the blog post, I thought I'd give those learnings some airtime, too.

More Trending

article thumbnail

Pipeline to the Cloud – Streaming On-Premises Data for Cloud Analytics

Confluent

This article show how you can offload data from on-premises transactional (OLTP) databases to cloud-based datastores, including Snowflake and Amazon S3 with Athena. I’m also going to take the opportunity […].

Cloud 27
article thumbnail

Analytics in the Hybrid Cloud – An Architect’s Perspective

Teradata

The hybrid cloud is not just a consideration, but for many of our customers, already a reality. Read more to learn best practices when considering a hybrid or multi-cloud environment.

Cloud 87
article thumbnail

Case Study: Standard Cognition Uses Rockset to Deliver Data APIs and Real-Time Metrics for Vision AI

Rockset

Walk into a store, grab the items you want, and walk out without having to interact with a cashier or even use a self-checkout system. That’s the no-hassle shopping experience of the future you’ll get at the Standard Store , a demonstration store showcasing the AI-powered checkout pioneered by Standard Cognition. The company makes use of computer vision to remove the need for checkout lines of any sort in physical retail locations.

Retail 40
article thumbnail

Replatforming Production Dataflows

Data Engineering Podcast

Summary Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering.

Kafka 100
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Shots You Get to Take

Grouparoo

At Grouparoo , we have been interviewing a lot of marketers. The overall learning is that it's a hard job. The biggest reason is that they need data to make their campaigns work and do not have the means to get that data. Basically, they need Engineers to prioritize writing code to get the data into the tool they are using. That rarely happens.

Coding 52
article thumbnail

Top 9 Mobile Apps for Learning and Practicing Data Science

KDnuggets

This article will tell you about the top 9 mobile apps that help the user in learning and practicing data science and hence is improving their productivity.

article thumbnail

Streams and Tables in Apache Kafka: Elasticity, Fault Tolerance, and Other Advanced Concepts

Confluent

Now that we’ve learned about the processing layer of Apache Kafka® by looking at streams and tables, as well as the architecture of distributed processing with the Kafka Streams API […].

Kafka 26
article thumbnail

Not Just SQL Anymore! Using R and Python with Vantage

Teradata

Learn about the different ways to use R and Python with Vantage and the pros and cons of each option. Read more from our Teradata expert.

Python 80
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

RocksDB Is Eating the Database World

Rockset

A Brief History of Distributed Databases The era of Web 2.0 brought with it a renewed interest in database design. While traditional RDBMS databases served well the data storage and data processing needs of the enterprise world from their commercial inception in the late 1970s until the dotcom era, the large amounts of data processed by the new applications—and the speed at which this data needs to be processed—required a new approach.

article thumbnail

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

Data Engineering Podcast

Summary The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth.

SQL 100
article thumbnail

Top 10 Technology Trends for 2020

KDnuggets

With integrations of multiple emerging technologies just in the past year, AI development continues at a fast pace. Following the blueprint of science and technology advancements in 2019, we predict 10 trends we expect to see in 2020 and beyond.

article thumbnail

The Book to Start You on Machine Learning

KDnuggets

This book is thought for beginners in Machine Learning, that are looking for a practical approach to learning by building projects and studying the different Machine Learning algorithms within a specific context.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Top 5 must-have Data Science skills for 2020

KDnuggets

The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market.

article thumbnail

A Comprehensive Guide to Natural Language Generation

KDnuggets

Follow this overview of Natural Language Generation covering its applications in theory and practice. The evolution of NLG architecture is also described from simple gap-filling to dynamic document creation along with a summary of the most popular NLG models.

article thumbnail

7 Resources to Becoming a Data Engineer

KDnuggets

An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.

article thumbnail

10 Python Tips and Tricks You Should Learn Today

KDnuggets

Check out this collection of 10 Python snippets that can be taken as a reference for your daily work.

Python 160
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Predict Electricity Consumption Using Time Series Analysis

KDnuggets

Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.

IT 160
article thumbnail

Why Python is One of the Most Preferred Languages for Data Science?

KDnuggets

Why do most data scientists love Python? Learn more about how so many well-developed Python packages can help you accomplish your crucial data science tasks.

article thumbnail

The 5 Most Useful Techniques to Handle Imbalanced Datasets

KDnuggets

This post is about explaining the various techniques you can use to handle imbalanced datasets.

Datasets 159
article thumbnail

7 Steps to a Job-winning Data Science Resume

KDnuggets

A resume plays a key role in bagging that dream data science job. We break down the nuances of a job-winning data science resume so that you can go ahead and transform your own resume.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Artificial Intelligence Books to Read in 2020

KDnuggets

Here are some AI-related books that I’ve read and recommend for you to add to your 2020 reading list!

article thumbnail

How to Convert a Picture to Numbers

KDnuggets

Reducing images to numbers makes them amenable to computation. Let's take a look at the why and the how using Python and Numpy.

Python 158
article thumbnail

The Data Science Interview Study Guide

KDnuggets

Preparing for a job interview can be a full-time job, and Data Science interviews are no different. Here are 121 resources that can help you study and quiz your way to landing your dream data science job.

article thumbnail

Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero

KDnuggets

This post presents a pipeline of building a KNN model in R with various measurement metrics.

Building 154
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.