Sat.Aug 10, 2019 - Fri.Aug 16, 2019

article thumbnail

How to Become More Marketable as a Data Scientist

KDnuggets

As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019.

Data 123
article thumbnail

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility? Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Digging Into Data Replication At Fivetran

Data Engineering Podcast

Summary The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most difficult tasks that we encounter. Fivetran is a platform that does the hard work for you and replicates information from your source systems into whichever data warehouse you use.

Media 100
article thumbnail

Top 10 Reasons to Attend Kafka Summit

Confluent

Yes, the other definition of event sourcing. 1. Keynotes from leading technologists. At Kafka Summit SF, you’ll get to hear incredible keynotes from leading technologists, including Jay Kreps and Neha Narkhede , original co-creators of Apache Kafka ®. In the past, we’ve featured Chris D’Agostino, James Watters, Martin Kleppmann, and Martin Fowler. This time around, we’re delighted to have Devendra Tagare , Engineering Manager of Streaming Platforms from Lyft and Chris Kasten , VP of Walmart Clou

Kafka 19
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Understanding Cancer using Machine Learning

KDnuggets

Use of Machine Learning (ML) in Medicine is becoming more and more important. One application example can be Cancer Detection and Analysis.

article thumbnail

How Human Growth Defines the Future of Digital Disruption

Teradata

Contrary to popular belief, in today's technology-enabled, digitally-disrupted world, it's the human element that matters the most in business. Read more!

More Trending

article thumbnail

Shoulder Surfers Beware: Confluent Now Provides Cross-Platform Secret Protection

Confluent

Compliance requirements often dictate that services should not store secrets as cleartext in files. These secrets may include passwords, such as the values for ssl.key.password , ssl.keystore.password , and ssl.truststore.password configuration parameters (as shown below), or any other sensitive data in the configuration files or log files. Here is a snippet from a properties file with standard SSL configurations that users often don’t want in cleartext: security.inter.broker.protocol=SSL

Kafka 12
article thumbnail

Statistical Modelling vs Machine Learning

KDnuggets

At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.

article thumbnail

The Power of Prioritization in Data Management

Teradata

Find out how the early architectural decisions surrounding the Teradata Database are still making a critical contribution to performance today. Read more!

article thumbnail

Tableau Operational Dashboards and Reporting on DynamoDB - Evaluating Redshift and Athena

Rockset

Organizations speak of operational reporting and analytics as the next technical challenge in improving business processes and efficiency. In a world where everyone is becoming an analyst , live dashboards surface up-to-date insights and operationalize real-time data to provide in-time decision-making support across multiple areas of an organization.

BI 40
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Kafka Connect Improvements in Apache Kafka 2.3

Confluent

With the release of Apache Kafka ® 2.3 and Confluent Platform 5.3 came several substantial improvements to the already awesome Kafka Connect. Not sure what Kafka Connect is or need convincing of its awesomeness? Didn’t realise that it’s part of Apache Kafka and solves all your streaming integration needs? Check out my Kafka Summit London talk: From Zero to Hero with Kafka Connect —and if you want to hear more talks like this, be sure to come to Kafka Summit San Francisco.

Kafka 20
article thumbnail

12 NLP Researchers, Practitioners & Innovators You Should Be Following

KDnuggets

Check out this list of NLP researchers, practitioners and innovators you should be following, including academics, practitioners, developers, entrepreneurs, and more.

123
123
article thumbnail

6 Key Concepts in Andrew NG’s “Machine Learning Yearning”

KDnuggets

If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.

article thumbnail

Learn how to use PySpark in under 5 minutes (Installation + Tutorial)

KDnuggets

Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.

Scala 24
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Command Line Basics Every Data Scientist Should Know

KDnuggets

Check out this introductory guide to completing simple tasks with the command line.

Data 122
article thumbnail

The Easy Way to Do Advanced Data Visualisation for Data Scientists

KDnuggets

Creating effective data visualisations is a core skill for data scientists. This tutorial will guide you through how to easily develop interactive visualisations using the Python library plotly.

Python 121
article thumbnail

Domain-Specific Language Processing Mines Value From Unstructured Data

KDnuggets

Processing unstructured text data in real-time is challenging when applying NLP or NLU. Find out how an alternative, called Domain-Specific Language Processing, can mine valuable information from data by following your guidance and using the language of your business.

article thumbnail

What is Poisson Distribution?

KDnuggets

An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.

IT 116
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How Concerned Should You be About Predictor Collinearity? It Depends…

KDnuggets

Predictor collinearity (also known as multicollinearity) can be problematic for your regression models. Check out these rules of thumb about when, and when not, to be concerned.

IT 115
article thumbnail

Pytorch Lightning vs PyTorch Ignite vs Fast.ai

KDnuggets

Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.

Python 111
article thumbnail

How Creating an AI Study Group Boosted My Skills and Got Me a Job

KDnuggets

The amount of time I had to put in to organize the AI Society left me sometimes sleep-deprived but it was definitely worth it. It was also one of the main factors why I got the job in Machine Learning after all. I hope that this article will inspire you to create your own AI study group!

article thumbnail

A 2019 Guide to Semantic Segmentation

KDnuggets

Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. We’ll now look at a number of research papers on covering state-of-the-art approaches to building semantic segmentation models.

Building 103
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

U. of Miami: Faculty Positions, with expertise in AI/Data Science/ML or related areas [Miami, FL]

KDnuggets

The positions require research and teaching expertise in AI/Data Science, or related areas including Data Extraction, Data Visualization, Machine Learning, and Intelligent Actuators.

article thumbnail

Data Driven Government – Speakers Highlights

KDnuggets

The lineup of experienced, thought-leading speakers at Data Driven Government, Sep 25 in Washington, DC, will explain how to use data and analytics to more effectively accomplish your mission, increase efficiency, and improve evidence-based policymaking.

article thumbnail

Postdoctoral position (2 years) in multivariate analysis and deep learning

KDnuggets

Help develop new e-science methods that fundamentally integrates Deep Learning and Multivariate analysis. The postdoc position is full-time for a period of two years.

article thumbnail

Introducing the Plato Research Dialogue System: Building Conversational Applications at Uber’s Scale

KDnuggets

While the process of building simple, domain-specific chatbots has gotten way easier, building large scale, multi-agent conversational applications remains a massive challenge. Recently, the Uber engineering team open sourced the Plato Research Dialogue System, which is the framework powering conversational agents across Uber’s different applications.

Systems 77
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

PhD student position in computational science with focus on chemistry

KDnuggets

Umea University, Sweden is seeking a PhD-student in computational science with focus on chemistry. The position is for 4 years of research including courses on graduate level.

article thumbnail

Top KDnuggets tweets, Aug 07-13: Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners To Follow

KDnuggets

Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners & Innovators You Should Be Following; Knowing Your Neighbours: Machine Learning on Graphs.

article thumbnail

Cambridge Analytica whistleblower Chris Wylie to headline Big Data LDN 2019 keynote programme

KDnuggets

Chris Wylie, the whistleblower who exposed Cambridge Analytica, will headline Big Data LDN 2019 programme, along with over 100 speakers at this free to attend event, Nov 13-14, London.

article thumbnail

Top Stories, Aug 5-11: Knowing Your Neighbours: Machine Learning on Graphs; What is Benford’s Law and why is it important for data science?

KDnuggets

Also: Deep Learning for NLP: ANNs, RNNs and LSTMs explained!; Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment; 25 Tricks for Pandas; Getting Started with Data Science; Data Science: Scientific Discipline or Business Process?

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m