Starbucks Offer Analysis

Overview

As part of the Udacity Machine Learning Nanodegree, I completed a project exploring how machine learning can optimize targeted advertising. The digital revolution has flooded the advertising world with data, creating both opportunities and challenges for marketers. Inefficient ad spend and irrelevant ads reaching the wrong audience are significant problems that targeted advertising aims to solve.[1]

This project explores how machine learning can optimize targeted advertising using simulated customer data from Starbucks, similar to what they might collect from their mobile app. By analyzing how customers react to different offers, we can build predictive models that uncover hidden patterns in customer behavior. These models can continuously refine marketing efforts and improve the effectiveness of advertising campaigns.[2]

In this project, I compare the performance of traditional machine learning algorithms from the Scikit-Learn library with deep learning neural networks from PyTorch to determine the best approach for predicting customer responses in this context. The ultimate goal is to provide a practical example of how machine learning can help create more effective and efficient targeted advertising.

View my Udacity Machine Learning Nanodegree Certification

Objective

This project aims to optimize the efficiency of Starbucks' mobile app offers using machine learning. By analyzing simulated transaction data, the goal is to identify the most relevant offers for each customer, maximizing engagement and purchases while reducing unnecessary ad spend.

Starbucks regularly sends a variety of offers to its mobile app users, including discounts, BOGO (Buy One Get One Free) deals, and informational messages. Each offer has a validity period, and users receive different offers at different times. Our dataset includes user demographics, transaction details, and whether offers were viewed—key information for understanding their impact. The objective is to predict which offers customers are most likely to engage with and complete before the expiration date.

Offers are categorized as successful or unsuccessful based on customer actions:

  • Discount and BOGO Offers: A successful discount or BOGO offer is one where the following sequence occurs within the offer's validity period:
    1. Offer Received
    2. Offer Viewed
    3. Transaction(s) Completed (meeting the minimum spend/quantity required by the offer)

    Successful discount and BOGO flow

    Figure 1: Flowchart of a successful discount or BOGO offer. All steps must occur within the offer validity period.


  • Informational Offers: A successful informational offer follows this sequence within the offer's validity period:
    1. Offer Received
    2. Offer Viewed
    3. Transaction Completed

    Successful informational flow

    Figure 2: Flowchart of a successful informational offer. All steps must occur within the offer validity period.


    These sequences are designed to ensure the customer was aware of the offer and that it influenced their purchase decision. Other events can occur between these steps without disqualifying an offer as successful. For example, a customer might receive an informational offer, make a transaction, view the offer, and then make another transaction. If all these events happen within the validity period, it's still considered a successful offer.

  • Unsuccessful Offers:
  • Offers that do not meet the above criteria are considered unsuccessful. Examples include:

    • Customers never viewed the offer.
    • Customers viewed the offer but didn't make a transaction.
    • Customers didn't make enough transactions to meet the offer's requirements (for discount or BOGO).
    • Customers completed the offer before viewing it.

By building and comparing machine learning models, this project aims to predict successful offer completions, enhancing the customer experience and optimizing advertising strategies.

Analysis

Datasets Overview

The data sets are provided in JSON files by Udacity. They contain simulated data that mimics Starbucks customer data as briefly touched upon in the previous section. Three files are provided:

  • portfolio.json – Offer IDs and meta information about each offer.
  • profile.json – Demographic data for each customer.
  • transcript.json – Records of transactions and information on offers viewed, received, or completed.

portfolio.json

This data set describes the different offers Starbucks sends to users of its app. It contains 10 rows and 6 columns.

  • reward (int) - Reward given for completing an offer.
  • channels (list of strings) – Medium used (email, mobile, social, web).
  • difficulty (int) - Minimum required spend to complete an offer.
  • duration (int) – Number of days before an offer expires.
  • offer_type (string) - Type of offer (BOGO, discount, informational).
  • id (string) - Offer ID.

profile.json

This data set contains the distinct users and some basic demographic data. It contains 17000 rows and 5 columns.

  • gender (str) - Gender of the customer (Male, Female, Other).
  • age (int) - Age of the customer.
  • id (str) - Customer ID.
  • became_member_on (int) - Date when the customer created an app account.
  • income (float) - Customer's income.

transcript.json

This data set contains all the offer events and all the transactions that occur. It contains 306534 rows and 4 columns.

  • person (str) - Customer ID.
  • event (str) - Record description (transaction, offer received, offer viewed, offer completed).
  • value (dict of strings) - Either an offer ID or transaction amount, depending on the record.
  • time (int) - Time in hours since the start of the test.

The combination of these datasets provides a comprehensive view of customer behavior, enabling analysis to determine which offers are most effective for various customer demographics and behaviors.

Data Exploration

Portfolio Data

The portfolio contains ten offers: four discounts, four BOGOs (buy-one-get-one-free), and two informational offers. Non-informational offers have difficulty (cost) ranging from $5 to $20 and rewards from $2 to $10, with durations between 3 and 10 days. Since all offers use email as a communication channel, this data won't be useful for analysis and will be removed during preprocessing.

Profile Data

The dataset contains 17,000 unique customer IDs, which is ideal for joining with other data. However, a plot of customer ages revealed an anomaly: a large number of customers with an age of 118 (Figure 3). Further investigation showed that these rows also had missing gender and income data, suggesting that 118 was used as a placeholder for missing values. After removing these 2,175 rows (13% of the data), the age distribution appears more realistic (Figure 4).

Distribution of Customer Ages Before Removing Null Values

Figure 3: Distribution of customer ages before removing null values. The large spike at 118 represents missing data.

Distribution of Customer Ages After Removing Null Values

Figure 4: Distribution of customer ages after removing null values (age = 118). The distribution appears more realistic after data cleaning.

The dataset shows a significant gender imbalance, with more males than females (Figure 5). There are also a few customers who identify as neither male nor female. While this gender imbalance is noticeable, it shouldn't pose an issue for our analysis.

Customer Gender Ratio

Figure 5: Pie chart showing the distribution of customer genders in the dataset.

The income distribution appeared as expected. I also examined it by gender, as shown in Figure 6 below, to confirm it still appeared as expected. While there are some interesting patterns, nothing indicates bad data. Therefore, I do not need to do any filtering based on income.

Customer income distribution histogram, broken down by gender, showing the expected income patterns for both male and female customers.
Figure 6: Histograms of customer income, broken down by gender.

Examining the relationship between age and income, we see a positive correlation, although the trend isn't as smooth as expected (Figure 7). This suggests the sample data might not perfectly reflect real-world patterns.

Average Customer Income by Age

Figure 7: Line graph showing the average customer income by age.

The distribution of customer account creation dates skews heavily towards recent dates, with a peak in late 2017 (Figure 8). Again, this hints that the sample data might deviate from real-world scenarios.

Customer Account Creation Date Distribution

Figure 8: Histogram showing the distribution of customer account creation dates.

Transcript Data

The transcript data contains 306,534 rows and 17,000 unique customers (the same ones from the profile data), with no null values. Each customer appears an average of 18 times across the 30-day period. Analyzing event times reveals that "offer received" events cause spikes in activity, followed by peaks in "offer viewed" and "offer completed" events (Figure 9).

Event Counts by Time

Figure 9: Histograms showing the distribution of event types over the 30-day period.

Figure 10 shows the breakdown of event types. "Transactions" are the most frequent, followed by "offers received", "offers viewed", and "offers completed".

Event Type Distribution

Figure 10: Pie chart showing the proportion of each event type in the transcript data.

Analyzing the transaction amounts (from the parsed `value` column) reveals a wide range, with most under $50, but also some very large and very small transactions. While the large transactions are likely valid, the very small ones (under $0.25) might be erroneous and could skew the analysis of informational offers. These small transactions will be filtered out during preprocessing.

Figure 11 shows that while all ten offer types are sent out equally, their view and completion rates differ.

Offer Ratios by Event Type

Figure 11: Pie charts showing the proportion of each offer type, broken down by event type (received, viewed, completed).

Transcript and Portfolio Data

To identify relationships between offer characteristics and customer response, I joined the transcript and portfolio data.

I found that the number of channels used to communicate an offer is highly correlated with the likelihood of an offer being viewed (0.85), and this correlation is statistically significant. However, the correlation with offer completion is weaker (0.56).

Interestingly, the duration of an offer has a weak negative correlation with views. This is unexpected, as I anticipated a positive relationship. Further analysis suggests that the communication channel might be more important than duration in influencing views. For example, the discount offer with the lowest view rate was only sent via two channels (not mobile or social). The correlation between duration and offer completion is weak but positive. Difficulty is moderately negatively correlated with offer completion (-0.49), as expected.

The social channel shows the strongest correlation with offer views (0.96), with a statistically significant p-value. However, there is no strong correlation between the social channel and offer completion. Surprisingly, the email channel has a higher correlation with offer completion (0.43) than the social channel does (0.31).

The correlation between an offer being viewed and an offer being completed is modest (0.42) and not statistically significant. This suggests that higher view rates don't necessarily translate to higher completion rates. While social media might be effective for generating views, more targeted channels like mobile or email might be better for driving conversions.

Algorithms and Techniques

To process the data and maximize offer efficiency, I'll use a linear neural network from PyTorch. While neural networks have proven successful in various domains like handwriting and speech recognition, their performance in classification tasks can sometimes be less optimal than traditional methods, depending on the complexity of the problem.[3] This project will explore whether a linear neural network can outperform classic classification models on this specific dataset.

Benchmark Model

To compare the performance of the neural network, I'll establish a benchmark model using a machine learning algorithm from the sklearn library. I'll evaluate several algorithms, including Gaussian Naive Bayes, Support Vector Classifier (SVC), AdaBoost, Random Forest, and Decision Tree, to identify the optimal one for this classification problem. The chosen model will then be optimized through several refinement iterations, which will involve feature engineering, PCA analysis to reduce dimensionality, and hyperparameter tuning using GridSearchCV. The optimized benchmark model will then be compared to the neural network (also optimized in a similar manner) to assess their predictive accuracy.

Evaluation Metrics

Model performance will be evaluated using a combination of accuracy, precision, and recall. The choice of the primary metric will depend on the balance of classes in the dataset. If there's significant class imbalance, the F1-score (which combines precision and recall) will be a more suitable measure than accuracy alone.

Furthermore, the business context suggests that optimizing for recall (true positive rate), while maintaining a good balance with overall accuracy and F1-score, might be the most beneficial approach. This is because Starbucks likely prioritizes maximizing the successful offers (true positives) while minimizing wasted advertising spend and customer dissatisfaction due to irrelevant offers (false negatives). Based on my initial analysis and the business context, I believe an F1-score of around 70% would represent a significant improvement over baseline performance and achieve a good balance between maximizing successful offers and minimizing wasted resources.

Methodology

Data Preprocessing

Filtering

The first step of the data preprocessing was removing the profile data that contained mostly null values. Before doing that, I wanted to make sure I wasn't removing too much of the overall data. If I were to find that I was removing too much, I would have had to consider alternative strategies, such as populating the empty values with estimated data. However, this would have been problematic because age, income, and gender were all missing for these customers, making any estimations very crude.

I had already found that 12.8% of the customers contained empty values, but I wanted to find out how much of the overall transcript data was associated with these customers. I joined the transcript and profile data to calculate this percentage. I found that 11.0% of the total events were associated with these customers and only 10.8% of transactions. Thankfully, these null-value customers were less prolific spenders than the rest. Since I would be losing slightly less data than expected by removing these values, I decided to delete them.

I also deleted the aforementioned 'email' channel. All offers contain 'email,' so it wouldn't be informative for the model. Removing it makes things clearer and helps reduce the number of features.

The only other data I wanted to delete was the transactions for less than 25 cents, for reasons explained earlier regarding informational offers. To do this, I needed to unpack the value column in the transcript data. I removed these values and later checked the impact of this decision on my classification of the informational offers. I found that only 53 offers moved from successful to unsuccessful. This is about 0.34% of all the informational offers. I was satisfied that I wasn't changing the classes too much, and I believe it will make a small improvement to the algorithm. I also confirmed that the classification of BOGO and discount offers didn't change, as expected, because their classification logic doesn't directly rely on transactions.

Classification

Determining the classes for the machine learning models was one of the most complex aspects of this project, due to the possibility of overlapping offers and edge cases. To address this, I first added an expiry time for each offer, calculated from the offer's start time and duration. Any offer completed before its expiry would be considered successful.

To handle overlapping offers, I implemented a series of forward and backward fills, ensuring that transactions were associated with the most recently viewed offer. For instance, if a customer received two offers but only viewed the first one, all subsequent transactions would be attributed to that first offer. This required careful consideration of the order of events and the potential for interactions between different offers.

I also filled forward expiry times from "offer received" events to "offer viewed" and "transaction" events, ensuring that only viewed offers influenced transaction classifications. This process involved multiple steps and careful validation to ensure accuracy.

To verify the logic, I conducted spot checks on edge cases, such as the one illustrated in Figure 12, where a transaction is correctly associated with an earlier viewed offer rather than a more recent unviewed offer. This edge case is particularly interesting because it includes an offer being completed, but it will be flagged as unsuccessful because the more recent offer was never viewed. This case highlights the importance of careful consideration of the order of events and the potential for interactions between different offers.

Edge case spot check showing an informational offer being correctly prioritised

Figure 12: Example of correctly classified overlapping offers. The informational offer received at time 336 and viewed at time 372 is correctly associated with the transaction at time 414.

Transactions that have an offer_id means that a customer has viewed an offer prior to making a purchase. Informational offers with a transaction with a time less than expiry can now be classed as a successful event. It will have an expiry value associated with the last viewed offer, so it is safe to make this determination. Discount and BOGO offers that have been completed at a time less than it's expiry time can also now be classes as a successful event. It will also have an expiry value associated with the last viewed offer. So, I don't need to explicitly check again that it has been viewed prior to completion.

To simplify the classification process, I split the data into informational and non-informational offers. While not strictly necessary, this improved clarity and made it easier to verify the logic for each offer type.

Classifying Informational Offers

To classify informational offers, I marked transactions occurring before the expiry time as successful, using a new column called `successful_offer` with values of 1 for successful and 0 for unsuccessful offers.

To propagate this classification back to the corresponding "offer received" events, I used a two-step process. First, I backfilled the `successful_offer` values to the "offer viewed" events. Then, I used the shift function to assign the `successful_offer` value to the "offer received" event immediately preceding the "offer viewed" event. This approach avoids incorrectly marking earlier "offer received" events as successful in cases where a user receives the same offer multiple times and only the later ones lead to a successful transaction.

For example, consider the following sequence of events with a successful transaction:

  1. Received
  2. Received
  3. Viewed
  4. Received
  5. Transaction = True

In this case, only the second "received" offer should be marked as successful. My two-step process ensures this by first marking the "viewed" offer as successful and then using the shift function to mark only the immediately preceding "received" offer.

Finally, I selected only the "offer received" events, resulting in a dataset of accurately classified informational offers for use in the machine learning models.

Classifying Discount and BOGO Offers

Similar to informational offers, I marked completed offers as successful if they occurred before their expiry time, using the `successful_offer` column.

One edge case I encountered involved a single transaction completing two offers simultaneously (Figure 13). In such cases, I decided to mark both offers as successful, as it's unclear which offer was the primary driver of the transaction. It's possible that the customer was influenced by both offers or was already planning to make a purchase that would complete both offers.

Edge case spot check showing two offers being completed by a single transaction
Figure 13: Example of a single transaction completing two offers simultaneously.

To classify these offers, I used a similar approach as with the informational offers, using a combination of fill and shift functions to associate the completed offer with the correct offer received event. For example, the following event sequence:

  1. Received
  2. Received
  3. Viewed
  4. Received
  5. Transaction
  6. Completed = True

would result in only the second "received" offer being marked as successful:

  1. Received = False
  2. Received = True
  3. Viewed = True
  4. Received = False
  5. Transaction
  6. Completed = True

Finally, I selected only the "offer received" events and combined this data with the classified informational offers from the previous section.

The Classes

After filtering for "offer received" events, I analyzed the distribution of positive and negative classes (Figure 14). The classes are slightly imbalanced, but not significantly.

Distribution of Offer Classes

Figure 14: Bar chart showing the number of successful (positive) and unsuccessful (negative) offers after filtering for 'offer received' events.

Data Preparation

All the categorical data must be converted to binary data for the machine learning algorithms. The channel column contained nested lists so I used the pandas explode function and assigned each value to be 1. I then pivoted the data and joined it back onto the portfolio table. These steps are similar to how one-hot encoding works. The other categorical fields were more straightforward and I was able to directly use pandas one_hot function to convert the data to binary columns.

I also had to convert the became_member_on field to a number. I finally renamed some field names and deleted unneeded fields. The data was now prepared for the implementation of the algorithms.

Implementation

Benchmark Model

First I will build and refine my benchmark model. I set my features to be reward, difficulty, duration, mobile, social, web, bogo, discount, informational, age, became_member_on, income, female, male and other. I set the label to be successful_offer.

I'll use sklearn's `QuantileTransformer` to scale the features. This method is robust to outliers and skewed data, which are present in some of the features. To avoid data leakage, I'll apply this transformation within a pipeline while splitting the data into training (85%) and testing (15%) sets. This split allows for sufficient training data while reserving an adequate portion for testing. The same testing data will be used for both the benchmark and neural network models to ensure a fair comparison. (The neural network model will use a 70/15/15 split for training, validation, and testing.)

To select the best algorithm for this classification problem, I tested several common sklearn models: `DecisionTreeClassifier`, `AdaBoostClassifier`, `RandomForestClassifier`, `GaussianNB`, and `SVC`. The `RandomForestClassifier` significantly outperformed the others in terms of both accuracy (0.6976) and the F1-score for successful offers (0.6352) (see Figure 15 for the confusion matrix). This model's weighted F1-score is 0.6958. I'll now focus on optimizing this model, aiming to increase the recall (true positive rate) while maintaining a good overall accuracy and F1-score.

Confusion Matrix for Initial RandomForestClassifier

Figure 15: Confusion matrix for the initial RandomForestClassifier model using default parameters.

Benchmark Refinement

The first approach I took to refining my model was to explore feature reduction. It is possible that too many features are causing my model to overfit the training data and is therefore weaker at predicting the classes within the testing data. First, I checked the importance of each feature in my RandomForestClassifier model, as can be seen in Figure 16 below. I see that age, income, and became_member_on are the most important features. I also see that social plays a relatively significant part. This tallies with what I found when doing some correlations in the Data Exploration section.

Feature Importance in RandomForestClassifier

Figure 16: Feature importance plot for the initial RandomForestClassifier model.

To explore further dimensionality reduction, I performed Principal Component Analysis (PCA). I obtained the eigenvalues and plotted them on a scree plot. A common method for deciding how many principal components to use is to use all the components before an elbow in a scree plot.[4] These first six components represent 0.78 of the variance, as illustrated in the second image in Figure 17 below. However, after retraining the `RandomForestClassifier` using only these four components, I found that both accuracy (0.6779) and the F1-score (0.6155) were worse than before. Therefore, I decided against using PCA for this dataset.

PCA Scree Plot and Explained Variance

Figure 17: Scree plot of eigenvalues and cumulative explained variance for PCA, suggesting the first four principal components capture most of the variance in the data.

Next, I used sklearn's `GridSearchCV` function to try and find the optimal hyperparameter settings for the `RandomForestClassifier`. I selected a range of hyperparameters and ran `GridSearchCV` using `StratifiedKFold=10` to tune the model for maximum recall. However, I discovered that the best parameters for recall were very similar to the default parameters, and not much improvement was to be found.

Since my initial refinement strategies weren't yielding significant improvements, I decided to go back to the data set and try to extract additional features. I thought that using information prior to an offer being sent/received would prove beneficial. I calculated the total spend of each customer by the time an offer was received, using the pandas `cumsum` function and then forward-filling to each offer received. I also counted the total number of individual transactions each customer made prior to an offer being received. Then, I also counted the total number of previous offers they received, viewed, and completed prior to each additional offer being received. This was done by one-hot encoding the events and then applying the same cumulative sum and forward-fill approach as before.

After incorporating these new features, I retrained the `RandomForestClassifier` with its default parameters and finally saw some improvement. Both the accuracy (0.7053) and the F1-score (0.6373) increased slightly. I checked the feature importance of this new data set to see if my new features were prominent, and they did appear to play an important part, as can be seen in Figure 18 below.

Feature Importance with New Features

Figure 18: Feature importance plot for the RandomForestClassifier model after adding new features.

To address potential overfitting from the new features, I tried PCA again, but it still decreased performance. So, I created a correlation matrix (Figure 19) and observed strong correlations between `completed`, `received`, and `viewed`. I decided to keep only `received`, as it had the highest feature importance among the three. This improved accuracy and F1-score, suggesting that the other two features were contributing to overfitting.

Correlation Matrix of New Features

Figure 19: Correlation matrix of the newly engineered features, showing strong correlations between completed, received, and viewed.

Finally, I used `GridSearchCV` to tune the hyperparameters of the model with the refined features. I found that optimizing for recall resulted in the best performance when `max_depth` was set to 20 and `n_estimators` to 300. This gave a recall score of 0.6139 and an overall accuracy of 0.7151. The confusion matrix for this refined model is shown in Figure 20.

Confusion Matrix for Refined Benchmark Model

Figure 20: Confusion matrix for the refined RandomForestClassifier model using the new features and tuned hyperparameters.

Neural Network Model

For this model, I used a linear neural network with the same preprocessed dataset (including the new features) and `QuantileScaler` as the benchmark model. I used a 70/15/15 split for training, validation, and testing, and converted the data to PyTorch tensors, ensuring proper shuffling and a similar class distribution across the splits.

The network architecture consists of three hidden layers with 256 neurons each. I used the Adam optimizer with a learning rate and weight decay of 0.0001, a dropout factor of 0.1, and weighted the cross-entropy loss based on class frequencies in the training set. Initially, I trained for 200 epochs but observed that validation performance plateaued after 60 epochs. To avoid overfitting, I implemented early stopping by selecting the model with the best validation performance and minimal loss after each epoch. The final model achieved an accuracy of 0.7148, comparable to the benchmark model. However, the recall for the positive class was lower, at 0.5398 (Figure 21).

Confusion Matrix for Initial Neural Network Model

Figure 21: Confusion matrix for the initial linear neural network model.

Neural Network Refinement

To improve the model's recall while maintaining accuracy, I adjusted the weights in the cross-entropy loss function to give more emphasis to the positive class. After experimenting with different weights, I settled on 1.35, which resulted in a recall of 0.6326 and an accuracy of 0.7210. The confusion matrix for this refined model is shown in Figure 22.

Confusion Matrix for Refined Neural Network Model

Figure 22: Confusion matrix for the refined linear neural network model.

Results

Results

Model Evaluation and Results

Both the benchmark Random Forest model and the refined neural network were evaluated on the same unseen test data (15% of the entire dataset). To assess their performance, I used a combination of accuracy, precision, and recall, calculated using sklearn's metric functions.

As mentioned earlier, the primary focus was on achieving a high recall score for the positive class (successful offers), as this is most important to Starbucks. Recall measures the proportion of actual successful offers that were correctly identified. While maintaining good accuracy and F1-score is important, prioritizing recall aligns with Starbucks' likely goal of maximizing the successful offers, even if it means slightly increasing the risk of sending out some ineffective offers.

The refined neural network model slightly outperformed the benchmark model across all metrics (see table below). This is a noteworthy result, as it demonstrates the potential of neural networks to achieve comparable or even superior performance to classical machine learning models, even in scenarios where simpler models might be expected to perform well.

Metric Benchmark Model (Random Forest) Refined Neural Network
Accuracy 0.7151 0.7210
Recall 0.6139 0.6326
F1-score 0.7125 0.7191

Conclusion

This project was an open-ended exploration of machine learning for targeted advertising, with the specific objective emerging after thorough data exploration. This open-endedness presented some initial challenges, such as defining the scope of the project and determining the most appropriate evaluation metrics. One surprising outcome was the strong performance of the neural network in classifying the data, which initially wasn't anticipated.

The findings from this project have practical implications for real-world marketing campaigns. By leveraging machine learning, businesses like Starbucks can potentially improve the efficiency of ad spend, target offers more effectively, and increase customer engagement.

Final Thoughts

This project was a challenging but rewarding experience. The Udacity tutorials and support from the tutors were invaluable throughout the process. Through this project, I significantly deepened my understanding of data science and machine learning, motivating me to continue exploring this field.

Future Work

This project revealed several potential avenues for further exploration and improvement. For example, the new features based on past customer behavior were particularly effective, suggesting that additional features related to past interactions could yield even better results. Calculating the average time lag between offer receipt and viewing for each customer might provide valuable insights.

Another interesting direction would be to refine the classification of offers. Currently, the model classifies offers as "successful" or "unsuccessful," with an emphasis on correctly identifying successful offers (recall). However, a more nuanced approach could involve identifying "very unsuccessful" offers, where a customer completes an offer and receives a discount but would have purchased the product even without the offer. This would help Starbucks avoid unnecessary discounts and further optimize marketing spend.

References

  1. Choi, J.-A., & Lim, K. (2020). Identifying Machine Learning Techniques for Classification of Target Advertising. ICT Express, 175–180.
  2. Chen, Y., Kapralov, M., Canny, J., & Pavlov, D. Y. (2009). Factor Modelling for Advertisement Targeting. Advances in Neural Information Processing System, 324–332.
  3. Siswantoro, J., Prabuwono, A. S., Abdullah, A., & Idrus, B. (2016). A linear model based on Kalman filter for improving neural network classification performance. Expert Systems with Applications, 112–122.
  4. Zhu, M., & Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 918–930.

GitHub Repository

The complete project code and data can be found on GitHub.