When you are shopping in the supermarket, do you ever notice how some items are placed next to each other? For example, do you notice that chips and soda are often placed next to each other? They are not placed there without reason — they have cold, hard data to back up the association of buying chips and soda together. And one powerful technique used to determine this association is Association Rule Mining (ARM).
Association Rule Mining (also known as Association Rule Learning) is a key concept in data mining that uncovers interesting correlations and relationships within data. It is used in recommendations of products, analysis of customer behaviour, and anomaly detection. At its core, association rule mining helps find the likelihood that follows from .
In this post, we’ll look at the key concepts involved in ARM, list out some algorithms used in ARM, and give an example of ARM using the mlxtend library.
Key Concepts
Let us start with some basic definitions.
Items and Itemsets
The most fundamental definition is for items.
An item is a binary attribute.
For example, an item in a shopping cart is a product — the shopping cart could either contain the item or not1. But it doesn’t have to be representing an actual product. As another example, if we think of the object as a book, an item could be whether the book is red, or blue, or green. Note that each of these colours is a separate item — we are just interested in the question “is the book red” or “is the book blue”.
Another important concept is a group of items.
An itemset is a collection of items. It is a subset of all the possible items.
In the shopping cart example, the set is one itemset (representing that the shopping cart contains milk, bread, and butter). For the book example, one possible itemset is (which represents the idea that the book is a red, hardback book printed in US trade size).
Transactions and Rules
A transaction is an itemset together with a unique transaction ID. The full list of transactions that we are considering is called the database.
For example, in the context of a supermarket, a transaction could be the products that the customer bought along with the invoice number. For the book example, the transaction could be the attributes of the book along with the International Standard Book Number (ISBN) of the book. In any case, the transaction should be uniquely identifiable within the database.
We can now define rules.
Suppose and are disjoint itemsets (i.e., and do not have a common item). An association rule denoted means that if a transaction contains , then the transaction also contains . The itemset is called the antecedent and is called the consequent.
For the shopping cart example, we could define the rule , which means that if a shopping cart contains both milk and bread, then it must contain cereal. Here the antecedent is that the shopping cart contains both milk and bread and the consequent is that the shopping cart also contains cereal.
Metrics
To measure the usefulness of the created association rule, we have a few metrics. To aid in the description of these metrics, consider the following database of transactions taken from Wikipedia:
| Transaction ID | Itemset |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 |
Let and be two disjoint itemsets, be an association rule, and be the database above.
Support
Support is an indication of the frequency of occurrence of either an itemset or an association rule within the database.
The support of an itemset is the number of transactions that contain all items in the itemset divided by the size of the database2. Formally, the support of an itemset is
where is the database containing transactions (with being the transaction ID and being the transaction itemset).
Consider the itemset . The number of transactions that contain is 3 (transactions 1, 2, and 4), so its support is (i.e., ). For the itemset , only transactions 1 and 4 contain all items in and thus its support is . In general, the larger the itemset , the smaller its support.
What about the support of an association rule?
The support of an association rule is the support of the itemset containing the union of items from and . Written another way,
Think of the support of an association rule this way: for a rule to be ‘supported’, the items in the antecedent and consequent must be present in the itemset of the transaction. This is the same thing as saying that the transaction must contain the items in the union of both antecedent and consequent itemsets, hence the definition.
For example, the support for the association rule is the support of , which is as we found earlier.
Confidence
The confidence of an association rule can be thought of as the likelihood that the association rule is actually true — when we see in a transaction’s itemset, how likely will also be there?
The confidence of an association rule is the proportion of transactions containing that also contain . Formally, the confidence of the association rule is
As an example, let’s find . One can calculate that . Earlier, we also found that , which means that the confidence is . This suggests that whenever fruit is bought, bread is also bought of the time. As another example, let’s calculate . We found earlier that . One can also see that , which means that the confidence is . This denotes the idea that every time a customer buys milk, they also buy bread.
Lift
The lift of an association rule tells us how well such an association rule predicts the occurrence of the consequence given the antecedent as compared to a random model. This is defined formally below.
The lift of an association rule is the ratio of the observed support to the expected support if the occurrence of and are independent to each other. Symbolically,
Let us consider the rule . Its support is as found earlier, with and , so the lift of this rule is . A lift of more than 1 indicates that they are dependent on each other, so this rule is potentially useful. Consider instead the rule . Here its support is (since transaction 4 is the only transaction containing and ), with the support of and being and respectively. Hence the lift of the rule is , which actually indicates that and are substitutes for each other since the presence of appears to have a negative impact on the presence of .
A Summary of Algorithms
Let’s now look at a quick summary of the common association rule mining algorithms. We’ll not go into too much detail about how they work, but we will highlight the main pros and cons of the algorithms.
These algorithms are used to generate frequent itemsets, which are itemsets which have a support of at least a certain threshold. Another step is needed to convert these frequent itemsets to association rules. That’s where the other two metrics come in — if the association rule generated does not meet a certain confidence (or lift), it will be discarded. For example, in mlxtend, the association_rules() function helps convert the frequent itemsets into association rules3.
Apriori
One of the earliest association rule mining algorithms, Apriori was proposed by Agrawal and Srikant in 1994 for frequent item set identification, and named so as it uses prior knowledge (i.e., a priori) of frequent itemset properties.
However, a big limitation is its poor performance when a large proportion of transactions in the database (>70%) contain a frequent itemset. When this occurs, its speed drops rapidly and will take a while longer to generate the frequent itemsets. The Apriori algorithm also needs to scan the database times where is the size of the largest frequent itemset that still surpasses the support threshold.
ECLAT
Due to the poor performance of the Apriori algorithm, several alternatives were developed. One of which, the Equivalence Class Clustering and Bottom-Up Lattice Traversal (ECLAT) algorithm by Zaki in 2000, is a backtracking algorithm that works on the frequent itemset lattice graph.
Although ECLAT is faster than Apriori, it comes with reduced interpretability — it does not include confidence or lift metrics. This is partly the reason why it is faster, but this does mean that the ability to interpret its results is reduced.
FP-Growth
Frequent Pattern Growth (FP-Growth) was an algorithm introduced by Han, Pei, and Yin in 2000 and quickly emerged as a popular alternative to the Apriori algorithm since it is faster. In particular, FP-Growth forgoes candidate frequent itemset generation which makes it faster than Apriori. It also provides more metrics than ECLAT and is on par with ECLAT’s speed.
An Example With mlxtend
With the background on the key concepts of association rule mining and the main algorithms used, let’s try our hand at actually finding these association rules in the database given above using the mlxtend library.
Setting Up
First install the mlxtend library. This will also install some useful data processing libraries like NumPy and Pandas.
pip install mlxtend~=0.23.4IMPORTANT
Make sure that pip refers to the Python 3 pip command. Otherwise, use pip3 instead.
Loading the Database
Let’s load our itemsets in the database so that we can use mlxtend’s association rule mining tools on it.
id,itemset
1,"Milk,Bread,Fruit"
2,"Butter,Eggs,Fruit"
3,"Beer,Diapers"
4,"Milk,Bread,Butter,Eggs,Fruit"
5,"Bread"Assuming that the database is named database.csv, we can load it in a Pandas dataframe by running the following code.
import pandas as pd
orig_df = pd.read_csv("database.csv", index_col="id")But what we actually need is a list of itemsets, not a database! With some processing we can achieve this:
rows = orig_df.values.tolist()
dataset = [row[0].split(",") for row in rows] # row[0] since each row has only one columnWe can now use dataset to generate the frequent itemsets and then the association rules.
Generating Frequent Itemsets
With our dataset created, we are almost ready to generate the frequent itemsets. However, we first need to encode the itemsets in the ‘canonical’ transaction form4. We can do this using the TransactionEncoder() class provided in mlxtend:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)We can now use mlxtend’s frequent_patterns module on df to generate the frequent itemsets. Let’s use the fpgrowth() function, specifying a minimum support value of 0.4 and saying that we want the column names to appear in the itemsets (instead of the column indices).
from mlxtend.frequent_patterns import fpgrowth
frequent_itemsets = fpgrowth(df, min_support=0.4, use_colnames=True)Let’s take a look at what the frequent_itemsets dataframe contains:
print(frequent_itemsets) support itemsets
0 0.6 (Fruit)
1 0.6 (Bread)
2 0.4 (Milk)
3 0.4 (Eggs)
4 0.4 (Butter)
5 0.4 (Fruit, Bread)
6 0.4 (Milk, Bread)
7 0.4 (Fruit, Milk)
8 0.4 (Milk, Fruit, Bread)
9 0.4 (Eggs, Fruit)
10 0.4 (Butter, Eggs)
11 0.4 (Butter, Fruit)
12 0.4 (Butter, Fruit, Eggs)Generating Association Rules
With the frequent_itemsets dataframe, we can now find the appropriate association rules. Luckily we don’t have to manually write the brute force code ourselves — mlextend’s frequent_pattern module also provides the association_rules() function to generate them for us. Here we specify that the association rules that are to be output should have a confidence of above 0.8. We could of course use lift in place of confidence; we’ll leave this as an exercise for the reader.
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)Taking a look at the rules dataframe reveals some interesting rules (the output is truncated for brevity):
print(rules) antecedents consequents antecedent support consequent support support confidence lift
0 (Milk) (Bread) 0.4 0.6 0.4 1.0 1.666667
1 (Milk) (Fruit) 0.4 0.6 0.4 1.0 1.666667
2 (Fruit, Milk) (Bread) 0.4 0.6 0.4 1.0 1.666667
3 (Bread, Milk) (Fruit) 0.4 0.6 0.4 1.0 1.666667
4 (Fruit, Bread) (Milk) 0.4 0.4 0.4 1.0 2.500000
5 (Milk) (Fruit, Bread) 0.4 0.4 0.4 1.0 2.500000
6 (Eggs) (Fruit) 0.4 0.6 0.4 1.0 1.666667
7 (Butter) (Eggs) 0.4 0.4 0.4 1.0 2.500000
8 (Eggs) (Butter) 0.4 0.4 0.4 1.0 2.500000
9 (Butter) (Fruit) 0.4 0.6 0.4 1.0 1.666667
10 (Butter, Fruit) (Eggs) 0.4 0.4 0.4 1.0 2.500000
11 (Butter, Eggs) (Fruit) 0.4 0.6 0.4 1.0 1.666667
12 (Eggs, Fruit) (Butter) 0.4 0.4 0.4 1.0 2.500000
13 (Butter) (Eggs, Fruit) 0.4 0.4 0.4 1.0 2.500000
14 (Eggs) (Butter, Fruit) 0.4 0.4 0.4 1.0 2.500000From the generated rules, we can see that all of them have a confidence of 1.0. Some of the rules’ lift scores are also quite high — the rule for example has a lift value of 2.5. Some of the rules are quite unexpected as well — the rule for example has a confidence of 1 and lift value of 1.666667.
NOTE
The rules dataframe also includes other metrics, which we have omitted in the output above and will not explore here. Interested readers could go to mlxtend’s official documentation to find out more about these metrics.
Filtering for certain rules comes down to manipulation of the rules dataframe. For example, to filter rules that have Butter as an antecedent, we can run:
rules[rules["antecedents"] == {"Butter"}]Also, if only the support metric data is needed (or if the other metrics cannot be calculated), we can set support_only = True when using the association_rules() function:
frequent_itemsets = fpgrowth(df, min_support=0.4, use_colnames=True, support_only=True)Challenges of Association Rule Mining
Although association rule mining only requires a few lines of code (as demonstrated earlier), there are a few challenges that needs to be addressed when dealing with real-world data.
- Sparse data: If the itemsets in each transaction in the database only contain a few items, but there is a large variety of items, data processing and storage may become an issue when most of the data in
te_aryanddf(see above) is going to be empty. - Thresholding: The threshold values of
min_support = 0.4andmin_threshold = 0.8were not chosen at random — they were chosen to ensure that we don’t get too many or too few rules. For real data, these values would have to be adjusted so that the processing time doesn’t become too long and that the rules generated are relevant. - Meaningless, Trivial, or Repeated Rules: One must be careful to watch out for rules that are meaningless in the context of the other generated rules. For example, in the
rulesdataframe above, notice that rule 0 says that but rule 2 says that . But clearly is a subset of , and thus rule 2 is unnecessary. One must be careful to filter out rules that are unhelpful.
Conclusion
Association rule mining is an oft-forgotten type of unsupervised machine learning, but its applications to product recommendation, bioinformatics, anomaly detection, web usage mining, customer behaviour analysis etc. cannot be understated. I hope that this post managed to cover the basics of this wonderful approach to uncovering hidden relationships between variables in your datasets.
Happy association rule mining!
Bibliography
- Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Record, 22(2), 207-216. https://doi.org/10.1145/170036.170072
- Wikipedia contributors. (2025, January 25). Association rule learning. In Wikipedia, The Free Encyclopedia. Retrieved 10:20, May 2, 2025, from https://en.wikipedia.org/w/index.php?title=Association_rule_learning&oldid=1271695648
- Agrawal, R., & Srikant, R. (1994, September). Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, 487-499. https://www.vldb.org/conf/1994/P487.PDF
- Zaki, M. J. (2000, May-June). Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering. 12(3), 372-390. https://doi.org/10.1109/69.846291
- Han, J., Pei, J., & Yin, Y. (2000, May 16). Mining Frequent Patterns without Candidate Generation. Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ‘00), 1-12. https://doi.org/10.1145/342009.335372
- Heaton, J. (2017, January 30). Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms. arXiv. https://arxiv.org/pdf/1701.09042
- Raschka, S. (n.d.). association_rules: Association rules generation from frequent itemsets. Retrieved 14:30, May 3, 2025, from https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
Image Credits
Cover image by Buzz from Pixabay, cropped from the original.
Footnotes
Of course, in real life a shopping cart could contain any quantity of product (like two or three bags of chips). Here we are just interested whether the shopping cart has the product, not how many of that product is in the shopping cart. ↩
This is similar to how the probability of an event , commonly denoted , occurring within a sample space of size is the count of within the sample space divided by . In fact, some authors denote by , but this is a bit confusing for my tastes. ↩
In particular, according to the source code for
association_rules(), it appears that it will generate all possible combinations of antecedents and consequents for a given frequent itemset, and keep only those whose metric exceeds a threshold value. Formally, for a given frequent itemset , it finds all sets (the antecedent) and (the consequent) such that the metric exceeds the threshold value. ↩In the original formulation for transactions in association rule mining by Agrawal, Imieliński, and Swami in 1993, transactions were simply a pair where is the transaction ID and is an array of values where if the th item (in a list of all items) is present and otherwise. However we opt to let be an itemset to avoid confusion. ↩