Package 'niarules'

Title: Numerical Association Rule Mining using Population-Based Nature-Inspired Algorithms
Description: Framework is devoted to mining numerical association rules through the utilization of nature-inspired algorithms for optimization. Drawing inspiration from the 'NiaARM' 'Python' and the 'NiaARM' 'Julia' packages, this repository introduces the capability to perform numerical association rule mining in the R programming language. Fister Jr., Iglesias, Galvez, Del Ser, Osaba and Fister (2018) <doi:10.1007/978-3-030-03493-1_9>.
Authors: Iztok Jr. Fister [aut, cre, cph]
Maintainer: Iztok Jr. Fister <[email protected]>
License: MIT + file LICENSE
Version: 0.2.0
Built: 2025-03-04 13:42:19 UTC
Source: https://github.com/firefly-cpp/niarules

Help Index


Add an attribute to the "rule" list.

Description

This function adds an attribute to the existing list.

Usage

add_attribute(rules, name, type, border1, border2, value)

Arguments

rules

The current rules list.

name

The name of the feature in the rule.

type

The type of the feature in the rule.

border1

The first border value in the rule.

border2

The second border value in the rule.

value

The value associated with the rule.

Value

The updated rules list.

Examples

rules <- list()
new_rules <- add_attribute(rules, "feature1", "numerical", 0.2, 0.8, "EMPTY")

Build rules based on a candidate solution.

Description

This function takes a candidate solution vector and a features list and builds rule.

Usage

build_rule(solution, features)

Arguments

solution

The solution vector.

features

The features list.

Value

A rule.


Calculate the border value based on feature information and a given value.

Description

This function calculates the border value for a feature based on the feature information and a given value.

Usage

calculate_border(feature_info, value)

Arguments

feature_info

Information about the feature.

value

The value to calculate the border for.

Value

The calculated border value.

Examples

feature_info <- list(type = "numerical", lower_bound = 0, upper_bound = 1)
border_value <- calculate_border(feature_info, 0.5)

Calculate the fitness of an association rule.

Description

This function calculates the fitness of an association rule using support and confidence.

Usage

calculate_fitness(supp, conf)

Arguments

supp

The support of the association rule.

conf

The confidence of the association rule.

Value

The fitness of the association rule.


Calculate the selected category based on a value and the number of categories.

Description

This function calculates the selected category based on a given value and the total number of categories.

Usage

calculate_selected_category(value, num_categories)

Arguments

value

The value to calculate the category for.

num_categories

The total number of categories.

Value

The calculated selected category.

Examples

selected_category <- calculate_selected_category(0.3, 5)

Check if the attribute conditions are satisfied for an instance.

Description

This function checks if the attribute conditions specified in the association rule are satisfied for a given instance row.

Usage

check_attribute(attribute, instance_row)

Arguments

attribute

An attribute with type and name information.

instance_row

A row representing an instance in the dataset.

Value

TRUE if conditions are satisfied, FALSE otherwise.


Calculate the cut point for an association rule.

Description

This function calculates the cut point, denoting which part of the vector belongs to the antecedent and which to the consequent of the mined association rule.

Usage

cut_point(sol, num_attr)

Arguments

sol

The cut value from the solution vector.

num_attr

The number of attributes in the association rule.

Value

The cut point value.


Implementation of Differential Evolution metaheuristic algorithm.

Description

This function uses Differential Evolution, a stochastic population-based optimization algorithm, to find the optimal numerical association rule.

Usage

differential_evolution(
  d = 10,
  np = 10,
  f = 0.5,
  cr = 0.9,
  nfes = 1000,
  features,
  data,
  is_time_series = FALSE
)

Arguments

d

Dimension of the problem (default: 10).

np

Population size (default: 10).

f

The differential weight, controlling the amplification of the difference vector (default: 0.5).

cr

The crossover probability, determining the probability of a component being replaced (default: 0.9).

nfes

The maximum number of function evaluations (default: 1000).

features

A list containing information about features, including type and bounds.

data

A data frame representing instances in the dataset.

is_time_series

A boolean indicating whether the dataset is time series.

Value

A list containing the best solution, its fitness value, and the number of function evaluations and list of identified association rules.

References

Storn, R., & Price, K. (1997). "Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces." Journal of Global Optimization, 11(4), 341–359. doi:10.1023/A:1008202821328


Evaluate a candidate solution, with optional time series filtering.

Description

This function evaluates the fitness of an association rule using support and confidence. If time series data is used, it restricts evaluation to the specified time range.

Usage

evaluate(solution, features, instances, is_time_series = FALSE)

Arguments

solution

A vector representing a candidate solution.

features

A list containing information about features.

instances

A data frame representing dataset instances.

is_time_series

A boolean flag indicating if time series filtering is required.

Value

A list containing fitness and identified rules.

References

Fister, I., Iglesias, A., Galvez, A., Del Ser, J., Osaba, E., & Fister, I. (2018). "Differential evolution for association rule mining using categorical and numerical attributes." In Intelligent Data Engineering and Automated Learning–IDEAL 2018: 19th International Conference, Madrid, Spain, November 21–23, 2018, Proceedings, Part I (pp. 79-88). Springer International Publishing. doi:10.1007/978-3-030-03496-2_9

Fister Jr, I., Podgorelec, V., & Fister, I. (2021). "Improved nature-inspired algorithms for numeric association rule mining." In Intelligent Computing and Optimization: Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020) (pp. 187-195). Springer International Publishing. doi:10.1007/978-3-030-68154-8_19


Extract feature information from a dataset, excluding timestamps.

Description

This function analyzes the given dataset and extracts information about each feature.

Usage

extract_feature_info(data, timestamp_col = "timestamp")

Arguments

data

The dataset to analyze.

timestamp_col

Optional. The name of the timestamp column to exclude from features.

Value

A list containing information about each feature, including type and bounds/categories.


Get the position of a feature.

Description

This function returns the position of a feature in the vector, considering the type of the feature.

Usage

feature_position(features, feature)

Arguments

features

The features list.

feature

The name of the feature to find.

Value

The position of the feature.

Examples

features <- list(
  feature1 = list(type = "numerical"),
  feature2 = list(type = "categorical"),
  feature3 = list(type = "numerical")
)
position <- feature_position(features, "feature2")

Fix Borders of a Numeric Vector

Description

This function ensures that all values greater than 1.0 are set to 1.0, and all values less than 0.0 are set to 0.0.

Usage

fix_borders(vector)

Arguments

vector

A numeric vector to be processed.

Value

A numeric vector with borders fixed.


Format Rule Parts

Description

This function formats the parts of an association rule into a string.

Usage

format_rule_parts(parts)

Arguments

parts

A list containing parts of an association rule.

Value

A formatted string representing the rule parts.


Map solution boundaries to time series instances.

Description

This function maps the lower and upper bounds of the solution vector to a subset of the dataset.

Usage

map_to_ts(lower, upper, instances)

Arguments

lower

The lower bound in [0, 1].

upper

The upper bound in [0, 1].

instances

The full dataset.

Value

A list with 'low', 'up', and 'filtered_instances'.


Implementation of Particle Swarm Optimization (PSO) metaheuristic algorithm.

Description

This function uses PSO, a stochastic population-based optimization algorithm, to find the optimal numerical association rule.

Usage

particle_swarm_optimization(
  d = 10,
  np = 10,
  w = 0.7,
  c1 = 1.5,
  c2 = 1.5,
  nfes = 1000,
  features,
  data,
  is_time_series = FALSE
)

Arguments

d

Dimension of the problem (default: 10).

np

Population size (default: 10).

w

Inertia weight (default: 0.7).

c1

Cognitive coefficient (default: 1.5).

c2

Social coefficient (default: 1.5).

nfes

The maximum number of function evaluations (default: 1000).

features

A list containing information about features, including type and bounds.

data

A data frame representing instances in the dataset.

is_time_series

A boolean indicating whether the dataset is time series.

Value

A list containing the best solution, its fitness value, and the number of function evaluations and list of identified association rules.

References

Kennedy, J., & Eberhart, R. (1995). "Particle swarm optimization." Proceedings of ICNN'95 - International Conference on Neural Networks, 4, 1942–1948. IEEE. doi:10.1109/ICNN.1995.488968


Calculate the dimension of the problem, excluding timestamps.

Description

Calculate the dimension of the problem, excluding timestamps.

Usage

problem_dimension(feature_info, is_time_series = FALSE)

Arguments

feature_info

A list containing information about each feature.

is_time_series

Boolean indicating if time series data is present.

Value

The calculated dimension based on the feature types.


Read a CSV Dataset

Description

Reads a dataset from a CSV file and optionally parses a timestamp column.

Usage

read_dataset(
  dataset_path,
  timestamp_col = "timestamp",
  timestamp_formats = c("%d/%m/%Y %H:%M:%S", "%H:%M:%S %d/%m/%Y")
)

Arguments

dataset_path

A string specifying the path to the CSV file.

timestamp_col

A string specifying the timestamp column name (default: '"timestamp"').

timestamp_formats

A vector of date-time formats to try for parsing timestamps.

Value

A data frame containing the dataset.


Simple Random Search

Description

This function generates a vector of random solutions for a specified length.

Usage

rs(candidate_len)

Arguments

candidate_len

The length of the vector of random solutions.

Value

A vector of random solutions between 0 and 1.

Examples

candidate_len <- 10
random_solutions <- rs(candidate_len)
print(random_solutions)

Calculate support and confidence for an association rule.

Description

This function calculates the support and confidence for the given antecedent and consequent in the dataset instances.

Usage

supp_conf(antecedent, consequent, instances, features)

Arguments

antecedent

The antecedent part of the association rule.

consequent

The consequent part of the association rule.

instances

A data frame representing instances in the dataset.

features

A list containing information about features, including type and bounds.

Value

A list containing support and confidence values.


Write Association Rules to CSV file

Description

This function writes association rules to a CSV file. For time series datasets, it also includes start and end timestamps instead of indices.

Usage

write_association_rules_to_csv(
  rules,
  file_path,
  is_time_series = FALSE,
  timestamps = NULL
)

Arguments

rules

A list of association rules.

file_path

The file path for the CSV output.

is_time_series

A boolean flag indicating if time series information should be included.

timestamps

A vector of timestamps corresponding to the time series data.

Value

No explicit return value. The function writes association rules to a CSV file.