Sequential data — data that has time dependency — is very common in business, ranging from credit card transactions to medical healthcare records to stock market prices. But privacy regulations limit and dramatically slow-down access to useful data, essential to research and development. This creates a demand for highly representative, yet fully private, synthetic sequential data, which is challenging, to say the least.
Generating synthetic time-series and sequential data is more challenging than tabular data where normally all the information regarding one individual is stored in a single row. In sequential data, information can be spread through many rows, like credit card transactions, and preservation of correlations between rows — the events — and columns — the variables is key. Furthermore, the length of the sequences is variable; some cases may comprise just a few transactions while others may have thousands.
Generative models for sequential data and time-series have been studied extensively, however, many of these efforts have resulted in relatively poor synthetic data quality and low flexibility. In many cases the models are designed to be specific to each problem, thus requiring detailed domain knowledge.
In this post, we describe and apply an extended version of a recent powerful method to generate synthetic sequential data — DoppelGANger. It is a framework based on Generative Adversarial Networks (GANs) with some innovations that make possible the generation of synthetic versions of complex sequential datasets.
We build on this work by introducing two innovations:
- A learning strategy to speed up the convergence of the GAN and avoid mode collapse.
- A well-designed noise in the discriminator to make the process differentially private without degrading the quality of the data, using a modified version of the moments accountant strategy to improve the stability of the model.
Do you like this in-depth educational content on applied machine learning? Subscribe to our Enterprise AI mailing list to be alerted when we release new material.
Common approaches to sequential data generation
Most of the models for time-series data generation use one of the following approaches:
Dynamic stationary processes that work by representing each point in the time series as a sum of deterministic processes with some noise added. This is a widely used approach for modeling time-series with techniques like bootstrapping. However, some prior knowledge of long-term dependencies, like cyclical patterns, has to be incorporated to constrain the deterministic process. This makes it very difficult to model datasets with complex, unknown correlations.
Markov Models are a popular approach for modeling categorical time series by representing system dynamics as a conditional probability distribution. Variants, such as Hidden Markov models, have also been used for modeling the distributions of time series. The problem with this approach is its inability to capture long-term complex dependencies.
Autoregressive (AR) models are dynamic stationary processes where each point in the sequence is represented as a function of the previous n points. Nonlinear AR models (like ARIMA) are very powerful. AR models like Markov models have a fidelity problem — they produce simplistic models incapable of capturing complex temporal correlations.
Recurrent Neural Networks (RNNs) have been recently used for time-series modeling in deep learning. Like autoregressive and Markov models, RNNs use a sliding window of previous timesteps to determine the next points in time. RNNs also store an internal state variable that captures the entire history of the time series. RNNs, like long short-term memory networks (LTSMs), have had great success in learning discriminative models of time series data, which predict a label conditioned on a sample. However, RNNs are unable to learn certain simple time-series distributions.
GAN-based methods or generative adversarial network models have emerged as a popular technique for generating or augmenting datasets, especially with images and videos. However, GANs give poor fidelity in networking data, which has both complex temporal correlations and mixed discrete-continuous data types. Although GAN-based time-series generation exists — for instance for medical time series — such techniques fail on more complex data exhibiting poor autocorrelation scores on long sequences while prone to mode collapse. This is due to the fact that the data distribution is heavy-tailed and variable in length. This seems to affect considerably GANs.
Introducing DoppelGANger for generating high-quality, synthetic time-series data
In this section, I will explore the recent model to generate synthetic sequential data DoppelGANger. I will use this model based on GANs with a generator composed of recurrent unities to generate synthetic versions of transactional data using two datasets: bank transactions and road traffic. We used a modification of the DoppelGANger model to address the limitations of generative models for sequential data.
Traditional Generative Adversarial Networks, or GANs, struggle to model sequential data due to the following issues:
- They don’t capture complex correlations between temporal features and their associated (immutable) attributes: For instance, depending on the owner characteristics (age, income, etc), credit card patterns in transactions are very distinct.
- Long-term correlations within time series, such as diurnal patterns: These correlations are qualitatively very different from those found in images, which have a fixed dimension and do not need to be generated pixel by pixel.
DoppelGANger incorporates some innovative ideas, like:
- using two networks (a MultiLayer Perceptron MLP and a recurrent network) to capture temporal dependencies
- decoupled attribution generation to better capture correlations between time series and their attributes — e.g., age, location, and gender of users
- batched generation — generation of small stacked batches for long sequences
- decoupled normalization — the addition of normalization factors to the generator to constraints range of features
DoppelGANger decouples the generation of attributes from time series while feeding attributes to the time series generator at each timestep. This contrasts with conventional approaches, where attributes and features are generated jointly.
DoppelGANger’s conditional generation architecture also offers the flexibility to change the attribute distribution and condition the features on the attributes. This also helps to hide the attribute distribution thus increasing privacy.
The DoppelGANger model also has the advantage of generating data features conditioned on data attributes.
Another neat feature of this model is how it handles extreme events, a very challenging problem. It’s not uncommon for sequential data to have a wide range of feature values across samples — some products may have thousands of transactions while others just a few. For GANs this is problematic as it is a sure recipe for mode collapse — samples will contain only the most common items and ignore the rare events. For images — the focus of almost all efforts on GANs — this isn’t an issue since the distributions are smooth. This is why the authors of DoppelGANger proposed an innovative way to handle these cases: auto-normalization. It consists in normalizing the data features prior to training and adding the minimum and maximum range of features as two additional attributes to each sample.
In the generated data, these two attributes usually scale features back to a realistic range. This is done in three steps:
- Generate attributes using the MultiLayer Perceptron (MLP) generator.
- With the generated attributes as inputs, generate the two “fake” (max/min) attributes using another MLP.
- With the generated real and fake attributes as inputs, generate the features.
Training the DoppelGANger model on bank transactions data
First, we evaluated DoppelGANger on a dataset of bank transactions. The data used for training is synthetic, so we know the real distributions, and can be accessed here. Our aim was to show that this model was able to learn the time dependencies in the data.
How to prepare the data?
We assume sequential data is composed of a set of sequences with maximum length Lmax — in our case we consider Lmax = 100. Each sequence contains a set of attributes A (fixed quantities) and features F (transactions). In our case, the only attribute is the initial bank Balance and the features are: Amount of the transaction (positive or negative) and two additional categories describing the transaction: Flag and Description.
To run the model we need three NumPy arrays:
- data_feature: training features, in NumPy float32 array format. The size is [(number of training samples) x (maximum length) x (total dimension of features)]. Categorical features are stored by one-hot encoding.
- data_attribute: Training attributes, in NumPy float32 array format. The size is [(number of training samples) x (total dimension of attributes)].
- data_gen_flag: An array of flags indicating the activation of features. The size is [(number of training samples) x (maximum length)].
Additionally, we need a list of objects of class Output that contains the data type for each variable, normalization, and cardinality. In this case, it is:
data_feature_outputs = [ output.Output(type_=OutputType.CONTINUOUS,dim=1,normalization=Normalization.ZERO_ONE,is_gen_flag=False), # time intervals between transactions (Dif) output.Output(type_=OutputType.DISCRETE,dim=20,is_gen_flag=False), # binarized Amount output.Output(type_=OutputType.DISCRETE,dim=5,is_gen_flag=False), # Flag output.Output(type_=OutputType.DISCRETE,dim=44,is_gen_flag=False) # Description ]
The first element of the list is the time interval between events Dif, followed by the 1-hot encoded transaction value (Amount), followed by the Flag, and the fourth is the transaction Description. All gen_flags are set to False since it’s an internal flag to be later modified by the model itself.
The attribute is encoded as a continuous variable with normalization between -1 and 1 to account for negative balances:
data_attribute_outputs = [output.Output(type_=OutputType.CONTINUOUS,dim=1,normalization=Normalization.MINUSONE_ONE,is_gen_flag=False)]
The only attribute used in this simulation is the initial balance. The balance at each step is simply updated by adding the corresponding transaction amount.
We used Hazy processors to pre-process each sequence and reshape it in the right format.
n_bins = 20 processor_dict = { "by_type": { "float": { "processor": "FloatToOneHot", #FloatToBin" "kwargs": {"n_bins": n_bins} }, "int": { "processor": "IntToFloat", "kwargs": {"n_bins": n_bins} }, "category": { "processor": "CatToOneHot", }, "datetime": { "processor": "DtToFloat", } } } from hazy_trainer.processing import HazyProcessor processor = HazyProcessor(processor_dict)
Now we are going to read the data and process it using the function format_data. The auxiliary variables categories_n and categories_cum store respectively the cardinality and the cumulative sum of the cardinality of the variables.
data=pd.read_csv('data.csv',nrows=100000) # read the data categorical = ['Amount','Flag','Description'] continuous =['Balance','Dif'] cols = categorical + continuous processor = HazyProcessor(processor_dict) #call Hazy processor processor.setup_df(data[cols]) # setup the processor categories_n = [] # Number of categories in each categorical variable for cat in categorical: categories_n.append(len(processor.column_to_columns[cat]['process'])) categories_cum = list(np.cumsum(categories_n)) # Cumulative sum of number of categorical variables categories_cum = [x for x in categories_cum] # We take one out because they will be indexes categories_cum = [0] + categories_cum def format_data(data, cols, nsequences=1000, Lmax=100, cardinality=70): ''' cols is a list of columns to be processed nsequences number of sequences to use for training Lmax is the maximum sequence length Cardinality shape of sequences''' idd=list(accenture.Account_id.unique()) # unique account ids data.Date = pd.to_datetime(data.Date) # format date # dummy to normalize the processors data['Dif']=np.random.randint(0,30*24*3600,size=accenture.shape[0]) data_all = np.zeros((nsequences,Lmax,Cardinality)) data_attribut=np.zeros((nsequences)) data_gen_flag=np.zeros((nsequences,Lmax)) real_df = pd.DataFrame() for i,ids in enumerate(idd[:nsequences]): user = data[data.Account_id==ids] user=user.sort_values(by='Date') user['Dif']=user.Date.diff(1).iloc[1:] user['Dif']=user['Dif'].dt.seconds user = user[cols] real_df=pd.concat([real_df,user]) processed_df = processor.process_df(user) Data_attribut[i] = processed_df['Balance'].values[0] processed_array = np.asarray(processed_df.iloc[:,1:) data_gen_flag[i,:len(user)]=1 data_all[i,:len(user),:]=processed_array return data_all, data_attribut, data_gen_flag
Data
The data consist of roughly 10 million bank transactions from which we will use just a sample of 100,000 containing 5,000 unique accounts with an average of 20 transactions per account. We consider the following fields:
- Date of the transaction
- Amount of transaction
- Balance
- Transaction Flag (5 levels)
- Description (44 levels)
Below is the head of the data used:
As mentioned before, the temporal information will be modeled as the time difference between two consecutive transactions (in seconds).
Running the code
We ran the code for only 100 epochs using the following parameters:
import sys import os sys.path.append("..") import matplotlib.pyplot as plt from gan import output sys.modules["output"] = output import numpy as np import pickle import pandas as pd from gan.doppelganger import DoppelGANger from gan.load_data import load_data from gan.network import DoppelGANgerGenerator, Discriminator, AttrDiscriminator from gan.output import Output, OutputType, Normalization import tensorflow as tf from gan.network import DoppelGANgerGenerator, Discriminator, \ RNNInitialStateType, AttrDiscriminator from gan.util import add_gen_flag, normalize_per_sample, \ renormalize_per_sample sample_len = 10 epoch = 100 batch_size = 20 d_rounds = 2 g_rounds = 1 d_gp_coe = 10.0 attr_d_gp_coe = 10.0 g_attr_d_coe = 1.0
Note that the generator is composed of a list of layers with the softmax activation function for categorical inputs and linear activation for continuous variables. Both generator and discriminator are optimized using the Adam algorithm with a specified learning rate and momentum.
Now we prepare the data to feed the network. The real_attribute_mask is a list of True/False with the same length as the number of attributes. False if the attribute is (max-min)/2 or (max+min)/2; otherwise True. First we instantiate the generator and the discriminator:
# create the necessary input arrays data_all, data_attribut, data_gen_flag = format_data(data,cols) # normalise data (data_feature, data_attribute, data_attribute_outputs, real_attribute_mask) = normalize_per_sample( data_all, data_attribut, data_feature_outputs, data_attribute_outputs) # add generation flag to features data_feature, data_feature_outputs = add_gen_flag( data_feature, data_gen_flag, data_feature_outputs, sample_len) generator = DoppelGANgerGenerator( feed_back=False, noise=True, feature_outputs=data_feature_outputs, attribute_outputs=data_attribute_outputs, real_attribute_mask=real_attribute_mask, sample_len=sample_len, feature_num_units=100, feature_num_layers=2) discriminator = Discriminator() attr_discriminator = AttrDiscriminator()
We used a neural network composed of two layers of 100 neurons for the generator and the discriminator. All data were normalized or 1-hot encoded. Then we train the model with the following parameters:
checkpoint_dir = "./results/checkpoint" sample_path = "./results/time" epoch = 100 batch_size = 50 g_lr = 0.0001 d_lr = 0.0001 vis_freq = 50 vis_num_sample = 5 d_rounds = 3 g_rounds = 1 d_gp_coe = 10.0 attr_d_gp_coe = 10.0 g_attr_d_coe = 1.0 extra_checkpoint_freq = 30 num_packing = 1
Some notes on training
If the data is large, you should use a larger number of epochs — the authors suggest 400 but, in our experiments, we found that we could be as high as 1000 without networks degenerating into mode collapse. Also, consider that the number of epochs is related to batch size — smaller batches need more epochs and a lower learning rate.
For those new to neural networks, Batch, Stochastic, and Minibatch gradient descent are the three main flavors of machine learning algorithms. Batch size controls the accuracy of the estimate of the error gradient when training neural networks. The user should be aware of the trade-offs between batch size, speed, and stability during learning. Larger batches require larger learning rates and the network will learn faster, but it can also be less stable, which is particularly problematic for GANs due to the mode collapse problem.
As a rule of thumb learning rates of generators and discriminators should be small (in the range to ) and similar to each other. In our case, we use , not the default .
Another important parameter is the number of rounds on the generator and discriminator. Wasserstein GAN (WGAN) requires two components to work properly: gradient clipping and higher rounds of discriminator (d_round) than the generator. Normally the number of rounds of the discriminator is between 3 to 5 for each round of the generator. Here we use d_round=3 and g_round=1.
In order to speed up the training, we used a cyclical learning rate for the generator and a fixed one for the discriminator.
The directory sample_path stores a set of samples collected at different checkpoints, which is useful for verification purposes. Visualizations of the loss functions can be done using TensorBoard on the checkpoint directory that you provide. You can control the frequency of checkpoints with the parameter extra_checkpoint_freq.
Be aware that this may take up a lot of disk space. The simulation took less than ten minutes on a MacBook Pro.
run_config = tf.ConfigProto() tf.reset_default_graph() # if you are using spyder with tf.Session(config=run_config) as sess: gan = DoppelGANger( sess=sess, checkpoint_dir=checkpoint_dir, sample_dir=sample_dir, time_path=sample_path, epoch=epoch, batch_size=batch_size, data_feature=data_feature, data_attribute=data_attribute, real_attribute_mask=real_attribute_mask, data_gen_flag=data_gen_flag, sample_len=sample_len, data_feature_outputs=data_feature_outputs, data_attribute_outputs=data_attribute_outputs, vis_freq=vis_freq, vis_num_sample=vis_num_sample, generator=generator, discriminator=discriminator, attr_discriminator=attr_discriminator, d_gp_coe=d_gp_coe, attr_d_gp_coe=attr_d_gp_coe, g_attr_d_coe=g_attr_d_coe, d_rounds=d_rounds, g_rounds=g_rounds, g_lr=g_lr, d_lr=d_lr, num_packing=num_packing, extra_checkpoint_freq=extra_checkpoint_freq) gan.build() gan.train()
Synthetic data generation
After the model is trained, you can use the generator to create synthetic data from noise. There are two ways to do it:
- Unconditional generation from pure noise
- Conditional generation on attributes
In the first case, we generate attributes and features. In the second, we explicitly specify which attributes we want to condition the feature generation with so that only features are generated.
Below is the code to generate samples from:
run_config = tf.ConfigProto() total_generate_num_sample = 1000 with tf.Session(config=run_config) as sess: gan = DoppelGANger( sess=sess, checkpoint_dir=checkpoint_dir, sample_dir=sample_dir, time_path=time_path, epoch=epoch, batch_size=batch_size, data_feature=data_feature, data_attribute=data_attribute, real_attribute_mask=real_attribute_mask, data_gen_flag=data_gen_flag, sample_len=sample_len, data_feature_outputs=data_feature_outputs, data_attribute_outputs=data_attribute_outputs, vis_freq=vis_freq, vis_num_sample=vis_num_sample, generator=generator, discriminator=discriminator, attr_discriminator=attr_discriminator, d_gp_coe=d_gp_coe, attr_d_gp_coe=attr_d_gp_coe, g_attr_d_coe=g_attr_d_coe, d_rounds=d_rounds, g_rounds=g_rounds, num_packing=num_packing, extra_checkpoint_freq=extra_checkpoint_freq) # build the network gan.build() length = int(data_feature.shape[1] / sample_len) real_attribute_input_noise = gan.gen_attribute_input_noise( total_generate_num_sample) addi_attribute_input_noise = gan.gen_attribute_input_noise( total_generate_num_sample) feature_input_noise = gan.gen_feature_input_noise( total_generate_num_sample, length) input_data = gan.gen_feature_input_data_free( total_generate_num_sample) # load the weights / change the path accordingly gan.load(checkpoint_dir+'/epoch_id-100') # generate features, attributes and lengths features, attributes, gen_flags, lengths = gan.sample_from( real_attribute_input_noise, addi_attribute_input_noise, feature_input_noise, input_data, given_attribute=None, return_gen_flag_feature=False) #denormalise accordingly features, attributes = renormalize_per_sample( features, attributes, data_feature_outputs, data_attribute_outputs, gen_flags, num_real_attribute=1)
We need a few extra steps to process the generated samples into a sequence format and return vectors in a 1-hot encoding format.
nfloat = len(continuous) synth=np.zeros(features.shape[-1]) for i in range(features.shape[0]): v = np.concatenate([np.zeros_like(attributes[i]), np.zeros_like(features[i])],axis=-1) v[attributes[i].shape] = attributes[i] V[attributes[i].shape[0]:attributes[i].shape[0]+1] = feature[i,:,0] for j, c in enumerate(categories_cum[:-1]): ac = features[:,nfloat+categories_cum[j]-1: nfloat+categories_cum[j+1]-1] a_hot = np.zeros((ac.shape[0], categories_n[j])) a_hot[np.arange(ac.shape[0]),ac.argmax(axis=1)] = 1 v[:,nfloat+categories_cum[j]:nfloat+categories_cum[j+1]]=a_hot v=np.concatenate([np.array([i]*len(ac))[np.newaxis].T,v],axis=1) synth = np.vstack([synth,v]) df = pd.DataFrame(synth[1:,1:],columns=processed_df.columns) formated_df = processor.format_df(df) formated_df['account_id']=synth[:,0] # add account_id
Below we present some comparisons between synthetic (generated) and real data. We can observe that, overall, the generated data distribution matches relatively well the real ones — Fig 8 and Fig 9.
Figure 8: Histograms of sequence length (top) time intervals between Transactions (middle) and Flags (bottom) for generated vs real data.
The only exception is the distribution of the variable Amount, as shown in Figure 9. This is due to the fact that this variable has a non-smooth distribution. To solve this issue we discretized it into 20 levels resulting in a much better match.
Figure 9: Amount real vs generated using a continuous encoding (top) and binarised one-hot encoding (bottom).
We then used the Hazy metrics to calculate the Similarity Score. This score is a mean of three scores: Histogram and histogram2D similarity (how much the real and synthetic data histograms overlap) and Mutual Information between columns. This score establishes how well the synthetic data preserves the correlations between columns.
We got a similarity score of 0.57 when treating Amount as a continuous variable and 0.63 when we binarised it into 20 bins. The Similarity Score was obtained as follows:
from hazy_trainer.evaluation.similarity import Similarity sim = Similarity(metrics=['hist','hist2d','mi']) score = sim.score(real_df[cols], formated_df[cols]) print(score['similarity']['score'])
However, we’ve noticed that this number does not really tell the whole story since it does not explicitly measure the temporal coherence of the synthetic data sequences — it treats each row independently.
For that purpose, we used an additional key metric: autocorrelation which measures how an event in time t is related to events occurring at time t — ∆ where ∆ is a time lag. To measure the relation we compare in the following way:
AC=i=1T(Areali -Asynthetici)2/ i=1T(Areali )2
Below are the autocorrelation plots for the total amount spent (aggregated by day) on real and synthetic data. We can see that the two have very similar patterns.
This will only work for numerical data. For categorical, we can use mutual information. For our data, we got AC = 0.71
The traffic dataset
In order to prove the capabilities of a sequential data generator, we tested it on another more challenging dataset: the Metro Interstate Traffic Volume Data Set. It’s a dataset with hourly traffic data from 2012 to 2018. As we can see in the next figures, the data is relatively coherent over time with some daily and weekly patterns and large hourly variability. The synthetic data originated from the generator has to reproduce all these trends.
The daily patterns can be quite complex as seen in the next figure containing traffic over the first month (October 2012):
In order to generate good quality synthetic data, the network has to predict the right daily, weekly, monthly, and even yearly patterns, so long-term correlations are important.
Figure 15: Some more distributions of the data.
In terms of autocorrelation, we can see a smooth daily correlation — which makes sense since most traffic have a symmetric behavior. High intensity in the morning is correlated with high intensity in the evening.
Running the model
In this case, the sequence lengths are fixed. To prepare the data, we generated 50,000 sequences using a sliding window of monthly and weekly data. This dataset is much larger than the previous and we expected the model to behave smoothly without mode collapse.
In this case, we also had a larger number of attributes. Some, like Day of the week and Month, were constructed from the data:
- Temperature
- Rain_1h
- Snow_1h
- Clouds_all
- Weather_description
- Weather_main
- Holiday
- Day of the week
- Month
As features, we have only the hourly traffic volume. Since we want to capture this variable with the highest granularity, all numeric values were discretized into 20 bins, except the traffic volume that was discretized into 50 bins. The model ran for 200 epochs with a batch size of 20 and the same learning rate as before.
Results
Figure 17 contains a real and generated sample. We can see that the cyclic patterns are well kept and data looks realistic.
To test the quality of the generated data, we present some metrics — see table 2:
- Similarity — measured by the overlap of histograms and mutual information
- Auto-correlation — the ratio between real and synthetic over 30 time lags
- Utility — measured by the relative ratio of forecasting error when trained with real and synthetic data
We used as a baseline an LSTM (long short-term memory) model with bootstrapping. This LSTM model is composed of two layers with 100 neurons each and uses a sliding window of 30 hours. The attributes were added through a dense layer and concatenated to the last hidden layer of the network.
As we can see from Table 2, DoppelGANger, trained with weekly data, performs relatively well, outperforming by a good margin the bootstrapping technique.
We added a third metric, the Sequential Mutual Information (SMI). It is evaluating the Mutual Information on a matrix containing T columns where each column corresponds to the event occurring previous t, t-1, t-2, … t-T time steps and averaging on a subset of attributes.
We should note that the model can be conditioned on the attributes, so we can generate samples for a specific weather condition or day of the week or month.
Experiments on Differential Privacy
In the original work, the authors introduced differential privacy in the model through the well-known technique of adding noise to the discriminator and clipping its gradients — the DPGAN.
However, they found out that, as soon as the privacy budget, ε, becomes relatively small — meaning that the synthetic data gets safer, it also starts losing quality — measured by temporal coherence with respect to the real data. This could represent a major problem if the end-usage of the data is to extract detailed temporal information, like causality between events.
Based on recent work around PPGAN (Privacy-preserving Generative Adversarial Network), we introduced some modifications to the noise injected to the gradients of the discriminator. The moment’s accountant frames the privacy loss problem as if it was a random variable, using its moment-generating functions to control the variable’s density distributions. This property makes the PPGAN model training more stable. The difference with DPGAN is particularly significant when generating very long sequences.
The noise is given by the following expression:
ɸ=f+N(0,σ2𝜍f2)
Where 𝞷 is the sensitivity to a query f from two neighbor points x and x’:
△f=maxf(x)-f(x’)2
This expression means that most informative points — the highest sensitivity — will get more noise added to the gradient, thus not compromising the quality from other points. By using this carefully designed noise, we were able to preserve 88 percent of the autocorrelation up to ε = 1 on the traffic data.
Conclusions
Synthetic sequential data generation is a challenging problem that has not yet been fully solved. Through the testing presented above, we proved that GANs present as an effective way to address this problem.
Learn more about Hazy synthetic data generation and request a demo at Hazy.com.
This article was originally published on Medium and re-published to TOPBOTS with permission from the author.
Enjoy this article? Sign up for more updates on applied ML.
We’ll let you know when we release more technical education.
Leave a Reply
You must be logged in to post a comment.