Security and Privacy considerations in Artificial Intelligence & Machine L ...

Where are the assets that need to be protected?

Security and Privacy considerations in Artificial Intelligence & Machine Learning ― Part 2: The NewAssets

Note: This is part-2 of a series of articles on ‘Security and Privacy in Artificial Intelligence & Machine Learning’. Here are the links to all articles (so far):

Part-1 Part-2 (this article)

In theprevious article, we looked at the key challenges that show up from an end to end consideration of AI & ML ecosystem and workflows from a ‘traditional’ cybersecurity standpoint (without going much into the AI & ML specifics of the workflows). In this part, we will begin zooming in on the AI & ML components ― starting with an exploration of the interesting new assets that AI & ML bring into the picture. This exercise will help us become cognizant of these assets allowing us to treat them alongside other critical information assets when we perform threat modeling and choose security techniques for the end to end system. With the assets identified, it will be easier to consider possible ways that attackers can breach the Confidentiality, Integrity or Availability (CIA) of these assets, the impact that each type of breach may have on the business and to systematically ensure that we design and implement appropriate protections against those attacks.

In the previous article, we have already discussed how immense extents of sensitive business data may be involved at various points in end to end AI & ML workflows and how ― owing to (a) a diaspora of new tools and frameworks, (b) new types and combinations of systems/sub-systems and (c) new stakeholders that are involved ― we are already looking at a handful of security and privacy challenges. We will tread forward from there and peel through the other layers to identify interesting assets ‘downstream’ relative to the volumes of business data that is used for ‘learning’.

Interesting ‘new assets’ that AI & ML introduce

At a very simple level, a lot of ML algorithms (especially ones concerned with prediction or classification) essentially try to work on a numerical problem that looks like:

y= w.x + b

Here ‘ x ’ represents the inputs (or features) and ‘ y ’ the corresponding outputs or outcomes as observed in past data.

So, in a home sales prediction context, ‘ x ’ may be the attributes of homes that influence their price (such as the built-up area, the yard size, the locality, the condition, etc.) and ‘ y ’ may be the prices home sales have fetched in the last few months. The task of the algorithm is to discover the optimal ‘ w ’ and ‘ b ’ that can explain the past data and that may be used to make good future predictions ‘ y ’ given a previously unseen inputs ‘ x ’. This process of working out the ‘ w ’ and the ‘ b ’ (referred to collectively as ‘weights’ or ‘ w ’ hereafter) is called ‘training’ or ‘learning’.

Once the algorithm ‘learns’ the weights ‘ w ’, we can use them to predict what a home newly placed on the market will likely sell for. (In most real world problems, the evaluation of ‘ w ’ involves computationally intense and expensive operations on very large matrices.)

In the backdrop of this really brief overview, let us look at the interesting new assets that emerge from AI & ML:

1. Features

In many ML problems, data scientists work closely with domain experts to devise the best ‘representation’ of the data to machine learning algorithms. This is called ‘feature engineering’ and can take a lot of effort and insights. Domain expertise helps towards the intuitions on what might make interesting features to consider (or not) and data scientists or statisticians help in figuring out the most appropriate ways to factor in those features. Thus a good choice and combination of features can yield better results even if you are starting from the same training data and that makes the artifacts of ‘feature engineering’ important assets from a data protection standpoint.

(These days, it is becoming more common for the model to ‘learn’ these features by itself ― especially in larger systems. The technique is called ‘feature learning’ or ‘representation learning’ and the rationale is to feed in all available inputs and let the algorithm (internally) figure out which features matter and which don’t. When that happens, the ‘features’ remain internal to the model. That is, there is no explicit artifact called ‘features’ to worry about protecting. However, where features are hand-engineered they represent a valuable artifact that needs to be treated just like any other information asset.)

A typical ‘deep’ neuralnetwork 2. Model Hyper-parameters

Most machine learning algorithms have several ‘settings’ that can be tweaked to modify the behavior of the algorithm. These settings can be thought as ‘design choices’ that define the physical characteristics and behavior of the underlying machine learning model. For e.g., in the case of linear regression, ‘learning rate’ is something that influences how fast the model converges (or not) in its search of the optimal weights. In the case of deep neural networks (see pic above), there are many other choices such as the number of layers (depth of the network), the number of neurons in each layer (the height of the layer), the batch size to use during training, the number of passes to make, the optimization method to use, etc., etc.

These settings are called “ hyper-parameters ” because their choice influences the eventual “parameters” (i.e., the coefficients or weights) that are learned by the model from the training data. In larger problems, dozens of such choices may be involved and it takes much work to discover and settle upon the correct combination which can provide desired outcomes. In problem contexts where the data itself is not unique (e.g., image recognition in a scenario where millions of images are available to all parties), these hyper-parameters represent a competitive edge. In other words, once you have invested a lot of hard work to create a model that has started producing great results, the respective hyper-parameters are no different than any other ‘high value asset’ (HVA) for your organization and it becomes important to think about protecting them wherever they may reside.

3. Weights or Coefficients

Similar to hyper-parameters, the weights/coefficients (the ‘ w ’ and the ‘ b ’ from the “ y = w.x + b ” above) learned by the model represent all the invaluable ‘insights’ that the model has gleaned from millions of records of data that it peered through in the training phase. The future predictions from the model are a simple (and often quick) mathematical operation on the new data point using these weights.

Just like ‘hyper-parameters’ these weights are ‘reusable’. Moreover, they are even more ready-made for reuse as compared to hyper-parameters. Using a technique called ‘transfer learning’, other

Security and Privacy considerations in Artificial Intelligence & Machine L ...

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本