esay

—abstract: |    Bilateral multi-issue negotiation is an important class of real-life    negotiations. Usually, negotiation problems have constraints such as a    complex and unknown opponent’s utility in real time, or time    discounting. In the class of negotiation with some constraints, the    effective automated negotiation agents can adjust their behavior    depending on the characteristics of their opponents and negotiation    scenarios.    In this paper, we propose an automated agent that estimates the    opponent’s strategies by characterizing them using the Thomas-Kilmann    Conflict Mode. Our agent tries to reach a speedy compromise by adjusting    its strategy from stubborn to compromising based on the strategy of the    opponent. In the experiments, we demonstrate that the proposed agent has    excellent outcomes and greater chances of reaching fair agreements by    reaching very close to the Pareto-Optimal agreements.nocite: ‘[@Baarslag2017; @Klein2003; @Lewis2017; @Fujita2016; @Akhbari2013; @Hao2014; @Dirkzwager2014; @Hausken2009; @Nguyen2017; @Mell; @Lin2017; @Ilany2016; @Bosse2005; @Aydogan2017; @Lin2017; @Kaddouci2009; @FARATIN2002205; @Baarslag2013a; @Baarslag2017a; @Zuckerman2015]’title: |    Adapting Human-Agent Multi-Issue Bilateral Negotiations    using the    Thomas-Kilmann Conflict Mode Instrument—Introduction============Negotiation, the process of joint decision making, is an inescapablephenomenon in our society. It is an important process in formingalliances and reaching trade agreements. Negotiation arises in almostevery social and organizational setting, yet many avoid it out of fearor lack of skill and this contributes to income inequality, politicalgridlock and social injustice [@eisenberg2009settlement]. This has ledto an increasing focus on the design of autonomous negotiators capableof automatically and independently negotiating with others.Automated negotiation research originates in various disciplinesincluding economics, social science, game theory and artificialintelligence and is fueled by a number of benefits that computerizednegotiation can offer, including better (win-win) deals, and reductionin time, costs, stress and cognitive effort on the part of the user.Agents which typically negotiate with a human opponent are studied as apart of the Human-Agent Negotiation domain.Such automated agents can be used side-by-side with a human negotiatorengaging in important negotiations. Agents can make negotiations fasteras well as assist people who are less qualified in the negotiationprocess. Another application of automated agents is in E-commerce whereagents can negotiate with humans to sell items faster by making a tradeoff between price and services like after-sales.One particular problem of interest of Human-Agent Negotiations isMulti-Issue Bilateral Negotiations which involves two participants.These participants negotiate over more than one issues or items.Applications of the solutions of Multi-Issue Bilateral Negotiations arefar reaching in Trade and Commerce. While the problem of modeling anautomated agent for bilateral negotiation is not new and has been wellstudied in the fields of Multi-Agent Systems i.e. Agent-AgentNegotiations.[@Aydogan2017; @Baarslag2014; @Baarslag2017a; @Fukuta2016; @Fujita; @Bosse2005].To our best knowledge, work on multi-issue Human-Agent Negotiation isstill under process.To properly fulfill its role in an ever changing environment, anegotiation agent must balance and adhere to different aspects ofautonomous behavior, including self-reliance and the capability andfreedom to perform its actions, while at the same time remaininginterdependent in its joint activity with the user.Most studies in the domains of Human-Agent and Agent-Agent negotiationsuse Bayesian and other sophisticated mathematical models and assumecomplete information to identify the opponent’s preferences.[@FARATIN2002205; @Oprea2003]. However, these methods tend to over fitand are not be suited for Human-Agent negotiation scenarios where theagent must deal with incomplete information and the opponent may changestrategy frequently. Also, in a Human-Agent Negotiation setting, theagent must have the ability to adapt its negotiation strategy to be ableto deal with the different types of human negotiators that it mayencounter.Studies in the Agent-Agent negotiations domain hold much relevance forHuman-Agent negotiations. Fujita [@Fujita2014] proposed the use of pastnegotiation session data to learn about the opponent’s negotiation styleby characterizing the opponent in terms of a known conflict-handlingstyle, Thomas-Kilmann Conflict Mode Instrument (TKI).This paper adapts Fujita’s idea to Human-Agent negotiations working withjust real-time data from the current session and without past sessiondata. We propose an adaptive strategy that adjusts the speed ofcompromising in real-time to be compatible with the opponent’s strategy,determined using TKI.To retain the flexibility of our agent in being able to respond tovarious situations, in place of mathematical models, we use simpleheuristics like Most Changed Least Preferred (MCLP) and Most OfferedMost Preferred (MOMP) to estimate the opponent’s preferences.We performed 24 trials with the agent described in the paper and reach23 negotiated agreements. Our proposed strategies enable our agent toreach fast agreements with an average time of 6 minutes and 30 seconds.The outcomes of the negotiation mostly result in the best distributionof items between the participants with a Pareto-Optimal efficiency of97.7%.Negotiation Environment=======================The interaction between negotiating parties is regulated by anegotiation protocol that defines the rules of how and whenproposals/offers can be exchanged.The protocol that our agent conforms to is the alternating-offersprotocol for bilateral negotiation, in which the negotiating partiesexchange offers in turns.Agents and Human take turns in the negotiation. The human is expected tostart the negotiations and the agent is informed about the action takenby the opponent. In negotiation, the two parties take turns in selectingthe next negotiation action. The possible actions are:Accept:   : This action indicates that the agent accepts the opponent’s last    bid.Offer:   : This action indicates that the agent proposes a new bid.End:   : This action indicates that the agent terminates the entire    negotiation, resulting in the lowest possible score for both    participantsWe divide the total Negotiation Time into *n* time windows. In eachwindow, $t inTime$ of the negotiation, if the negotiation has not terminated earlier,each participant can propose a possible agreement, and the opponent caneither accept or reject the offer.If the action was an *Offer*, the participant is subsequently asked todetermine its next action and the turn taking goes to the next round. Ifit is not an *Offer*, the negotiation has finished. The turn takingstops and the final score (utility of the last bid) is determined foreach of the agents, as follows:-   The action of participant is an *Accept*. This action is possible    only if the opponent actually did a bid. The last bid of the    opponent is taken, and the utility of that bid is determined in the    utility spaces of agents A and B.-   The action is returned an *End*. The score of both agents is set to    the lowest score.The parties negotiate over *issues*, and every issue has an associatedrange of alternatives or *values*. A negotiation outcome consists of amapping of every issue to a value, and the set $Omega$ of all possibleoutcomes is called the negotiation domain. The domain is commonknowledge to the negotiating parties and stays fixed during a singlenegotiation session. Both parties have certain preferences prescribed bya preference profile over $Omega$. These preferences can be modeled bymeans of a utility function $U$ that maps a possible outcome$omega in Omega$ to a real-valued number in the range $[0, 1]$. Whilethe domain information is common knowledge, the preference profile ofthe participants is private.For a particular bid at time t: $bid_t$, the utility to the agent isrepresented as $U_A(bid_t)$ and the utility to the human is representedas $U_H(bid_t)$.= [draw, -latex’](human) [Human]{}; (om) [Opponent Model]{}; (bs) [Bidding Strategy]{};(as) [Acceptance Strategy]{};(human) – node[auto,left] [Receive Offer]{} (om); (om) – node[auto][Compute TKI & ]{} (bs); (om) – node[auto,below] [OpponentPreferences]{} (bs); (om) – node[auto,below] [Compute Utility]{} (as);(as) – node[auto,above] [Acceptance/Rejection]{} (human); (as) –node[auto,below] [& Counter Offer]{} (human); (bs) –node[auto,right] [Send Offer]{} (as);Agent Design============Our agent, *Puffin* is designed along the BOA Model as worked on indetail by Baarslag [@Baarslag2014]. As described in the Figure[fig:M1], the Agent *Puffin*, primarily consists of 3 components:1.  Acceptance Strategy: We aggregate received offers in time windows    and accept if current offer is better than those in the previous    time windows.2.  Opponent Model: Simple heuristics and Thomas-Kilmann Conflict Mode    Instrument (TKI) is used to estimate and model the human’s    preferences and strategy.3.  Bidding Strategy: An Adaptive strategy that adjusts the speed of    compromising depending on the opponent’s strategy, estimated by the    Opponent Model.Acceptance Strategy——————-The Acceptance strategy of our agent determines whether to accept thecurrent offer or to wait for better offers in the future. To do so, weuse a conditional model, which aggregates offers received from the humanin time windows and for every received offers, accepts if:-   Current offer is better than all offers in previous time window, or-   Current offer is better than average utility of offers in previous    time window, or-   Current offer is better than any offer seen before.Opponent Modeling using Heuristics———————————-Our agent estimates the alternatives the opponent will offer in thefuture based on the opponent’s offers. To do so, we use the commonlyused heuristic in the negotiation domain: *The first bid made by theopponent is the most preferred bid.* [@Baarslag2014] The best bid is theselection of the most preferred value for each issue, and therebyimmediately reveals which values are the best for each issue. Mostnegotiations start with the best bid stated outright.Adding to this, we propose the following set of heuristics:1.  Most Changed Least Preferred (MCLP): There is a inverse relation    between the preference of an issue and the times its value is    significantly changed.2.  Most Offered Most Preferred (MOMP): There is a direct relation    between the preference of a value and the frequency it is offered.To elaborate the reasoning for the above set of heuristics, we say that,if the value of a particular issue/item is often significantly changed,then it can be assumed that its preferred less by the opponent, sincethey don’t care enough for it to request for it equally, always. Had theissue been important to the opponent, they would have tried to maximisethe value of that issue and its value would not have changed frequently.Further, if a particular value of an issue maximises gains for theopponent, they would offer it more frequently. This also correlatesdirectly to our second proposed heuristic which states of a directrelation between preference of a value of and its frequency of offering.Also, to make use of the preference exchanges that might take placebetween the agent and the human, we use the Minimax Preference Algorithmby Mell and Gratch [@Mell] which helps eliminate spurious alternativesfrom our Opponent Model.Thomas-Kilmann Conflict Mode Instrument Index———————————————In our Agent Puffin’s design, we use the Thomas-Kilmann Conflict ModeInstrument (TKI) to determine the opponent’s strategy and adapt theagent’s bidding strategy.An opponent’s strategy is predictable based on earlier encounters or anexperience profile, and can be characterized in terms of some globalstyle, such as the negotiation styles, or a known conflict-handlingstyle. One important style is the Thomas-Kilmann Conflict ModeInstrument (TKI) [@Fukuta2016].The TKI is designed to measure a person’s behavior in conflictsituations. “Conflict situations” are those in which the interests orconcerns of two people appear to be incompatible. In such a situation,an individual’s behavior has two dimensions: (1) assertiveness, theextent to which the person attempts to satisfy their own concerns, and(2) cooperativeness, the extent to which the person attempts to satisfythe opponent’s concerns. These two basic dimensions of behavior definefive different modes for responding to conflict situations:**Competing**, **Accommodating**, **Avoiding**, **Collaborating**, and**Compromising** as Fig.[fig:M2] shows.![ Thomas-Kilmann Conflict ModeInstrument[]{data-label=”fig:M2″}](../TKI_orig)### Estimating Opponent Strategy using Real Time dataWe use the real time negotiation information to judge the opponent’sTKI. Using mean of previous bids as a measure of the opponent’scooperativeness and the variance of bids as a measure of theirassertiveness, we judge the TKI index of the opponent. To judgeassertiveness and cooperativeness, we use the same criteria as proposedby Fujita [@Fujita2014]. The major difference in our approach fromFujita [@Fujita2014] is that we use only current data to estimate thecooperativeness and assertiveness of the opponent.According to our best knowledge, while TKI has been introduced inAgent-Agent Negotiations, it hasn’t been applied in the Human-AgentNegotiation domain in previous works. While previous work compares thecurrent opponent to opponents in the past and tries to estimate theirTKI mode such a mechanism is susceptible to biased previous negotiationsessions.Unlike Fujita [@Fujita2014] which uses historical negotiation data inthe Agent-Agent domain to estimate TKI, our agent *Puffin* uses onlycurrent data. The agent retains no knowledge of the previous negotiationsessions. The current bid of the opponent is compared to the opponentspast bids in the same negotiation session. This gives the agent theflexibility to adapt to all kinds of opponent (Hardlines, Cooperative,Collaborative, Passive) since the opponent is being compared to thatopponent themselves. This method also makes the agent impervious to biasthat might come from biased historical data.  **Condition**                     **Cooperativeness**   **Condition**                             **Assertiveness**  ——————————— ——————— —————————————– ——————-  $U_H(bid_t) extgreater mu_h$   Uncooperative         $sigma^2(t) extgreater sigma_{h}^2$   Passive  $U_H(bid_t) = mu_h$              Neutral               $sigma^2(t) = sigma_{h}^2$              Neutral  $U_H(bid_t) extless mu_h$      Cooperative           $sigma^2(t) extless sigma_{h}^2$      AssertiveSource: Fujita K. Efficient Strategy Adaptation for Complex Multi-timesBilateral Negotiations [@Fujita2014]The Table [Table:1] relates the condition to the assertiveness andcooperativeness. In summary, when $U_H(bid_t)$ (utility of opponent’sbid in round t) is lower than $mu_h$ (mean utility of the previousoffers from the human), our agent regards the opponent as cooperative.On the other hand, when $U_H(bid_t)$ is higher than $mu_h$ , our agentregards the opponent as uncooperative. In addition, our agent evaluatesthe assertiveness by comparing between the variance of the currentproposal from the to the variance of the previous proposals so far inthe session. If the variance for the bid at time $t$ is greater thanprevious, then the bids are spread out and the opponent can be seen aspassive and can be coerced. In the other case, where the variance ofprevious bids is higher than that of the current, represents that thebids are confined to a smaller region and that the opponent is assertiveof their decision.### Adapting Bidding StrategyTo determine the right offer to make, to reach agreement, our agentutilizes an adaptive bidding strategy by making use of theThomas-Kilmann Conflict Mode Instrument Index as described earlier.Studying empirical results, we say that the best bid to make isdependent on the negotiation time left as well as the previous bidreceived from the opponent. Such a strategy takes care of the estimatedpreferences of the opponent, thereby increasing the chances of the bidbeing accepted by the opponent.In concrete terms, when $j$ negotiation rounds have been completed (outof a total of $n$ rounds), and the utility of the last bid by the humanhas been $U_A(A[j-1])$, the strategy is to make a bid with utilityclosest to:$ target(j,alpha) =  U_{min} + (U_{max} – U_{min}) cdot ( 1-U_A(A[j-1]) cdot  (frac{j}{n})^{1/ alpha}) $Where:$alpha$:   Concession rate, determined from Human’s TKI index, calculated by    the Opponent Model.$U_{max}$:   Maximum target utility by the Agent(set to 0.99)$U_{min}$:   Minimum target utility by the Agent (set to 0.3)The values for $U_{max}$ and $U_{min}$ are empirically determined. Anytarget utility lesser than $0.3$ will almost always result innegotiation timeouts or participants leaving halfway through thenegotiation.![ With increasing $alpha$, the agent concedes faster (at earlierrounds). The “-^–” line represents least value of $alpha$, i.e.stubborn agent. []{data-label=”fig:M3″}](../bidding)As equation above shows, the speed of compromising is decided by$alpha$ in $target(j,alpha)$. $alpha$ is set very close to 0initially. We compare the mean and variance of the utility of the pastbids and compare it with the current bid made by the human opponent tothe agent to determine cooperativeness and assertiveness of the human.According to the cooperativeness and assertiveness scale for TKI,$alpha$ is increased when the opponent is “accommodating” or“compromising.” By introducing this adjustment algorithm, our agent canadjust its strategy from stubborn to cooperative, becoming compatiblewith the human opponent to reach agreement faster.The exact algorithm for adapting the agent’s strategy by adjusting$alpha$ is as given in Algorithm [Algorithm:1].In lines 1-2, we define the initial conditions for the algorithm, set atthe beginning of the negotiation session. Line 4 refers to the bidrecieved from the human opponent. In lines 5-8, mean and variance arecalculated to be subsequently used in line 10-16 for Cooperativeness andlinees 17-23 for Assertiveness. In lines 24-27, if the assertiveness andcooperativeness point towards the human’s TKI being **Compromising**(line 24) or **Accommodating** (line 25) then $alpha$ in increased by avalue of 0.1 to increase the speed of compromise. Line 27 adds thecurrent bid to the array of bids which is used in subsequent interationsin lines 5-8.![image](../time)Figure [fig:M3] highlights the concept of adapting the speed ofcompromising in this paper. The figure is presented as a plot of agent’sutility with respect to the $alpha$ value selected. Plugging values of$alpha$ ranging from 0.01 to 0.91 in the proposed equation for$target(j,alpha)$ we get the resultant graph. Here the values of theutility to the agent of the opponent’s bid $U_A(A[j-1])$ are generatedrandomly and put in the equation. This shows that as the value of$alpha$ is increased, the agent concedes faster.For the initial value of $alpha$, the agent never concedes. This wouldbe the case when the opponent is never cooperative or accomodating.However, if the opponent compromises and keeps accomodating the agent’sbids by reducing its utility, the agent will compromise as well and tryreaching agreement faster.Experimental Trials and Results===============================We conducted 24 Trials of which 23 reached Negotiated Agreement.The Human and agent were to negotiate over a set of 4 items, each 5 inquantity. Human and agent had predefined dissimilar interests over theset of 4 items. The maximum score a participant could get was 50 points.These sessions had a time limit of 10 minutes.To test the effectiveness of the agent’s strategies, each trial had afresh instance of the agent with no knowledge of past negotiationsessions. At the end of each session, the time of negotiated agreementand the points for both human and agent were captured.Observations————![ The best case negotiation scenario is the Pareto-Optimal Frontierspecified by the bounding line. Results from negotiation trials aremarked as dots. []{data-label=”fig:trials”}](../trials_2)The results from the trials were plotted in Figure [fig:time] and[fig:trials].Figure [fig:trials] shows the resultant points of human and agent onreaching agreement in the negotiation session. Most of the agreementsclustered around the 36-36 mark which is the most fair agreement pointfor the agent and human since the points are equally distributed amongstboth the participants.Figure [fig:time] shows the time taken in the trials to reachnegotiated agreement. 19 of the 23 agreements completed before round 11,i.e. around 7 minutes (420 seconds) since the negotiations began. On anaverage, it took 6.5 minutes or 390 seconds to reach agreement with ouragent *Puffin*. The maximum time to agreement was 493 seconds or 8minutes and 13 seconds. The minimum time to agreement was 290 seconds,i.e 4 minutes and 50 seconds.This reinforces our belief that the use of Thomas-Kilmann Conflict ModeInstrument helps reach faster and fairer agreements.Benchmarks———-The best case scenario for negotiated agreement is the Pareto-OptimalFrontier, which is the theoretical limit. It is impossible to do betterthan that.Plotting the Pareto-Optimal Frontier along with the results in Fig[fig:trials] we see that most of our trial results cluster quite nearthe Pareto-Optimal Frontier. Taking Root Mean Square distance of trialresults from the Pareto-Optimal Frontier as a measure of efficiency, weget an efficiency score of **97.7%** for our Agent Puffin.This metric can be understood as, on an average, the probability of thenegotiated agreement resulting in an optimal distribution of pointsbetween the human and agent is 97.7%. This means, 97.7% of the time, thedistribution of issues between human and agent is optimal.Conclusion==========This paper focused on bilateral multi-issue negotiation, which is animportant class of real-life negotiations. We proposed a novel agentthat estimates the preferences of the opponent using simple heuristics(MCLP and MOMP). Our agent guesses the opponent’s strategy through themeans of the Thomas-Kilmann Conflict Mode Instrument and adapts its ownstrategy accordingly. Through 24 trials we demonstrated that theproposed method results in marked improvement in reaching optimalnegotiated agreements.In our future work, we will incorporate in the agent, the ability towork with partial offers which will help emulate real life negotiationswhere agreements are reached by first agreeing on a subset of items. Wewill also incorporate ability to effectively use emotional exchange toelicit compromise from the opponent in certain situations.

Leave a Reply

Your email address will not be published. Required fields are marked *