• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      Using Vector Representation of Propositions and Actions for STRIPS Action Model Learning

      2019-01-17 01:24:06WeiGaoandDunboCai

      Wei Gao and Dunbo Cai

      (1.Beijing Aerospace Control Center, Beijing 100094, China; 2.School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China)

      Abstract: Action model learning has become a hot topic in knowledge engineering for automated planning. A key problem for learning action models is to analyze state changes before and after action executions from observed “plan traces”. To support such an analysis, a new approach is proposed to partition propositions of plan traces into states. First, vector representations of propositions and actions are obtained by training a neural network called Skip-Gram borrowed from the area of natural language processing (NLP). Then, a type of semantic distance among propositions and actions is defined based on their similarity measures in the vector space. Finally, k-means and k-nearest neighbor (kNN) algorithms are exploited to map propositions to states. This approach is called state partition by word vector (SPWV), which is implemented on top of a recent action model learning framework by Rao et al. Experimental results on the benchmark domains show that SPWV leads to a lower error rate of the learnt action model, compared to the probability based approach for state partition that was developed by Rao et al.

      Key words: automated planning; action model learning; vector representation of propositions

      Automated planning for intelligent agents is the problem of arranging a sequence of actions to achieve goals, given the current situation and the set of available actions. Actions are models of how agents can change the states of the world. In the action representing languages: STRIPS[1]and PDDL (planning domain definition language)[2], an action consists of a set of preconditions and a set of effects which are both logical propositions. Preconditions specify under what conditions the action can execute, while effects specify the value changes of propositions after the execution. Usually, design of action models is done by domain experts from scratch. However, for many application domains, preconditions and effects are complex and difficult for human to figure out. Therefore, building action models is a challenging knowledge engineering task.

      Wang[3]is the first researcher to propose to automatically build action models, with the aim to help human experts. Along this direction, advanced techniques have been developed[4-11]. Yang et al.[6]proposed an algorithm for STRIPS-style action melding, which utilized MAX-SAT techniques to reason constraints on mapping propositions to action preconditions or effects. Zhuo et al.[7]develop a method using Markov Logic Network to learn the logical qualifiers ? and ?, and logical implications for actions. Lanchas et al.[11]used Regression Tree to predicate the duration of action. Rao et al.[9]extend the learning problem from deterministic action models to non-deterministic model in a partially observable environment.

      In partially observable environment, one important problem is the analysis of state change before and after the execution of an action in a plan trace. Specifically, two neighboring propositionspandp′ observed from a plan trace cannot be assumed to be from the same state, due to that some action a may execute between the times the two hold but the execution was not observed. In other words, we cannot determine for which actionpis a precondition and for which actionp′ is an effect. Due to the same reason, if an action is observed to be next to a proposition we cannot assume that the proposition is one of the preconditions of that action. To handle the problem of partial observability, Amir and Chang[8]used a kind of logical formula called Transition Belief Formula, to track the possibly transitions among world states. Rao et al.[9]extracted state change knowledge under two assumptions: actions make least modification on states, and each action has an equal opportunity of affecting a changed literal. These assumptions are purely made from the aspect of probability change where the relation among propositions are not considered.

      This paper exploits a recently proposed distributed representation of words from the work of Word2vec[12]. Specifically, words are represented by vectors, and this representation is obtained by training a kind of neural network called Skip-Gram. This representation is useful in many natural language processing (NLP) tasks, e.g., named entity reconignition[13], latent semantic analysis[14], and automated summarization of texts[15-16]. Notably, the representation supports the vector of a word in carrying some meanings of the word. This property is utilized here to partition propositions into states. The vectors also provide us a way to define semantic distances among propositions and actions of our problem. Further, the semantic distances are used to develop a novel state partition method which exploits the k-means and kNN[17]algorithm frameworks. Extensive experiments were conducted to evaluate our method on the International Planning Competition (IPC) domains. And, results show that our method is superior to the state partition method designed by Rao et al.[9], both in terms of the error rate of state partition and the error rate in STRIPS action model learning.

      1 Backgrounds

      In this paper some important concepts and notions are introduced from predicate logic, STRIPS planning, action model learning, and vector representation of words (an interesting technique from NLP).

      1.1 Predicates, propositions and literals

      In mathematical logic, each predicate has a name and a set of parameters. For example, the predicate At(x,y) has the name At and two parametersx,y, and can be used to represent that an agentxis at the locationy. Propositions are instantiated predicates. For example, if a1is an agent and Beijing a location, then At(a1, Beijing) is a proposition. Usually, for a propositionp, we usepto denote thatpis true, and use its negationpto denote thatpis false. Propositions and the negation of propositions are called literals. In this paper, we use a set of literals {l1, …,lk} to denote their conjunctionsl1∧…∧lk.

      1.2 STRIPS action model

      A STRIPS action model (aka, action schema, or, operator)ois of the form

      〈name(o),para(o),pre(o),add(o),del(o)〉

      where name(o) is the name of the action model, para(o) is its parameters, pre(o), add(o), and del(o) are preconditions, add effects, delete effects ofo, respectively. name(o) is a string of characters. para(o) is a set of typed objects. A typed object consists of its name and its type (both are strings). We refer to name(o) and para(o) as the head ofo. pre(o), add(o), and del(o) are sets of predicates. Please refer to Fig. 1 for an example of an action represented in STRIPS for the Rover domain from IPC. For a detailed description of the syntax and semantics of STRIPS, please refer to Refs.[18].

      (:action sample_soil:parameters (?x-rover ?s-store ?p-waypoint)

      :precondition (and

      (at ?x ?p)

      (at_soil_sample ?p)

      (equipped_for_soil_analysis ?x)

      (store_of ?s ?x)

      (empty ?s) )

      :effect (and (not (empty ?s))

      (full ?s)

      (have_soil_analysis ?x ?p)

      (not (at_soil_sample ?p))))

      Fig.1 Action model expressed in STRIPS from the Rover domain

      An action is one instantiation of an action model. For example, ifris a rover,sa store, andwa waypoint, then sample_soil(r,s,w) is an action.

      1.3 Action model learning

      Action model learning is the problem of building the structures (preconditions and effects) of each action model in a domain, given a set of plan traces from the domain. A plan trace is composed of observations of facts and actions.

      Definition1An observation of a fact is of the form (p,x1, …,xk), wherepis the predicate name,xiare objects,k∈N is a const.

      Definition2An observation of an action is of the form (a,x1, …,xm), whereais the action name,xiare objects, andm∈N is a const.

      Definition3A plan trace is a sequence of observationsT= (v1,v2, …,vt), whereviis either an observation of a fact or an action, andt∈N is the number of steps.

      Definition4An action model learning problem (AML) is to construct the preconditions and effects for action models in a planning domain, given:

      A finite set of predicatesP,

      A finite set of action model heads: {(name(o),para(o))},

      A finite set of plan traces.

      In other words, for each action model, its preconditions and effects are not known. And we will develop a method to fill the flaws. The way to solve this is called the learning of action models[6].

      We cite the learning process from Rao et al[9]. The process consists of the following steps:

      ① State partition: partition propositions in the plan traces into a set of states.

      ② Precondition extraction:associate propositions as preconditions of an action based on changes among states.

      ③ Effects extraction: associate propositions as effects of an action based on changes among states.

      1.4 Vector representation of words

      Words are symbols with meanings. However, these symbols are difficult to handle by mathematic approaches. So, researchers in NLP attempts to design representations of words to support those approaches. A classical method is to represent word as vectors of 0 or 1. The first idea is the one-hot representation[12]. It assigns a total order on the words in a corpus. The length of each vector of a word is of the same length: the number of words in the corpus. And, the vector of a word is defined by setting a 1 at the word’s position and setting a 0 elsewhere. For example, the corpus {He, She, They} will have a one-hot representation of the words “He”, “She”, “They” with vectors: [1, 0, 0], [0, 1, 0] and [0, 0, 1], respectively. However, this representation is found to lose the meanings of words[12].

      A recent well-known vector representation is called Word2vec proposed by Mikolov et al[12, 19]. They designed two kinds of neural networks, CBox and Skip-Gram, to learn the vector of words. They hope this representation can carry meanings of words. The motivation is that the meaning of a word depends on its contexts (i.e., the words around it). So, they train the neural networks with pairs of words in a certain length of window. It is shown that the vectors obtained by CBox have the potential to use a set of words to predict the word among them, while those obtained by Skip-Gram have the potential to predict the words around for a given word[19]. This paper will use Skip-Gram to analyze meanings of propositions and actions.

      2 Building the Embedding of Propositions and Actions

      First, propositions and actions in plan traces are converted into words. Second, plan traces are viewed as sentences from which the samples (pairs of words co-occurred) for training are generated. A Skip-Gram network is then trained with those samples. Third, a semantic distance is defined for a pair of words via a similarity measure on their vectors.

      2.1 Mapping plan traces to sentences

      As discussed in section 1.3, propositions and actions in a plan trace have a certain form, but do not correspond to words as we required. Specifically, the name and arguments of a proposition are separated, which make them be seen as separated words. For example, the proposition At(a1, Beijing) corresponds to three words: the first word is “At”, the second is “a1”, and the third is “Beijing”. To make all the parts of a proposition to form a single word, we concatenate the together with the - (hyphen) operator. For example, the proposition At(a1, Beijing) is transformed to At-a1-Beijing, which is a word in.

      Definition5A sentence mapping of a plan traceT=(v1,v2, …,vt) is a sequence of words (an artificial sentence)φ=(w1,w2, …,wt) where a wordwi(i=1,…,t) is formed by concatenating each part ofviwith the - (hyphen) operator.

      2.2 Building vector representation of words

      When a set of plan traces are mapped into a set of sentences, these sentences form a corpus (Note that, corpus is a terminology borrowed from NLP). Given a corpusC, we will useCto train a Skip-Gram neural network to build the vector representation of words inC.

      We use the Skip-Gram neural network implemented in the open-source software Deeplearing4j (https://deeplearning4j.org/). And, we configure the network with parameters shown in Tab.1.

      Tab.1 Configuration for the parameters of the Skip-Gram neural network

      2.3 Semantic distances among propositions and actions

      In this paper, the semantic distance of a pair of propositions is considered in terms of the frequency of the two propositions’ occurrences in the same state. Further, we want to use the semantic distance to figure out those propositions that belong to the same state. In this way, a sequence of propositions are able to be partitioned into different states.

      Given two wordswandw′, cosine distance between their vectors w and w′ is used to measure their similarity[12]. Specifically,

      (1)

      It can be seen that the value of Similarity(w,w′) is in [-1, 1]. To regulate the values for semantic analysis of words, we define

      (2)

      In Eq.(2), the relationships between SemDistance(w,w′) and Similarity(w,w′) are:

      ① if Similarity(w,w′)=1, then SemDistance(w,w′)=0, which predicts that the two words have similar meanings.

      ② if Similarity(w,w′)=0, then SemDistance(w,w′)=0.5, which is not so useful to predict the relation of the two words in terms of semantic meanings.

      ③ if Similarity(w,w′)=-1, then SemDistance(w,w′)=1, which predicts that the two words have different meanings.

      To show the usefulness of SemDistance in characterizing the semantic distance of propositions, we present an example in Tab.2. This example is taken from the IPC’s rovers domain, trained on the plan traces obtained on the first problem (named pfile01) of the domain. A representative case is for propositions “at_soil_sample-waypoint” and “available-rover”. Their SemDistance is 0.045 (close to 0), based on which we think that the two proposition may occur in the same state. In fact, “at_soil_sample-waypoint” and “available-rover” can be in the same state.

      Tab.2 Example distances among propositions on the first task in the rovers domain

      3 State Partition Using Word Vectors

      Using SemDistance as the semantic measure of propositions and actions, in this section we develop an algorithm for partitioning propositions on plan traces into states.

      3.1 Identifying co-occurrence of propositions

      For a set of plan traces, we first obtain the semantic distances of the pairs of propositions for every plan trace. We then order the distances in an ascending order. For example, the set of distances {0.3, 0.1, 0.2} will be ordered into a sequence 〈0.1, 0.2, 0.3〉. Given, such a sequence, we want to predict how many hidden states are there. We use k-means to develop a clustering algorithm for this purpose (Fig.2).

      Now, the algorithm is illustrated in Fig.2. Our first note is that it uses a fixed numberKof clusters. Here, we setK=5, and our reason is as follows.

      Observation1For two neighboring actions in a plan trace, the number of actions that occurred between them but were not observed should be less than 5. Suppose the event e: “an action that appeared but was not observed” is with the probability of 0.5. Then,e′: “5 events happens simultaneously” is with the probability of 0.55=0.031 25<5%. So,e′ is a small probability event. Due to this observation, there are at most 5 hidden states between the two observed actions, and therefore we setK=5.

      Input: A sequence of semantic distances,D= {x1,x2, …,xm},

      The expected number of clusters:K

      Output: Clusters ofD, with formC={C1,C2, …,Ck}

      Randomly selectKvalues fromD: {μ1,μ2, …,μK}

      Repeat

      Ci?(1≤i≤K)

      For 1≤j≤mDo

      dji←‖xj-μi‖2

      Cλj←Cλj∪{xj}

      EndFor

      For 1≤i≤KDo

      Ifμ′i≠μiThen

      μiμ′i

      EndIf

      EndFor

      Until {μ1,μ2, …,μK} is not changed

      Fig.2 Hidden states cluster algorithm

      Our second note is on the idea of the algorithm in Fig.2. The algorithm does not work directly on the propositions, but on the SemDistance between them. It is used to predict what SemDistance values the pairs of propositions in the same sate (i.e., the two propositions co-occur) may have. In the following subsection, we propose an algorithm that use the information to map the propositions to a particular state.

      3.2 Mapping propositions into states based on their co-occurrence

      Our mapping method is based on the co-occurrence of propositions. Specifically, for a propositionp, if the pairs of proposition 〈p,q〉, 〈p,l〉, and 〈p,s〉 are predicted to co-occur in sates1,s2, ands2, respectively. Due to that the times of co-occurrence ins2 is more than that ins1, we predict thatpbelongs tos2. Utilizing this method and the idea of kNN, our algorithm (in Fig.3) maps propositions to states. In other words, the algorithm partitions a sequence of propositions to a set of states. And every state is on a different time step. The difference between two states can be seen as the changes an action made, and therefore can be used to model the preconditions and effects of actions. We call our method of partitioning propositions as Sate Partition by Word Vector (SPWV).

      Input: A set of propositionsF, and their vectors;

      The functionSemDist( );

      A set of clusters of distances:C={C1,C2, …,Ck}

      Output: A set of mapping from proposition to statePar: {(p,s)|pis a proposition,sis a state}

      Par{}

      Forp∈FDo

      tempthe nearestkSemDist(p,q), forq∈Fandqp

      Forc∈CDo

      count(c) = the frequency of elements intempbelongs toc

      EndFor

      ParPar

      EndFor

      Fig.3 State partition algorithm

      4 Experiments

      In this section, we evaluate SPWV on International Planning Competition (IPC) benchmarks. The set of benchmarks includes the following domains: Depots, DriverLog, Rovers, Zenotravel, Airport, Pipesworld, and Satellite. These are all STRIPS domains. For comparison, we take the action model learning algorithm developed in Ref.[9], which is based on two assumptions on properties of the probability distribution of propositions. We call it state partition via probability (SPP), and on the other hand refer to the action model learning method as SPP-AML. Replacing SPP in SSP-AML with SPWV, a new action model learner is obtained, which we call SPWV-AML.

      The aim of our experiment is to evaluate SPWV with respect to the error rate in state partition, and the error rate in action model learning. For the first purpose, we compare SPWV to SPP, and for the second we compare SPWV-AML to SPP-AML. For rules that determines the two kinds of error rate, please refer to Ref.[9].

      4.1 Experimental setup

      We use a 5-fold cross-validation. The planning traces for sampling and training are generated in the following way: the first 10 tasks in a domain is used for plan trace generating, 10 random plan traces are generated for each task, and the first 100 steps of a plan trace is kept. In this way, we have 100 plan traces. Further, we take the prefixes of increasing length of every plan traces as the test planning traces. Therefore, we have 100×100=10 000 samples, in which 8 000 samples are used for training and the left is for validation. The randomness in a plan trace is controlled by the observation probabilityγ.

      The test bed is with a 4 core CPU running at 1.63 GHz, and with 8 GB memory. Our SPWV method was implemented in Java using Deeplearning4j, and the other modules of SPWV-AML were implemented using C++.

      4.2 Results and analysis

      The first result is about the error rate in state partition. We observed that the error rates of both SPP and SPWV decrease whenγ(the observation probability) increases. These outputs are reasonable as a method gets more information to reason about. However, our SPWV method becomes better than SPP whenγincreases. Specifically, SPWV is superior to SPP whenγ=0.9 (shown in Tab.3).

      The second result is about the error rate in the action model learning (Fig.3). The result shows that both SPP-AML and our SPWV-AML becomes better whenγincreases. Notably, SPWV-AML is consistently better than SPP-AML on all the benchmark domains. It is reasonable to conclude that the good performance of SPP-AML is due to the low error rate obtained by SPP in the state partition process.

      5 Conclusions

      In this paper, a recent word embedding technique from the NLP community is utilized to predict the semantic distance between propositions. To our knowledge, this is the first work in automated planning which exploits advancements from NLP. Our preferred properties of the word embedding are: providing a convenient way to be manipulated in a formal way, and carrying meanings of words. We show how these two properties can be utilized to define a reasonable distance measure between propositions. The resulted state partition method is shown to be superior to a representative method. Further, the resulted action model learner shows a better performance on the IPC benchmarks.

      Further improvements on the proposed state partition method are possible, such as using a more recent word embedding model, partitioning propositions with both the semantic distances among them and changing their parameter structures[20].

      怀集县| 洪洞县| 金川县| 石河子市| 墨脱县| 双峰县| 平山县| 安阳市| 祁连县| 阜阳市| 静乐县| 远安县| 阿克苏市| 宣威市| 张北县| 连城县| 忻城县| 泾川县| 盖州市| 菏泽市| 北流市| 河北区| 富锦市| 故城县| 葫芦岛市| 通山县| 湘乡市| 夹江县| 大竹县| 年辖:市辖区| 双江| 手机| 海原县| 太白县| 庆阳市| 叶城县| 个旧市| 赤水市| 平山县| 汾西县| 桦川县|