Experiment Spec File Format
An experiment specification file goes in the directory experiment_specs/
, and must end with
.json
. As you could probably guess from the required extension, this file should be formatted as
a json object (with an extension or two; see below). This file specifies all of the parameters
that define a PRA experiment, grouped into four chunks (note that each of these bullet points has
links to further documentation about the set of parameters):
-
The graph to be used in the experiment. There are a lot of possible parameters to put here. See the graph page for more information. The name of this set of parameters in the json object must be
graph
. -
Relation metadata. If you know things about the relations in your graph, such as what their domains and ranges are, or whether they have known inverses, that information is specified in this directory. If you don’t know any of this information, you can safely leave this out. The name of this parameter in the json object must be
relation metadata
, and its value should just be a string. The string can be a name, which will make the code search for a directory with that name underrelation_metadata/
, or it can be a fully qualified path. -
A description of the training / testing split to use for the experiment. This tells the PRA code which relations to learn models for, and which node pairs to use as training and testing data for each relation. The name of this parameter in the json object must be
split
. And its value must be a string, as with therelation metadata
parameter. The only difference is the code will look undersplits/
if just a name is given. -
PRA-specific parameters. The previous three sets of parameters specify the data to use for PRA - what the graph is like, what we know about the relations in the graph, and what to use for training and testing. These parameters specify how PRA should work; things like how many random walks to do, whether to use the vector space walks described in my EMNLP 2014 paper, or whether to compute a standard PRA matrix or just see what paths you can find in the data. The name of this in the json object must be
pra parameters
.
The name of the file (including any subdirectories under experiment_specs/
)
defines the result directory where the
output will be put.
Examples
There are lots of examples in the PRA codebase. Those are the files that I actually used to run the experiments in my recent papers, so they should be functional with current code and are a good source to look at when trying to create your own. I’ve put a few of them here with a little bit of explanation.
Simple, with generated data
Here’s an example of a fully specified and functional experiment spec:
{
"graph": {
"name": "test_graph",
"relation sets": [
{
"type": "generated",
"generation params": {
"name": "synthetic/very_easy",
"num_entities": 10000,
"num_base_relations": 20,
"num_base_relation_training_duplicates": 5,
"num_base_relation_testing_duplicates": 0,
"num_base_relation_overlapping_instances": 500,
"num_base_relation_noise_instances": 100,
"num_pra_relations": 2,
"num_pra_relation_training_instances": 200,
"num_pra_relation_testing_instances": 50,
"num_rules": 5,
"min_rule_length": 1,
"max_rule_length": 4,
"rule_prob_mean": 0.6,
"rule_prob_stddev": 0.2,
"num_noise_relations": 2,
"num_noise_relation_instances": 100
}
}
]
},
"split": "synthetic/very_easy",
"pra parameters": {
"mode": "learn models",
"features": {
"path finder": {
"walks per source": 100,
"path finding iterations": 3,
"path accept policy": "paired-only"
},
"path selector": {
"number of paths to keep": 1000
},
"path follower": {
"walks per path": 50,
"matrix accept policy": "all-targets"
}
},
"learning": {
"l1 weight": 0.005,
"l2 weight": 1
}
}
}
Don’t worry too much about all of the individual parameters - you can look at the links above to
get a better description on each of them. For now, just pay attention to the format. You can copy
and paste this spec file, and it should just work; the specified graph comes from the data
generator in the code base, so there are no other input files you need to have for this to work.
ExperimentRunner
will look for the graph, see that it does not exist in the graphs/
directory,
and try to create it. Creating the graph will look for the relation set under relation_sets/
,
see that it’s not there, and generate the data.
After this has run once and the graph has been created, I could alternatively specify the graph just with a name, instead of with a nested json object. So in subsequent experiments, this specification would work:
{
"graph": "test_graph",
"split": "synthetic_very_easy",
"pra parameters": {
"mode": "learn models",
"features": {
"path finder": {
"walks per source": 100,
"path finding iterations": 3,
"path accept policy": "paired-only"
},
"path selector": {
"number of paths to keep": 1000
},
"path follower": {
"walks per path": 50,
"matrix accept policy": "all-targets"
}
},
"learning": {
"l1 weight": 0.005,
"l2 weight": 1
}
}
}
If the graph is just a name, ExperimentRunner
will look under the graphs/
directory for
something with that name. You could also specify a path, if you have the graph stored elsewhere.
Be careful with this, though - ExperimentRunner
does not guarantee an order that the experiments
will run in, and so if you’re only creating the graph once, and referring to it like this in the
other experiments, you might try to use it before it’s created. To solve this issue, keep reading.
More complex, with load statements
If you consider the specification above, you might notice that it contains a complete set of parameters both for a data set and for running PRA. If you want to use the same data in another experiment, or if you use the same PRA parameters across multiple experiments, you’ll have to repeat all of these parameters every time they’re used. Unless you use load statements.
The code that reads these specification files will accept a load
keyword to read another file
containing parameters. For example, you could have a file, pra_params.json
, containing the PRA
parameters that you tend to reuse:
{
"mode": "learn models",
"features": {
"path finder": {
"walks per source": 100,
"path finding iterations": 3,
"path accept policy": "paired-only"
},
"path selector": {
"number of paths to keep": 1000
},
"path follower": {
"walks per path": 50,
"matrix accept policy": "all-targets"
}
},
"learning": {
"l1 weight": 0.005,
"l2 weight": 1
}
}
Then in your experiment specification, you can put in a load statement to read those parameters:
{
...
"pra parameters": "load pra_params"
}
Note the format - it is load
, space, name, where the extension is dropped. If this is what the
load statement looks like, the code will look under param_files/
for a file with that name
(e.g., it will check for param_files/pra_params.json
in this example). You can alternatively
give a fully specified path for the parameters you are loading.
This kind of a load works just fine for a lot of cases, but does not work very well if you want to
use these parameters as defaults, but override them in some experiments. If the pra parameters
key in the json is used for a load statement, it can’t also contain other overrides. So, you can
also have a load statement at the beginning of a file, and those parameters can be overriden. For
example, say we change pra_params.json
to instead look like this:
{
"pra parameters": {
"mode": "learn models",
"features": {
"path finder": {
"walks per source": 100,
"path finding iterations": 3,
"path accept policy": "paired-only"
},
"path selector": {
"number of paths to keep": 1000
},
"path follower": {
"walks per path": 50,
"matrix accept policy": "all-targets"
}
},
"learning": {
"l1 weight": 0.005,
"l2 weight": 1
}
}
}
The only difference here is that the parameters are nested under pra parameters
, so they show up
where they’re supposed to in an experiment specification. Now, in my experiment spec, I can do
the following:
load pra_params
{
...
"pra parameters": {
"path follower": "matrix multiplication"
}
}
And I can have my default parameters set, and only specify here the things that I want done differently from my defaults. This allows for relatively easy specification of parallel experiments, where just one thing changes across the set.
Also note that the files that are loaded with a load
statement can themselves have load
statements, so you can go as deep with this as you care to. And, in a few rare circumstances, you
may want to delete a parameter that was specificied in something you loaded, instead of just
overriding it. You can do that with the delete
keyword, such as "matrix accept policy":
"delete"
.
And that, along with reading the links above on what each of these parameters actually means,
should be enough to get you started using this. There are some examples of experiment
specifications and parameter files that I actually use in the examples/
directory of the PRA
codebase.