| Modifier and Type | Method and Description |
|---|---|
void |
SimpleUriCollector.addNewUri(CrawleableUri uri,
CrawleableUri newUri) |
void |
UriCollector.addNewUri(CrawleableUri uri,
CrawleableUri newUri)
Adds the given new URI to the list of URIs collected for the given URI.
|
default void |
UriCollector.addNewUri(CrawleableUri uri,
org.apache.jena.graph.Node newUri)
Adds the given new URI to the list of URIs collected for the given URI.
|
default void |
UriCollector.addNewUri(CrawleableUri uri,
String newUri)
Adds the given new URI to the list of URIs collected for the given URI.
|
default void |
UriCollector.addTriple(CrawleableUri uri,
org.apache.jena.graph.Triple triple)
Adds the given triple to the list of URIs collected from the given URI.
|
void |
SimpleUriCollector.closeSinkForUri(CrawleableUri uri) |
long |
SimpleUriCollector.getSize(CrawleableUri uri) |
long |
UriCollector.getSize(CrawleableUri uri)
Returns the total of uris that have been collected
|
Iterator<byte[]> |
SimpleUriCollector.getUris(CrawleableUri uri) |
Iterator<byte[]> |
UriCollector.getUris(CrawleableUri uri)
Returns a list of serialized
CrawleableUri instances that have been
collected for the given URI. |
void |
SimpleUriCollector.openSinkForUri(CrawleableUri uri) |
| Modifier and Type | Method and Description |
|---|---|
CrawleableUri |
CrawleableUriFactoryImpl.create(String uri) |
CrawleableUri |
DefaultCrawleableUriFactory.create(String uri) |
CrawleableUri |
CrawleableUriFactory.create(String uri)
Creates a
CrawleableUri from the given URI String. |
CrawleableUri |
CrawleableUriFactoryImpl.create(URI uri) |
CrawleableUri |
DefaultCrawleableUriFactory.create(URI uri) |
CrawleableUri |
CrawleableUriFactory.create(URI uri)
Creates a
CrawleableUri from the given URI instance. |
CrawleableUri |
CrawleableUriFactory4Tests.create(URI uri,
InetAddress ipAddress,
UriType type) |
CrawleableUri |
CrawleableUriFactoryImpl.create(URI uri,
UriType type) |
CrawleableUri |
DefaultCrawleableUriFactory.create(URI uri,
UriType type) |
CrawleableUri |
CrawleableUriFactory.create(URI uri,
UriType type)
|
protected CrawleableUri |
CrawleableUriFactoryImpl.filter(CrawleableUri createdUri)
Returns the given
CrawleableUri instance if all local
CrawleableUriFactoryImpl.filters marked it as a good URI. |
static CrawleableUri |
CrawleableUri.fromByteArray(byte[] bytes)
Deprecated.
Use the JSON deserialization instead.
|
static CrawleableUri |
CrawleableUri.fromByteBuffer(ByteBuffer buffer)
Deprecated.
Use the JSON deserialization instead.
|
| Modifier and Type | Method and Description |
|---|---|
static List<CrawleableUri> |
UriUtils.createCrawleableUriList(ArrayList uris,
UriType type)
Deprecated.
|
static List<CrawleableUri> |
UriUtils.createCrawleableUriList(Collection<String> seedUris) |
static List<CrawleableUri> |
UriUtils.createCrawleableUriList(String[] seedUris) |
static List<CrawleableUri> |
UriUtils.getCrawleableUriList() |
| Modifier and Type | Method and Description |
|---|---|
protected CrawleableUri |
CrawleableUriFactoryImpl.filter(CrawleableUri createdUri)
Returns the given
CrawleableUri instance if all local
CrawleableUriFactoryImpl.filters marked it as a good URI. |
static String |
UriUtils.generateFileName(CrawleableUri curi,
String fileEnding) |
| Modifier and Type | Field and Description |
|---|---|
protected Hashtable<CrawleableUri,InMemoryKnownUriFilter.UriInfo> |
InMemoryKnownUriFilter.uris
- key: the crawled (known) uri
- value: the info about the URI (see
InMemoryKnownUriFilter.UriInfo), including the reference list |
| Modifier and Type | Method and Description |
|---|---|
List<CrawleableUri> |
KnownUriFilter.getOutdatedUris()
Returns all
CrawleableUris which have to be recrawled. |
List<CrawleableUri> |
AbstractKnownUriFilterDecorator.getOutdatedUris() |
List<CrawleableUri> |
InMemoryKnownUriFilter.getOutdatedUris() |
| Modifier and Type | Method and Description |
|---|---|
void |
KnownUriFilter.add(CrawleableUri uri,
long nextCrawlTimestamp)
Adds the given URI to the list of already known URIs.
|
void |
AbstractKnownUriFilterDecorator.add(CrawleableUri uri,
long nextCrawlTimestamp) |
void |
InMemoryKnownUriFilter.add(CrawleableUri uri,
long nextCrawlTimestamp) |
void |
KnownUriFilter.add(CrawleableUri uri,
long lastCrawlTimestamp,
long nextCrawlTimestamp)
Adds the given URI to the list of already known URIs together with the the time at which it has been crawled.
|
void |
AbstractKnownUriFilterDecorator.add(CrawleableUri uri,
long lastCrawlTimestamp,
long nextCrawlTimestamp) |
void |
InMemoryKnownUriFilter.add(CrawleableUri uri,
long lastCrawlTimestamp,
long nextCrawlTimestamp) |
boolean |
AbstractKnownUriFilterDecorator.isUriGood(CrawleableUri uri) |
boolean |
UriFilter.isUriGood(CrawleableUri uri)
Returns true if the given
CrawleableUri object fulfills the
requirements imposed by this filter. |
boolean |
InMemoryKnownUriFilter.isUriGood(CrawleableUri uri) |
| Constructor and Description |
|---|
InMemoryKnownUriFilter(Hashtable<CrawleableUri,InMemoryKnownUriFilter.UriInfo> uris,
boolean frontierDoesRecrawling)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
CrawleableUri |
GsonUriSerializer.CrawleableUriAdapter.read(com.google.gson.stream.JsonReader in) |
| Modifier and Type | Method and Description |
|---|---|
void |
GsonUriSerializer.CrawleableUriAdapter.write(com.google.gson.stream.JsonWriter out,
CrawleableUri uri) |
| Modifier and Type | Method and Description |
|---|---|
Set<CrawleableUri> |
UriHashCustodian.getUrisWithSameHashValues(Set<HashValue> hashValuesForComparison)
Get all uris that have a common hash value with one of the hash values of the given set.
|
| Modifier and Type | Method and Description |
|---|---|
void |
UriHashCustodian.addHashValuesForUris(List<CrawleableUri> uris)
Add the given hash values for the given uris.
|
| Modifier and Type | Method and Description |
|---|---|
List<CrawleableUri> |
Frontier.getNextUris()
Returns the next chunk of URIs that should be crawled or null.
|
| Modifier and Type | Method and Description |
|---|---|
void |
Frontier.addNewUri(CrawleableUri uri)
|
| Modifier and Type | Method and Description |
|---|---|
void |
Frontier.addNewUris(List<CrawleableUri> newUris)
Adds the given list of URIs to the
Frontier. |
void |
Frontier.crawlingDone(List<CrawleableUri> uris)
This method should be called after a list of URIs have been requested
using the
Frontier.getNextUris() method and the crawling has been
finished. |
void |
ExtendedFrontier.informAboutDeadWorker(int idOfWorker,
List<CrawleableUri> lstUrisToReassign)
The frontier gets the information that some worker has died and he has to react somehow.
|
| Modifier and Type | Field and Description |
|---|---|
private CrawleableUri |
CrawlingActivity.uri
The uri for the crawling activity.
|
| Modifier and Type | Method and Description |
|---|---|
CrawleableUri |
CrawlingActivity.getCrawleableUri() |
CrawleableUri |
CrawlingActivity.getUri() |
| Modifier and Type | Method and Description |
|---|---|
static void |
ActivityUtil.addStep(CrawleableUri uri,
Class<?> clazz)
A simple method which attaches a step with the given Class to the
CrawlingActivity of the given URI if it exists. |
static void |
ActivityUtil.addStep(CrawleableUri uri,
Class<?> clazz,
String... actions)
A simple method which attaches a step with the given Class and the given
actions to the
CrawlingActivity of the given URI if it exists. |
| Constructor and Description |
|---|
CrawlingActivity(CrawleableUri uri,
String workerUri)
Constructor.
|
| Modifier and Type | Field and Description |
|---|---|
protected SortedMap<IpUriTypePair,List<CrawleableUri>> |
InMemoryQueue.queue |
| Modifier and Type | Method and Description |
|---|---|
Iterator<AbstractMap.SimpleEntry<InetAddress,List<CrawleableUri>>> |
IpAddressBasedQueue.getIPURIIterator()
Goes through the queue und collects all IP-address with their URIs
|
Iterator<AbstractMap.SimpleEntry<InetAddress,List<CrawleableUri>>> |
InMemoryQueue.getIPURIIterator() |
List<CrawleableUri> |
UriQueue.getNextUris()
Returns the next chunk of URIs that should be crawled or null.
|
List<CrawleableUri> |
AbstractIpAddressBasedQueue.getNextUris() |
protected abstract List<CrawleableUri> |
AbstractIpAddressBasedQueue.getUris(IpUriTypePair pair) |
protected List<CrawleableUri> |
InMemoryQueue.getUris(IpUriTypePair pair) |
| Modifier and Type | Method and Description |
|---|---|
protected abstract void |
AbstractIpAddressBasedQueue.addToQueue(CrawleableUri uri) |
protected void |
InMemoryQueue.addToQueue(CrawleableUri uri) |
void |
UriQueue.addUri(CrawleableUri uri)
Adds the given
CrawleableUri instance to the queue. |
void |
AbstractIpAddressBasedQueue.addUri(CrawleableUri uri) |
| Modifier and Type | Field and Description |
|---|---|
List<CrawleableUri> |
UriSet.uris |
List<CrawleableUri> |
CrawlingResult.uris |
| Constructor and Description |
|---|
CrawlingResult(List<CrawleableUri> uris) |
CrawlingResult(List<CrawleableUri> uris,
String idOfWorker) |
UriSet(List<CrawleableUri> uris) |
| Modifier and Type | Method and Description |
|---|---|
default void |
UnstructuredDataSink.addData(CrawleableUri uri,
byte[] data)
Stores the given data for the given URI.
|
void |
UnstructuredDataSink.addData(CrawleableUri uri,
InputStream stream)
Stores the data from the given stream for the given URI.
|
default void |
UnstructuredDataSink.addData(CrawleableUri uri,
String data)
Stores the given data for the given URI.
|
void |
SinkBase.closeSinkForUri(CrawleableUri uri)
Closes the resources necessary for storing the data of the given URI.
|
void |
SinkBase.openSinkForUri(CrawleableUri uri)
Opens the sink to process data for the given URI.
|
| Modifier and Type | Method and Description |
|---|---|
void |
InMemorySink.addData(CrawleableUri uri,
byte[] data) |
void |
InMemorySink.addData(CrawleableUri uri,
InputStream stream) |
void |
InMemorySink.addTriple(CrawleableUri uri,
org.apache.jena.graph.Triple triple) |
void |
InMemorySink.closeSinkForUri(CrawleableUri uri) |
void |
InMemorySink.openSinkForUri(CrawleableUri uri) |
| Modifier and Type | Method and Description |
|---|---|
void |
TripleBasedSink.addTriple(CrawleableUri uri,
org.apache.jena.graph.Triple triple)
Add a triple for the given uri.
|
List<org.apache.jena.graph.Triple> |
AdvancedTripleBasedSink.getTriplesForGraph(CrawleableUri uri)
Get all
Triples behind the given uri. |
| Modifier and Type | Method and Description |
|---|---|
CrawleableUri |
UriProcessorInterface.recognizeInetAddress(CrawleableUri uri)
Recognizes the IP address of
CrawleableUri. |
CrawleableUri |
UriProcessor.recognizeInetAddress(CrawleableUri uri) |
CrawleableUri |
UriProcessorInterface.recognizeUriType(CrawleableUri uri)
Recognizes the type of
CrawleableUri. |
CrawleableUri |
UriProcessor.recognizeUriType(CrawleableUri uri) |
| Modifier and Type | Method and Description |
|---|---|
CrawleableUri |
UriProcessorInterface.recognizeInetAddress(CrawleableUri uri)
Recognizes the IP address of
CrawleableUri. |
CrawleableUri |
UriProcessor.recognizeInetAddress(CrawleableUri uri) |
CrawleableUri |
UriProcessorInterface.recognizeUriType(CrawleableUri uri)
Recognizes the type of
CrawleableUri. |
CrawleableUri |
UriProcessor.recognizeUriType(CrawleableUri uri) |
| Modifier and Type | Field and Description |
|---|---|
private List<CrawleableUri> |
WorkerInfo.urisCrawling
List contains all uris that the worker is currently crawling.
|
| Modifier and Type | Method and Description |
|---|---|
List<CrawleableUri> |
WorkerInfo.getUrisCrawling() |
| Modifier and Type | Method and Description |
|---|---|
void |
Worker.performCrawling(CrawleableUri uri)
Crawls the given URI and adds new URIs that have been found while crawling to
the given list of new URIs.
|
| Modifier and Type | Method and Description |
|---|---|
void |
Worker.crawl(List<CrawleableUri> uris)
Crawls the given URIs and sends URIs that have been found while crawling to
the frontier.
|
| Constructor and Description |
|---|
WorkerInfo(boolean workerSendsAliveMessages,
List<CrawleableUri> urisCrawling,
Date dateLastAlive) |
Copyright © 2017–2019. All rights reserved.