edpop_explorer package¶
Module contents¶
- class edpop_explorer.BasePreparedQuery¶
Bases:
objectEmpty base dataclass for prepared queries. For prepared queries that can be represented by a single string, do not inherit from this class but use a simple string instead.
- class edpop_explorer.BibliographicalRecord(from_reader: Type[Reader])¶
Bases:
RecordPython representation of edpoprec:BibliographicalRecord.
This subclass adds fields that are specific for bibliographical records.
- class edpop_explorer.BiographicalRecord(from_reader: Type[Reader])¶
Bases:
RecordPython representation of edpoprec:BiographicalRecord.
This subclass adds fields that are specific for biographical records.
- class edpop_explorer.CERLReader¶
Bases:
GetByIdBasedOnQueryMixin,ReaderA generic reader class for the CERL databases on the
data.cerl.orgplatform.This is an abstract class – to use, derive from this class, set the
API_URL,API_BY_ID_BASE_URLandLINK_BASE_URLconstant attributes, and implement the_convert_recordclass method.- API_BY_ID_BASE_URL: str¶
The base URL of the API for retrieving single records, of the form
https://data.cerl.org/<CATALOGUE>/.
- API_URL: str¶
The base URL of the search API, of the form
https://data.cerl.org/<CATALOGUE>/_search.
- DEFAULT_RECORDS_PER_PAGE: int = 10¶
The number of records to fetch at a time using the
fetch()method if not determined by user.
- LINK_BASE_URL: str¶
The base URL for userfriendly representations of single records.
- MAXIMUM_RECORDS_PER_PAGE: int | None = 100¶
Maximum number of records to fetch per page. If not set, there is no predefined limit.
- additional_params: Dict[str, str] | None = None¶
- fetch_range(range_to_fetch: range) range¶
Fetch a specific range of records. After fetching, the records are available in the
recordsattribute and thenumber_of_resultsattribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.- Parameters:
range_to_fetch – The range of records to fetch.
stepvalues of ranges other than 1 are not supported and may be ignored.- Returns:
The range of record indexes that has actually been fetched.
- classmethod transform_query(query) str¶
Return a version of the query that is prepared for use in the API.
This method does not have to be called directly; instead
prepare_query()can be used.
- class edpop_explorer.DatabaseFileMixin¶
Bases:
objectMixin that adds a method
prepare_datato aReaderclass, which will make the database file available in thedatabase_pathattribute as apathlib.Pathobject. If the constant attributeDATABASE_URLis given, the database will be downloaded from that URL if the data is not yet available. The database file will be (expected to be) stored in the application data directory using the filename specified in the constant attributeDATABASE_FILENAME, which has to be specified by the user of this mixin.- DATABASE_FILENAME: str¶
The filename (not the full path) under which the database is expected to be stored.
- DATABASE_LICENSE: str | None = None¶
A URL that contains the license of the downloaded database file.
- DATABASE_URL: str | None = None¶
The URL to download the database file from. If this attribute is
None, automatically downloading the database file is not supported.
- database_path: Path | None = None¶
The path to the database file. Will be set by the
prepare_datamethod.
- prepare_data() None¶
Prepare the database file by confirming that it is available, and if not, by attempting to download it.
- class edpop_explorer.DigitizationField(original_text: str)¶
Bases:
Field- description: str | None = None¶
- iiif_manifest: str | None = None¶
- preview_url: str | None = None¶
- property summary_text: str | None¶
- url: str | None = None¶
- class edpop_explorer.Field(original_text: str)¶
Bases:
objectPython representation of edpoprec:Field.
This base class has two user-defined subfields:
original_text(which is required and should be passed to the constructor), andunknown. User-defined subfields are simple object attributes and can be accessed directly. In addition, this base class defines an automatic subfieldnormalized_text, which is a read-only property that is only available if normalization is supported by the field – this is not the case for this base class. In those cases, it is still possible to set this field using theset_normalized_textmethod. Exceptoriginal_text, all subfields are optional and are None by default. Useto_graph()to obtain an RDF graph. The subject node is by default a blank node, but this may be overridden by setting the subject_node attribute.Subclasses should override the
_rdf_classattribute to the corresponding RDF class. Subclasses can define additional subfields by adding additional public attributes and by registring them in theSUBFIELDSconstant attribute. For registring, a constructor__init__should be defined that first calls the parent’s constructor and then adds the subfields one by one usingself.SUBFIELDS.append(('<attribute-name>', EDPOPREC.<rdf-property-name>, '<datatype>')), where <datatype> is any of the datatypes defined in theDATATYPESconstant of this module. Subclasses may furthermore define the_normalized_textprivate method.- authority_record: str | None = None¶
Subfield – may contain the URI of an authority record
- normalize() NormalizationResult¶
Perform normalization on this field, based on the
normalizerattribute. Subclasses ofFieldmay predefine a normalizer function, but this can always be overridden.
- normalizer: Callable | None = None¶
- original_text: str¶
Subfield – text of this field according to the original record.
- subject_node: Node¶
This field’s subject node if converted to RDF. This is a blank node by default.
- property summary_text: str | None¶
- to_graph() Graph¶
Create an
rdflibRDF graph according to the current data.
- unknown: bool | None = None¶
Subfield – indicates whether the value of this field is explicitly marked as unknown in the original record.
- exception edpop_explorer.FieldError¶
Bases:
Exception
- class edpop_explorer.GetByIdBasedOnQueryMixin¶
Bases:
ABCMixin for readers that are based on an API that has no special way of retrieving single records – instead, these readers fetch single records using a list query. To use, make sure to override the
_prepare_get_by_id_querymethod, which defines the list query that should be used.
- class edpop_explorer.LazyRecordMixin¶
Bases:
ABCAbstract mixin that adds an interface for lazy loading to a Record.
To use, implement the
fetch()method and make sure that it fills the record’sdataattributes and its Fields and that thefetchedattribute is set toTrue.- abstractmethod fetch() None¶
- fetched: bool = False¶
- class edpop_explorer.LocationField(original_text: str)¶
Bases:
Field- COUNTRY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/country')¶
- LOCALITY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/locality')¶
- location_type: URIRef | None = None¶
- class edpop_explorer.Marc21BibliographicalRecord(from_reader: Type[Reader])¶
Bases:
Marc21DataMixin,BibliographicalRecordA combination of
BibliographicalRecordandMarc21DataMixin.
- class edpop_explorer.Marc21Data(fields: List[Marc21Field] = <factory>, controlfields: Dict[str, str]=<factory>, picafields: List[Marc21Field] = <factory>, raw: dict = <factory>)¶
Bases:
RawDataPython representation of the data inside a Marc21 record
- controlfields: Dict[str, str]¶
- fields: List[Marc21Field]¶
- get_all_subfields(fieldnumber: str, subfield: str, picaxml=False) List[str]¶
Return a list of subfields that matches the requested field number and subfield. May return an empty list if the field and subfield do not occur.
- get_fields(fieldnumber: str, picaxml=False) List[Marc21Field]¶
Return a list of fields with a given field number. May return an empty list if field does not occur.
- get_first_field(fieldnumber: str, picaxml=False) Marc21Field | None¶
Return the first occurance of a field with a given field number. May be useful for fields that appear only once, such as 245. Return None if field is not found.
- get_first_subfield(fieldnumber: str, subfield: str | tuple[str], picaxml=False) str | None¶
Return the requested subfield of the first occurance of a field with the given field number. Return None if field is not found or if the subfield is not present on the first occurance of the field.
subfieldmay be a tuple, in that case a concatenation of all given subfields is returned.
- picafields: List[Marc21Field]¶
- raw: dict¶
- to_dict() dict¶
Give a
dictrepresentation of the raw data.
- class edpop_explorer.Marc21DataMixin¶
Bases:
objectA mixin that adds a
dataattribute to a Record class to contain an instance ofMarc21Data.- data: Marc21Data | None = None¶
- show_record() str¶
- class edpop_explorer.Marc21Field(fieldnumber: str, indicator1: str | None = None, indicator2: str | None = None, subfields: Dict[str, str]=<factory>, description: str | None = None)¶
Bases:
objectPython representation of a single field in a Marc21 record
- description: str | None = None¶
- fieldnumber: str¶
- indicator1: str | None = None¶
- indicator2: str | None = None¶
- subfields: Dict[str, str]¶
- exception edpop_explorer.NotFoundError¶
Bases:
ReaderError
- class edpop_explorer.RawData¶
Bases:
ABCBase class to store raw original data of a record. Only defines an abstract method
to_dict.- abstractmethod to_dict() dict¶
Give a
dictrepresentation of the raw data.
- class edpop_explorer.Reader¶
Bases:
ABCBase reader class (abstract).
This abstract base class provides a common interface for all readers. To use, instantiate a subclass, set a query using the
prepare_query()orset_query()method, callfetch()and subsequentlyfetch_next()until you have the number of results that you want. The attributesnumber_of_results,number_fetchedandrecordswill be updated after fetching.To create a concrete reader, make a subclass that implements the
fetch_range()andtransform_query()methods and set theREADERTYPEandCATALOG_URIREFattributes.fetch_range()should populate therecords,number_of_results,number_fetchedandrange_fetchedattributes.- ALLOW_EMPTY_QUERY = False¶
If True, it is possible to enter an empty query to fetch all records.
- CATALOG_URIREF: URIRef | None = None¶
- DEFAULT_RECORDS_PER_PAGE: int = 10¶
The number of records to fetch at a time using the
fetch()method if not determined by user.
- DESCRIPTION: str | None = None¶
Information about the contents of the corresponding catalogue, to be used in user interfaces.
- FETCH_ALL_AT_ONCE = False¶
True if the reader is configured to fetch all records at once, even if the user only needs a subset.
- IRI_PREFIX: str | None = None¶
The prefix to use to create an IRI out of a record identifier. If an IRI cannot be created with a simple prefix, the identifier_to_iri and iri_to_identifier methods have to be overridden.
- MAXIMUM_RECORDS_PER_PAGE: int | None = None¶
Maximum number of records to fetch per page. If not set, there is no predefined limit.
- READERTYPE: str | None = None¶
The type of the reader, out of
BIOGRAPHICALandBIBLIOGRAPHICAL(defined in theedpop_explorerpackage).
- SHORT_NAME: str | None = None¶
Short name of the corresponding catalogue, to be used in user interfaces.
- adjust_start_record(start_number: int) None¶
Skip the given number of first records and start fetching afterwards.
This functionality may be ignored by readers that can only load all records at once; generally these are readers that return lazy records.
- classmethod catalog_to_graph() Graph¶
Create an RDF representation of the catalog that this reader supports as an instance of EDPOPREC:Catalog.
- fetch(number: int | None = None) range¶
Perform an initial or subsequent query. Most readers fetch a limited number of records at once – this number depends on the reader but it may be adjusted using the
numberparameter. Other readers fetch all records at once and ignore thenumberparameter. After fetching, the records are available in therecordsattribute and thenumber_of_resultsattribute will be available. Returns the range of record indexes that has been fetched.
- abstractmethod fetch_range(range_to_fetch: range) range¶
Fetch a specific range of records. After fetching, the records are available in the
recordsattribute and thenumber_of_resultsattribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.- Parameters:
range_to_fetch – The range of records to fetch.
stepvalues of ranges other than 1 are not supported and may be ignored.- Returns:
The range of record indexes that has actually been fetched.
- property fetching_exhausted: bool¶
Return
Trueif all results have been fetched.
- property fetching_started: bool¶
Trueif fetching has started, otherwiseFalse. As soon as fetching has started, changing the query is not possible anymore.
- generate_identifier() str¶
Generate an identifier for this reader that is unique for the combination of reader type and prepared query. This identifier can be used when the reader has to be reused across sessions by pickling and unpickling.
Note: while the identifier is guaranteed to be unique, there is no guarantee that the generated identifier is the same for every combination of reader type and prepared query.
- get(index: int, allow_fetching: bool = True) Record¶
Get a record with a specific index. If the record is not yet available, fetch additional records to make it available.
- Parameters:
index – The number of the record to get.
allow_fetching – Allow fetching the record from an external source if it was not yet fetched.
- abstractmethod classmethod get_by_id(identifier: str) Record¶
Get a single record by its identifier.
- classmethod get_catalog_slug() str | None¶
- classmethod identifier_to_iri(identifier: str) str¶
- classmethod iri_to_identifier(iri: str) str¶
- property number_fetched: int¶
The number of results that has been fetched so far, or 0 if no fetch has been performed yet.
- number_of_results: int | None = None¶
The total number of results for the query, or None if fetching has not yet started and the number is not yet known.
- prepare_query(query: str) None¶
Prepare a query for use by the reader’s API. Updates the
prepared_queryattribute.
- prepared_query: str | BasePreparedQuery | None = None¶
A transformed version of the query, available after calling
prepare_query()orset_query.
- records: Dict[int, Record]¶
The records that have been fetched as instances of (a subclass of)
Record.
- set_query(query: str | BasePreparedQuery) None¶
Set an exact query. Updates the
prepared_queryattribute.
- abstractmethod classmethod transform_query(query: str) str | BasePreparedQuery¶
Return a version of the query that is prepared for use in the API.
This method does not have to be called directly; instead
prepare_query()can be used.
- exception edpop_explorer.ReaderError¶
Bases:
ExceptionGeneric exception for failures in
Readerclass. More specific errors derive from this class.
- class edpop_explorer.Record(from_reader: Type[Reader])¶
Bases:
objectPython representation of edpoprec:Record.
This base class provides some basic attributes, an infrastructure to define fields and a method to convert the record to RDF. While this is a non-abstract base class, no fields are defined here – these should be added in subclasses.
Recordand its subclasses should be created by calling the constructor with theReaderclass as thefrom_readerparameter and by setting thedata,link,identifierandsubject_nodeattributes (all are optional but recommended), as well as the fields that are defined by the subclass. fields are set using the attribute with the same name: set them to an instance ofFieldor toNone. The basic attributes and the fields areNoneby default.Subclasses should override the
_rdf_classattribute to the corresponding RDF class. They should define additional fields by adding additional public attributes defaulting toNoneand by registring them in the_fieldsattribute. For registring, a constructor__init__should be defined that first calls the parent’s constructor and then adds the fields by adding tuples to_fieldsin the form('<attribute-name>', EDPOPREC.<rdf-property-name>, <Field class name>).- fetch() None¶
Fetch the full contents of the record if this record works with lazy loading (i.e., if the record’s class derives from
RDFRecordMixin). If the record is not lazy, this method does nothing.
- from_reader: Type[Reader]¶
The subject node, which will be used to convert the record to RDF. This is a blank node by default.
- get_data_dict() dict | None¶
Convenience function to get the record’s raw data as a
dict, orNoneif it is not available.
- identifier: str | None = None¶
Unique identifier used by the source catalog.
- property iri: str | None¶
A stable IRI based on the identifier attribute. None if the identifier attribute is not set.
- link: str | None = None¶
A user-friendly link where the user can find the record.
- property subject_node: Node¶
A subject node based on the identifier attribute. If the identifier attribute is not set, a blank node.
- to_graph() Graph¶
Return an RDF graph for this record.
- exception edpop_explorer.RecordError¶
Bases:
Exception
- class edpop_explorer.SRUMarc21BibliographicalReader¶
Bases:
SRUMarc21Reader,Marc21BibliographicalReaderMixin,ABCSubclass of
SRUMarc21Readerthat adds functionality to create instances ofBibliographicRecord.This subclass assumes that the Marc21 data is according to the standard format of Marc21 for bibliographical data. See: https://www.loc.gov/marc/bibliographic/
- READERTYPE: str | None = 'bibliographical'¶
The type of the reader, out of
BIOGRAPHICALandBIBLIOGRAPHICAL(defined in theedpop_explorerpackage).
- records: List[Marc21BibliographicalRecord]¶
The records that have been fetched as instances of (a subclass of)
Record.
- class edpop_explorer.SRUMarc21Reader¶
Bases:
SRUReaderSubclass of
SRUReaderthat adds Marc21 functionality.This class is still abstract and to create concrete readers the
_get_link(),_get_identifier()and_convert_recordmethods should be implemented.- abstractmethod classmethod _convert_record(sruthirecord: dict) Record¶
Convert the output of
sruthiinto an instance of (a subclass of)Record.
- abstractmethod classmethod _get_link(data: Marc21Data) str | None¶
Get a public URL according to the Marc21 data or
Noneif it is not available.
- abstractmethod classmethod _get_identifier(data: Marc21Data) str | None¶
Get the unique identifier from the Marc21 data or
Noneif it is not available.
- marcxchange_prefix: str = ''¶
- picaxml_prefix: str = 'info:srw/schema/5/picaXML-v1.0:'¶
- class edpop_explorer.SRUReader¶
Bases:
GetByIdBasedOnQueryMixin,ReaderSubclass of
Readerthat adds basic SRU functionality using thesruthilibrary.This class is still abstract and subclasses should implement the
transform_query()and_convert_record()methods and set the attributessru_urlandsru_version.The
_prepare_get_by_id_query()method by default returns the transformed version of the identifier as a query, which normally works, but this may be optimised by overriding it.- abstractmethod classmethod _convert_record(sruthirecord: dict) Record¶
Convert the output of
sruthiinto an instance of (a subclass of)Record.
- fetch_range(range_to_fetch: range) range¶
Fetch a specific range of records. After fetching, the records are available in the
recordsattribute and thenumber_of_resultsattribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.- Parameters:
range_to_fetch – The range of records to fetch.
stepvalues of ranges other than 1 are not supported and may be ignored.- Returns:
The range of record indexes that has actually been fetched.
- prepare_query(query) None¶
Prepare a query for use by the reader’s API. Updates the
prepared_queryattribute.
- query: str | None = None¶
- session: Session¶
The
Sessionobject of therequestslibrary.
- sru_additional_schema: str | None = None¶
Additional SRU schemas for which an additional request is made. The results are merged in the SRU results. If
None(default), do not make an additional request.
- sru_schema: str | None = None¶
The requested SRU schema. If
None(default), use the default schema of the SRU provider.
- sru_url: str¶
URL of the SRU API.
- sru_version: str¶
Version of the SRU protocol. Can be ‘1.1’ or ‘1.2’.
- abstractmethod classmethod transform_query(query: str) str¶
Return a version of the query that is prepared for use in the API.
This method does not have to be called directly; instead
prepare_query()can be used.
- edpop_explorer.bind_common_namespaces(graph: Graph) None¶
Bind the RDF namespaces that are in use across this package to the specified graph.
These are: RDF, RDFS, EDPOPREC.