The CARMEN data sharing portal project: what have we learned? L.S. Smith 1 , J. Austin 2 , S.J. Eglen 3 , T. Jackson 2 , M. Jessop 2 , B. Liang 2 , M. Weeks 2 , E. Sernagor 4 1 Computer Science and Mathematics, Stirling University, 2 Computer Science, University of York, 3 DAMTP, Centre for Mathematical Sciences, University of Cambridge, 4 Institute of Neuroscience, Newcastle University: UK. Summary The UK based CARMEN project is a portal based collabora?ve facility for neuroscien?sts (and in par?cular electrophysiologists) to share data and tools for working on data. It started in 2006, and has been providing a gradually improving service for about six years. Given the experience that we have gained from running this service, what have we learned? What would we do differently if we were to start again? Is there s?ll interest in this type of capability, or has the world moved onwards? Neuroinforma?cians are already convinced of the importance of data sharing, so it is important to find out what is impeding uptake. We recently put out a ques?onnaire to all registered users of CARMEN, and we have now some feedback from registered users, and (perhaps equally importantly) from people who registered and did not end up using the system. Here we aTempt to understand the issues raised. CARMEN is much more than a repository. Data and metadata can be uploaded to (and downloaded from) the repository, and shared with other neuroscien?sts. Services which process the uploaded datasets to produce new data, and workflows composed from these services can operate on datasets. Extensive metadata can be aTached to the data, and a search facility allows data and services to be located in the repository. Fine-grained security of access is available. The system is currently targeted towards electrophysiology data, predominantly MEA and EEG data. For details see hTp://www.carmen.org.uk. NDF (Neural Data Format), a novel open internal data format, provides a single unified internal data format. By using services to translate to this format, services and workflows can cope with the many different formats generated with different recording pla\orms. Details may be found at hTp://www.carmen.org.uk/standards/ CarmenDataSpecs.pdf. HDF5 is a data model, library and file format for storing and managing data. NDF is not based on HDF5: the HDF5 working group of the INCF Electrophysiology Task force is developing a new standard for electrophysiological signals based on HDF5, but this is not yet complete. An HDF5 format was used in (Eglen et al 2014), and a service to convert this to NDF wriTen so that the CARMEN facili?es could be u?lised. Reference: [Eglen et al. 2014] ] S. J. Eglen, M. Weeks, M. Jessop, J. SimonoTo, T. Jackson, E. Sernagor, A data repository and analysis framework for spontaneous neural ac?vity recordings in the developing re?na, GigaScience 3: 3, 2014. We acknowledge the contribu?on of the UK Research Councils (EPSRC grant EP/E002331/1 and BBSRC grant BB/I001042/1) towards this work. Conclusion: CARMEN is free, and is ready to be used for further longitudinal analyses of electrophysiological (and other neural ?me-series) datasets from related areas (e.g. hippocampus, auditory brainstem, ECoG, slice prepara?ons, etc.). Yet usage has remained low. The data paper (Eglen et al 2014) provided good publicity, and has resulted in new enquiries. Clearly, sharing datasets, metadata and analysis techniques can have real impact, but the experimental neuroscience community have not been fast to take up the opportuni?es. We have tried to iden?fy the issues that are impeding take-up. Carmen workflow edi?ng screen What was CARMEN used for? Data Sharing: Most users wanted and used data sharing facili?es. Some used it also for data backup. Some felt forced to use it for data sharing. Data Analysis: Some users used data analysis facili?es. Few used the workflow facili?es. Many said they would have used these more, but…it was too difficult, or too complicated, or … One group (Eglen et al 2014) has produced a major data paper, and this has given the project publicity. It has featured in a Nature podcast, and been accessed over 9000 ?mes since it was published on 26 March 2014. This shows that sharing datasets, metadata and analysis techniques can have real impact. What stood in the way of more usage? Slow upload speeds: Unfortunately, this is almost certainly a problem at the lab end, as the data upload speeds at the server are high. Difficulty in using faciliEes: Some users found the service deployment, and service usage difficult to use. Workflows seem to have proven too hard to use for nearly all users. General difficulty with the user interface: Some users simply found the user interface too difficult or perhaps too off-pujng. Some commented that being essen?ally forced to serialise their work was off-pujng (because it made running services etc. too slow). The Java-based system has meant that users are con?nually encountering security no?ces. Lack of integraEon with their exisEng data storage and analysis techniques: Users would like the portal to integrate with their exis?ng workflow, so that use of CARMEN was more seamless. How should Data Sharing projects like CARMEN react? Ease of use. Systems need to be easy to use: that’s partly a maTer of a good user interface on the portal (if portal based), but that needs both careful design, and integra?on with a range of browsers. Further ,methods for adding dynamic func?onality change (specifically, from Java to JavaScript). Ease of use includes integraEon with how users normally work. But that’s very difficult, if only because there is a large range of exis?ng solware out there, so (e.g.) integra?ng the system with MATLAB would suit some users, whereas others would like it integrated with the solware form their equipment manufacturer. Difficult. Provide some really simple services to get users started. For example, simple visualisa?on services, allowing users to inspect datasets before aTemp?ng any more sophis?cated processing. This both increases users confidence in using the system, and allows them not to spend ?me analysing poor quality datasets. Aim to be very useful for some classes of users: Trying to be useful to a large class of users means that a range of services (etc.) need to be maintained. So far CARMEN has been at its most useful when associated with specific classes of project, so that specific services, and visualisa?on tools can be rolled out that support these projects. This type of project can help to bring other classes of project on board: the marginal cost of suppor?ng new specific areas decreases with ?me. Simple data visualisa?on View publication stats View publication stats